Skip to content

Unexpected Error: AssertHandler::printMessage #843

@SkyIsland-CN

Description

@SkyIsland-CN

Describe the bug

Unexpected Error: AssertHandler::printMessage

Dear respected developers,

Sorry to bother you! I’m just a tech enthusiast without any formal education in AI. After trying many methods and reading a lot of documentation, I still haven’t been able to solve this issue. My description might not be very professional or standardized—please forgive me! I’ll try to describe the problem as accurately as I can.

I’m attempting to port a project originally based on CUDA to Intel GPU (using IPEX), but I encountered an error that I couldn’t find any information about on Google.

I suspect this might be an internal error rather than a project-specific issue (just my guess—please correct me if I’m wrong, and I apologize if so):

  1. The language is Python, but based on my research, this error seems to originate from C.
  2. I only made simple changes like replacing model.to("cuda") with model.to("xpu"), and torch.cuda with torch.xpu. I didn’t modify anything else.
  3. When I tried to run inference on CPU by changing the device to "cpu", everything worked fine without any issues.

I’m really sorry that I don’t have the ability to pinpoint the problem or construct a minimal reproducible example. I also couldn’t find any similar cases on Google. All I can do is describe what I did—apologies again.

My platform: 12490F + A750 8GB
System: Windows 11 22H2 (22621.2283)
Environment: Python 3.10.5 and 3.11.9 (I tried both and encountered the same issue; I didn’t use Anaconda and set up a clean development environment. I even reinstalled the OS for this.)
Torch version: v2.7.0 + xpu
IPEX version: v2.7.10
driver version:101.6913
oneAPI version: 2025.02
Project: YourMT3, an audio-to-MIDI project built with PyTorch: https://huggingface.co/spaces/mimbres/YourMT3

After git cloning the project to local and opening it in VSCode, I modified app.py and module_helper.py to replace the original CUDA-related calls with XPU equivalents. I didn’t change anything else. When I ran the script, the error mentioned above occurred. I’ve been trying to fix it for three days but haven’t found the cause.

Here are the modified app.py and module_helper.py:
app.py :
https://gist.github.com/SkyIsland-CN/af26e877f4c26e7b65e04014e4acd1f1
module_helper.py :
https://gist.github.com/SkyIsland-CN/78900713d4d420bb7fff6f8e9a4216e0

Below is the terminal output after running:

PS D:\Projects\004> d:; cd 'd:\Projects\004'; & 'c:\Users\wyh\AppData\Local\Programs\Python\Python311\python.exe' 'c:\Users\wyh.trae-cn\extensions\ms-python.debugpy-2025.6.0-win32-x64\bundled\libs\debugpy\launcher' '64895' '--' 'D:\Projects\004\app.py'
[W707 21:23:12.000000000 OperatorEntry.cpp:161] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::geometric_(Tensor(a!) self, float p, *, Generator? generator=None) -> Tensor(a!)
registered at C:\actions-runner_work\pytorch\pytorch\pytorch\build\aten\src\ATen\RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at C:\actions-runner_work\pytorch\pytorch\pytorch\aten\src\ATen\VmapModeRegistrations.cpp:37
new kernel: registered at H:\frameworks.ai.pytorch.ipex-gpu\build\Release\csrc\gpu\csrc\gpu\xpu\ATen\RegisterXPU_0.cpp:186 (function operator ())
D:\Projects\004\amt\src\model\RoPE\RoPE.py:35: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
@autocast(enabled=False)
D:\Projects\004\amt\src\model\RoPE\RoPE.py:242: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
@autocast(enabled=False)
IPEX是否可用: True
Intel GPU数量: 1
当前设备名称: Intel(R) Arc(TM) A750 Graphics
Resuming from amt/logs\2024\mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b36_nops\checkpoints\last.ckpt
c:\Users\wyh\AppData\Local\Programs\Python\Python311\Lib\site-packages\lightning_fabric\connector.py:571: precision=16 is supported for historical reasons but its usage is discouraged. Please set your precision to 16-mixed instead!
c:\Users\wyh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pytorch_lightning\trainer\connectors\accelerator_connector.py:513: You passed Trainer(accelerator='cpu', precision='16-mixed') but AMP with fp16 is not supported on CPU. Using precision='bf16-mixed' instead.
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Trainer(limit_train_batches=1.0) was configured so 100% of the batches per epoch will be used..
Trainer(limit_val_batches=1.0) was configured so 100% of the batches will be used..
Trainer(limit_test_batches=1.0) was configured so 100% of the batches will be used..
Task: mc13_full_plus_256, Max Shift Steps: 206
"add_melody_metric_to_singing": True
"add_pitch_class_metric": None
"audio_cfg": {'codec': 'spec', 'hop_length': 300, 'audio_backend': 'torchaudio', 'sample_rate': 16000, 'input_frames': 32767, 'n_fft': 2048, 'n_mels': 512, 'f_min': 50.0, 'f_max': 8000.0}
"base_lr": None
"eval_drum_vocab": None
"eval_subtask_key": default
"eval_vocab": None
"init_factor": None
"max_steps": None
"model_cfg": {'encoder_type': 'perceiver-tf', 'decoder_type': 'multi-t5', 'pre_encoder_type': 'conv', 'pre_encoder_type_default': {'t5': None, 'perceiver-tf': 'conv', 'conformer': None}, 'pre_decoder_type': 'mc_shared_linear', 'pre_decoder_type_default': {'t5': {'t5': None}, 'perceiver-tf': {'t5': 'linear', 'multi-t5': 'mc_shared_linear'}, 'conformer': {'t5': None}}, 'conv_out_channels': 128, 't5_basename': 'google/t5-v1_1-small', 'pretrained': False, 'use_task_conditional_encoder': True, 'use_task_conditional_decoder': True, 'd_feat': 128, 'tie_word_embeddings': True, 'vocab_size': 596, 'num_max_positions': 1034, 'encoder': {'t5': {'d_model': 512, 'num_heads': 6, 'num_layers': 8, 'dropout_rate': 0.05, 'position_encoding_type': 'sinusoidal', 'ff_widening_factor': 2, 'ff_layer_type': 't5_gmlp'}, 'perceiver-tf': {'num_latents': 26, 'd_latent': 128, 'd_model': 128, 'num_blocks': 3, 'num_local_transformers_per_block': 2, 'num_temporal_transformers_per_block': 2, 'sca_use_query_residual': True, 'dropout_rate': 0.1, 'position_encoding_type': 'rope', 'attention_to_channel': True, 'layer_norm_type': 'layer_norm', 'ff_layer_type': 'moe', 'ff_widening_factor': 4, 'moe_num_experts': 8, 'moe_topk': 2, 'hidden_act': 'silu', 'rotary_type_sca': 'pixel', 'rotary_type_latent': 'pixel', 'rotary_type_temporal': 'lang', 'rotary_apply_to_keys': False, 'rotary_partial_pe': False, 'rope_partial_pe': True, 'num_max_positions': 110, 'vocab_size': 596}, 'conformer': {'d_model': 512, 'intermediate_size': 512, 'num_heads': 8, 'num_layers': 8, 'dropout_rate': 0.1, 'layerdrop': 0.1, 'position_encoding_type': 'rotary', 'conv_dim': (512, 512, 512, 512, 512, 512, 512), 'conv_stride': (5, 2, 2, 2, 2, 2, 2), 'conv_kernel': (10, 3, 3, 3, 3, 3, 3), 'conv_depthwise_kernel_size': 31}}, 'decoder': {'t5': {'d_model': 512, 'num_heads': 6, 'num_layers': 8, 'dropout_rate': 0.05, 'position_encoding_type': 'sinusoidal', 'ff_widening_factor': 2, 'ff_layer_type': 't5_gmlp'}, 'multi-t5': {'d_model': 512, 'num_heads': 6, 'num_layers': 8, 'dropout_rate': 0.05, 'position_encoding_type': 'sinusoidal', 'ff_widening_factor': 2, 'ff_layer_type': 't5_gmlp', 'num_channels': 13, 'num_max_positions': 1034, 'vocab_size': 596}}, 'feat_length': 110, 'event_length': 1024, 'init_factor': 1.0}
"onset_tolerance": 0.05
"optimizer": None
"optimizer_name": adamwscale
"pretrained": False
"scheduler_name": cosine
"shared_cfg": {'PATH': {'data_home': '../../data'}, 'BSZ': {'train_sub': 12, 'train_local': 24, 'validation': 64, 'test': 16}, 'AUGMENTATION': {'train_random_amp_range': [0.8, 1.1], 'train_stem_iaug_prob': 0.7, 'train_stem_xaug_policy': {'max_k': 3, 'tau': 0.3, 'alpha': 1.0, 'max_subunit_stems': 12, 'p_include_singing': None, 'no_instr_overlap': True, 'no_drum_overlap': True, 'uhat_intra_stem_augment': True}, 'train_pitch_shift_range': [-2, 2]}, 'DATAIO': {'num_workers': 4, 'prefetch_factor': 2, 'pin_memory': True, 'persistent_workers': False}, 'CHECKPOINT': {'save_top_k': 4, 'monitor': 'validation/macro_onset_f', 'mode': 'max', 'save_last': True, 'filename': '{epoch}-{step}'}, 'TRAINER': {'limit_train_batches': 1.0, 'limit_val_batches': 1.0, 'limit_test_batches': 1.0, 'gradient_clip_val': 1.0, 'accumulate_grad_batches': 1, 'check_val_every_n_epoch': 1, 'num_sanity_val_steps': 0}, 'WANDB': {'save_dir': 'amt/logs', 'resume': 'allow', 'anonymous': 'allow', 'mode': 'disabled'}, 'LR_SCHEDULE': {'warmup_steps': 1000, 'total_steps': 100000, 'final_cosine': 1e-05}, 'TOKENIZER': {'max_shift_steps': 206, 'shift_step_ms': 10}}
"task_manager": <utils.task_manager.TaskManager object at 0x0000017107319B50>
"test_optimal_octave_shift": False
"test_pitch_shift_layer": None
"weight_decay": 0.0
"write_output_dir": amt/logs\2024\mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b36_nops
"write_output_vocab": None
Running on local URL: http://127.0.0.1:7861
To create a public link, set share=True in launch().

Everything seems normal up to this point—the Gradio webUI loads and opens correctly. The current stack:

Image

(*正在运行 means Running)

But once I upload an audio file and the GPU starts inference, I observe a brief spike in GPU memory usage, and then within 2 seconds, the script stops running.(The stack remained unchanged during this period.) The terminal output is as follows:

c:\Users\wyh\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchaudio_backend\soundfile_backend.py:71: UserWarning: The MPEG_LAYER_III subtype is unknown to TorchAudio. As a result, the bits_per_sample attribute will be set to 0. If you are seeing this warning, please report by opening an issue on github (after checking for existing/closed ones). You may otherwise ignore this warning.
warnings.warn(
⏰ converting audio: 0m 0s 35.23ms
AssertHandler::printMessage

*Note: It’s not an “Out of memory” issue, because I’ve seen that kind of error before and the output is different.

Apologies—English is not my native language. I used AI to help translate this issue. If there are any mistakes or mistranslations, please let me know. Also, I’m not very familiar with GitHub, so if anything I’ve done here is against the rules or etiquette, please feel free to point it out.

Thank you for reading. I’ve also reported this issue to the PyTorch team, since I’m not sure where the problem lies.

Thanks for your hand working and PLZ don't mind my lack of professionalism.

Versions

The information of being printed in terminal by run the 'collect_env.py':

PyTorch version: 2.7.0+xpu
PyTorch CXX11 ABI: No
IPEX version: 2.7.10+xpu
IPEX commit: 0e47515
Build type: Release

OS: Microsoft Windows 11 Pro (10.0.22621 64-bit)
GCC version: N/A
Clang version: N/A
IGC version: N/A
CMake version: N/A
Libc version: N/A

Python version: 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is XPU available: True
DPCPP runtime: N/A
MKL version: N/A

GPU models and configuration onboard:

  • Intel(R) Arc(TM) A750 Graphics

GPU models and configuration detected:

  • [0] _XpuDeviceProperties(name='Intel(R) Arc(TM) A750 Graphics', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.33890', total_memory=7934MB, max_compute_units=448, gpu_eu_count=448, gpu_subslice_count=56, max_work_group_size=1024, max_num_sub_groups=128, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=0, has_atomic64=1)

Driver version:

  • 32.0.101.6913 (20250621000000.***+)

CPU:
Description: Intel64 Family 6 Model 151 Stepping 2
Manufacturer: GenuineIntel
Name: 12th Gen Intel(R) Core(TM) i5-12490F
NumberOfCores: 6
NumberOfEnabledCore: 6
NumberOfLogicalProcessors: 12
ThreadCount: 12

Versions of relevant libraries:
[pip] dpcpp-cpp-rt==2025.0.5
[pip] intel-cmplr-lib-rt==2025.0.5
[pip] intel-cmplr-lib-ur==2025.0.5
[pip] intel-cmplr-lic-rt==2025.0.5
[pip] intel_extension_for_pytorch==2.7.10+xpu
[pip] intel-opencl-rt==2025.0.5
[pip] intel-openmp==2025.0.5
[pip] intel-pti==0.10.1
[pip] intel-sycl-rt==2025.0.5
[pip] mkl==2025.0.1
[pip] mkl-dpcpp==2025.0.1
[pip] numpy==1.26.4
[pip] onemkl-sycl-blas==2025.0.1
[pip] onemkl-sycl-datafitting==2025.0.1
[pip] onemkl-sycl-dft==2025.0.1
[pip] onemkl-sycl-lapack==2025.0.1
[pip] onemkl-sycl-rng==2025.0.1
[pip] onemkl-sycl-sparse==2025.0.1
[pip] onemkl-sycl-stats==2025.0.1
[pip] onemkl-sycl-vm==2025.0.1
[pip] pytorch-lightning==2.5.2
[pip] pytorch-triton-xpu==3.3.0
[pip] torch==2.7.0+xpu
[pip] torchaudio==2.7.0+xpu
[pip] torchmetrics==1.7.4
[pip] torchvision==0.22.0+xpu
[pip] transformers==4.45.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions