After I changed my dataset for my code, the training failed:
/tmp/pip-req-build-_tx3iysr/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:310: operator(): block: [0,0,0], thread: [59,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed. /tmp/pip-req-build-_tx3iysr/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:310: operator(): block: [0,0,0], thread: [60,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed. /tmp/pip-req-build-_tx3iysr/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:310: operator(): block: [0,0,0], thread: [61,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed. /tmp/pip-req-build-_tx3iysr/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:310: operator(): block: [0,0,0], thread: [62,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed. /tmp/pip-req-build-_tx3iysr/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:310: operator(): block: [0,0,0], thread: [63,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed. Traceback (most recent call last): File "train.py", line 337, in <module> train(args, train_loader, eval_loader) File "train.py", line 189, in train sounds = aug(sounds) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 881, in _call_impl result = self.forward(*input, **kwargs) File "/home/sanbai/birds_sound_classification/utils/augment.py", line 13, in forward image = (image - image.mean()) / image.std() RuntimeError: CUDA error: device-side assert triggered
It’s terribly hard to find out the reason for this common error “RuntimeError: CUDA error: device-side assert triggered”. But someone on Github recommends a method: adding CUDA_LAUNCH_BLOCKING=1
before the program.
Now the real error behind RuntimeError shows up: it’s the wrong number of categories I set to the model.