The program which used CUDA for computing in GPU reported error about memory:

terminate called after throwing an instance of 'std::runtime_error'
  what():  [CUDA] an illegal memory access was encountered LightGBM/src/treelearner/cuda_tree_learner.cpp 239

For common C++ program, we use gdb for debugging. For CUDA program, we should use cuda-gdb. Make sure to compile CUDA code with -g flag and then run:

/usr/local/cuda-11.0/bin/cuda-gdb python3
(cuda-gdb) run test.py

After a while, we could see the exact memory corrupt position of the code:

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x1668b2f0 (histogram_16_64_256.cu:182)

Thread 1 "python3" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 10, block (2163,0,0), thread (0,0,0), device 0, sm 0, warp 3, lane 0]
0x000000001668b380 in LightGBM::histogram16<<<(7360,1,1),(16,1,1)>>> () at LightGBM/src/treelearner/kernels/histogram_16_64_256.cu:185
185            feature = (feature >> ((ind & 1) << 2)) & 0xf;