One of my team members had accomplished some tests on using GPU for LightGBM training. The result is quite good that GPU could accelerate training speed to 2 times fast.

But this also rises up my interesting about how LightGBM uses GPU for training. Since the GBDT algorithm only use operations like condition checking, data sorting, split point searching etc, it doesn’t have matrix operations which is the strong point of GPU.

Then Jimmy (one of my colleague) send me a paper. This is exactly how LightGBM uses GPU — using GPU for histogram algorithm. The story is: to find the best split point for a feature (or a column of a dataset), LightGBM needs to collect them into bins with different value ranges. This process could be concurrently executed so it could be put into the GPU.

So, GPU could not only used for heavy matrix operation situation but also highly parallel case. Thanks to Jimmy, the paper explained my doubt.

The code for GPU kernel of LightGBM is in three files:

src/treelearner/ocl/histogram16.cl
src/treelearner/ocl/histogram64.cl
src/treelearner/ocl/histogram256.cl

Since it use OpenCL framework to implement, the LightGBM could use both Nvidia and AMD’s GPU to train.