I just received an email from NVIDIA about their RAPIDS. Although the cuDF and cuML look fantastic for a data scientist. I am still doubtful about them.
In our daily work, we usually process small DataFrame by Pandas, so cuDF will be too expensive since it needs GPU. And even we need to join two large DataFrame, we tend to use BigQuery, for it’s distributed and relatively cheap. The only proper case for cuDF I think is some heavy operations on less than 8GB data. Who need so many heavy operations on a DataFrame? I don’t know.
For cuML, it’s more like a GPU version scikit-learn. Actually, for tabular data we use XGBoost/LightGBM, for non-structure-data we use PyTorch/Tensorflow. Who will even use scikit-learn? Not even mention the cuML.