Memory Issues with Sparse Vectors in XGBoost4j-Spark: Disabling Sparse-to-Dense Conversion #11467
Replies: 4 comments 2 replies
-
cc @wbo4958 |
Beta Was this translation helpful? Give feedback.
-
Hey @trivialfis and @wbo4958, the lack of proper handling for Can we expect this issue to be fixed in the near future, or should I consider giving up on distributed training for now? |
Beta Was this translation helpful? Give feedback.
-
Before 3.0, xgboost jvm packages allowed sparse vector, but it will result in in-accurate model due to lacking sparse vector in xgboost. So starting from 3.0, we don't use sparse vector any more until we get sparse vector supported in xgboost. |
Beta Was this translation helpful? Give feedback.
-
There's mismatch between traditional sparse matrix and decision trees. For decision trees, 0 is a valid value, not missing. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I’m using sparse vectors with about 10 features out of a possible 50 million. However, the conversion to dense vectors is causing heap exhaustion. Is there a way to disable the sparse-to-dense conversion?
Right now, I can’t even train on a small batch of vectors without running into memory issues, but I ultimately need to train on 200 million rows.
Any help would be greatly appreciated. I’m using XGBoost4j-Spark version 3.0.0 with the Java.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions