-
Notifications
You must be signed in to change notification settings - Fork 574
Multi-gpu KNN build for UMAP using all-neighbors API #6654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-gpu KNN build for UMAP using all-neighbors API #6654
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hints in Python docs are awesome. But again, there is no choice for a user of UMAP to select multi-GPU build. We must not do this automatically.
I am working on multi-gpu resource on raft side. Currently the plan is to wrap this in pylibraft and advise users to explicitly pass this as the
If nothing is passed to the handle, then the default handle will be used, and this results in a single-gpu run. Do you think there is a better way to let users do a multi-gpu build? Any advice would be greatly appreciated : ) |
@jinsolp thanks for the explanation, that API looks good to me. Opt-in behavior is perfect. We need to be very explicit about the fact that this is not multi-GPU UMAP but rather multi-GPU KNN step in UMAP. That's because users of cuml-dask expect that their data can be distributed across GPUs and we would still be able to create a learned ML model, but in this use-case data will need to be on a single-GPU for other parts of UMAP. |
d2383e9
to
888dd7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single-GPU C++ and Python LGTM, but I'd like @csadorf to assign a Python reviewer as well. For multi-GPU, I'd like to see tests added perhaps in the dask
module to ensure the environment has enough GPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jinsolp! Just two small comments
I would like to review this PR prior to merge. |
[After discussing offline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cpp approval only. LGTM!
@jcrist Can you approve this PR assuming that your concerns were addressed? |
Changed docs (content staying almost same, change in style) in .pyx file because of docs build failure in CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is blocked on some changes from rapidsai/cuvs#944
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A combination of nnd_n_clusters>1
and data_on_host=False
(the default) will currently break user code, because batching is not supported on device.
Agreed mitigation approach is to auto-set data_on_host=True
with a deprecation warning in 25.06 and then require data_on_host=True
in combination with nnd_n_clusters>1
as of 25.08.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a tiny suggestion for language, otherwise LGTM.
/merge |
Description
Allows multi-gpu knn graph building in UMAP using the all-neighbors API.
PRs that need to be merged before this one
device_resources_snmg
raft#2666Changes in cuML UMAP usage
Closes #6729