-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[Core] Support Intel GPU #38553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Support Intel GPU #38553
Conversation
|
Please check this PR instead https://github.com/ray-project/ray/pull/36493 |
|
@xwu99 |
|
Also updated previous comments:
|
python/ray/tests/test_basic.py
Outdated
|
|
||
| def test_disable_xpu_devices(): | ||
| script = """ | ||
| import ray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe indent the quoted script:
script= """
import ray .....
LGTM otherwise
abhilash1910
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM- ! Thanks
|
Previous comments are in https://github.com/ray-project/ray/pull/36493 |
OK, reach you on Slack. |
python/ray/_private/resource_spec.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to no change the original format.
python/ray/_private/resource_spec.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to rephrase like: The GPU type in the same node should be the same, but different node can have different types of GPUs.
python/ray/_private/resource_spec.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove redundant comment.
python/ray/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should move the long description up to the first paragraph.
python/ray/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can remove the above block as ONEAPI_DEVICE_SELECTOR already applied to dpctl.
d09cda3 to
b3c1424
Compare
jjyao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you create a test_intel_gpu.py file and create some tests. You can see test_tpu.py as an example.
|
Lint failed: |
328340d to
ca007ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't test anything. Since we didn't mock IntelGPUAcceleratorManager.get_current_node_num_accelerators, both nodes with have Nvidia GPUs.
da1feca to
ce3e7e6
Compare
Signed-off-by: harborn <[email protected]>
Signed-off-by: harborn <[email protected]>
Signed-off-by: harborn <[email protected]>
Signed-off-by: harborn <[email protected]>
Signed-off-by: harborn <[email protected]>
Signed-off-by: harborn <[email protected]>
Signed-off-by: harborn <[email protected]>
Signed-off-by: harborn <[email protected]>
Signed-off-by: harborn <[email protected]>
07ecee3 to
9c417fd
Compare
|
Tests failed on windows |
Signed-off-by: harborn <[email protected]>
Why are these changes needed?
Intel also provide common computing GPUs.
Intel internal benchmark shows that Intel GPU has great performance on LLM train/infer workflow.
This PR aim to support Intel GPU on Ray.
We add two device type as GPU: INTEL_MAX_1550, INTEL_MAX_1100.
This upgrade allows users to use INTEL GPUs almost seamlessly, just like Nvidia’s different GPU devices.
Usage of different GPU type in ray cluster
To use different GPU in ray cluster:
accelerator_typein task/actor options, ray will auto use the only one GPU type.accelerator_typein options, ray will raise ValueError, due to ray can't decide which GPU to run the task/actor.Such as:
The changes include 2 parts:
Upgrades GPU detection process of ray.init
ray.initwill autodetect all kinds of GPUs, current including:The GPUs info will be detected during
ray.init()and stored inresourcesfield in option.Upgrades of ray task or actor
Only one accelerator type in current ray service
Multi accelerator type in current ray service
not specified accelerator type
specified accelerator type
Related issue number
#36493 previous implementation
#37998 auto detect aws accelerator
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.