-
Notifications
You must be signed in to change notification settings - Fork 798
[SYCL] optimized calling UR functions #20776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
steffenlarsen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM! Just some small suggestions.
b13c83a to
c865b9f
Compare
ldorau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UR changes LGTM
|
@intel/llvm-gatekeepers , please merge
|
|
Looks like the UR testing is still running. Are they thought to be stuck or can we wait for them to finish? |
I don't see it running now. It was running in the morning and I was told by devops it is stuck. There is end of a day so I think it is not worth to wait more for it. |
Change summary
checkUrResult function was made as short as possible by moving away all error handling to a separate non-inlined function and keeping common success path tiny and additionally optimized by compiler __builtin_expect. Also a few more simple inlines were added.
Performance impact
checkUrResult function is widely used in all paths including common path. My optimization positive impact on performance is visible in almost all metrics.
Some examples are below:
SYCL instructions reduced from 153.9k to 151.8k over UR (130.9), that is overhead over UR reduced by 9.1%, see:

SYCL instructions reduced from 133.5k to 132.2k over UR (119.2), that is overhead over UR reduced by 9.1%, see:

SYCL instructions reduced from 145.9k to 143.9k over UR (118.4), that is overhead over UR reduced by 7.3%, see:

SYCL time reduced from 202.1 to 194.6 over UR (155.0), that is overhead over UR reduced by 15.9%, see:
