Skip to content

Update/cubecl to client#4866

Open
Charles23R wants to merge 204 commits intomainfrom
update/cubecl-to-client
Open

Update/cubecl to client#4866
Charles23R wants to merge 204 commits intomainfrom
update/cubecl-to-client

Conversation

@Charles23R
Copy link
Copy Markdown
Contributor

Pull Request Template

Checklist

  • Confirmed that cargo run-checks command has been executed.
  • Made sure the book is up to date with changes in this PR.

Related Issues/PRs

Depends on tracel-ai/cubecl#1304

Changes

  • Updates to cubecl's to_client api
  • Updates to cubecl collective ops initialization
  • Updates to fusion server/client

Testing

Unit tests + text-cla + benchmarks
image

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

❌ Patch coverage is 0% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.57%. Comparing base (b23c1d9) to head (ae789a9).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
crates/burn-fusion/src/client.rs 0.00% 5 Missing ⚠️
crates/burn-cubecl/src/tensor/base.rs 0.00% 4 Missing ⚠️
crates/burn-fusion/src/server.rs 0.00% 2 Missing ⚠️
crates/burn-cubecl/src/ops/base.rs 0.00% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (65.57%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4866      +/-   ##
==========================================
- Coverage   65.66%   65.57%   -0.09%     
==========================================
  Files        1153     1154       +1     
  Lines      168981   169569     +588     
==========================================
+ Hits       110953   111202     +249     
- Misses      58028    58367     +339     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +393 to +403
let utilities = self
.server
.utilities()
.downcast::<FusionUtilities>()
.expect("Can downcast to `FusionUtilities`");
let id = CommunicationId::from(device_ids);
if utilities.initialized_comms.read().unwrap().contains(&id) {
self.flush_queue();
let mut initialized_comms = utilities.initialized_comms.write().unwrap();
initialized_comms.insert(id);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to call ensure_collective_init using the inner backend?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to initialize the communication on the first call to a collective operation. Initializing is blocking for the server, so we need to make sure to flush right away so that other devices don't end up stuck on an initialization call from another device.

This is already handled by cubecl, but since fusion adds another layer of streams and asynchronous submits, we also needed to add some logic here to flush the fusion server.

Maybe there is a way to avoid this by design when handling collective calls? We can also chat about this offline as it can be quite complex/confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants