Update/cubecl to client by Charles23R · Pull Request #4866 · tracel-ai/burn

Charles23R · 2026-04-22T15:54:51Z

Pull Request Template

Checklist

Confirmed that cargo run-checks command has been executed.
Made sure the book is up to date with changes in this PR.

Related Issues/PRs

Depends on tracel-ai/cubecl#1304

Changes

Updates to cubecl's to_client api
Updates to cubecl collective ops initialization
Updates to fusion server/client

Testing

Unit tests + text-cla + benchmarks

codecov · 2026-04-22T17:20:22Z

Codecov Report

❌ Patch coverage is 0% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.57%. Comparing base (b23c1d9) to head (ae789a9).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/burn-fusion/src/client.rs	0.00%	5 Missing ⚠️
crates/burn-cubecl/src/tensor/base.rs	0.00%	4 Missing ⚠️
crates/burn-fusion/src/server.rs	0.00%	2 Missing ⚠️
crates/burn-cubecl/src/ops/base.rs	0.00%	1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (65.57%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4866      +/-   ##
==========================================
- Coverage   65.66%   65.57%   -0.09%     
==========================================
  Files        1153     1154       +1     
  Lines      168981   169569     +588     
==========================================
+ Hits       110953   111202     +249     
- Misses      58028    58367     +339

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

nathanielsimard · 2026-04-22T17:46:31Z

+        let utilities = self
+            .server
+            .utilities()
+            .downcast::<FusionUtilities>()
+            .expect("Can downcast to `FusionUtilities`");
+        let id = CommunicationId::from(device_ids);
+        if utilities.initialized_comms.read().unwrap().contains(&id) {
+            self.flush_queue();
+            let mut initialized_comms = utilities.initialized_comms.write().unwrap();
+            initialized_comms.insert(id);
+        }


Would it be possible to call ensure_collective_init using the inner backend?

We need to initialize the communication on the first call to a collective operation. Initializing is blocking for the server, so we need to make sure to flush right away so that other devices don't end up stuck on an initialization call from another device.

This is already handled by cubecl, but since fusion adds another layer of streams and asynchronous submits, we also needed to add some logic here to flush the fusion server.

Maybe there is a way to avoid this by design when handling collective calls? We can also chat about this offline as it can be quite complex/confusing.

Charles23R added 30 commits April 2, 2026 09:42

local dev

62ee896

text-cla params

8474db3

sync_collective timing

df3dd39

print model

76ed1d3

add random test

70637de

cleanup 1

c5f9c94

Merge remote-tracking branch 'origin/main' into improve-ddp

03ca438

cleanup 2

720bb2d

update cubecl versions

a5436ef

Merge branch 'main' into improve-ddp

902cfff

lock

d2f4b13

change all_reduce op api

efe4175

Merge branch 'improve-ddp' into fusion-all-reduce

e307573

stream swap thing

3c1631e

test smtg fusion

7d901b6

async collective api

ec712c7

fusion distributed with new api

0096756

swap stream thing in fusion

1d8616f

streamid executes

6c94c36

dispatch update + all_red backend tests

10783da

test tolerance

4fe1216

flatten

d93ea87

add multithread test all_reduce

9216ef7

Merge remote-tracking branch 'origin/main' into fusion-all-reduce

605de72

Merge remote-tracking branch 'origin/main' into fusion-all-reduce

5f452d5

fix change_server_float

b5c7cda

stuff to test

c698b80

forgot

a901ab2

test prints

3b2bb74

remove streamid executes in fusion

d039094

Charles23R added 24 commits April 20, 2026 16:42

cargo file

165d30d

update revs

0bc8949

import

6f9fc67

fusion client changes

d6f0dac

fix test

d9905ff

update all_reduce test

2483a4a

update all_reduce test

fb4e865

update all_reduce test

667c17f

add prints

0571696

flag

aa62337

error

416ef19

local dev

d188f81

go back to original test

1b2e590

Merge remote-tracking branch 'origin/main' into send-recv-test

a6849a8

rev + flex

f59a8ad

distributed ndarray

7da632b

Merge remote-tracking branch 'origin/main' into send-recv-test

3e2ba69

rev

d6e823e

call to ensure init

a133efa

prints

7a78e79

debug lock

f846a09

remove prints

27bda6b

cleanup

cd7a064

typos

4b8edbc

Charles23R requested review from laggui and nathanielsimard April 22, 2026 16:15

nathanielsimard reviewed Apr 22, 2026

View reviewed changes

Charles23R added 2 commits April 22, 2026 16:17

update rev

89eb0a4

update revs

ae789a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update/cubecl to client#4866

Update/cubecl to client#4866
Charles23R wants to merge 204 commits intomainfrom
update/cubecl-to-client

Charles23R commented Apr 22, 2026

Uh oh!

codecov Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

nathanielsimard Apr 22, 2026

Uh oh!

Charles23R Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Charles23R commented Apr 22, 2026

Pull Request Template

Checklist

Related Issues/PRs

Changes

Testing

Uh oh!

codecov Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nathanielsimard Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Charles23R Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Apr 22, 2026 •

edited

Loading