Handle Inconsistent Response in Miner Api #221

0xForerunner · 2025-05-15T08:01:23Z

Audit findings:

All requests which are to be forwarded should not be treated equally. Specifically, there are eth_ and miner_ requests. If for some reason miner_ succeeds on the L2-EL but fails on the builder (or vice versa), it can result in discrepancy with critical information for consensus, such as GasPrice and GasLimit. This can lead to a block created by the builder with a higher gas limit than what L2-EL expects. As a result, the L2-EL will mark this block invalid and potentially flag all the future builder blocks as invalid invalid as well, effectively disabling builder functionality.

Recommendation(s): Increase the robustness of the forwarding mechanism. The miner_ should be forwarder to the builder only if the call to EL-L2 succeeds. If the builder fails to process the request, the failure should be logged. In addition, the system should set the execution_status to Disabled to prevent further issues.

0xForerunner · 2025-05-15T08:02:38Z

I've set execution mode to fallback here, although I'm not convinced this is the best way to handle this situation. I think setting the health to partial content is probably sufficient. Let me know what you think!

vercel · 2025-05-15T20:25:12Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
rollup-boost	⬜️ Ignored (Inspect)	Visit Preview		May 22, 2025 3:12pm

0xForerunner · 2025-05-15T20:26:56Z

Since #201 was closed I cherry-picked this back onto main.

karankurbur · 2025-05-15T22:32:26Z

src/proxy.rs

+                    builder_client.forward(builder_req, builder_method),
+                    service.l2_client.forward(l2_req, method)
+                );
+                if builder_res.is_ok() != l2_res.is_ok() {


If both builder and l2 are not_ok(), shouldn't we still update execution mode and health?

No because we can return that error to the caller, and they can handle that as they wish.

0xOsiris · 2025-05-16T03:00:09Z

src/proxy.rs

+                    let mut execution_mode = service.execution_mode.lock();
+                    if *execution_mode == ExecutionMode::Enabled {
+                        *execution_mode = ExecutionMode::Disabled;
+                        // Drop before aquiring health lock


The health status here isn't sticky. It will switch back to Healthy on the next health check interval in the HealthHandle if the builders unsafe head is up to date. So, I'm not sure if it's even worth updating the probes unless you can inform the Health handle not to update the health until the DA limits discrepancy is resolved.

Although it will likely trigger a conductor failover as is, so maybe worth it

Ideally the health status here sticks until there is no discrepancy in DA limits between the sequencer, and the builder

I think letting health status revert should be okay, as long as a new leader election is triggered.

src/proxy.rs

ferranbt · 2025-05-16T21:00:10Z

I will check more in detail next week but can we add a unit test to check this behaviour?

0xForerunner · 2025-05-17T01:24:06Z

@ferranbt Yeah will definitely add tests. Was just waiting to get settled on the solution. Just pushed a couple commits and I think this is currently as good as we can do. Added a retry layer into our HTTP client which should hopefully prevent ever running into this situation.

Dzejkop · 2025-05-22T12:58:36Z

src/consistent_request.rs

+        };
+
+        let clone = manager.clone();
+        tokio::spawn(async move {


Consider extracting this future to a separate async function - something like async fn handle_requests

Dzejkop · 2025-05-22T13:06:14Z

src/consistent_request.rs

+        let parts_clone = parts.clone();
+        let body_clone = body.clone();
+        let res_sender_clone = res_tx.clone();
+        tokio::spawn(async move {


the result of this future is immediately awaited on, why are we spawning a task here?

I think I see - is it to isolate the inner part in case of a task abort?

If that's the case maybe we should drop a comment explaining it? Or consider an explicit cancellation token

Dzejkop · 2025-05-22T13:06:24Z

src/consistent_request.rs

+            let mut manager = self.clone();
+            let parts_clone = parts.clone();
+            let body_clone = body.clone();
+            match tokio::spawn(


Same as https://github.com/flashbots/rollup-boost/pull/221/files#r2102515472

Dzejkop · 2025-05-22T13:10:59Z

src/consistent_request.rs

+
+        let has_disabled_execution_mode = Arc::new(AtomicBool::new(false));
+
+        let manager = Self {


consider putting the manager inside an Arc and share the same instance across many tasks

Dzejkop · 2025-05-22T13:32:37Z

src/consistent_request.rs

+        let mut manager = self.clone();
+        let parts_clone = parts.clone();
+        let body_clone = body.clone();
+        let res_sender_clone = res_tx.clone();


unnecessary clone? We could just move res_tx into the task?

avalonche · 2025-05-23T02:23:44Z

tests/miner_set_max_da_size.rs

are there meant to be any assertions on this test? It would be nice to have it test the case where the builder fails to respond to the call but happy to have it as a follow up

avalonche · 2025-05-23T02:37:59Z

src/consistent_request.rs

+        req_rx.mark_unchanged();
+        res_rx.mark_unchanged();
+
+        let has_disabled_execution_mode = Arc::new(AtomicBool::new(false));


why is this initially set to false? shouldn't it be derived from the execution mode?

avalonche · 2025-05-23T02:39:48Z

src/proxy.rs

-                let l2_req = HttpRequest::from_parts(parts, HttpBody::from(body_bytes));
-                info!(target: "proxy::call", message = "forward request to default execution client", ?method);
-                service.l2_client.forward(l2_req, method).await
+                if method == MINER_SET_MAX_DA_SIZE {


would be useful here to add docs why MINER_SET_MAX_DA_SIZE is treated differently here

avalonche · 2025-05-23T02:41:36Z

src/consistent_request.rs

+                Ok(())
+            }
+            Err(e) => {
+                let msg = format!("failed to send request to l2: {e}");


should self.probes.set_health be set to unhealthy here?

avalonche · 2025-05-23T02:42:39Z

src/consistent_request.rs

+                    drop(execution_mode);
+                    self.has_disabled_execution_mode
+                        .store(false, std::sync::atomic::Ordering::SeqCst);
+                    info!(target: "proxy::call", message = "setting execution mode to Enabled");


should self.probes.set_health be reset here if the call failed previously

avalonche · 2025-05-23T02:45:35Z

src/consistent_request.rs

+            Err(e) => {
+                let msg = format!("failed to send request to l2: {e}");
+                res_tx.send(Some(Err(e)))?;
+                Err(eyre!(msg))


error logging would be useful here

avalonche · 2025-05-23T02:45:49Z

src/consistent_request.rs

+            {
+                Ok(_) => return Ok(()),
+                Err(_) => {
+                    tokio::time::sleep(Duration::from_millis(200)).await;


error logging would be useful here

Handle Inconsistent Response in Miner Api

6ea2e32

0xForerunner force-pushed the forerunner/inconsistant-miner-api branch from 8a2b679 to 6ea2e32 Compare May 15, 2025 20:25

0xForerunner changed the base branch from kit/engine-loop to main May 15, 2025 20:26

Fix log message

06ade97

karankurbur reviewed May 15, 2025

View reviewed changes

0xOsiris reviewed May 16, 2025

View reviewed changes

src/proxy.rs Outdated Show resolved Hide resolved

0xForerunner added 4 commits May 16, 2025 14:04

wip

d40f45b

wip

549ff16

retry layer

a8c50cb

just handle miner_setMaxDASize

bd28d7b

0xForerunner added 9 commits May 16, 2025 18:27

rename attempts -> retries

718a46a

wip

ec50813

wip

eb799a7

wip

fb1a531

wip

f07a8a3

wip

d506285

cleanup

2e9e185

remove http retry layer

d689308

fix tests

4a2c45b

0xForerunner requested review from 0xOsiris and ferranbt May 21, 2025 02:48

0xForerunner added 3 commits May 21, 2025 08:06

don't update on retry

11aec74

switch to join handle abort

e2fcd4a

typo

b0b5c5e

Dzejkop reviewed May 22, 2025

View reviewed changes

ferranbt added the audit label May 22, 2025

cleanup match in send_to_l2

28d4616

Dzejkop approved these changes May 22, 2025

View reviewed changes

avalonche reviewed May 23, 2025

View reviewed changes


		let has_disabled_execution_mode = Arc::new(AtomicBool::new(false));

		let manager = Self {

Handle Inconsistent Response in Miner Api #221

Are you sure you want to change the base?

Handle Inconsistent Response in Miner Api #221

Uh oh!

Conversation

0xForerunner commented May 15, 2025

Uh oh!

0xForerunner commented May 15, 2025

Uh oh!

vercel bot commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0xForerunner commented May 15, 2025

Uh oh!

karankurbur May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0xOsiris May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ferranbt commented May 16, 2025

Uh oh!

0xForerunner commented May 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vercel bot commented May 15, 2025 •

edited

Loading

karankurbur May 15, 2025 •

edited

Loading

0xOsiris May 16, 2025 •

edited

Loading