-
Notifications
You must be signed in to change notification settings - Fork 75
Handle Inconsistent Response in Miner Api #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I've set execution mode to fallback here, although I'm not convinced this is the best way to handle this situation. I think setting the health to partial content is probably sufficient. Let me know what you think! |
8a2b679 to
6ea2e32
Compare
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
|
Since #201 was closed I cherry-picked this back onto main. |
src/proxy.rs
Outdated
| builder_client.forward(builder_req, builder_method), | ||
| service.l2_client.forward(l2_req, method) | ||
| ); | ||
| if builder_res.is_ok() != l2_res.is_ok() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both builder and l2 are not_ok(), shouldn't we still update execution mode and health?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because we can return that error to the caller, and they can handle that as they wish.
src/proxy.rs
Outdated
| let mut execution_mode = service.execution_mode.lock(); | ||
| if *execution_mode == ExecutionMode::Enabled { | ||
| *execution_mode = ExecutionMode::Disabled; | ||
| // Drop before aquiring health lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The health status here isn't sticky. It will switch back to Healthy on the next health check interval in the HealthHandle if the builders unsafe head is up to date. So, I'm not sure if it's even worth updating the probes unless you can inform the Health handle not to update the health until the DA limits discrepancy is resolved.
Although it will likely trigger a conductor failover as is, so maybe worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the health status here sticks until there is no discrepancy in DA limits between the sequencer, and the builder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think letting health status revert should be okay, as long as a new leader election is triggered.
|
I will check more in detail next week but can we add a unit test to check this behaviour? |
|
@ferranbt Yeah will definitely add tests. Was just waiting to get settled on the solution. Just pushed a couple commits and I think this is currently as good as we can do. Added a retry layer into our HTTP client which should hopefully prevent ever running into this situation. |
d6d1d9a to
20538b1
Compare
tests/miner_set_max_da_size.rs
Outdated
| params: Value, | ||
| _result: Value, | ||
| ) -> Pin<Box<dyn Future<Output = Option<Value>> + Send>> { | ||
| println!("method: {:?}", method); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| println!("method: {:?}", method); |
tests/miner_set_max_da_size.rs
Outdated
| // restart the builder | ||
| client.unpause_container(harness.builder.id()).await?; | ||
|
|
||
| // let _res = engine_api.set_max_da_size(val, val).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, TY!
Audit findings:
All requests which are to be forwarded should not be treated equally. Specifically, there are eth_ and miner_ requests. If for some reason miner_ succeeds on the L2-EL but fails on the builder (or vice versa), it can result in discrepancy with critical information for consensus, such as GasPrice and GasLimit. This can lead to a block created by the builder with a higher gas limit than what L2-EL expects. As a result, the L2-EL will mark this block invalid and potentially flag all the future builder blocks as invalid invalid as well, effectively disabling builder functionality.
Recommendation(s): Increase the robustness of the forwarding mechanism. The miner_ should be forwarder to the builder only if the call to EL-L2 succeeds. If the builder fails to process the request, the failure should be logged. In addition, the system should set the execution_status to Disabled to prevent further issues.