Skip to content

Enable trace methods#2294

Open
tmeinlschmidt wants to merge 18 commits into
0xPolygon:developfrom
tmeinlschmidt:enable-trace-methods
Open

Enable trace methods#2294
tmeinlschmidt wants to merge 18 commits into
0xPolygon:developfrom
tmeinlschmidt:enable-trace-methods

Conversation

@tmeinlschmidt

Copy link
Copy Markdown

Summary

Adds a Parity/OpenEthereum-compatible trace_* RPC namespace to Bor, targeting archive-node operators who need OpenEthereum-style execution traces for indexing, debugging, and MEV tooling. Six methods are implemented: trace_block, trace_transaction, trace_replayTransaction, trace_replayBlockTransactions, trace_call, and trace_callMany — covering all three Parity trace types (trace, stateDiff, vmTrace). Output is validated byte-identical against a live Polygon Erigon v3.3.7 reference (Chainstack) across 520 random mainnet samples spanning the full chain history: 499/520 identical (95.96%); all mined-tx methods (trace_transaction, trace_replayTransaction, trace_replayBlockTransactions) reach 120/120 (100%). The remaining 21 diffs are two accepted gaps documented below. The namespace is opt-in only via --rpc.enabletrace (default off) so existing nodes are unaffected.

Executed tests

  • Unit tests: go test -race ./eth/tracers/... — all pass. Table-driven tests cover parity trace conversion, stateDiff encoding (created/deleted/modified/storage), vmTrace push and mem regions, gas accounting, and the feeless sender correction.
  • 520-sample validation campaign: scripts_trace_validate.py drew random blocks across the full mainnet chain and compared every method's output byte-for-byte against Chainstack Polygon Erigon v3.3.7. Results:
Method Identical Total
trace_transaction 120 120
trace_replayTransaction 120 120
trace_replayBlockTransactions 40 40
trace_block ~79/80* 80
trace_call / trace_callMany ~140/160* 160

* trace_block diffs = pre-Madhugiri state-sync gap (see Rollout notes). trace_call diffs = underfunded-sender bailout gap + 3 Chainstack-side "first run for tx" errors.

  • Manual spot-checks: blocks 0x4cf4912, 0x2277c00, 0x3545626, 0x4acf9d3 inspected individually with jq diff vs reference for edge cases (SELFDESTRUCT, CREATE revert, precompile calls, state-sync tx).

Rollout notes

Not consensus-affecting. All new code is in eth/tracers/ (RPC layer only); no changes to block production, state transition, or consensus. Existing call sites (debug_traceTransaction, eth_call, etc.) are byte-for-byte unchanged.

Backwards-compatible. Opt-in only. The trace namespace is not registered unless --rpc.enabletrace is set (or JsonRPC.EnableTrace = true in TOML). No operator impact without the flag.

Requires archive node. --gcmode archive is a hard prerequisite; pruned nodes will return errors on historical state lookups.

Recommended companion flags:

--rpc.enabletrace --rpc.evmtimeout 60s --gcmode archive

Known gaps (documented, not bugs):

  1. trace_block omits the state-sync pseudo-tx for pre-Madhugiri blocks (< 80,084,800 mainnet). Post-Madhugiri (all recent blocks) is correct.
  2. trace_call/trace_callMany errors on underfunded simulated senders instead of returning a bailout trace (gas bailout deferred — requires changes to core/state_transition.go).

Tom Meinlschmidt added 4 commits July 1, 2026 13:48
Add .env/.env.local (the local validation RPC endpoint lives there and is never
committed) and CLAUDE.local.md to .gitignore.
Implement trace_block, trace_transaction, trace_call, trace_callMany,
trace_replayTransaction and trace_replayBlockTransactions in the
Parity/OpenEthereum format, with trace, stateDiff and vmTrace outputs.

Output is validated byte-for-byte against Polygon erigon across the block,
transaction and replay methods, including Bor specifics: real coinbase via
CalculateCoinbase, base-fee burn to the burnt contract, exact intrinsic gas
and gross gasUsed, precompile-call filtering, and state-sync transaction
tracing. Includes unit and RPC integration tests.
The Parity trace_* methods are expensive and archive-oriented, so they are not
exposed by default. Add the rpc.enabletrace flag (default off); when it is unset
the trace namespace is not registered.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@pratikspatil024 pratikspatil024 requested a review from Copilot July 1, 2026 12:12
@pratikspatil024

Copy link
Copy Markdown
Member

@claude review once

Comment thread eth/tracers/parity_statediff.go Fixed
Comment thread eth/tracers/parity_statediff.go Fixed

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in Parity/OpenEthereum-compatible trace_* RPC namespace to Bor’s tracer stack, aimed at archive-node operators who need Parity-style execution traces (trace/stateDiff/vmTrace) for indexing and debugging.

Changes:

  • Introduces trace_* RPC methods via a new trace namespace, gated by --rpc.enabletrace / enabletrace config.
  • Implements Parity-compatible trace conversion, stateDiff, and opcode-level vmTrace, plus extensive unit tests.
  • Updates operator-facing documentation, default config testdata, version patch, and gitignore.

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
README.md Documents the new opt-in trace_* namespace and operational requirements.
params/version.go Bumps patch version.
internal/cli/server/testdata/default.toml Adds enabletrace default value in JsonRPC test config.
internal/cli/server/server.go Registers trace RPC APIs only when EnableTrace is set.
internal/cli/server/flags.go Adds --rpc.enabletrace CLI flag.
internal/cli/server/config.go Adds JsonRPC.EnableTrace config field and default.
eth/tracers/parity.go Adds Parity replay result types and mined-tx trace implementation.
eth/tracers/parity_vmtrace.go Adds Parity-compatible opcode-level vmTrace tracer implementation.
eth/tracers/parity_vmtrace_values_test.go Adds targeted vmTrace value assertions (push/mem).
eth/tracers/parity_vmtrace_test.go Adds end-to-end vmTrace tests for contract + EOA cases.
eth/tracers/parity_test.go Adds unit tests for traceTypes parsing and metadata stripping.
eth/tracers/parity_statediff.go Implements Parity-compatible stateDiff conversion and feeless adjustments.
eth/tracers/parity_statediff_test.go Adds unit + end-to-end tests for stateDiff behavior.
eth/tracers/parity_replay_tx.go Implements trace_replayTransaction.
eth/tracers/parity_replay_tx_test.go Tests trace_replayTransaction behavior and errors.
eth/tracers/parity_replay_block.go Implements trace_replayBlockTransactions.
eth/tracers/parity_replay_block_test.go Tests trace_replayBlockTransactions behavior and errors.
eth/tracers/parity_gas_test.go Locks gas accounting semantics against expected Parity/Erigon behavior.
eth/tracers/parity_calltracer.go Adds a wrapper tracer to capture post-execution gas-left for gross gasUsed.
eth/tracers/parity_call.go Implements trace_call and trace_callMany.
eth/tracers/parity_call_test.go Adds tests for trace_call / trace_callMany and input decoding.
eth/tracers/api.go Adds Parity trace types, TraceAPI registration, and trace_block parity implementation; adjusts default tracer timeout.
eth/tracers/api_test.go Adds parity trace tests and RPC registration tests; adjusts a global-directory test to avoid parallel races.
.gitignore Ignores local env secret files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
`trace_callMany`). These methods are **disabled by default** because they are
expensive and require an archive node.

Enable with the `--rpc.enabletrace` flag (or `JsonRPC.EnableTrace = true` in TOML):
Comment thread eth/tracers/api.go Outdated
Comment on lines +53 to +57
// by default before being forcefully aborted. Parity-style trace_* replay of
// heavy historical transactions (deep account-abstraction call trees) can
// exceed a few seconds, so the default is generous; the trace namespace is
// opt-in (--rpc.enabletrace) and intended for archive nodes.
defaultTraceTimeout = 60 * time.Second
Comment thread eth/tracers/api.go Outdated
Comment on lines +1720 to +1722
last := addr[common.AddressLength-1]
if last < 0x01 || last > 0x0a {
return false
Comment thread eth/tracers/api.go Outdated
Comment on lines +1620 to +1623
// Result / error follow erigon semantics:
// - success: result present, no error
// - revert: result present (gasUsed + revert output) AND error "Reverted"
// - other error (out of gas, ...): result omitted, error = raw EVM string
Comment thread README.md
Comment on lines +63 to +66
```toml
[JsonRPC]
EnableTrace = true
```
Comment thread eth/tracers/parity_vmtrace.go
Comment thread internal/cli/server/testdata/default.toml
@tmeinlschmidt

Copy link
Copy Markdown
Author

checking on the issues and will push changes after then, thanks

@tmeinlschmidt

Copy link
Copy Markdown
Author

@BlockReq-AmyPan yes, addressed in my last commits (including tests). Gonna check sonar issues now

Tom Meinlschmidt added 3 commits July 1, 2026 15:27
- Extract parityFrameCtx struct (9-param fn → 3-param), fixing param count warning
- Split TestConvertCallFrameToParityTraces (298 lines, complexity 91) into 9 focused per-scenario tests
- Add mustConvertFrame/parityTestCtx helpers to reduce test boilerplate duplication
- Extract anonymous structs in parity_vmtrace_values_test.go as named types
- Add explanatory comment inside OnTxEnd empty function body
…layResult

- Add parityExecInput struct bundling (tx, msg, txctx, vmctx, statedb, config)
  so the 7-param sequence is not repeated in parityTraceTx, parityStateDiffFor
  and parityVMTraceFor signatures
- Add withStateCopy() helper for pre-execution state snapshots
- Extract buildReplayResult() helper: the stateDiff/vmTrace/trace dispatch
  block was repeated in ReplayTransaction, replayBlockTransactions, and
  traceCallExec; now lives in one place
- Update all three callers to use buildReplayResult
trace_block (traceBlockParityByHash) and trace_replayBlockTransactions
(replayBlockTransactions) each carried an identical copy of the block
pre-state setup and per-transaction message/context construction, which
SonarCloud flagged as duplicated new code.

Extract the shared logic into parityBlockExec plus setupParityBlockExec
(parent state + beacon-root/parent-hash system calls) and txInput
(message + per-tx Context). Both methods now build on the single copy;
the diverging per-tx loop bodies (flatten vs ReplayResult) remain local.

Also apply gofmt to api_test.go.
@tmeinlschmidt tmeinlschmidt force-pushed the enable-trace-methods branch from 8fae1ad to b99944c Compare July 2, 2026 13:26
Tom Meinlschmidt added 2 commits July 3, 2026 10:00
Move all Parity trace_* additions out of api.go into dedicated files
(parity_block.go: types, TraceAPI, trace_block; parity_convert.go: call
frame conversion), returning api.go to its upstream size. TraceTransaction
and transactionParity now share canonicalTxTraceEnv, removing their
duplicated mined-tx setup (SonarCloud duplication).

Split the four functions over the cognitive-complexity threshold:
convertCallFrameToParityTraces into per-concern builders (frame accessors,
trace type, action, result, error), buildParityStateDiff into per-account-
state helpers (deleted/created/modified), parityTraceTx into config/decode/
gas/context helpers, and vmTraceMemRegion into a table-driven opcode spec.

Add unit tests for the extracted helpers (mem-region table, push counts,
tracer interrupt/sub-frame/mem capture, stateDiff field encoding, root gas
boundary, callMany tuple decoding, block resolution errors, feeless gas
debit) to lift the diffguard mutation score above the tier thresholds.
The parity test files each carried a near-identical copy of the chain
setup boilerplate (funded accounts, per-block value transfer, backend
construction), which SonarCloud flagged as duplicated new code.

Consolidate them in parity_harness_test.go: newTransferChainAPI (value-
transfer chain), newPrefundedContractAPI (genesis-deployed contract call),
replayVMTraceFrame (vmTrace replay + decode) and newRegisteredRPCServer
(in-proc RPC registration). Test assertions are unchanged.
@sonarqubecloud

sonarqubecloud Bot commented Jul 3, 2026

Copy link
Copy Markdown

@adamdossa

Copy link
Copy Markdown
Contributor

@claude review once

Comment on lines +86 to +103
func (api *API) TraceBlockParity(ctx context.Context, number rpc.BlockNumber, config *TraceConfig) ([]*ParityTrace, error) {
block, err := api.blockByNumber(ctx, number)
if err != nil {
return nil, fmt.Errorf("failed to get block: %w", err)
}

return api.traceBlockParityByHash(ctx, block.Hash(), config)
}

// TraceBlockParityByNumber returns Parity-format traces for a block by number.
// This method is exposed as debug_traceBlockParityByNumber in the RPC API.
func (api *API) TraceBlockParityByNumber(ctx context.Context, number rpc.BlockNumber, config *TraceConfig) ([]*ParityTrace, error) {
return api.TraceBlockParity(ctx, number, config)
}

// TraceBlockParityByHash returns Parity-format traces for a block by hash.
// This method is exposed as debug_traceBlockParityByHash in the RPC API.
func (api *API) TraceBlockParityByHash(ctx context.Context, hash common.Hash, config *TraceConfig) ([]*ParityTrace, error) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The three block-level Parity methods (TraceBlockParity, TraceBlockParityByNumber, TraceBlockParityByHash at eth/tracers/parity_block.go:86-108) are defined on *API rather than *TraceAPI, so they are registered under the debug namespace unconditionally via tracers.APIs() at server.go:296 — i.e., they are reachable as debug_traceBlockParity[ByNumber|ByHash] without setting --rpc.enabletrace. The five tx/call-level methods (Transaction, Call, CallMany, ReplayTransaction, ReplayBlockTransactions) live on *TraceAPI and are correctly gated. This contradicts the PR summary's claim that "the namespace is opt-in only via --rpc.enabletrace" — strictly, only the trace namespace is opt-in. Practical exposure is limited (debug is not in the default --http.api list), and the behavior appears intentional (doc comments name the debug_ methods explicitly and TestDebugTraceBlockParityRPCRegistration at api_test.go:1788 asserts the debug registration), so this is a documentation/consistency inconsistency rather than a code defect. Consider either moving the three methods to *TraceAPI (dropping the debug alias) or updating the README to note that block-level Parity traces are also reachable via debug.

Extended reasoning...

What's happening

Three of the six Parity-format trace operations are exposed via two RPC surfaces, and only one of them is gated by --rpc.enabletrace:

  • func (api *API) TraceBlockParity(...) at eth/tracers/parity_block.go:86debug_traceBlockParity
  • func (api *API) TraceBlockParityByNumber(...) at eth/tracers/parity_block.go:97debug_traceBlockParityByNumber
  • func (api *API) TraceBlockParityByHash(...) at eth/tracers/parity_block.go:103debug_traceBlockParityByHash

These are methods on *API. In internal/cli/server/server.go:296, tracers.APIs(srv.backend.APIBackend) is registered unconditionally, and tracers.APIs returns *API under the debug namespace. Only after this does the code check config.JsonRPC.EnableTrace (server.go:302) to conditionally register tracers.TraceAPIs — which returns *TraceAPI under the trace namespace. Since Go's RPC framework reflects on the receiver and exports every RPC-compatible exported method, any exported method on *API becomes debug_<lowerFirstName> automatically.

The other five Parity operations (Transaction, Call, CallMany, ReplayTransaction, ReplayBlockTransactions) are methods on *TraceAPI, so they only appear when --rpc.enabletrace is set. The gating is therefore inconsistent for the block-level trio.

Why this contradicts the PR summary

The PR summary states: "The namespace is opt-in only via --rpc.enabletrace (default off) so existing nodes are unaffected." Strictly, only the trace namespace is opt-in. An operator who exposes the debug namespace (e.g., --http.api=eth,net,web3,debug on an archive node) gets three block-level Parity trace operations without ever setting --rpc.enabletrace, and would reasonably not expect them based on the summary.

Step-by-step proof

  1. Start bor with the debug namespace exposed and --rpc.enabletrace unset (or absent): bor server --http --http.api=eth,net,web3,debug.
  2. tracers.APIs(...) runs at server.go:296 → registers a {Namespace: "debug", Service: NewAPI(backend)} entry.
  3. The subsequent if config.JsonRPC.EnableTrace guard at server.go:302 is skipped, so tracers.TraceAPIs is not registered — the trace namespace is absent (as intended).
  4. Go's RPC framework reflects on the *API service and, because TraceBlockParity(ctx, number, config) has an RPC-compatible signature (returns ([]*ParityTrace, error)) and starts with an uppercase letter, exports it as debug_traceBlockParity. Same for the ByNumber/ByHash variants.
  5. A call to debug_traceBlockParity" 0x1 succeeds (or returns an execution error), whereas trace_block "0x1" returns method-not-found. The test TestDebugTraceBlockParityRPCRegistration at api_test.go:1788 baselines exactly this behavior (registers only APIs, asserts debug_traceBlockParity is present).

Why the refutations are correct on intent but the finding still stands

The refuting verifiers point out — accurately — that this is deliberate:

  • Each of the three method's doc comments explicitly says "This method is exposed as debug_traceBlockParity in the RPC API." (parity_block.go:85, 96, 102).
  • TestDebugTraceBlockParityRPCRegistration at api_test.go:1788 exercises the default (APIs) API set and asserts that debug_traceBlockParity is registered — this is a positive assertion of the debug exposure.
  • The debug namespace is off by default (default --http.api" = "eth,net,web3,txpool,bor"), and it already contains equally expensive tracers (debug_traceBlock, debug_traceTransaction, etc.).

Given all three signals — comments, tests, and existing debug tracers — this is intentional dual-exposure, not a bug. The remaining concern is purely a documentation/consistency one: the PR summary's blanket "opt-in only" claim is misleading for the three block-level methods.

Impact

  • No new attack surface: operators already accept expensive tracers when they enable the debug namespace, and debug_traceBlock is comparable in cost.
  • No functional regression: the code paths and outputs match the gated trace_block path.
  • The only real risk is operator surprise: an archive operator reading the PR summary may not realize that block-level Parity traces are already reachable via debug if they enable that namespace for any other reason.

Suggested fix (either is fine)

  1. Move the three methods to *TraceAPI, dropping the debug_ alias. This makes gating uniform across all six Parity methods and matches the PR summary.
  2. Keep the current design and update the README's rollout notes to mention that the three block-level methods are additionally reachable as debug_traceBlockParity[ByNumber|ByHash] for operators who cannot flip --rpc.enabletrace. Update TestDebugTraceBlockParityRPCRegistration's comment to spell out why the debug exposure is intentional.

Comment thread README.md
Comment on lines +67 to +70

**Requirements:**
- Archive node (`--gcmode archive`)
- Sufficient RPC timeout for deep blocks (`--rpc.evmtimeout` ≥ 60s recommended)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The README's --rpc.evmtimeout ≥ 60s recommended line under the trace namespace is misleading: --rpc.evmtimeout maps to RPCEVMTimeout, which is only read by eth_call / eth_simulate / graphql — never by anything under eth/tracers/. Setting it has no effect on trace_* methods (they use their own defaultParityTraceTimeout = 60s internally). Consider dropping the line entirely (the 60s trace-phase default already gives operators what the README claims to want), or replacing it with the knob that actually bounds a long trace call's wall-clock — the HTTP server write timeout (--http.timeout.write, default 30s).

Extended reasoning...

What is wrong

README.md:69 tells operators enabling the new Parity trace namespace to set --rpc.evmtimeout ≥ 60s. That flag is wired to ethconfig.RPCEVMTimeout and is consumed only by:

  • internal/ethapi/api.go (DoCalleth_call, eth_estimateGas)
  • internal/ethapi/simulate.go (eth_simulate)
  • graphql/graphql.go

grep -r RPCEVMTimeout eth/tracers/ returns zero hits. Neither debug_trace* nor the new trace_* methods read it.

How the trace path actually times out

The tracers ship their own constants:

  • eth/tracers/api.go:54defaultTraceTimeout = 5 * time.Second (used by debug_trace* when the caller does not set TraceConfig.Timeout).
  • eth/tracers/parity.go:28defaultParityTraceTimeout = 60 * time.Second (used by parityTraceConfig for every trace_* call, since the Parity RPC methods do not expose Timeout to the caller).

So the trace phase is already capped at 60s internally, regardless of --rpc.evmtimeout. The only external knob that governs a long trace_* call is the HTTP server write timeout (--http.timeout.write, default 30s), which will cut the RPC response off well before the 60s tracer cap on default settings.

Step-by-step proof

  1. Operator follows README: bor server --rpc.enabletrace --rpc.evmtimeout 60s --gcmode archive.
  2. --rpc.evmtimeout 60s sets JsonRPC.RPCEVMTimeout = 60s in the config, threaded into ethconfig.Config.RPCEVMTimeout.
  3. Operator calls trace_replayBlockTransactions for a heavy historical block.
  4. Request routes to TraceAPI.ReplayBlockTransactionsreplayBlockTransactionsbuildReplayResultparityTraceTxtraceTx(cfg = parityTraceConfig(userCfg)).
  5. In parityTraceConfig (parity.go:184), cfg.Timeout is set to defaultParityTraceTimeout.String() (60s). RPCEVMTimeout is not read anywhere on this path.
  6. In traceTx the timeout comes from cfg.Timeout — the 60s constant — not from api.b.RPCEVMTimeout().
  7. Meanwhile the HTTP handler enforces its own write timeout (default 30s, jsonrpc.timeouts.write); if the reply is not written in time the client sees a broken connection well before the 60s tracer cap.

The operator has set a flag that changes literally nothing about their trace timeouts.

Impact

Documentation-only. The trace namespace works correctly out of the box because the internal 60s default is already what the README claims to be recommending. Nothing crashes, no wrong data is returned; the advice is simply a no-op for the namespace it targets. The practical harm is that an operator debugging a hung trace_* call will look at the flag they set (--rpc.evmtimeout) instead of the flag that actually matters (--http.timeout.write).

How to fix

Two good options in README.md (and in the PR description, which repeats the same line under "Recommended companion flags"):

  • Drop the --rpc.evmtimeout bullet entirely. The internal defaultParityTraceTimeout = 60s already gives the intended behavior.
  • Replace it with --http.timeout.write (default 30s), which is the real ceiling on a long trace_* call. Suggested wording: "Sufficient HTTP write timeout for deep blocks (--http.timeout.write ≥ 60s recommended, since heavy trace_replay* calls can take longer than the 30s default before the internal 60s trace cap trips)."

Severity

nit — documentation error, not a code-correctness bug. Merging as-is does not break the feature. Worth fixing before release notes go out so operators are not chasing a no-op flag.

Comment on lines +239 to +248
func (api *API) parityStateDiffFor(ctx context.Context, in parityExecInput, feeless bool) (parityStateDiff, error) {
prestate := "prestateTracer"
cfg := &TraceConfig{
Tracer: &prestate,
TracerConfig: json.RawMessage(`{"diffMode":true}`),
}
if in.config != nil {
cfg.Reexec = in.config.Reexec
cfg.Timeout = in.config.Timeout
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The Parity trace namespace applies a 60s per-tx timeout (defaultParityTraceTimeout) only inside parityTraceTx via parityTraceConfig. The stateDiff phase at eth/tracers/parity_statediff.go:239-248 and the vmTrace phase at eth/tracers/parity_vmtrace.go:446-452 build their own TraceConfig and only copy Timeout when in.config.Timeout is non-nil — but every trace_* RPC entry point (ReplayTransaction, replayBlockTransactions, traceCallExec, Block) constructs parityExecInput with a nil config, so those two phases fall through to defaultTraceTimeout = 5s. Effect: trace_replayTransaction(hash, ["trace","stateDiff","vmTrace"]) on a heavy tx passes the 60s trace phase but times out on the heavier stateDiff/vmTrace phases at 5s — the very case the 60s default was introduced for. RPC signatures do not accept a caller Timeout, so there is no workaround. Fix: reuse parityTraceConfig's default-timeout logic in both helpers (or thread all three configs through one builder) so all phases share the 60s ceiling.

Extended reasoning...

What's wrong

parityTraceConfig at eth/tracers/parity.go:211-227 is explicit about the design intent — its comment states "Use the Parity-specific (longer) timeout when the caller hasn't set one explicitly. This avoids inflating defaultTraceTimeout for debug_trace callers."* — and it applies the 60s defaultParityTraceTimeout when cfg.Timeout == nil. That default is applied only where parityTraceConfig is called, which is only inside parityTraceTx.

The two peer phases build their own configs without reusing this helper:

  • parityStateDiffFor (eth/tracers/parity_statediff.go:239-248) — cfg := &TraceConfig{Tracer: &prestate, TracerConfig: ...}; then if in.config != nil { cfg.Reexec = in.config.Reexec; cfg.Timeout = in.config.Timeout }. No fallback if the copied Timeout is nil.
  • parityVMTraceFor (eth/tracers/parity_vmtrace.go:446-452) — same shape.

When either in.config or in.config.Timeout is nil, traceTx at eth/tracers/api.go:1039-1042 falls back to defaultTraceTimeout = 5 * time.Second.

Every RPC path leaves in.config nil

  • trace_replayTransactioneth/tracers/parity_replay_tx.go:60: in := parityExecInput{tx: tx, msg: msg, txctx: txctx, vmctx: vmctx, statedb: statedb} (no config).
  • trace_replayBlockTransactionseth/tracers/parity_replay_block.go:73: same shape, no config.
  • trace_call / trace_callManyeth/tracers/parity_call.go:176: in := parityExecInput{tx: tx, msg: msg, txctx: new(Context), vmctx: blockCtx, statedb: statedb} (no config).
  • trace_block sets config on parityExecInput in parity_block.go:145, but the RPC-visible Block method at parity_block.go:69 calls TraceBlockParity(ctx, number, nil), so the config is nil in practice.

The RPC method signatures do not accept a TraceConfig, so a caller cannot override the timeout to work around this.

Step-by-step proof

  1. Operator calls trace_replayTransaction(0xdeadbeef, ["trace","stateDiff","vmTrace"]) on a heavy historical tx (deep account-abstraction call tree) whose real execution costs, say, ~10s.
  2. ReplayTransaction builds in with config: nil and calls buildReplayResult(..., set, false, false).
  3. buildReplayResult runs the requested tracers in order. stateDiff is first: parityStateDiffFor executes with cfg.Timeout == nil. traceTx sees a nil timeout and sets timeout = defaultTraceTimeout = 5s (api.go:1039-1042). After 5s the deadline fires, tracer.Stop(errors.New("execution timeout")) runs, and traceTx returns "tracing failed: execution timeout".
  4. buildReplayResult propagates the error; the whole RPC call returns an error with no result — even though the trace phase (had it run) would have completed within the intended 60s budget.

Why this is a real gap, not just a doc issue

defaultParityTraceTimeout's own comment says "Parity-style replay of heavy historical transactions (deep account-abstraction call trees) can take well over 5s". vmTrace is the heaviest of the three tracers (it captures every opcode's stack/memory/storage), so it's ironically the one most likely to hit the 5s ceiling — the opposite of the design.

Impact

  • Non-consensus, opt-in namespace (--rpc.enabletrace, off by default).
  • Deterministic error (context.DeadlineExceeded), not silent bad data — an operator gets a clear failure.
  • No workaround from the RPC surface: the three methods don't accept a caller-supplied Timeout.
  • The failure mode contradicts the PR's stated 60s design contract for heavy traces.

Fix

Replace the ad-hoc cfg := &TraceConfig{...} in both helpers with a call to a small shared builder (analogous to parityTraceConfig) that sets Tracer/TracerConfig for the specific tracer and applies the same default-timeout logic. One-liner alternative in each site: after copying from in.config, add

if cfg.Timeout == nil {
    s := defaultParityTraceTimeout.String()
    cfg.Timeout = &s
}

so all three phases share the 60s ceiling.

Comment on lines +172 to +180
parent := t.stack[len(t.stack)-1]
prefix := t.rootPrefix
if parent.pending != nil {
// Attach the sub-frame to the call op that opened it.
parent.pending.op.Sub = frame
prefix = parent.pending.op.Idx
}
t.stack = append(t.stack, &vmTraceState{frame: frame, prefix: prefix})
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 🟡 The Parity vmTrace tracer produces an empty ops list ({"code":"0x...","ops":[]}) for post-Madhugiri Bor state-sync transactions when replayed via trace_replayTransaction / trace_replayBlockTransactions with the "vmTrace" trace type. WrapStateSyncHooks injects a synthetic OnEnter(depth=0) at OnTxStart so a root frame exists, but no OnOpcode ever fires at that scope — parityVMTracer attaches sub-frames only via parent.pending.op.Sub, so every real per-event OnEnter creates an orphan frame that is pushed onto the stack, populated, then popped without ever being linked into t.root. Fix: inject a synthetic call op into the root when the first child arrives without a pending op, or detect the state-sync case at OnTxStart and route event frames as top-level ops. Contradicts the PR summary byte-identical parity claim for post-Madhugiri state-sync txs with vmTrace, but the trace namespace is opt-in (--rpc.enabletrace, default off) and non-consensus.

Extended reasoning...

What is wrong

parityVMTracer.OnEnter (eth/tracers/parity_vmtrace.go:172-179) attaches a sub-frame to its parent only via parent.pending.op.Sub, and parent.pending is set exclusively inside OnOpcode. For a normal transaction this is fine because the root call runs real EVM ops before any sub-call fires, so an opening call op is always pending when a child arrives. Bor state-sync transactions break that invariant.

For a post-Madhugiri state-sync tx, traceTx (eth/tracers/api.go) wraps the tracer via WrapStateSyncHooks. That wrapper does two things at OnTxStart (state_sync_tracing_hooks.go:66-83): forwards to inner.OnTxStart and then synthetically fires inner.OnEnter(depth=0, CALL, BorSystemAddress, stateReceiverAddress) to give the tracer a single root frame. Every real subsequent OnEnter/OnExit/OnOpcode is shifted by +1 in depth. But the synthetic root scope is never actually executed — ApplyStateSyncEvents iterates through each event and calls vmenv.Call at unwrapped depth 0, which becomes depth 1 after shift. Nothing ever calls OnOpcode at depth 0, so root_state.pending stays nil for the entire tx.

Step-by-step proof

  1. OnTxStart fires. WrapStateSyncHooks forwards to parityVMTracer.OnTxStart (captures statedb), then synthetically calls inner.OnEnter(depth=0, CALL, ...). parity_vmtrace.go:166 branches on depth==0: sets t.root = frame and t.stack = [{root_state}]. root_state.pending is nil.
  2. ApplyStateSyncEvents processes the first event. vmenv.Call triggers OnEnter(depth=0, ...); wrapper shifts to depth=1.
  3. parity_vmtrace.go:172-179 runs with parent = root_state, parent.pending == nil. The if parent.pending != nil block at line 174-178 is SKIPPED — parent.pending.op.Sub = frame never executes. The frame is only pushed to t.stack.
  4. The event EVM execution now fires OnOpcodes into the orphan frame — its Ops slice fills, but nothing references it from t.root.
  5. Event ends → OnExit(depth=0) shifted to depth=1 → parity_vmtrace.go:184 pops the orphan off t.stack. No reference reachable from t.root remains.
  6. Pattern repeats for every subsequent state-sync event. All frames orphaned.
  7. OnTxEnd → synthetic OnExit(depth=0) pops the root off the stack.
  8. GetResult marshals t.root: {"code":"0x...<stateReceiverContract code>","ops":[]}. Ops is [], and no ops[i].Sub subtree references any of the recorded event data.

Why other paths work

The callTracer wrapped by parityCallTracer (used for the "trace" and "stateDiff" trace types) builds its call tree by depth alone — a +1 depth shift is sufficient, so state-sync txs are correctly represented in that output. The vmTrace tracer uses a fundamentally different attachment model (pending-op.Sub), which needs an owning op at the synthetic root — one that does not exist.

The 520-sample validation campaign reports 40/40 for trace_replayBlockTransactions, but that either did not include a post-Madhugiri state-sync tx with vmTrace requested, or Erigon returns the same empty ops and it appeared as a symmetric match. The unit tests (parity_vmtrace_*_test.go) all use hand-driven scopes or transfer-chain harnesses — none exercise WrapStateSyncHooks + vmTrace.

Impact

Silent correctness bug. Trigger scenario: post-Madhugiri (mainnet block ≥ 80,084,800) archive node with --rpc.enabletrace, calling trace_replayTransaction or trace_replayBlockTransactions with ["vmTrace"] on a state-sync tx hash returns {"vmTrace":{"code":"0x...","ops":[]}, ...} instead of the actual per-event opcode sequence. Downstream indexers reading Bor vmTrace of state-sync txs receive an empty ops list and will not detect the missing data unless they cross-check against another client.

Fix

Three viable options: (1) at OnEnter, when parent.pending is nil and depth > 0, add a synthetic call op record to the parent frame and use it as pending for the child sub-frame attachment; (2) detect the state-sync case at OnTxStart (via tx.Type()) and treat the shifted depth-1 frames as top-level ops rather than nested subs; (3) add a subframes []*vmTraceFrame field to vmTraceFrame and populate it directly when there is no pending op. Option (1) is the least invasive.

Severity

nit. Opt-in namespace (--rpc.enabletrace, default off), non-consensus, narrow surface (only state-sync tx + vmTrace subset — a real Parity-parity divergence but only on Bor pseudo-transactions). Fixing it is an incremental improvement, not a merge blocker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants