Skip to content

perf: push socket events host→guest instead of guest wait=0 polling #178

Description

@NathanFlurry

Problem

Guest networking pays a ~2–4ms quantum per socket event (accept, data, close), because the embedded node polyfill (crates/execution/src/node_import_cache.rs:4750-4752) calls the net.poll / net.server_poll sync RPCs with waitMs = 0 and paces itself with guest-side setTimeout timers (scheduleSocketPoll). Latency is the guest timer cadence, not sidecar work.

Measured 2026-07-01 (3-layer differential bench, p50, guest VM vs host node — same program):

op host node guest tax
udp loopback echo 0.04ms 21ms 525×
unix-socket echo 0.24ms 21ms 88×
unix accept (connect+close) 0.14ms 4ms 29×
tcp 64KiB echo 0.32ms 6ms 19×
node:http loopback GET 0.46ms 5ms 11×

Cost scales with event count (one accept = one ~4ms tick; a full echo ≈ 5 events ≈ 21ms), not payload size. Sidecar-side wait improvements (#172, #173) are correct but cold on this path — the guest always passes wait==0, so the sidecar never waits.

Constraint — why not waitMs > 0

The polyfill cannot block: callSync parks the whole V8 isolate. And the sidecar services these RPCs synchronously on a single-thread tokio runtime (new_current_thread().block_on, stdio.rs), so a wait>0 sync RPC would park the entire sidecar event loop and serialize all VM traffic behind it. Event push is the only shape that doesn't block either loop.

Fix direction

Push socket readiness/connection events host→guest asynchronously over the stream-callback channel the bridge already uses for stdio, and have the polyfill resolve pending accepts/reads/writes off those events instead of re-arming timers. The sidecar-side readiness plumbing from #172 (wait_fd_readable_until) already gives the sidecar an efficient way to learn about readiness to forward.

Follow-up (smaller, same theme)

Sidecar-internal waits can also go push-based: e.g. the loopback peer-pairing wait (#173, currently deadline+backoff polling) could subscribe to a socket-table-insert notification, using the same Condvar/Notify + bounded-timeout-fallback pattern as #174.

Regression gates

Standing bench rows in agent-os scripts/benchmarks fuzz-perf lane: net/udp_echo_small, net/unix_echo_small, net/http_loopback_get, net/tcp_*, perf-finding/unix_accept_latency (agentos#1570/#1571).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions