Batched sampling across concurrent requests (step toward #1) #891

Bennethxyz · 2025-10-22T09:51:17Z

This PR introduces batched sampling to reduce per-token overhead when multiple requests reach the last shard concurrently.\n\nWhat\n- Add an async batcher in Node to group sampling calls within a short window (default 5ms) or until (default 8).\n- Stack logits and call once for the batch; on failure, fall back to per-request sampling.\n- Emit per-request token callbacks and forward the sampled token for continued generation, preserving current behavior.\n\nWhy\n- Incremental progress toward full forward-pass batching requested in #1 ([BOUNTY - ] Batched Requests). Sampling is a measurable hotspot and can benefit from batching with minimal risk.\n\nNotes\n- No changes to public APIs or gRPC schema; fully backward compatible.\n- Future work: extend batching earlier in the pipeline (prompt encode and forward passes) with per-request caches combined into batch-aware caches.\n\nConfig\n- (default 8)\n- (default 5)\n\nI’m happy to iterate on full tensor-forward batching next (MLX/Tinygrad cache semantics).

…-token overhead\n\n- Add async batch queues with short timeout and max batch size\n- Stack logits and call engine.sample once for the batch (with per-request fallback)\n- Forward sampled tokens and emit callbacks per-request\n\nThis is an incremental step toward full forward-pass batching as requested in exo-explore#1.

… to per-request\n\n- Prevent passing batched logits to TinygradDynamicShardInferenceEngine.sample\n- Safety check token count; fallback if mismatch

Bennethxyz mentioned this pull request Oct 22, 2025

[BOUNTY - $200] Batched Requests #1

Open

fix(node): disable batched sampling for tinygrad engine and fall back…

8b94583

… to per-request\n\n- Prevent passing batched logits to TinygradDynamicShardInferenceEngine.sample\n- Safety check token count; fallback if mismatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Batched sampling across concurrent requests (step toward #1) #891

Batched sampling across concurrent requests (step toward #1) #891

Uh oh!

Bennethxyz commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Batched sampling across concurrent requests (step toward #1) #891

Are you sure you want to change the base?

Batched sampling across concurrent requests (step toward #1) #891

Uh oh!

Conversation

Bennethxyz commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant