Skip to content

[Flow Control] Create Benchmarking Guide for the Flow Control Layer #1801

@LukeAVanDrie

Description

@LukeAVanDrie

What documentation is needed: A new guide that provides users with a standard set of scenarios and methodologies for benchmarking the Flow Control layer. This will allow users to understand the performance characteristics and trade-offs of the feature in a reproducible way.

Proposed scenarios:

  • Scenario 1: Single Workload Saturation

  • Goal: Demonstrate the tail latency (p99) benefits of shifting Head-of-Line blocking from the model server to the EPP.

  • Method: A single, high-QPS workload that pushes the pool just beyond saturation. Compare p99 latency with and without the Flow Control layer enabled.

  • Scenario 2: Unsaturated Overhead

  • Goal: Measure the baseline latency overhead added by the Flow Control layer when the system is not under load.

  • Method: A single, low-QPS workload that does not saturate the pool. Compare p50/p90 latency with and without the Flow Control layer enabled.

  • Scenario 3: Multi-Tenancy (Fairness)

  • Goal: Demonstrate the fairness policy's ability to provide isolation between competing tenants.

  • Method: N tenants with identical, non-sheddable priority sending traffic simultaneously to saturate the pool. Measure the throughput and latency for each tenant to validate equitable distribution.

  • Scenario 4: Multi-Tenancy (Priority)

  • Goal: Demonstrate the strict priority enforcement and load shedding behavior.

  • Method: N tenants with different priorities (e.g., P=100, P=0, P=-10) sending traffic to saturate the pool. Verify that P=100 requests are always dispatched first and that P=-10 requests are shed.

  • Scenario 5: Multi-Tenancy (Mixed)

  • Goal: Demonstrate a complex, realistic scenario combining fairness and priority.

  • Method: Multiple tenants at a high priority level and multiple tenants at a default priority level. Validate that fairness is applied correctly within each priority band while the higher band is strictly preferred.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions