-
Notifications
You must be signed in to change notification settings - Fork 187
Description
What documentation is needed: A new guide that provides users with a standard set of scenarios and methodologies for benchmarking the Flow Control layer. This will allow users to understand the performance characteristics and trade-offs of the feature in a reproducible way.
Proposed scenarios:
-
Scenario 1: Single Workload Saturation
-
Goal: Demonstrate the tail latency (p99) benefits of shifting Head-of-Line blocking from the model server to the EPP.
-
Method: A single, high-QPS workload that pushes the pool just beyond saturation. Compare p99 latency with and without the Flow Control layer enabled.
-
Scenario 2: Unsaturated Overhead
-
Goal: Measure the baseline latency overhead added by the Flow Control layer when the system is not under load.
-
Method: A single, low-QPS workload that does not saturate the pool. Compare p50/p90 latency with and without the Flow Control layer enabled.
-
Scenario 3: Multi-Tenancy (Fairness)
-
Goal: Demonstrate the fairness policy's ability to provide isolation between competing tenants.
-
Method: N tenants with identical, non-sheddable priority sending traffic simultaneously to saturate the pool. Measure the throughput and latency for each tenant to validate equitable distribution.
-
Scenario 4: Multi-Tenancy (Priority)
-
Goal: Demonstrate the strict priority enforcement and load shedding behavior.
-
Method: N tenants with different priorities (e.g., P=100, P=0, P=-10) sending traffic to saturate the pool. Verify that P=100 requests are always dispatched first and that P=-10 requests are shed.
-
Scenario 5: Multi-Tenancy (Mixed)
-
Goal: Demonstrate a complex, realistic scenario combining fairness and priority.
-
Method: Multiple tenants at a high priority level and multiple tenants at a default priority level. Validate that fairness is applied correctly within each priority band while the higher band is strictly preferred.