commit: f1d6bed , In some scenarios, the number of requests will double and the input length will be fixed at two number

**request command:**
export OPENAI_API_KEY=EMPTY &&  export OPENAI_API_BASE=http://127.0.0.1:26894/v1/ && python3 /workspace/llmperf/token_benchmark_ray.py --model Qwen/Qwen2.5-7B-Instruct --mean-input-tokens 2048 --stddev-input-tokens 1024 --mean-output-tokens 2048  --stddev-output-tokens 1024  --max-num-completed-requests 20  --timeout 36000  --num-concurrent-requests 20  --results-dir './'  --llm-api openai --additional-sampling-params '{"ignore_eos": true}'

**expect**
Hope to complete 20 concurrent requests,  The input length is randomly distributed according to Gaussian distribution

**issue**
1. but the actual server received 40 requests

......
 26 INFO:     127.0.0.1:42840 - "POST /v1/chat/completions HTTP/1.1" 200 OK
 27 INFO:     127.0.0.1:42846 - "POST /v1/chat/completions HTTP/1.1" 200 OK
 28 INFO:     127.0.0.1:42862 - "POST /v1/chat/completions HTTP/1.1" 200 OK
 29 INFO:     127.0.0.1:42878 - "POST /v1/chat/completions HTTP/1.1" 200 OK
 30 INFO:     127.0.0.1:42894 - "POST /v1/chat/completions HTTP/1.1" 200 OK
 31 INFO:     127.0.0.1:42902 - "POST /v1/chat/completions HTTP/1.1" 200 OK
 32 INFO:     127.0.0.1:42916 - "POST /v1/chat/completions HTTP/1.1" 200 OK
 33 INFO:     127.0.0.1:42924 - "POST /v1/chat/completions HTTP/1.1" 200 OK
 34 INFO:     127.0.0.1:42938 - "POST /v1/chat/completions HTTP/1.1" 200 OK
 35 INFO:     127.0.0.1:42948 - "POST /v1/chat/completions HTTP/1.1" 200 OK
 36 INFO:     127.0.0.1:42956 - "POST /v1/chat/completions HTTP/1.1" 200 OK
 37 INFO:     127.0.0.1:42966 - "POST /v1/chat/completions HTTP/1.1" 200 OK
 38 INFO:     127.0.0.1:42982 - "POST /v1/chat/completions HTTP/1.1" 200 OK
 39 INFO:     127.0.0.1:42994 - "POST /v1/chat/completions HTTP/1.1" 200 OK
 40 INFO:     127.0.0.1:43008 - "POST /v1/chat/completions HTTP/1.1" 200 OK



2. From the log file Qwen-Qwen2-5-7B-Instruct_2048_2048_individual_responses.json, it can be seen that there were 20 requests counted (40 requests actually received by the server), and these 20 requests were fixed at two input lengths (4896 and 5369)

![Image](https://github.com/user-attachments/assets/b951e93d-9ed9-4a55-87d3-a27c6ec62f5c)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

commit: f1d6bed , In some scenarios, the number of requests will double and the input length will be fixed at two number #82

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

commit: f1d6bed , In some scenarios, the number of requests will double and the input length will be fixed at two number #82

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions