-
Notifications
You must be signed in to change notification settings - Fork 182
Description
request command:
export OPENAI_API_KEY=EMPTY && export OPENAI_API_BASE=http://127.0.0.1:26894/v1/ && python3 /workspace/llmperf/token_benchmark_ray.py --model Qwen/Qwen2.5-7B-Instruct --mean-input-tokens 2048 --stddev-input-tokens 1024 --mean-output-tokens 2048 --stddev-output-tokens 1024 --max-num-completed-requests 20 --timeout 36000 --num-concurrent-requests 20 --results-dir './' --llm-api openai --additional-sampling-params '{"ignore_eos": true}'
expect
Hope to complete 20 concurrent requests, The input length is randomly distributed according to Gaussian distribution
issue
- but the actual server received 40 requests
......
26 INFO: 127.0.0.1:42840 - "POST /v1/chat/completions HTTP/1.1" 200 OK
27 INFO: 127.0.0.1:42846 - "POST /v1/chat/completions HTTP/1.1" 200 OK
28 INFO: 127.0.0.1:42862 - "POST /v1/chat/completions HTTP/1.1" 200 OK
29 INFO: 127.0.0.1:42878 - "POST /v1/chat/completions HTTP/1.1" 200 OK
30 INFO: 127.0.0.1:42894 - "POST /v1/chat/completions HTTP/1.1" 200 OK
31 INFO: 127.0.0.1:42902 - "POST /v1/chat/completions HTTP/1.1" 200 OK
32 INFO: 127.0.0.1:42916 - "POST /v1/chat/completions HTTP/1.1" 200 OK
33 INFO: 127.0.0.1:42924 - "POST /v1/chat/completions HTTP/1.1" 200 OK
34 INFO: 127.0.0.1:42938 - "POST /v1/chat/completions HTTP/1.1" 200 OK
35 INFO: 127.0.0.1:42948 - "POST /v1/chat/completions HTTP/1.1" 200 OK
36 INFO: 127.0.0.1:42956 - "POST /v1/chat/completions HTTP/1.1" 200 OK
37 INFO: 127.0.0.1:42966 - "POST /v1/chat/completions HTTP/1.1" 200 OK
38 INFO: 127.0.0.1:42982 - "POST /v1/chat/completions HTTP/1.1" 200 OK
39 INFO: 127.0.0.1:42994 - "POST /v1/chat/completions HTTP/1.1" 200 OK
40 INFO: 127.0.0.1:43008 - "POST /v1/chat/completions HTTP/1.1" 200 OK
- From the log file Qwen-Qwen2-5-7B-Instruct_2048_2048_individual_responses.json, it can be seen that there were 20 requests counted (40 requests actually received by the server), and these 20 requests were fixed at two input lengths (4896 and 5369)