Skip to content

commit: f1d6bed , In some scenarios, the number of requests will double and the input length will be fixed at two number #82

@lq2030

Description

@lq2030

request command:
export OPENAI_API_KEY=EMPTY && export OPENAI_API_BASE=http://127.0.0.1:26894/v1/ && python3 /workspace/llmperf/token_benchmark_ray.py --model Qwen/Qwen2.5-7B-Instruct --mean-input-tokens 2048 --stddev-input-tokens 1024 --mean-output-tokens 2048 --stddev-output-tokens 1024 --max-num-completed-requests 20 --timeout 36000 --num-concurrent-requests 20 --results-dir './' --llm-api openai --additional-sampling-params '{"ignore_eos": true}'

expect
Hope to complete 20 concurrent requests, The input length is randomly distributed according to Gaussian distribution

issue

  1. but the actual server received 40 requests

......
26 INFO: 127.0.0.1:42840 - "POST /v1/chat/completions HTTP/1.1" 200 OK
27 INFO: 127.0.0.1:42846 - "POST /v1/chat/completions HTTP/1.1" 200 OK
28 INFO: 127.0.0.1:42862 - "POST /v1/chat/completions HTTP/1.1" 200 OK
29 INFO: 127.0.0.1:42878 - "POST /v1/chat/completions HTTP/1.1" 200 OK
30 INFO: 127.0.0.1:42894 - "POST /v1/chat/completions HTTP/1.1" 200 OK
31 INFO: 127.0.0.1:42902 - "POST /v1/chat/completions HTTP/1.1" 200 OK
32 INFO: 127.0.0.1:42916 - "POST /v1/chat/completions HTTP/1.1" 200 OK
33 INFO: 127.0.0.1:42924 - "POST /v1/chat/completions HTTP/1.1" 200 OK
34 INFO: 127.0.0.1:42938 - "POST /v1/chat/completions HTTP/1.1" 200 OK
35 INFO: 127.0.0.1:42948 - "POST /v1/chat/completions HTTP/1.1" 200 OK
36 INFO: 127.0.0.1:42956 - "POST /v1/chat/completions HTTP/1.1" 200 OK
37 INFO: 127.0.0.1:42966 - "POST /v1/chat/completions HTTP/1.1" 200 OK
38 INFO: 127.0.0.1:42982 - "POST /v1/chat/completions HTTP/1.1" 200 OK
39 INFO: 127.0.0.1:42994 - "POST /v1/chat/completions HTTP/1.1" 200 OK
40 INFO: 127.0.0.1:43008 - "POST /v1/chat/completions HTTP/1.1" 200 OK

  1. From the log file Qwen-Qwen2-5-7B-Instruct_2048_2048_individual_responses.json, it can be seen that there were 20 requests counted (40 requests actually received by the server), and these 20 requests were fixed at two input lengths (4896 and 5369)

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions