Skip to content

Conversation

@pooyadavoodi
Copy link
Contributor

@pooyadavoodi pooyadavoodi commented Jan 23, 2026

Purpose

  • Adding support for more features such as tool calling to run_batch.
  • This is achieved by using init_app_state and FrontendArgs from vllm/entrypoints/openai/api_server.py.
  • The approach taken here also removes some code duplication between api_server and run_batch.
  • Due to args conflict over --port between FrontendArgs and the existing run_batch options, we improve the option names from --port and --url to --metrics-port and --metrics-url and provide a backward compatibility guarantee.

Test Plan

Adding a new test for tool calling.

Test Result

$ pytest -v tests/entrypoints/openai/test_run_batch.py
=================================================================================== test session starts ===================================================================================
platform linux -- Python 3.12.9, pytest-9.0.2, pluggy-1.6.0 -- /root/dev/vllm/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /root/dev/vllm
configfile: pyproject.toml
plugins: anyio-4.12.1, asyncio-1.3.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 8 items

tests/entrypoints/openai/test_run_batch.py::test_empty_file PASSED                                                                                                                  [ 12%]
tests/entrypoints/openai/test_run_batch.py::test_completions PASSED                                                                                                                 [ 25%]
tests/entrypoints/openai/test_run_batch.py::test_completions_invalid_input PASSED                                                                                                   [ 37%]
tests/entrypoints/openai/test_run_batch.py::test_embeddings PASSED                                                                                                                  [ 50%]
tests/entrypoints/openai/test_run_batch.py::test_score[{"custom_id": "request-1", "method": "POST", "url": "/score", "body": {"model": "BAAI/bge-reranker-v2-m3", "queries": "What is the capital of France?", "documents": ["The capital of Brazil is Brasilia.", "The capital of France is Paris."]}}\n{"custom_id": "request-2", "method": "POST", "url": "/v1/score", "body": {"model": "BAAI/bge-reranker-v2-m3", "queries": "What is the capital of France?", "documents": ["The capital of Brazil is Brasilia.", "The capital of France is Paris."]}}] PASSED [ 62%]
tests/entrypoints/openai/test_run_batch.py::test_score[{"custom_id": "request-1", "method": "POST", "url": "/rerank", "body": {"model": "BAAI/bge-reranker-v2-m3", "query": "What is the capital of France?", "documents": ["The capital of Brazil is Brasilia.", "The capital of France is Paris."]}}\n{"custom_id": "request-2", "method": "POST", "url": "/v1/rerank", "body": {"model": "BAAI/bge-reranker-v2-m3", "query": "What is the capital of France?", "documents": ["The capital of Brazil is Brasilia.", "The capital of France is Paris."]}}\n{"custom_id": "request-2", "method": "POST", "url": "/v2/rerank", "body": {"model": "BAAI/bge-reranker-v2-m3", "query": "What is the capital of France?", "documents": ["The capital of Brazil is Brasilia.", "The capital of France is Paris."]}}] PASSED [ 75%]
tests/entrypoints/openai/test_run_batch.py::test_reasoning_parser PASSED                                                                                                            [ 87%]
tests/entrypoints/openai/test_run_batch.py::test_tool_calling PASSED                                                                                                                [100%]

==================================================================================== warnings summary =====================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================== 8 passed, 2 warnings in 302.24s (0:05:02) ========================================================================

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully integrates init_app_state and FrontendArgs from api_server.py into run_batch.py, significantly reducing code duplication and enabling support for new features like tool calling. The changes to argument parsing for metrics, including renaming --port and --url to --metrics-port and --metrics-url respectively, are well-handled with backward compatibility. The addition of a comprehensive test case for tool calling ensures the new functionality works as expected. Overall, the changes improve modularity, maintainability, and extend the capabilities of run_batch.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
)
parser.add_argument(
"--url",
"--metrics-url",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid this by adding a new base class of FrontendArgs, then each subclass can have different definitions of host and port

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants