Question of LLM inference speed?

Hello, I noticed that the article mentions 5,000 queries, and the total time for rewriting them using a llm (taking TPC-H as an example) is only 3.40 seconds. This suggests that the inference time for each query is approximately 0.0006 seconds. Could you share which API you used to achieve such speed, or what kind of locally deployed model was utilized? Additionally, was the measurement conducted using techniques such as batching multiple queries for submission to the model or parallel processing with multithreading?

<img width="410" alt="Image" src="https://github.com/user-attachments/assets/1686e1cd-9934-4f8c-b9eb-49a8e9e637b5" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question of LLM inference speed? #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question of LLM inference speed? #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions