-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
Hello, I noticed that the article mentions 5,000 queries, and the total time for rewriting them using a llm (taking TPC-H as an example) is only 3.40 seconds. This suggests that the inference time for each query is approximately 0.0006 seconds. Could you share which API you used to achieve such speed, or what kind of locally deployed model was utilized? Additionally, was the measurement conducted using techniques such as batching multiple queries for submission to the model or parallel processing with multithreading?

Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels