-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Configurable Inference timeout during Query time #131551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Configurable Inference timeout during Query time #131551
Conversation
Pinging @elastic/search-eng (Team:SearchOrg) |
Pinging @elastic/search-relevance (Team:Search - Relevance) |
Hi @Samiul-TheSoccerFan, I've created a changelog YAML for you. |
@Mikep86 Do we need to ping ML team in the PR too? |
@Samiul-TheSoccerFan Yes, we should ping the ML team since it touches code they own |
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment, if you could take a look that'd be great!
@@ -73,6 +74,9 @@ public void infer( | |||
TimeValue timeout, | |||
ActionListener<InferenceServiceResults> listener | |||
) { | |||
if (timeout == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the request is external via the REST api, the timeout
field should always be provided and won't be null. If the timeout isn't provided via a query parameter, we default it to 30 seconds. It seems like we're forcing it to be null in SemanticQueryBuilder
. We were intentionally trying to avoid timeout
being null.
Can you explain the reasoning behind why we want to retrieve the timeout from the settings vs getting it from a query parameter?
If semantic text would like to retrieve it from the settings, why don't we retrieve it from the settings in that code prior to constructing the InferenceAction.Request
and pass the value in?
If we did that, we'd only need to change it in SemanticQueryBuilder
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathan-buttner We actually started with the approach you're suggesting, having this setting at the index level and passing the timeout value from SemanticQueryBuilder
. But moved away from it after realizing it introduced more complexity. Switching to a cluster-level setting helped simplify the overall design.
During the query rewrite phase, we do not have access to cluster services in the current context of the SemanticQueryBuilder
. However, the inference service does have access to cluster services during inference (specifically in the infer function), which is why the logic was moved there.
The goal is not to override the passed timeout value, but to use the timeout from cluster settings (only if it is null
) so that we can support a user-defined inference timeout at query time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that we can support a user-defined inference timeout at query time.
If the goal is to provide a user-defined inference timeout at query time wouldn't we want to leverage the timeout
parameter on the search request itself? Is there a way to plumb that through if we can't get access to the cluster service in SemanticQueryBuilder
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathan-buttner : The timeout
parameter at the root level of the search request is only intended to control timeouts getting results from shards, potentially allowing for partial search results if desired. Our goal with this setting is to give users control over how long to wait for inference results, especially in cases where a model deployment is required. This allows users to tailor behavior based on their tolerance for latency or urgency of the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, thanks for explaining that. I'm good with this approach then. There are a few comments I'll add in another review 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an issue for this that has been open for a while #107077. Pat had a go at it then came to the same conclusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good to me, but some more tests need to be updated
@@ -279,7 +278,7 @@ protected QueryBuilder doRewrite(QueryRewriteContext queryRewriteContext) { | |||
List.of(query), | |||
TextExpansionConfigUpdate.EMPTY_UPDATE, | |||
false, | |||
InferModelAction.Request.DEFAULT_TIMEOUT_FOR_API | |||
null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this change broke the SparseVectorQueryBuilderTests
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will probably have to look in depth why this is happening. I remember running the tests locally and found no issues before creating this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few suggestions.
@@ -179,6 +180,12 @@ public class InferencePlugin extends Plugin | |||
Setting.Property.NodeScope, | |||
Setting.Property.Dynamic | |||
); | |||
public static final Setting<TimeValue> INFERENCE_QUERY_TIMEOUT = Setting.timeSetting( | |||
"xpack.inference.query_timeout", | |||
TimeValue.timeValueSeconds(TimeUnit.SECONDS.toSeconds(10)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've typically used 30 seconds as the default. Is that too long, is that why we're using 10 here for semantic text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default timeout for query-time inference is 10 seconds, as defined here: https://github.com/Mikep86/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/action/InferModelAction.java
That timeout is what's used by the semantic
, sparse_vector
, and knn
query_vector_builder
.
@jonathan-buttner Do we have an issue with differing default timeouts depending on the source of the request (query vs. external REST request)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, 10 seconds sounds good. No I don't believe we have an issue. I believe we followed the 30 second precedent from elsewhere in the stack for requests.
@@ -73,6 +74,9 @@ public void infer( | |||
TimeValue timeout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add @Nullable
here and add a comment explaining when null
will be passed and the motivation (basically just what we discussed in the previous thread).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review, I think we have unhandled edge cases and potentially divergent default values to manage.
@@ -179,6 +180,12 @@ public class InferencePlugin extends Plugin | |||
Setting.Property.NodeScope, | |||
Setting.Property.Dynamic | |||
); | |||
public static final Setting<TimeValue> INFERENCE_QUERY_TIMEOUT = Setting.timeSetting( | |||
"xpack.inference.query_timeout", | |||
TimeValue.timeValueSeconds(TimeUnit.SECONDS.toSeconds(10)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default timeout for query-time inference is 10 seconds, as defined here: https://github.com/Mikep86/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/action/InferModelAction.java
That timeout is what's used by the semantic
, sparse_vector
, and knn
query_vector_builder
.
@jonathan-buttner Do we have an issue with differing default timeouts depending on the source of the request (query vs. external REST request)?
if (timeout == null) { | ||
timeout = clusterService.getClusterSettings().get(InferencePlugin.INFERENCE_QUERY_TIMEOUT); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only want to apply this timeout if the input type is SEARCH
or INTERNAL_SEARCH
. Which brings up another edge case: If we allow timeout to be null now, we need to set default timeouts for the other input types as well.
This PR focuses on introducing user configurable
inference timeout
settings and use that as timeout during inference calls. Currently, it is hardcoded to10s
and the goal is to make it configurable.Setup
GET the default settings:
Update the inference timeout value:
GET the updated settings: