Skip to content

Fix KQL usage in in STATS .. BY #128371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

carlosdelest
Copy link
Member

KQL can now be used in STATS .. BY

FROM airports 
| STATS c = COUNT(*) where kql("country: United States") BY scalerank
| SORT scalerank desc

This fixes the LuceneQueryEvaluator not doing a rewrite of the query to evaluate. As KQL function depends on rewriting for being executed, this meant no hits when it was being used directly as a filter in a STATS .. BY clause.

@carlosdelest carlosdelest added >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL ES|QL-ui Impacts ES|QL UI Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed v9.1.0 labels May 23, 2025
@carlosdelest carlosdelest changed the title KQL can be used in STATS .. BY Fix KQL usage in in STATS .. BY May 23, 2025
@carlosdelest carlosdelest added auto-backport Automatically create backport pull requests when merged v8.19.0 labels May 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @carlosdelest, I've created a changelog YAML for you.

…tions-stats-using-by' into bugfix/esql-full-text-functions-stats-using-by
@@ -184,7 +184,8 @@ private class ShardState {
private final List<SegmentState> perSegmentState;

ShardState(ShardConfig config) throws IOException {
weight = config.searcher.createWeight(config.query, scoreMode(), 1.0f);
Query rewritten = config.searcher.rewrite(config.query);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Queries should be rewritten prior to actual execution in the LuceneQueryEvaluator. Some queries like KNN or KQL depend on rewriting for being executed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, why not always rewrite the query whenever we create a ShardConfig instance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewriting is only needed in this case, where the query is not pushed down to Lucene. Doing the rewriting on the ShardContexts.toQuery() method is not necessary for cases when the query is pushed down, and it's clearer that way for debugging as we don't have the rewritten query being pushed down.

We could do the rewriting on the FullTextFunction.toEvaluator() method, but I think it's cleaner to have just the query and let the LuceneQueryEvaluator deal with that internal detail instead of forcing the client to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes the issue we have with KQL.
What I don't understand is why the rewrite does not already happen when we do:

shardConfigs[i++] = new ShardConfig(shardContext.toQuery(queryBuilder()), shardContext.searcher());

which calls toQuery which should already handle the rewrite?

public ParsedQuery toQuery(QueryBuilder queryBuilder) {
reset();
try {
Query query = Rewriteable.rewrite(queryBuilder, this, true).toQuery(this);
if (query == null) {
query = Queries.newMatchNoDocsQuery("No query left after rewrite.");
}
return new ParsedQuery(query, copyNamedQueries());
} catch (QueryShardException | ParsingException e) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The problem lies in the rewriting target.

toQuery() rewrites the QueryBuilder. This means that in KQL it generates a CaseInsensitiveTermQuery, which is the right thing to do.

However, CaseInsensitiveTermQueryis not a query we can create aWeight` on. It needs to be rewritten using the rewrite strategy by actually rewriting the query itself.

The LuceneOperator do this implicitly when creating a Lucene slice. But, the LuceneQueryEvaluator does not do that - it creates a Weight directly on the Query itself. In case the query has not been rewritten, it fails as CaseInsensitiveTermQuery needs to be rewritten first.

I will add some comments to explain this in more detail.

@@ -841,3 +841,22 @@ r:double | author: text
4.670000076293945 | Walter Scheps
4.559999942779541 | J.R.R. Tolkien
;

testMatchInStatsWithGroupingBy
required_capability: match_function
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests for other functions as well that were missing

@carlosdelest carlosdelest marked this pull request as ready for review May 23, 2025 17:48
@elasticsearchmachine elasticsearchmachine removed the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label May 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

@carlosdelest
Copy link
Member Author

@elasticmachine run elasticsearch-ci/rest-compatibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged ES|QL-ui Impacts ES|QL UI >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants