Skip to content

Fix KQL usage in in STATS .. BY #128371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,8 @@ private class ShardState {
private final List<SegmentState> perSegmentState;

ShardState(ShardConfig config) throws IOException {
weight = config.searcher.createWeight(config.query, scoreMode(), 1.0f);
Query rewritten = config.searcher.rewrite(config.query);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Queries should be rewritten prior to actual execution in the LuceneQueryEvaluator. Some queries like KNN or KQL depend on rewriting for being executed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, why not always rewrite the query whenever we create a ShardConfig instance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewriting is only needed in this case, where the query is not pushed down to Lucene. Doing the rewriting on the ShardContexts.toQuery() method is not necessary for cases when the query is pushed down, and it's clearer that way for debugging as we don't have the rewritten query being pushed down.

We could do the rewriting on the FullTextFunction.toEvaluator() method, but I think it's cleaner to have just the query and let the LuceneQueryEvaluator deal with that internal detail instead of forcing the client to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes the issue we have with KQL.
What I don't understand is why the rewrite does not already happen when we do:

shardConfigs[i++] = new ShardConfig(shardContext.toQuery(queryBuilder()), shardContext.searcher());

which calls toQuery which should already handle the rewrite?

public ParsedQuery toQuery(QueryBuilder queryBuilder) {
reset();
try {
Query query = Rewriteable.rewrite(queryBuilder, this, true).toQuery(this);
if (query == null) {
query = Queries.newMatchNoDocsQuery("No query left after rewrite.");
}
return new ParsedQuery(query, copyNamedQueries());
} catch (QueryShardException | ParsingException e) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The problem lies in the rewriting target.

toQuery() rewrites the QueryBuilder. This means that in KQL it generates a CaseInsensitiveTermQuery, which is the right thing to do.

However, CaseInsensitiveTermQueryis not a query we can create aWeight` on. It needs to be rewritten using the rewrite strategy by actually rewriting the query itself.

The LuceneOperator do this implicitly when creating a Lucene slice. But, the LuceneQueryEvaluator does not do that - it creates a Weight directly on the Query itself. In case the query has not been rewritten, it fails as CaseInsensitiveTermQuery needs to be rewritten first.

I will add some comments to explain this in more detail.

weight = config.searcher.createWeight(rewritten, scoreMode(), 1.0f);
searcher = config.searcher;
perSegmentState = new ArrayList<>(Collections.nCopies(searcher.getLeafContexts().size(), null));
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -268,3 +268,22 @@ r:double | author: text
4.670000076293945 | Walter Scheps
4.559999942779541 | J.R.R. Tolkien
;

testKqlInStatsWithGroupingBy
required_capability: kql_function
required_capability: lucene_query_evaluator_query_rewrite
FROM airports
| STATS c = COUNT(*) where kql("country: United States") BY scalerank
| SORT scalerank desc
;

c: long | scalerank: long
0 | 9
44 | 8
10 | 7
28 | 6
10 | 5
12 | 4
10 | 3
15 | 2
;
Original file line number Diff line number Diff line change
Expand Up @@ -841,3 +841,22 @@ r:double | author: text
4.670000076293945 | Walter Scheps
4.559999942779541 | J.R.R. Tolkien
;

testMatchInStatsWithGroupingBy
required_capability: match_function
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests for other functions as well that were missing

required_capability: full_text_functions_in_stats_where
FROM airports
| STATS c = COUNT(*) where match(country, "United States") BY scalerank
| SORT scalerank desc
;

c: long | scalerank: long
0 | 9
44 | 8
10 | 7
28 | 6
10 | 5
12 | 4
10 | 3
15 | 2
;
Original file line number Diff line number Diff line change
Expand Up @@ -299,3 +299,22 @@ r:double | author: text
4.670000076293945 | Walter Scheps
4.559999942779541 | J.R.R. Tolkien
;

testQstrInStatsWithGroupingBy
required_capability: qstr_function
required_capability: full_text_functions_in_stats_where
FROM airports
| STATS c = COUNT(*) where qstr("country: \"United States\"") BY scalerank
| SORT scalerank desc
;

c: long | scalerank: long
0 | 9
44 | 8
10 | 7
28 | 6
10 | 5
12 | 4
10 | 3
15 | 2
;
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

package org.elasticsearch.xpack.esql.plugin;

import org.elasticsearch.ElasticsearchException;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.support.WriteRequest;
import org.elasticsearch.common.settings.Settings;
Expand All @@ -16,12 +17,14 @@
import org.elasticsearch.xpack.esql.VerificationException;
import org.elasticsearch.xpack.esql.action.AbstractEsqlIntegTestCase;
import org.elasticsearch.xpack.kql.KqlPlugin;
import org.hamcrest.Matchers;
import org.junit.Before;

import java.util.Collection;
import java.util.List;

import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked;
import static org.elasticsearch.xpack.esql.EsqlTestUtils.getValuesList;
import static org.hamcrest.CoreMatchers.containsString;

public class KqlFunctionIT extends AbstractEsqlIntegTestCase {
Expand Down Expand Up @@ -91,6 +94,42 @@ public void testInvalidKqlQueryLexicalError() {
assertThat(error.getRootCause().getMessage(), containsString("line 1:1: extraneous input ':' "));
}

public void testKqlhWithStats() {
var errorQuery = """
FROM test
| STATS c = count(*) BY kql("content: fox")
""";

var error = expectThrows(ElasticsearchException.class, () -> run(errorQuery));
assertThat(error.getMessage(), containsString("[KQL] function is only supported in WHERE and STATS commands"));

var query = """
FROM test
| STATS c = count(*) WHERE kql("content: fox"), d = count(*) WHERE kql("content: dog")
""";

try (var resp = run(query)) {
assertColumnNames(resp.columns(), List.of("c", "d"));
assertColumnTypes(resp.columns(), List.of("long", "long"));
assertValues(resp.values(), List.of(List.of(4L, 4L)));
}

query = """
FROM test METADATA _score
| WHERE kql("content: fox")
| STATS m = max(_score), n = min(_score)
""";

try (var resp = run(query)) {
assertColumnNames(resp.columns(), List.of("m", "n"));
assertColumnTypes(resp.columns(), List.of("double", "double"));
List<List<Object>> valuesList = getValuesList(resp.values());
assertEquals(1, valuesList.size());
assertThat((double) valuesList.get(0).get(0), Matchers.lessThan(1.0));
assertThat((double) valuesList.get(0).get(1), Matchers.greaterThan(0.0));
}
}

private void createAndPopulateIndex() {
var indexName = "test";
var client = client().admin().indices();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1117,7 +1117,13 @@ public enum Cap {
/**
* Allow lookup join on mixed numeric fields, among byte, short, int, long, half_float, scaled_float, float and double.
*/
LOOKUP_JOIN_ON_MIXED_NUMERIC_FIELDS;
LOOKUP_JOIN_ON_MIXED_NUMERIC_FIELDS,

/**
* {@link org.elasticsearch.compute.lucene.LuceneQueryEvaluator} rewrites the query before executing it in Lucene. This
* provides support for KQL in a STATS ... BY command that uses a KQL query for filter, for example.
*/
LUCENE_QUERY_EVALUATOR_QUERY_REWRITE;

private final boolean enabled;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import com.carrotsearch.randomizedtesting.annotations.ParametersFactory;

import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.util.BytesRef;
import org.elasticsearch.Build;
import org.elasticsearch.common.network.NetworkAddress;
import org.elasticsearch.common.settings.Settings;
Expand Down Expand Up @@ -2039,6 +2040,33 @@ public void testMatchFunctionStatisWithNonPushableCondition() {
assertNull(esQuery.query());
}

public void testMatchFunctionWithStatsBy() {
String query = """
from test
| stats count(*) where match(job_positions, "Data Scientist") by gender
""";
var analyzer = makeAnalyzer("mapping-default.json");
var plannerOptimizer = new TestPlannerOptimizer(config, analyzer);
var plan = plannerOptimizer.plan(query);

var limit = as(plan, LimitExec.class);
var agg = as(limit.child(), AggregateExec.class);
var grouping = as(agg.groupings().get(0), FieldAttribute.class);
assertEquals("gender", grouping.name());
var aggregateAlias = as(agg.aggregates().get(0), Alias.class);
assertEquals("count(*) where match(job_positions, \"Data Scientist\")", aggregateAlias.name());
var count = as(aggregateAlias.child(), Count.class);
var countFilter = as(count.filter(), Match.class);
assertEquals("Data Scientist", ((BytesRef) ((Literal) countFilter.query()).value()).utf8ToString());
var aggregateFieldAttr = as(agg.aggregates().get(1), FieldAttribute.class);
assertEquals("gender", aggregateFieldAttr.name());
var exchange = as(agg.child(), ExchangeExec.class);
var aggExec = as(exchange.child(), AggregateExec.class);
var fieldExtract = as(aggExec.child(), FieldExtractExec.class);
var esQuery = as(fieldExtract.child(), EsQueryExec.class);
assertNull(esQuery.query());
}

private QueryBuilder wrapWithSingleQuery(String query, QueryBuilder inner, String fieldName, Source source) {
return FilterTests.singleValueQuery(query, inner, fieldName, source);
}
Expand Down