[Feature] PPL Highlight Support#5234
Conversation
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 5891541.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
PR Reviewer Guide 🔍(Review updated until commit 08d3c20)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 08d3c20 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit d289025
Suggestions up to commit e2ef1b2
Suggestions up to commit 1bd30fa
Suggestions up to commit 1b07536
Suggestions up to commit 87bb284
|
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit f0dacc0.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit f0dacc0 |
|
Persistent review updated to latest commit ab37b5b |
|
Persistent review updated to latest commit a7c44d8 |
|
Persistent review updated to latest commit 51356cd |
|
Persistent review updated to latest commit b2d6afe |
|
Persistent review updated to latest commit 7bb526a |
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit e3b9c75.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit e3b9c75 |
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
…command Signed-off-by: Jialiang Liang <jiallian@amazon.com>
e3b9c75 to
3cfbe9a
Compare
|
Persistent review updated to latest commit 3cfbe9a |
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 87bb284.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Highlight is a scan hint, not a relational operator. Consider dropping cc: @dai-chen |
| "lastname.keyword": ["@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"] | ||
| }, | ||
| { | ||
| "address": ["880 @opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@ Lane"] |
There was a problem hiding this comment.
Datarows does not include address, should address been highlighted?
There was a problem hiding this comment.
Yes, this is intentional and aligns with OpenSearch DSL behavior. In OpenSearch's _search endpoint, the highlight response is independent of _source filtering — even if you use "_source": ["firstname", "lastname"], the highlight section will still return fragments for any field that matched the query, including address. Our PPL highlight follows the same semantics: "fields": {"*": {}} highlights all fields that matched the search term, regardless of which fields are projected in the schema/datarows. Added a note in the docs to clarify this.
Verified against OpenSearch DSL — _source filtered but highlight still includes address
curl -X POST "localhost:9200/accounts/_search" \
-H "Content-Type: application/json" \
-d '{
"query": {"query_string": {"query": "Holmes"}},
"_source": ["firstname", "lastname"],
"highlight": {
"fields": {"*": {}},
"pre_tags": ["<b>"],
"post_tags": ["</b>"]
}
}'{
"hits": {
"hits": [
{
"_id": "578",
"_source": { "firstname": "Holmes", "lastname": "Mcknight" },
"highlight": {
"firstname": ["<b>Holmes</b>"],
"firstname.keyword": ["<b>Holmes</b>"]
}
},
{
"_id": "828",
"_source": { "firstname": "Blanche", "lastname": "Holmes" },
"highlight": {
"lastname": ["<b>Holmes</b>"],
"lastname.keyword": ["<b>Holmes</b>"]
}
},
{
"_id": "1",
"_source": { "firstname": "Amber", "lastname": "Duke" },
"highlight": {
"address": ["880 <b>Holmes</b> Lane"]
}
}
]
}
}_source is filtered to only firstname/lastname, but highlight still returns address for account 1.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteHighlightIT.java
Show resolved
Hide resolved
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
|
Persistent review updated to latest commit 1b07536 |
Done. Dropped |
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
|
Persistent review updated to latest commit 1bd30fa |
| public void execute( | ||
| UnresolvedPlan plan, | ||
| QueryType queryType, | ||
| HighlightConfig highlightConfig, |
There was a problem hiding this comment.
nit: If a third request-level parameter is added (timeout, routing, preference), that's the time to refactor to QueryOptions — the pattern exists (ExecutionContext, PlanContext), and it's a straightforward migration. No need to speculate now.
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit e2ef1b2.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit e2ef1b2 |
|
Transforming some offline conversation with @penghuo and @dai-chen : Detailscurl -sS -X POST "localhost:9200/_plugins/_ppl" \
-H "Content-Type: application/json" \
-d '{
"query": "source=accounts \"Holmes\"",
"highlight": {
"fields": {
"address": {"fragment_size": 200},
"firstname": {"number_of_fragments": 3},
"lastname": {}
},
"pre_tags": ["<b>"],
"post_tags": ["</b>"]
}
}' | python3 -m json.tool
{
"schema": [
{
"name": "account_number",
"type": "bigint"
},
{
"name": "firstname",
"type": "string"
},
{
"name": "address",
"type": "string"
},
{
"name": "balance",
"type": "bigint"
},
{
"name": "gender",
"type": "string"
},
{
"name": "city",
"type": "string"
},
{
"name": "employer",
"type": "string"
},
{
"name": "state",
"type": "string"
},
{
"name": "age",
"type": "bigint"
},
{
"name": "email",
"type": "string"
},
{
"name": "lastname",
"type": "string"
}
],
"datarows": [
[
578,
"Holmes",
"969 Metropolitan Avenue",
34259,
"M",
"Aguila",
"Cubicide",
"PA",
37,
"holmesmcknight@cubicide.com",
"Mcknight"
],
[
828,
"Blanche",
"605 Stryker Court",
44890,
"F",
"Loomis",
"Motovate",
"KS",
33,
"blancheholmes@motovate.com",
"Holmes"
],
[
1,
"Amber",
"880 Holmes Lane",
39225,
"M",
"Brogan",
"Pyrami",
"IL",
32,
"amberduke@pyrami.com",
"Duke"
]
],
"highlights": [
{
"firstname": [
"<b>Holmes</b>"
]
},
{
"lastname": [
"<b>Holmes</b>"
]
},
{
"address": [
"880 <b>Holmes</b> Lane"
]
}
],
"total": 3,
"size": 3
}We are proposing - instead of doing highlight in a separate response formatter to isolate everything out in json body, we should keep it in the original data rows / schemas, and let the frontend handle the transformation for render: Details curl -sS -X POST "localhost:9200/_plugins/_ppl" \
-H "Content-Type: application/json" \
-d '{
"query": "source=accounts \"Holmes\"",
"highlight": {
"fields": {
"address": {"fragment_size": 200},
"firstname": {"number_of_fragments": 3},
"lastname": {}
},
"pre_tags": ["<b>"],
"post_tags": ["</b>"]
}
}' | python3 -m json.tool
{
"schema": [
{
"name": "account_number",
"type": "bigint"
},
{
"name": "firstname",
"type": "string"
},
{
"name": "address",
"type": "string"
},
{
"name": "balance",
"type": "bigint"
},
{
"name": "gender",
"type": "string"
},
{
"name": "city",
"type": "string"
},
{
"name": "employer",
"type": "string"
},
{
"name": "state",
"type": "string"
},
{
"name": "age",
"type": "bigint"
},
{
"name": "email",
"type": "string"
},
{
"name": "lastname",
"type": "string"
},
{
"name": "_highlight",
"type": "struct"
}
],
"datarows": [
[
578,
"Holmes",
"969 Metropolitan Avenue",
34259,
"M",
"Aguila",
"Cubicide",
"PA",
37,
"holmesmcknight@cubicide.com",
"Mcknight",
{
"firstname": [
"<b>Holmes</b>"
]
}
],
[
828,
"Blanche",
"605 Stryker Court",
44890,
"F",
"Loomis",
"Motovate",
"KS",
33,
"blancheholmes@motovate.com",
"Holmes",
{
"lastname": [
"<b>Holmes</b>"
]
}
],
[
1,
"Amber",
"880 Holmes Lane",
39225,
"M",
"Brogan",
"Pyrami",
"IL",
32,
"amberduke@pyrami.com",
"Duke",
{
"address": [
"880 <b>Holmes</b> Lane"
]
}
]
],
"total": 3,
"size": 3
} |
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
|
Persistent review updated to latest commit d289025 |
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
|
Persistent review updated to latest commit 08d3c20 |
Description
This PR replaces the request-level
ThreadLocalapproach from #5141. Instead of passing highlight config through thread-local state, we thread aHighlightConfigrecord through the AST and Calcite plan — following the same pattern asfetch_size(#5109). The highlight config flows from the API request into the AST viaStatementBuilderContext, becomes aLogicalHighlightCalcite node, and gets pushed down to the OpenSearch scan via an optimizer rule. This avoids the thread-safety concerns and implicit coupling ofThreadLocaland keeps the data flow explicit and traceable through the plan.Updated design according to review comment
highlightAPI parameter (not as a PPL command)["*"]) and the rich OpenSearch Dashboards object format withpre_tags,post_tags,fields, andfragment_sizeHighlightConfigrecord to carry highlight options through the full pipeline:PPLQueryRequest→AstStatementBuilder→HighlightAST node →CalciteRelNodeVisitor→LogicalHighlight→HighlightIndexScanRulepushdown → OpenSearchHighlightBuilderhighlightsarray parallel todatarows, containing per-field highlighted fragmentsRelated Issues
Sample Request / Response
Details
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.