[Feature] PPL Highlight Support by RyanL1997 · Pull Request #5234 · opensearch-project/sql

RyanL1997 · 2026-03-14T17:01:23Z

Description

Preview of the updated documentation at my fork:
https://github.com/RyanL1997/sql/blob/ppl-highlight-cmd/docs/user/ppl/interfaces/endpoint.md#highlight

This PR replaces the request-level ThreadLocal approach from #5141. Instead of passing highlight config through thread-local state, we thread a HighlightConfig record through the AST and Calcite plan — following the same pattern as fetch_size (#5109). The highlight config flows from the API request into the AST via StatementBuilderContext, becomes a LogicalHighlight Calcite node, and gets pushed down to the OpenSearch scan via an optimizer rule. This avoids the thread-safety concerns and implicit coupling of ThreadLocal and keeps the data flow explicit and traceable through the plan.

Updated design according to review comment

Done. Dropped LogicalHighlight, HighlightIndexScanRule, and the Highlight AST node (3 classes deleted). Highlight config now threads through Query → QueryPlan → QueryService → CalcitePlanContext, and pushDownHighlight() is called eagerly in visitRelation() before alias wrapping — same consume-once pattern as other scan hints. Introduced a HighlightPushDown interface in core to bridge the module boundary with opensearch (since CalciteRelNodeVisitor can't reference CalciteLogicalIndexScan directly). Config is cleared immediately after pushdown so it's applied exactly once.

Add PPL search result highlighting via the highlight API parameter (not as a PPL command)
Support both the simple array format (["*"]) and the rich OpenSearch Dashboards object format with pre_tags, post_tags, fields, and fragment_size
Introduce HighlightConfig record to carry highlight options through the full pipeline: PPLQueryRequest → AstStatementBuilder → Highlight AST node → CalciteRelNodeVisitor → LogicalHighlight → HighlightIndexScanRule pushdown → OpenSearch HighlightBuilder
Response includes a top-level highlights array parallel to datarows, containing per-field highlighted fragments
Add endpoint documentation with examples and expected output

Related Issues

Sample Request / Response

Details

curl -s -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d '{"query":"search source=accounts \"Holmes\"","highlight":{"fields":{"*":{}},"pre_tags":["@opensearch-dashboards-highlighted-field@"],"post_tags":["@/opensearch-dashboards-highlighted-field@"],"fragment_size":2147483647}}' | jq
{
  "schema": [
    {
      "name": "account_number",
      "type": "bigint"
    },
    {
      "name": "firstname",
      "type": "string"
    },
    {
      "name": "address",
      "type": "string"
    },
    {
      "name": "balance",
      "type": "bigint"
    },
    {
      "name": "gender",
      "type": "string"
    },
    {
      "name": "city",
      "type": "string"
    },
    {
      "name": "employer",
      "type": "string"
    },
    {
      "name": "state",
      "type": "string"
    },
    {
      "name": "age",
      "type": "bigint"
    },
    {
      "name": "email",
      "type": "string"
    },
    {
      "name": "lastname",
      "type": "string"
    }
  ],
  "datarows": [
    [
      578,
      "Holmes",
      "969 Metropolitan Avenue",
      34259,
      "M",
      "Aguila",
      "Cubicide",
      "PA",
      37,
      "holmesmcknight@cubicide.com",
      "Mcknight"
    ],
    [
      828,
      "Blanche",
      "605 Stryker Court",
      44890,
      "F",
      "Loomis",
      "Motovate",
      "KS",
      33,
      "blancheholmes@motovate.com",
      "Holmes"
    ],
    [
      1,
      "Amber",
      "880 Holmes Lane",
      39225,
      "M",
      "Brogan",
      "Pyrami",
      "IL",
      32,
      "amberduke@pyrami.com",
      "Duke"
    ]
  ],
  "highlights": [
    {
      "firstname": [
        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
      ],
      "firstname.keyword": [
        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
      ]
    },
    {
      "lastname.keyword": [
        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
      ],
      "lastname": [
        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
      ]
    },
    {
      "address": [
        "880 @opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@ Lane"
      ]
    }
  ],
  "total": 3,
  "size": 3
}

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-03-14T17:02:22Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 5891541.

Path	Line	Severity	Description
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java	516	low	User-supplied highlight terms are interpolated directly into a Lucene query_string expression with only double-quote wrapping and no further sanitization. A crafted term containing backslash escapes or unbalanced quotes could malform or expand the generated query. Impact is limited to highlighting behavior (not document access control), so this is likely an oversight rather than malicious intent.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-14T17:03:27Z

PR Reviewer Guide 🔍

(Review updated until commit `08d3c20`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: HighlightConfig data model and AST/request parsing Relevant files: core/src/main/java/org/opensearch/sql/ast/tree/HighlightConfig.java core/src/main/java/org/opensearch/sql/ast/statement/Query.java ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java ppl/src/test/java/org/opensearch/sql/ppl/domain/PPLQueryRequestTest.java ppl/src/test/java/org/opensearch/sql/ppl/parser/AstStatementBuilderTest.java Sub-PR theme: Highlight pushdown into Calcite scan and OpenSearch request builder Relevant files: core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java core/src/main/java/org/opensearch/sql/calcite/plan/HighlightPushDown.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownType.java core/src/main/java/org/opensearch/sql/expression/HighlightExpression.java opensearch/src/test/java/org/opensearch/sql/opensearch/storage/scan/CalciteIndexScanCostTest.java Sub-PR theme: Highlight config threading through execution pipeline and response formatting Relevant files: core/src/main/java/org/opensearch/sql/executor/QueryService.java core/src/main/java/org/opensearch/sql/executor/execution/QueryPlan.java core/src/main/java/org/opensearch/sql/executor/execution/QueryPlanFactory.java ppl/src/main/java/org/opensearch/sql/ppl/PPLService.java protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java protocol/src/main/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatter.java ppl/src/test/java/org/opensearch/sql/ppl/PPLServiceTest.java core/src/test/java/org/opensearch/sql/executor/execution/QueryPlanTest.java protocol/src/test/java/org/opensearch/sql/protocol/response/QueryResultTest.java protocol/src/test/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatterTest.java
⚡ Recommended focus areas for review Iterator Change The `iterator()` method was refactored to use `tuple.entrySet().stream()` instead of `Map::values`. This changes iteration order to be based on the tuple's entry set order rather than the schema column order. If the tuple map's key order doesn't match the schema column order, data rows will be misaligned with schema columns. The old approach used `convertExprValuesToValues` which iterated over `Map::values` — the new approach iterates over `entrySet()` which may produce a different ordering for some map implementations. return exprValues.stream() .map(ExprValueUtils::getTupleValue) .map( tuple -> tuple.entrySet().stream().map(e -> e.getValue().value()).toArray(Object[]::new)) .iterator(); Highlight Consumed Early The highlight config is consumed (set to null) immediately after the first `Relation` node is visited. If a query involves multiple relations (e.g., JOINs or subqueries), only the first scan will receive the highlight config. This may be intentional for single-table PPL queries, but should be validated for multi-table scenarios. if (context.getHighlightConfig() != null && scan instanceof HighlightPushDown) { RelNode newScan = ((HighlightPushDown) scan).pushDownHighlight(context.getHighlightConfig()); context.relBuilder.build(); // pop old scan context.relBuilder.push(newScan); scan = newScan; context.setHighlightConfig(null); // consumed } Highlight Condition The highlight config is only set on the `Query` statement when `fields()` is non-null and non-empty. However, a `HighlightConfig` constructed with a non-empty field list via the simple array constructor will always have non-empty fields. The condition may silently drop a valid highlight config if `fields()` returns empty for some edge case (e.g., a config constructed with an empty list). Consider whether this guard is necessary or if it should throw instead. if (context.getHighlightConfig() != null && context.getHighlightConfig().fields() != null && !context.getHighlightConfig().fields().isEmpty()) { query.setHighlightConfig(context.getHighlightConfig()); } Fragment Size Validation The `fragment_size` validation only applies to the rich object format. In the per-field options parsed by `parsePerFieldOptions`, a negative `fragment_size` is not validated and would be passed through to OpenSearch without error. Consider adding validation in `parsePerFieldOptions` as well. if (fieldObj.has("fragment_size")) { opts.put("fragment_size", fieldObj.getInt("fragment_size")); } Project Filter In `pushDownProject`, the `_highlight` field is filtered out from the projected field names sent to OpenSearch via `pushDownProjectStream`. However, the `_highlight` field is still included in the `newSchema` passed to the scan. This means the schema has `_highlight` but the source builder does not request it — which is correct since highlights come from the highlight builder, not `_source`. This implicit coupling should be documented or made more explicit to avoid future regressions. action = (OSRequestBuilderAction) requestBuilder -> requestBuilder.pushDownProjectStream( newSchema.getFieldNames().stream() .filter(f -> !HighlightExpression.HIGHLIGHT_FIELD.equals(f))); }

github-actions · 2026-03-14T17:04:15Z

PR Code Suggestions ✨

Latest suggestions up to 08d3c20

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Align row values with schema column order by name The new iterator implementation iterates over the tuple's entry set directly, which relies on the tuple's map iteration order matching the schema column order. If the `ExprTupleValue` map order ever diverges from the schema column order (e.g., when `_highlight` is appended to the schema but the tuple was built without it), the column values will be misaligned with the schema. The previous implementation using `Map::values` had the same issue, but the change makes it more explicit. Consider iterating over schema columns and looking up each value by name to guarantee alignment. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [84-89] + List<String> columnNames = schema.getColumns().stream() + .map(col -> col.getAlias() != null ? col.getAlias() : col.getName()) + .collect(java.util.stream.Collectors.toList()); return exprValues.stream() .map(ExprValueUtils::getTupleValue) - .map( - tuple -> - tuple.entrySet().stream().map(e -> e.getValue().value()).toArray(Object[]::new)) + .map(tuple -> columnNames.stream() + .map(name -> { + ExprValue v = tuple.get(name); + return v != null ? v.value() : null; + }) + .toArray(Object[]::new)) .iterator(); Suggestion importance[1-10]: 6 __ Why: The suggestion identifies a real potential issue: if the `ExprTupleValue` map order diverges from the schema column order (especially with the new `_highlight` column), values could be misaligned. Iterating by schema column names guarantees correct alignment and is a meaningful correctness improvement.	Low
Possible issue	Prevent integer overflow when parsing fragment_size The validation rejects `fragment_size` of `0` and negative values, but the OSD object format commonly sends `fragment_size: 2147483647` (Integer.MAX_VALUE) to disable fragmentation. However, the real issue is that `Integer.MAX_VALUE` is a valid value, but the check `fragmentSize <= 0` is correct. The problem is that `obj.getInt("fragment_size")` can throw a `JSONException` if the value overflows an int (e.g., values larger than `Integer.MAX_VALUE`). Consider using `obj.getLong` and then range-checking before casting to `int` to avoid silent overflow or exceptions. ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [169-174] if (obj.has("fragment_size")) { - fragmentSize = obj.getInt("fragment_size"); - if (fragmentSize <= 0) { - throw new IllegalArgumentException("highlight fragment_size must be a positive integer"); + long fragmentSizeLong = obj.getLong("fragment_size"); + if (fragmentSizeLong <= 0 \|\| fragmentSizeLong > Integer.MAX_VALUE) { + throw new IllegalArgumentException("highlight fragment_size must be a positive integer not exceeding " + Integer.MAX_VALUE); } + fragmentSize = (int) fragmentSizeLong; } Suggestion importance[1-10]: 5 __ Why: The suggestion is valid - using `getLong` instead of `getInt` prevents potential overflow for values near `Integer.MAX_VALUE`. However, since OSD commonly sends exactly `Integer.MAX_VALUE` (2147483647), which fits in an int, this is a low-probability edge case. The improved code is accurate and addresses a real but minor robustness concern.	Low
General	Avoid silently dropping highlight config with empty fields The condition skips setting the highlight config when `fields` is null or empty, but a `HighlightConfig` constructed via the simple array constructor `new` `HighlightConfig(List.of("*"))` will have a non-null, non-empty `fields` map. However, if someone constructs a `HighlightConfig` with an empty `fields` map (e.g., object format with no `fields` key), the highlight config will be silently dropped without any error or warning. This could lead to confusing behavior where a user provides a highlight config but gets no highlighting. The guard should either propagate the config regardless or throw an error for empty fields. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [41-45] - if (context.getHighlightConfig() != null - && context.getHighlightConfig().fields() != null - && !context.getHighlightConfig().fields().isEmpty()) { + if (context.getHighlightConfig() != null) { query.setHighlightConfig(context.getHighlightConfig()); } Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that a `HighlightConfig` with empty `fields` (e.g., object format with no `fields` key) would be silently dropped. Removing the extra guards and always propagating a non-null config is a reasonable improvement, though the practical impact depends on whether empty-fields configs are a real use case.	Low
General	Guard unsafe casts in per-field options application The `@SuppressWarnings("unchecked")` annotation suppresses all unchecked cast warnings for the entire method, but the casts to `List` and `Boolean` are unsafe if the map values are not of the expected types. Since the values come from JSON parsing in `parsePerFieldOptions`, a `ClassCastException` at runtime is possible if the JSON contains unexpected types (e.g., a number where a boolean is expected). Each cast should be individually guarded or the annotation scope should be narrowed. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java [532-563] - @SuppressWarnings( - "unchecked") // values are trusted types from PPLQueryRequest.parsePerFieldOptions - private static void applyPerFieldOptions( - HighlightBuilder.Field field, Map<String, Object> options) { - ... - if (options.containsKey("pre_tags")) { - field.preTags(((List<String>) options.get("pre_tags")).toArray(new String[0])); + private static void applyPerFieldOptions( + HighlightBuilder.Field field, Map<String, Object> options) { + if (options == null \|\| options.isEmpty()) { + return; + } + if (options.containsKey("fragment_size")) { + field.fragmentSize(((Number) options.get("fragment_size")).intValue()); + } + if (options.containsKey("number_of_fragments")) { + field.numOfFragments(((Number) options.get("number_of_fragments")).intValue()); + } + if (options.containsKey("type")) { + field.highlighterType((String) options.get("type")); + } + if (options.containsKey("pre_tags")) { + Object val = options.get("pre_tags"); + if (val instanceof List<?> list) { + field.preTags(list.stream().map(Object::toString).toArray(String[]::new)); + } + } + if (options.containsKey("post_tags")) { + Object val = options.get("post_tags"); + if (val instanceof List<?> list) { + field.postTags(list.stream().map(Object::toString).toArray(String[]::new)); + } + } + if (options.containsKey("require_field_match")) { + Object val = options.get("require_field_match"); + if (val instanceof Boolean b) { + field.requireFieldMatch(b); + } + } + if (options.containsKey("no_match_size")) { + field.noMatchSize(((Number) options.get("no_match_size")).intValue()); + } + if (options.containsKey("order")) { + field.order((String) options.get("order")); + } } - if (options.containsKey("post_tags")) { - field.postTags(((List<String>) options.get("post_tags")).toArray(new String[0])); - } - if (options.containsKey("require_field_match")) { - field.requireFieldMatch((Boolean) options.get("require_field_match")); - } - </code> Suggestion importance[1-10]: 3 __ Why: The suggestion is valid in principle - the `@SuppressWarnings("unchecked")` suppresses all warnings for the method. However, since the values are populated by `parsePerFieldOptions` which explicitly uses typed JSON getters, the actual risk of `ClassCastException` is low. The existing code is also not accurately represented in the `existing_code` snippet (it contains `...` and `</code>` artifacts), making the suggestion less precise.	Low

Previous suggestions

Suggestions up to commit d289025

Category	Suggestion	Impact
Possible issue	Fix incorrect highlight data access path in test The test accesses `result.getJSONArray("highlights")`, but based on the rest of the PR the highlight data is returned as a `_highlight` column inside `datarows`, not as a top-level `highlights` array. This will cause a `JSONException` at runtime when the key is not found. The test should instead look up the `_highlight` column index from `schema` and read it from `datarows`. integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java [490-495] -var highlights = result.getJSONArray("highlights"); -var highlight = highlights.getJSONObject(0); +JSONArray schema = result.getJSONArray("schema"); +int hlIndex = -1; +for (int i = 0; i < schema.length(); i++) { + if ("_highlight".equals(schema.getJSONObject(i).getString("name"))) { + hlIndex = i; + break; + } +} +Assert.assertTrue("_highlight column should exist", hlIndex >= 0); +JSONObject highlight = result.getJSONArray("datarows").getJSONArray(0).getJSONObject(hlIndex); Assert.assertTrue( "Highlight should contain <em>Hattie</em>", highlight.getJSONArray("firstname").getString(0).contains("<em>Hattie</em>")); Suggestion importance[1-10]: 8 __ Why: The test accesses `result.getJSONArray("highlights")` which does not exist in the response format; highlight data is stored as a `_highlight` column inside `datarows`. This will cause a `JSONException` at runtime, making the test fail. The improved code correctly reads the `_highlight` column index from `schema` and retrieves the value from `datarows`.	Medium
Possible issue	Ensure row values align with schema column order The new iterator implementation iterates over the tuple's entry set directly, which relies on the tuple's map iteration order matching the schema column order. If the `ExprTupleValue` map order ever diverges from the schema column order (e.g., when highlight adds `_highlight` to the tuple but the schema lists it last), the column values will be misaligned. The previous approach using `Map::values` had the same issue, but the new code makes it more explicit. Consider iterating over schema columns and looking up each value by name to guarantee alignment. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [84-89] +// Iterate in schema column order to guarantee alignment +List<String> columnNames = schema.getColumns().stream() + .map(col -> col.getAlias() != null ? col.getAlias() : col.getName()) + .collect(java.util.stream.Collectors.toList()); return exprValues.stream() .map(ExprValueUtils::getTupleValue) .map( tuple -> - tuple.entrySet().stream().map(e -> e.getValue().value()).toArray(Object[]::new)) + columnNames.stream() + .map(name -> { + ExprValue v = tuple.get(name); + return v != null ? v.value() : null; + }) + .toArray(Object[]::new)) .iterator(); Suggestion importance[1-10]: 7 __ Why: The new iterator iterates over the tuple's entry set order rather than the schema column order, which could cause column misalignment if the `_highlight` field or other fields are stored in a different order than the schema. The suggested fix of iterating by schema column names is more robust and directly addresses a real correctness risk introduced by this PR.	Medium
General	Avoid silently dropping valid highlight configs with empty fields The condition skips setting the highlight config when `fields` is null or empty, but a `HighlightConfig` constructed via the simple array constructor (`new` `HighlightConfig(List.of("*"))`) will always have a non-null, non-empty `fields` map. However, if someone constructs a `HighlightConfig` with an empty `fields` map (e.g., object format with no `fields` key), the config will be silently dropped. This could cause unexpected behavior where a valid highlight config is ignored. Consider only checking for `null` config. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [41-45] -if (context.getHighlightConfig() != null - && context.getHighlightConfig().fields() != null - && !context.getHighlightConfig().fields().isEmpty()) { +if (context.getHighlightConfig() != null) { query.setHighlightConfig(context.getHighlightConfig()); } Suggestion importance[1-10]: 5 __ Why: The extra `fields` null/empty check in `AstStatementBuilder` could silently drop a valid `HighlightConfig` constructed with an empty `fields` map (e.g., object format with no `fields` key). Simplifying to only check for `null` config is more correct and consistent with how the config is constructed.	Low
General	Align validation condition with error message wording The validation rejects `fragment_size` of `0` and negative values, but the test `testHighlightOsdObjectFormatAllFields` passes `2147483647` (Integer.MAX_VALUE) which is valid. However, the OSD format commonly uses `Integer.MAX_VALUE` to mean "no fragmentation". The current check is correct, but the error message says "positive integer" while `0` is also rejected — consider clarifying the message to say "must be greater than 0" or use `fragmentSize < 1` for consistency with the message. ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [171-173] -if (fragmentSize <= 0) { - throw new IllegalArgumentException("highlight fragment_size must be a positive integer"); +if (fragmentSize < 1) { + throw new IllegalArgumentException("highlight fragment_size must be greater than 0"); } Suggestion importance[1-10]: 2 __ Why: The functional difference between `<= 0` and `< 1` is identical for integers, so this is purely a cosmetic/wording improvement. The existing code is correct and the test already covers both `0` and `-1` cases.	Low

Suggestions up to commit e2ef1b2

Category	Suggestion	Impact
Possible issue	Guard against non-collection highlight field values Calling `entry.getValue().collectionValue()` will throw an exception if the highlight value for a field is not a collection (e.g., it's a string or missing). This can cause a runtime failure when processing highlight data that doesn't conform to the expected list-of-strings structure. Add a null/type check before calling `collectionValue()`. core/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123] Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { + ExprValue fieldVal = entry.getValue(); + if (fieldVal == null \|\| fieldVal.isMissing() \|\| fieldVal.isNull()) { + continue; + } hlMap.put( entry.getKey(), - entry.getValue().collectionValue().stream() + fieldVal.collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } -return (Map<String, Object>) hlMap; +return hlMap; Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies a potential runtime exception if a highlight field value is not a collection. The improved code adds null/missing/null checks before calling `collectionValue()`, which is a valid defensive improvement. The unchecked cast `(Map<String, Object>) hlMap` is also removed in the improved code, which is a minor cleanup.	Low
Possible issue	Guard against non-integer fragment_size values The validation rejects `fragment_size` of `0` and negative values, but the OSD format commonly sends `fragment_size: 2147483647` (Integer.MAX_VALUE) to disable fragmentation. However, the real issue is that `Integer.MAX_VALUE` itself is valid, but the check `<= 0` is correct. The bigger concern is that `obj.getInt("fragment_size")` will throw a `JSONException` (not `IllegalArgumentException`) if the value is not an integer (e.g., a float like `100.5`), causing an unhandled exception to propagate. Use `optInt` with a sentinel or wrap in a try-catch to provide a consistent error message. ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [170-173] -if (fragmentSize <= 0) { +if (obj.has("fragment_size")) { + Object fsRaw = obj.get("fragment_size"); + if (!(fsRaw instanceof Integer) && !(fsRaw instanceof Long)) { throw new IllegalArgumentException("highlight fragment_size must be a positive integer"); + } + fragmentSize = ((Number) fsRaw).intValue(); + if (fragmentSize <= 0) { + throw new IllegalArgumentException("highlight fragment_size must be a positive integer"); + } } Suggestion importance[1-10]: 4 __ Why: The suggestion correctly identifies that `obj.getInt("fragment_size")` can throw a `JSONException` for non-integer values, but in practice the OSD format always sends integer values, and the `JSONException` would still surface as an error to the caller. The improvement is minor and the existing code handles the common cases correctly.	Low
General	Use consistent non-empty check for highlight config The simple array format constructs a `HighlightConfig` via `new HighlightConfig(List)`, which calls `toFieldMap`. If the field list is non-null but empty, `fields()` will be an empty map, and the highlight config will be silently dropped here. This means a request with `"highlight": []` would not propagate the config at all, which may be the desired behavior, but it also means a `HighlightConfig` created with a non-empty field list but whose `fields()` map is somehow empty would be dropped. Consider also checking `fieldNames()` for the simple-array path to be consistent. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [41-45] if (context.getHighlightConfig() != null - && context.getHighlightConfig().fields() != null - && !context.getHighlightConfig().fields().isEmpty()) { + && !context.getHighlightConfig().fieldNames().isEmpty()) { query.setHighlightConfig(context.getHighlightConfig()); } Suggestion importance[1-10]: 4 __ Why: The suggestion simplifies the null/empty check by using `fieldNames()` instead of checking `fields()` directly, which is more consistent with the `HighlightConfig` API. However, the behavior difference for an empty array `[]` is intentional (silently dropping it), so this is a minor style/consistency improvement rather than a bug fix.	Low
General	Prevent duplicate highlight field in projections The `_highlight` field is unconditionally appended to `expandedFields` whenever it exists in the schema, even if the user's explicit `\| fields` projection already included it. This could result in duplicate `_highlight` columns in the projection. Check whether the field is already present in `expandedFields` before adding it. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [435-439] // Include _highlight in projections when the highlight column is present in the schema int hlIndex = currentFields.indexOf(HighlightExpression.HIGHLIGHT_FIELD); -if (hlIndex >= 0) { +boolean alreadyIncluded = expandedFields.stream() + .anyMatch(rex -> rex.equals(context.relBuilder.field(hlIndex >= 0 ? hlIndex : 0)) && hlIndex >= 0); +if (hlIndex >= 0 && !alreadyIncluded) { expandedFields.add(context.relBuilder.field(hlIndex)); } Suggestion importance[1-10]: 3 __ Why: The concern about duplicate `_highlight` columns is valid in theory, but the `improved_code` is awkward and potentially incorrect (it evaluates `context.relBuilder.field(0)` when `hlIndex < 0`). The existing code appends `_highlight` only when it's in the schema, and typical user projections via `\| fields` would not include `_highlight` explicitly, making duplicates unlikely in practice.	Low

Suggestions up to commit 1bd30fa

Category	Suggestion	Impact
Possible issue	Guard against null highlight field values Calling `entry.getValue().collectionValue()` will throw an exception if the highlight value for a field is not a collection (e.g., it's a string or null). This can cause a runtime failure when processing highlight data that doesn't conform to the expected array-of-strings structure. Add a null/type check before calling `collectionValue()`. core/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123] Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { + ExprValue fieldVal = entry.getValue(); + if (fieldVal == null \|\| fieldVal.isMissing() \|\| fieldVal.isNull()) { + continue; + } hlMap.put( entry.getKey(), - entry.getValue().collectionValue().stream() + fieldVal.collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } -return (Map<String, Object>) hlMap; +return hlMap; Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies a potential runtime exception if a highlight field value is not a collection type. Adding null/missing/null checks before calling `collectionValue()` improves robustness, and the unchecked cast `(Map<String, Object>) hlMap` is also removed in the improved code.	Low
Possible issue	Guard against out-of-range fragment_size values The validation rejects `fragment_size` of `2147483647` (Integer.MAX_VALUE), which is the value explicitly sent by OpenSearch Dashboards as shown in the tests and documentation. The condition `<= 0` is correct, but the comment says "positive integer" — however, `Integer.MAX_VALUE` is positive and valid. The real issue is that `obj.getInt("fragment_size")` will throw a `JSONException` if the value exceeds `Integer.MAX_VALUE` (e.g., a long value), and there's no guard against that. Consider wrapping the parse in a try-catch or using `optLong` with a range check to handle out-of-range values gracefully. ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [169-173] -if (fragmentSize <= 0) { - throw new IllegalArgumentException("highlight fragment_size must be a positive integer"); +if (obj.has("fragment_size")) { + long rawFragmentSize = obj.optLong("fragment_size", -1L); + if (rawFragmentSize <= 0 \|\| rawFragmentSize > Integer.MAX_VALUE) { + throw new IllegalArgumentException( + "highlight fragment_size must be a positive integer not exceeding " + Integer.MAX_VALUE); + } + fragmentSize = (int) rawFragmentSize; } Suggestion importance[1-10]: 4 __ Why: The existing `<= 0` check is correct and `Integer.MAX_VALUE` (2147483647) is a valid positive integer that passes the check. The `obj.getInt()` method in org.json does handle large integers within int range, and `2147483647` is exactly `Integer.MAX_VALUE`, so no overflow occurs. The suggestion adds marginal safety for truly out-of-range long values, but this is a low-probability edge case.	Low
General	Prevent duplicate highlight field in projections The `_highlight` field is unconditionally appended to every explicit projection when it exists in the schema, even when the user has explicitly excluded it or when it's already included in `expandedFields` via `expandProjectFields`. This could result in duplicate `_highlight` entries in the projection list, potentially causing plan errors or unexpected behavior. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [435-439] // Include _highlight in projections when the highlight column is present in the schema +// and not already included by the user's explicit projection int hlIndex = currentFields.indexOf(HighlightExpression.HIGHLIGHT_FIELD); -if (hlIndex >= 0) { +boolean alreadyIncluded = expandedFields.stream() + .anyMatch(rex -> rex.toString().contains(HighlightExpression.HIGHLIGHT_FIELD)); +if (hlIndex >= 0 && !alreadyIncluded) { expandedFields.add(context.relBuilder.field(hlIndex)); } Suggestion importance[1-10]: 5 __ Why: The concern about duplicate `_highlight` entries is valid — if `expandProjectFields` already includes `_highlight` (e.g., user explicitly projects it), appending it again could cause plan errors. However, the `improved_code` uses a string-based check (`rex.toString().contains(...)`) which is fragile; a more robust approach would compare field indices or names directly.	Low
General	Unify highlight config non-empty check The highlight config is only set on the query when the object-format `fields` map is non-empty. However, when the simple array format is used (e.g., `["*"]`), `HighlightConfig` is constructed via `new HighlightConfig(List)` which calls `toFieldMap`, so `fields()` will be non-null and non-empty — this works. But if someone passes an object format with an empty `fields` object `{}`, the highlight config will be silently dropped. This may be intentional, but it's worth verifying the `fieldNames()` method is used consistently for the simple-array path. core/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [41-45] if (context.getHighlightConfig() != null - && context.getHighlightConfig().fields() != null - && !context.getHighlightConfig().fields().isEmpty()) { + && !context.getHighlightConfig().fieldNames().isEmpty()) { query.setHighlightConfig(context.getHighlightConfig()); } Suggestion importance[1-10]: 4 __ Why: The suggestion simplifies the condition by using `fieldNames()` instead of checking `fields() != null && !fields().isEmpty()`. This is a minor readability improvement, but the behavior difference (silently dropping empty object-format configs) is a valid concern worth addressing.	Low

Suggestions up to commit 1b07536

Category	Suggestion	Impact
Possible issue	Avoid adding highlight field to exclusion projections The `_highlight` field is unconditionally appended to every non-wildcard projection when it exists in the schema. This means that even when the user explicitly projects specific fields (e.g., `\| fields firstname, age`), the `_highlight` field is silently added to `expandedFields`. This could cause issues if `expandedFields` is later used for exclusion (`projectExcept`) since `_highlight` would be excluded rather than included. The highlight field should only be added when the projection is not an exclusion projection. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [435-439] // Include _highlight in projections when the highlight column is present in the schema int hlIndex = currentFields.indexOf(HighlightExpression.HIGHLIGHT_FIELD); -if (hlIndex >= 0) { +if (hlIndex >= 0 && !node.isExcluded()) { expandedFields.add(context.relBuilder.field(hlIndex)); } Suggestion importance[1-10]: 7 __ Why: This is a valid bug: when `node.isExcluded()` is true, `expandedFields` is passed to `projectExcept`, which would exclude `_highlight` rather than include it. Adding the `!node.isExcluded()` guard prevents this incorrect behavior.	Medium
Possible issue	Guard against non-collection highlight field values The code calls `entry.getValue().collectionValue()` without checking whether the value is actually a collection type. If a highlight field value is not a collection (e.g., it's a string or null), this will throw a runtime exception. Add a null/type check before calling `collectionValue()`. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123] Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { + ExprValue fieldVal = entry.getValue(); + if (fieldVal == null \|\| fieldVal.isMissing() \|\| fieldVal.isNull()) { + continue; + } hlMap.put( entry.getKey(), - entry.getValue().collectionValue().stream() + fieldVal.collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } return (Map<String, Object>) hlMap; Suggestion importance[1-10]: 5 __ Why: Calling `collectionValue()` on a non-collection `ExprValue` could throw a runtime exception. The suggested null/type check adds defensive handling, though in practice highlight values from OpenSearch are always arrays.	Low
General	Avoid silently dropping highlight config with empty fields The highlight config is only applied when the `fields` map is non-null and non-empty. However, a `HighlightConfig` constructed via the simple array constructor (`new` `HighlightConfig(List.of("*"))`) will have a non-null, non-empty `fields` map, so that case is handled. But if someone constructs a `HighlightConfig` with an empty `fields` map (e.g., object format with no `fields` key), the config will be silently dropped without any error or warning. This silent discard could be confusing. Consider applying the config whenever it is non-null, or throwing a validation error earlier in `PPLQueryRequest.getHighlightConfig()` when fields are empty. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [41-45] -if (context.getHighlightConfig() != null - && context.getHighlightConfig().fields() != null - && !context.getHighlightConfig().fields().isEmpty()) { +if (context.getHighlightConfig() != null) { query.setHighlightConfig(context.getHighlightConfig()); } Suggestion importance[1-10]: 4 __ Why: The current guard silently drops a `HighlightConfig` with an empty `fields` map, which could be confusing. Simplifying to just a null check is cleaner, though the empty-fields case is already rejected earlier in `PPLQueryRequest.getHighlightConfig()` since `fieldsObj` with no keys would produce an empty map that gets passed through.	Low
General	Validate fragment_size upper bound to prevent overflow The validation rejects `fragment_size` values of `0` or negative, but the OpenSearch Dashboards integration explicitly sends `fragment_size: 2147483647` (Integer.MAX_VALUE) to disable fragmentation. However, the check `<= 0` is correct for that case. The real issue is that `Integer.MAX_VALUE` is a valid value, but any value like `-1` should be rejected. The current check is fine, but the condition should use `< 0` to allow `0` as a valid "no fragmentation" sentinel if OpenSearch supports it, or keep `<= 0` only if `0` is truly invalid per the OpenSearch API. Per OpenSearch docs, `fragment_size: 0` is invalid, so the check is correct — but consider also validating the upper bound to prevent overflow issues when passed to the builder. ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [169-174] +if (obj.has("fragment_size")) { + fragmentSize = obj.getInt("fragment_size"); + if (fragmentSize <= 0) { + throw new IllegalArgumentException("highlight fragment_size must be a positive integer"); + } +} - Suggestion importance[1-10]: 2 __ Why: The `existing_code` and `improved_code` are identical, meaning no actual change is proposed. The suggestion only discusses whether the validation is correct without providing a concrete fix.	Low

Suggestions up to commit 87bb284

Category	Suggestion	Impact
Possible issue	Avoid duplicate highlight column in scan schema `pushDownHighlight` unconditionally adds the `_highlight` field to the schema even if it already exists (e.g., when the rule fires a second time or the field was already added by `LogicalHighlight.create`). This can result in duplicate `_highlight` columns in the row type, causing index-out-of-bounds or incorrect field resolution downstream. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [89-95] public RelNode pushDownHighlight(HighlightConfig highlightConfig) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } Suggestion importance[1-10]: 7 __ Why: This is a real correctness issue — `pushDownHighlight` unconditionally adds `_highlight` without checking if it already exists, which could cause duplicate columns. The `LogicalHighlight.create` already guards against this, but `pushDownHighlight` does not, making it inconsistent and potentially buggy.	Medium
	Guard against unexpected highlight value types `entry.getValue().collectionValue()` will throw an exception if the highlight value for a field is not a collection (e.g., it is a plain string or missing). This can cause the entire response serialization to fail for rows where highlight data has an unexpected shape. A null/missing check or a try-catch should guard this call. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123] Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { + ExprValue fieldVal = entry.getValue(); + if (fieldVal == null \|\| fieldVal.isMissing() \|\| fieldVal.isNull()) { + continue; + } hlMap.put( entry.getKey(), - entry.getValue().collectionValue().stream() + fieldVal.collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } -return (Map<String, Object>) hlMap; +return hlMap; Suggestion importance[1-10]: 6 __ Why: Calling `collectionValue()` on a non-collection `ExprValue` could throw an exception and break response serialization. Adding a null/missing check is a defensive improvement that prevents potential runtime failures for unexpected highlight data shapes.	Low
	Prevent highlight config leaking across queries The `visitHighlight` method sets the highlight config on the context and then visits children, but the config is consumed and cleared inside `visitRelation`. If the query has no `Relation` node (e.g., a `Values` node or subquery), the highlight config will remain set on the context and could leak into subsequent queries sharing the same context. The config should be cleared after `visitChildren` if it was not consumed. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3202-3207] @Override public RelNode visitHighlight(Highlight node, CalcitePlanContext context) { context.setHighlightConfig(node.getHighlightConfig()); visitChildren(node, context); + // Clear in case visitRelation was not reached (e.g. non-scan source) + context.setHighlightConfig(null); return context.relBuilder.peek(); } Suggestion importance[1-10]: 5 __ Why: The concern is valid — if `visitRelation` is not reached, the `highlightConfig` stays set on the context. However, the config is already cleared inside `visitRelation` with `context.setHighlightConfig(null)`, and the `CalcitePlanContext` is typically per-query, reducing the actual risk. The fix is reasonable but the impact is limited.	Low
General	Return null for highlight config with no fields When the `fields` object is absent or empty in the OSD object format, `fields` will be an empty list, and the `HighlightConfig` will be created with no fields. This will silently produce a no-op highlight (the check in `AstStatementBuilder` skips injection when fields are empty), but the caller receives a non-null config with empty fields rather than `null`, which may be confusing. More critically, if `fields` is empty, `applyHighlightPushDown` returns early without configuring the highlight builder, so pre/post tags and fragment_size from the OSD object are also silently dropped. Consider returning `null` when fields are empty. ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [149-154] JSONObject fieldsObj = obj.optJSONObject("fields"); if (fieldsObj != null) { for (String key : fieldsObj.keySet()) { fields.add(key); } } +if (fields.isEmpty()) { + return null; +} + Suggestion importance[1-10]: 5 __ Why: When the OSD object format has no `fields`, `applyHighlightPushDown` returns early and silently drops `pre_tags`, `post_tags`, and `fragment_size`. Returning `null` instead of an empty-fields config makes the behavior explicit and consistent with the `AstStatementBuilder` check.	Low

github-actions · 2026-03-14T17:29:29Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit f0dacc0.

Path	Line	Severity	Description
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java	519	low	User-supplied highlight terms are wrapped in double quotes but not escaped before being passed to queryStringQuery(). A term containing a literal '"' character could partially break out of the phrase context and alter the Lucene query structure. Because PPL users are already authenticated and can issue arbitrary search queries, the practical exploitability is limited, but proper escaping of the term string (e.g., via QueryParser.escape) would be safer.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-14T17:30:33Z

Persistent review updated to latest commit f0dacc0

github-actions · 2026-03-15T19:40:17Z

Persistent review updated to latest commit ab37b5b

github-actions · 2026-03-15T20:51:54Z

Persistent review updated to latest commit a7c44d8

github-actions · 2026-03-15T21:16:05Z

Persistent review updated to latest commit 51356cd

github-actions · 2026-03-15T21:21:58Z

Persistent review updated to latest commit b2d6afe

github-actions · 2026-03-16T16:41:14Z

Persistent review updated to latest commit 7bb526a

github-actions · 2026-03-16T20:38:31Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit e3b9c75.

Path	Line	Severity	Description
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java	514	medium	User-supplied field names from the API request's 'highlight' parameter are directly interpolated into a Lucene query string with only double-quote wrapping ('"' + term + '"'). While this helps with phrase matching, it does not sanitize Lucene special characters or escape sequences, creating a potential query string injection vector. A crafted field value like '* OR sensitive_field:*' could alter query semantics. This appears to be an unintentional coding oversight rather than malicious intent, as the pattern is consistent with the rest of the feature and the terms are at least wrapped in quotes.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 1 | Low: 0

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-16T20:39:50Z

Persistent review updated to latest commit e3b9c75

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

…command Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-16T21:10:49Z

Persistent review updated to latest commit 3cfbe9a

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-17T18:59:34Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 87bb284.

Path	Line	Severity	Description
integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java	172	low	Test helper `buildRequestWithHighlight` uses String.format to embed the `query` parameter directly into a JSON string without escaping. A query containing double-quotes or backslashes could produce malformed or injected JSON. This is test code and the inputs are controlled, so the risk is minimal, but it represents a JSON injection pattern.
integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java	499	low	The private `executeQueryWithHighlight` helper likewise uses String.format to inline both `query` and `highlightJson` directly into a JSON literal without escaping. Same JSON-injection concern as above; confined to test/integration code with controlled inputs.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

penghuo · 2026-03-17T20:18:40Z

Highlight is a scan hint, not a relational operator. Consider dropping LogicalHighlight, HighlightIndexScanRule, and the Highlight AST node. Instead, pass HighlightConfig through CalcitePlanContext and call pushDownHighlight() eagerly in visitRelation() — same pattern as existing PushDownContext operations. Simpler, fewer classes, same result.

cc: @dai-chen

docs/user/ppl/interfaces/endpoint.md

penghuo · 2026-03-17T18:23:42Z

docs/user/ppl/interfaces/endpoint.md

+      "lastname.keyword": ["@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"]
+    },
+    {
+      "address": ["880 @opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@ Lane"]


Datarows does not include address, should address been highlighted?

Yes, this is intentional and aligns with OpenSearch DSL behavior. In OpenSearch's _search endpoint, the highlight response is independent of _source filtering — even if you use "_source": ["firstname", "lastname"], the highlight section will still return fragments for any field that matched the query, including address. Our PPL highlight follows the same semantics: "fields": {"*": {}} highlights all fields that matched the search term, regardless of which fields are projected in the schema/datarows. Added a note in the docs to clarify this.

Verified against OpenSearch DSL — _source filtered but highlight still includes address

curl -X POST "localhost:9200/accounts/_search" \ -H "Content-Type: application/json" \ -d '{ "query": {"query_string": {"query": "Holmes"}}, "_source": ["firstname", "lastname"], "highlight": { "fields": {"*": {}}, "pre_tags": [""], "post_tags": [""] } }'

{ "hits": { "hits": [ { "_id": "578", "_source": { "firstname": "Holmes", "lastname": "Mcknight" }, "highlight": { "firstname": ["Holmes"], "firstname.keyword": ["Holmes"] } }, { "_id": "828", "_source": { "firstname": "Blanche", "lastname": "Holmes" }, "highlight": { "lastname": ["Holmes"], "lastname.keyword": ["Holmes"] } }, { "_id": "1", "_source": { "firstname": "Amber", "lastname": "Duke" }, "highlight": { "address": ["880 Holmes Lane"] } } ] } }

_source is filtered to only firstname/lastname, but highlight still returns address for account 1.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteHighlightIT.java

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-18T01:15:21Z

Persistent review updated to latest commit 1b07536

RyanL1997 · 2026-03-18T01:18:31Z

Highlight is a scan hint, not a relational operator. Consider dropping LogicalHighlight, HighlightIndexScanRule, and the Highlight AST node. Instead, pass HighlightConfig through CalcitePlanContext and call pushDownHighlight() eagerly in visitRelation() — same pattern as existing PushDownContext operations. Simpler, fewer classes, same result.

Done. Dropped LogicalHighlight, HighlightIndexScanRule, and the Highlight AST node (3 classes deleted). Highlight config now threads through Query → QueryPlan → QueryService → CalcitePlanContext, and pushDownHighlight() is called eagerly in visitRelation() before alias wrapping — same consume-once pattern as other scan hints. Introduced a HighlightPushDown interface in core to bridge the module boundary with opensearch (since CalciteRelNodeVisitor can't reference CalciteLogicalIndexScan directly). Config is cleared immediately after pushdown so it's applied exactly once.

cc @penghuo @dai-chen

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-18T17:39:46Z

Persistent review updated to latest commit 1bd30fa

penghuo · 2026-03-19T16:08:15Z

core/src/main/java/org/opensearch/sql/executor/QueryService.java

+  public void execute(
+      UnresolvedPlan plan,
+      QueryType queryType,
+      HighlightConfig highlightConfig,


nit: If a third request-level parameter is added (timeout, routing, preference), that's the time to refactor to QueryOptions — the pattern exists (ExecutionContext, PlanContext), and it's a straightforward migration. No need to speculate now.

docs/user/ppl/interfaces/endpoint.md

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-19T16:56:24Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit e2ef1b2.

Path	Line	Severity	Description
integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java	165	low	Test helper uses String.format to construct JSON with user-supplied query and highlightJson values without escaping. If a query string contains double-quote characters, this could produce malformed or injected JSON. Isolated to test code with controlled inputs so impact is minimal, but the pattern is unsafe if reused in production.
ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java	162	low	validateFieldName only checks for null/empty; it does not restrict special characters (e.g., script injection tokens) in field names before passing them to the OpenSearch HighlightBuilder. While OpenSearch itself is expected to handle this safely, the lack of any allowlist or character-level validation is an anomaly worth noting in a security context.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-19T16:57:38Z

Persistent review updated to latest commit e2ef1b2

RyanL1997 · 2026-03-19T18:05:40Z

Transforming some offline conversation with @penghuo and @dai-chen :
For the current response in commit e2ef1b2:

Details

curl -sS -X POST "localhost:9200/_plugins/_ppl" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "source=accounts \"Holmes\"",
    "highlight": {
      "fields": {
        "address": {"fragment_size": 200},
        "firstname": {"number_of_fragments": 3},
        "lastname": {}
      },
      "pre_tags": ["<b>"],
      "post_tags": ["</b>"]
    }
  }' | python3 -m json.tool
{
    "schema": [
        {
            "name": "account_number",
            "type": "bigint"
        },
        {
            "name": "firstname",
            "type": "string"
        },
        {
            "name": "address",
            "type": "string"
        },
        {
            "name": "balance",
            "type": "bigint"
        },
        {
            "name": "gender",
            "type": "string"
        },
        {
            "name": "city",
            "type": "string"
        },
        {
            "name": "employer",
            "type": "string"
        },
        {
            "name": "state",
            "type": "string"
        },
        {
            "name": "age",
            "type": "bigint"
        },
        {
            "name": "email",
            "type": "string"
        },
        {
            "name": "lastname",
            "type": "string"
        }
    ],
    "datarows": [
        [
            578,
            "Holmes",
            "969 Metropolitan Avenue",
            34259,
            "M",
            "Aguila",
            "Cubicide",
            "PA",
            37,
            "holmesmcknight@cubicide.com",
            "Mcknight"
        ],
        [
            828,
            "Blanche",
            "605 Stryker Court",
            44890,
            "F",
            "Loomis",
            "Motovate",
            "KS",
            33,
            "blancheholmes@motovate.com",
            "Holmes"
        ],
        [
            1,
            "Amber",
            "880 Holmes Lane",
            39225,
            "M",
            "Brogan",
            "Pyrami",
            "IL",
            32,
            "amberduke@pyrami.com",
            "Duke"
        ]
    ],
    "highlights": [
        {
            "firstname": [
                "<b>Holmes</b>"
            ]
        },
        {
            "lastname": [
                "<b>Holmes</b>"
            ]
        },
        {
            "address": [
                "880 <b>Holmes</b> Lane"
            ]
        }
    ],
    "total": 3,
    "size": 3
}

We are proposing - instead of doing highlight in a separate response formatter to isolate everything out in json body, we should keep it in the original data rows / schemas, and let the frontend handle the transformation for render:

Details

 curl -sS -X POST "localhost:9200/_plugins/_ppl" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "source=accounts \"Holmes\"",
    "highlight": {
      "fields": {
        "address": {"fragment_size": 200},
        "firstname": {"number_of_fragments": 3},
        "lastname": {}
      },
      "pre_tags": ["<b>"],
      "post_tags": ["</b>"]
    }
  }' | python3 -m json.tool
{
    "schema": [
        {
            "name": "account_number",
            "type": "bigint"
        },
        {
            "name": "firstname",
            "type": "string"
        },
        {
            "name": "address",
            "type": "string"
        },
        {
            "name": "balance",
            "type": "bigint"
        },
        {
            "name": "gender",
            "type": "string"
        },
        {
            "name": "city",
            "type": "string"
        },
        {
            "name": "employer",
            "type": "string"
        },
        {
            "name": "state",
            "type": "string"
        },
        {
            "name": "age",
            "type": "bigint"
        },
        {
            "name": "email",
            "type": "string"
        },
        {
            "name": "lastname",
            "type": "string"
        },
        {
            "name": "_highlight",
            "type": "struct"
        }
    ],
    "datarows": [
        [
            578,
            "Holmes",
            "969 Metropolitan Avenue",
            34259,
            "M",
            "Aguila",
            "Cubicide",
            "PA",
            37,
            "holmesmcknight@cubicide.com",
            "Mcknight",
            {
                "firstname": [
                    "<b>Holmes</b>"
                ]
            }
        ],
        [
            828,
            "Blanche",
            "605 Stryker Court",
            44890,
            "F",
            "Loomis",
            "Motovate",
            "KS",
            33,
            "blancheholmes@motovate.com",
            "Holmes",
            {
                "lastname": [
                    "<b>Holmes</b>"
                ]
            }
        ],
        [
            1,
            "Amber",
            "880 Holmes Lane",
            39225,
            "M",
            "Brogan",
            "Pyrami",
            "IL",
            32,
            "amberduke@pyrami.com",
            "Duke",
            {
                "address": [
                    "880 <b>Holmes</b> Lane"
                ]
            }
        ]
    ],
    "total": 3,
    "size": 3
}

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-19T18:45:29Z

Persistent review updated to latest commit d289025

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-19T19:12:50Z

Persistent review updated to latest commit 08d3c20

RyanL1997 added PPL Piped processing language v3.6.0 Issues targeting release v3.6.0 feature labels Mar 14, 2026

RyanL1997 added 12 commits March 16, 2026 14:07

initial commit for highlight cmd

0686753

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix the response format

f8f7208

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

pushdown fix (similar to v2)

4c649b5

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix - filter out highlight col in result schema

a596cba

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

add docs for highlight

5c64e04

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

dead code cleanup

49f5598

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

enable no pushdown

fec921d

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix pushdown and add IT and explain IT

a5a2788

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

add more tests for coverage

f46b241

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

add ccr tests

90839b6

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix ccr test

e38284d

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

[GRAMMAR REMOVAL] Internally handle highlight instead of expose as a …

3cfbe9a

…command Signed-off-by: Jialiang Liang <jiallian@amazon.com>

RyanL1997 force-pushed the ppl-highlight-cmd branch from e3b9c75 to 3cfbe9a Compare March 16, 2026 21:08

Add more tests

9a272c4

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

peng - update the doc to reflect the changes

87bb284

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

opensearch-project deleted a comment from github-actions bot Mar 17, 2026

penghuo reviewed Mar 17, 2026

View reviewed changes

Peng - drop the AST layer implementation and address the comments.

1b07536

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

peng - add more IT for highlight cases

1bd30fa

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

penghuo previously approved these changes Mar 19, 2026

View reviewed changes

dai-chen reviewed Mar 19, 2026

View reviewed changes

docs/user/ppl/interfaces/endpoint.md Outdated Show resolved Hide resolved

RyanL1997 mentioned this pull request Mar 19, 2026

[Feature] PPL Search Result Highlight Support in Explore opensearch-project/OpenSearch-Dashboards#11547

Merged

7 tasks

chen - update doc

e2ef1b2

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

RyanL1997 dismissed penghuo’s stale review via e2ef1b2 March 19, 2026 16:55

review - make highlight embeded into datarows/schemas

d289025

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix security IT

08d3c20

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

penghuo approved these changes Mar 19, 2026

View reviewed changes

dai-chen approved these changes Mar 19, 2026

View reviewed changes

RyanL1997 merged commit 94649fb into opensearch-project:main Mar 19, 2026
38 checks passed

Conversation

RyanL1997 commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Sample Request / Response

Check List

Uh oh!

github-actions bot commented Mar 14, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 08d3c20)

Uh oh!

github-actions bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 14, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

PR Code Analyzer ❗

Uh oh!

penghuo commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

penghuo Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

RyanL1997 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

RyanL1997 commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

penghuo Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 19, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

RyanL1997 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

Uh oh!

RyanL1997 commented Mar 14, 2026 •

edited

Loading

github-actions bot commented Mar 14, 2026 •

edited

Loading

(Review updated until commit `08d3c20`)

github-actions bot commented Mar 14, 2026 •

edited

Loading

penghuo commented Mar 17, 2026 •

edited

Loading

RyanL1997 commented Mar 19, 2026 •

edited

Loading