Skip to content

[Feature] PPL Search Result Highlight Support in Explore#11547

Merged
mengweieric merged 6 commits intoopensearch-project:mainfrom
RyanL1997:ppl-syntax-highlighting
Mar 20, 2026
Merged

[Feature] PPL Search Result Highlight Support in Explore#11547
mengweieric merged 6 commits intoopensearch-project:mainfrom
RyanL1997:ppl-syntax-highlighting

Conversation

@RyanL1997
Copy link
Copy Markdown
Contributor

@RyanL1997 RyanL1997 commented Mar 18, 2026

Description

When user run a full-text search in PPL, matching terms are now automatically highlighted in the results — the same highlighting experience you get with DQL and Lucene queries, now available with PPL.

Issues Resolved

Demo

ppl-h-no-aud.mp4

Sample Request/Response

Details
#request payload
{
    "query": {
        "query": "source=`accounts` \"Holmes\" OR \"Bond\"",
        "language": "PPL",
        "dataset": {
            "id": "d9a7e550-23c3-11f1-b43a-8f40ac7555a0::a4c07906-82f1-4977-8808-32df0ad5043e",
            "title": "accounts",
            "type": "INDEXES",
            "schemaMappings": {},
            "dataSource": {
                "id": "d9a7e550-23c3-11f1-b43a-8f40ac7555a0",
                "title": "local",
                "type": "OpenSearch",
                "version": "3.6.0-SNAPSHOT"
            }
        },
        "format": "jdbc"
    },
    "highlight": {
        "pre_tags": [
            "@opensearch-dashboards-highlighted-field@"
        ],
        "post_tags": [
            "@/opensearch-dashboards-highlighted-field@"
        ],
        "fields": {
            "*": {}
        },
        "fragment_size": 2147483647
    }
}

#response
{
    "type": "data_frame",
    "body": {
        "name": "d9a7e550-23c3-11f1-b43a-8f40ac7555a0::a4c07906-82f1-4977-8808-32df0ad5043e",
        "schema": [
            {
                "name": "account_number",
                "type": "bigint",
                "values": []
            },
            {
                "name": "firstname",
                "type": "string",
                "values": []
            },
            {
                "name": "address",
                "type": "string",
                "values": []
            },
            {
                "name": "balance",
                "type": "bigint",
                "values": []
            },
            {
                "name": "gender",
                "type": "string",
                "values": []
            },
            {
                "name": "city",
                "type": "string",
                "values": []
            },
            {
                "name": "employer",
                "type": "string",
                "values": []
            },
            {
                "name": "state",
                "type": "string",
                "values": []
            },
            {
                "name": "age",
                "type": "bigint",
                "values": []
            },
            {
                "name": "email",
                "type": "string",
                "values": []
            },
            {
                "name": "lastname",
                "type": "string",
                "values": []
            }
        ],
        "meta": {
            "highlights": [
                {
                    "firstname": [
                        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
                    ],
                    "firstname.keyword": [
                        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
                    ]
                },
                {
                    "lastname.keyword": [
                        "@opensearch-dashboards-highlighted-field@Bond@/opensearch-dashboards-highlighted-field@"
                    ],
                    "lastname": [
                        "@opensearch-dashboards-highlighted-field@Bond@/opensearch-dashboards-highlighted-field@"
                    ]
                },
                {
                    "lastname.keyword": [
                        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
                    ],
                    "lastname": [
                        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
                    ]
                },
                {
                    "address": [
                        "880 @opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@ Lane"
                    ]
                },
                {
                    "address": [
                        "461 @opensearch-dashboards-highlighted-field@Bond@/opensearch-dashboards-highlighted-field@ Street"
                    ]
                }
            ]
        },
        "fields": [
            {
                "name": "account_number",
                "type": "bigint",
                "values": [
                    578,
                    6,
                    828,
                    1,
                    679
                ]
            },
            {
                "name": "firstname",
                "type": "string",
                "values": [
                    "Holmes",
                    "Hattie",
                    "Blanche",
                    "Amber",
                    "Henrietta"
                ]
            },
            {
                "name": "address",
                "type": "string",
                "values": [
                    "969 Metropolitan Avenue",
                    "671 Bristol Street",
                    "605 Stryker Court",
                    "880 Holmes Lane",
                    "461 Bond Street"
                ]
            },
            {
                "name": "balance",
                "type": "bigint",
                "values": [
                    34259,
                    5686,
                    44890,
                    39225,
                    20149
                ]
            },
            {
                "name": "gender",
                "type": "string",
                "values": [
                    "M",
                    "M",
                    "F",
                    "M",
                    "M"
                ]
            },
            {
                "name": "city",
                "type": "string",
                "values": [
                    "Aguila",
                    "Dante",
                    "Loomis",
                    "Brogan",
                    "Richville"
                ]
            },
            {
                "name": "employer",
                "type": "string",
                "values": [
                    "Cubicide",
                    "Netagy",
                    "Motovate",
                    "Pyrami",
                    "Geekol"
                ]
            },
            {
                "name": "state",
                "type": "string",
                "values": [
                    "PA",
                    "TN",
                    "KS",
                    "IL",
                    "WA"
                ]
            },
            {
                "name": "age",
                "type": "bigint",
                "values": [
                    37,
                    36,
                    33,
                    32,
                    33
                ]
            },
            {
                "name": "email",
                "type": "string",
                "values": [
                    "holmesmcknight@cubicide.com",
                    "hattiebond@netagy.com",
                    "blancheholmes@motovate.com",
                    "amberduke@pyrami.com",
                    "henriettabonner@geekol.com"
                ]
            },
            {
                "name": "lastname",
                "type": "string",
                "values": [
                    "Mcknight",
                    "Bond",
                    "Holmes",
                    "Duke",
                    "Bonner"
                ]
            }
        ],
        "size": 5
    }
}

Check List

  • All tests pass
    • yarn test:jest
    • yarn test:jest_integration
  • New functionality includes testing.
  • New functionality has been documented.
  • Update CHANGELOG.md
  • Commits are signed per the DCO using --signoff

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit f1a2106.

PathLineSeverityDescription
src/plugins/explore/public/components/data_table/table_cell/source_field_table_cell.tsx55mediumIntroduces dangerouslySetInnerHTML to render server-supplied data as raw HTML. The simultaneous change from formatHit(row, 'text') to formatHit(row) (removing the safe 'text' format) means HTML-formatted content from the server is now rendered in the DOM. While dompurify.sanitize() is applied, this pattern opens an XSS attack surface that did not exist before. Any dompurify bypass or misconfiguration would directly enable stored/reflected XSS from search result data.
src/plugins/query_enhancements/server/routes/index.ts92lowThe highlight field is validated as schema.object({}, { unknowns: 'allow' }), permitting arbitrary nested keys and values to pass through schema validation without restriction. While this follows the same pattern used for timeRange and options, it means the highlight payload forwarded to the backend search engine (OpenSearch) is essentially unvalidated. An attacker with access to the API could inject unexpected highlight parameters to manipulate backend query behavior.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 1 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 18, 2026

PR Reviewer Guide 🔍

(Review updated until commit 09d30a6)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Add shared highlight parsing utilities (parseHighlightedValue, getDisplayValue)

Relevant files:

  • src/plugins/data/common/field_formats/utils/highlight/highlight_display.tsx
  • src/plugins/data/common/field_formats/utils/highlight/highlight_display.test.tsx
  • src/plugins/data/common/field_formats/utils/highlight/index.ts
  • src/plugins/data/common/field_formats/utils/index.ts
  • src/plugins/data/common/field_formats/index.ts

Sub-PR theme: PPL backend highlight extraction and propagation through data frames

Relevant files:

  • src/plugins/query_enhancements/common/utils.ts
  • src/plugins/query_enhancements/server/routes/index.ts
  • src/plugins/query_enhancements/server/search/ppl_search_strategy.ts
  • src/plugins/query_enhancements/server/search/ppl_search_strategy.test.ts
  • src/plugins/query_enhancements/server/utils/facet.ts
  • src/plugins/data/common/data_frames/utils.ts
  • src/plugins/data/common/data_frames/utils.test.ts

Sub-PR theme: Apply highlight rendering in Explore and AgentTraces table cells

Relevant files:

  • src/plugins/explore/public/components/data_table/table_cell/source_field_table_cell.tsx
  • src/plugins/explore/public/components/data_table/table_cell/source_field_table_cell.test.tsx
  • src/plugins/agent_traces/public/components/data_table/table_cell/source_field_table_cell.tsx
  • src/plugins/agent_traces/public/components/data_table/table_cell/source_field_table_cell.test.tsx

⚡ Recommended focus areas for review

Fragment Key Collision

In parseHighlightedValue, the key prop for <mark> elements uses match.index (the character position in the string). If the same highlight tag appears at the same position across multiple renders or if the regex is reused, this could cause subtle React reconciliation issues. Consider using a sequential index instead.

parts.push(<mark key={match.index}>{match[1]}</mark>);
Fragment Joining

In getDisplayValue, multiple highlight fragments are joined with a space (join(' ')). This may not be semantically correct — highlight fragments from OpenSearch are independent snippets, not necessarily meant to be concatenated. Joining them could produce misleading or garbled display text, especially when fragments come from different parts of a long field value.

return parseHighlightedValue(highlight[fieldName].join(' '));
Mutation of Response

The code mutates rawResponse.data.schema and rawResponse.data.datarows in-place using splice. This could cause unexpected side effects if the response object is referenced elsewhere. Consider creating new arrays instead of mutating the originals.

rawResponse.data.schema.splice(hlIndex, 1);
rawResponse.data.datarows?.forEach((row: any) => row.splice(hlIndex, 1));
Highlight Condition

isPPLSearchQuery is used to gate highlight requests, but the highlight feature should also consider whether the query actually contains full-text search terms. Sending a highlight request for every PPL query (even non-search ones like aggregations) may add unnecessary overhead on the OpenSearch side.

const highlight = isPPLSearchQuery(query) ? getHighlightRequest(query.query, true) : undefined;

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 18, 2026

PR Code Suggestions ✨

Latest suggestions up to 09d30a6

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Escape special regex characters in highlight tags

The pre and post tag strings used in the regex may contain special regex characters
(like @, /, (). These should be escaped before being used in a RegExp constructor to
avoid unintended regex behavior or errors.

src/plugins/data/common/field_formats/utils/highlight/highlight_display.tsx [9]

-const highlightRegex = new RegExp(`${highlightTags.pre}(.*?)${highlightTags.post}`, 'g');
+const escapeRegex = (s: string) => s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+const highlightRegex = new RegExp(`${escapeRegex(highlightTags.pre)}(.*?)${escapeRegex(highlightTags.post)}`, 'g');
Suggestion importance[1-10]: 6

__

Why: The highlightTags.pre and highlightTags.post values (@opensearch-dashboards-highlighted-field@ and @/opensearch-dashboards-highlighted-field@) contain / and @ characters. While these aren't special regex metacharacters that would cause errors, the suggestion is technically sound as a defensive practice. The impact is low since the current tags don't contain problematic regex characters.

Low
General
Avoid mutating raw response data in-place

Array.prototype.findIndex returns -1 when not found, never undefined, so the hlIndex
!== undefined check is redundant but harmless. More critically, mutating
rawResponse.data.schema and rawResponse.data.datarows in-place with splice could
cause issues if the raw response object is referenced elsewhere. Consider working on
a copy or using filter/map instead.

src/plugins/query_enhancements/server/search/ppl_search_strategy.ts [55-61]

-const hlIndex = rawResponse.data.schema?.findIndex((s: any) => s.name === '_highlight');
+const hlIndex = rawResponse.data.schema?.findIndex((s: any) => s.name === '_highlight') ?? -1;
 let highlights: any[] | undefined;
-if (hlIndex !== undefined && hlIndex >= 0) {
+if (hlIndex >= 0) {
   highlights = rawResponse.data.datarows?.map((row: any) => row[hlIndex]) ?? [];
-  rawResponse.data.schema.splice(hlIndex, 1);
-  rawResponse.data.datarows?.forEach((row: any) => row.splice(hlIndex, 1));
+  rawResponse.data.schema = rawResponse.data.schema.filter((_: any, i: number) => i !== hlIndex);
+  rawResponse.data.datarows = rawResponse.data.datarows?.map((row: any) =>
+    row.filter((_: any, i: number) => i !== hlIndex)
+  );
 }
Suggestion importance[1-10]: 5

__

Why: Mutating rawResponse.data.schema and rawResponse.data.datarows in-place with splice could cause issues if the raw response is referenced elsewhere. Using filter/map to create new arrays is safer, though the redundant hlIndex !== undefined check fix is minor since findIndex never returns undefined.

Low
Use meaningful separator for multiple highlight fragments

Joining multiple highlight fragments with a space (' ') may produce incorrect
display when fragments are meant to be shown separately (e.g., each fragment is a
separate sentence excerpt). Consider joining with an ellipsis or a separator that
better represents discontinuous fragments, similar to how OpenSearch typically
renders highlight snippets.

src/plugins/data/common/field_formats/utils/highlight/highlight_display.tsx [44-53]

 export const getDisplayValue = (
   fieldName: string,
   formattedValue: string,
   highlight?: Record<string, string[]>
 ): React.ReactNode => {
   if (highlight && highlight[fieldName] && highlight[fieldName].length > 0) {
-    return parseHighlightedValue(highlight[fieldName].join(' '));
+    return parseHighlightedValue(highlight[fieldName].join(' ... '));
   }
   return formattedValue;
 };
Suggestion importance[1-10]: 4

__

Why: Joining highlight fragments with ' ... ' instead of ' ' better represents discontinuous text excerpts, which is a common pattern in search result highlighting. This is a minor UX improvement but doesn't affect correctness.

Low

Previous suggestions

Suggestions up to commit 9e90c55
CategorySuggestion                                                                                                                                    Impact
Possible issue
Escape special regex characters in highlight tags

The pre and post tag strings used in the regex may contain special regex characters
(like @, /, (, )) that could cause unintended regex behavior. These characters
should be escaped before being used in a RegExp constructor to ensure the pattern
matches literally.

src/plugins/data/common/field_formats/utils/highlight/highlight_display.tsx [9]

-const highlightRegex = new RegExp(`${highlightTags.pre}(.*?)${highlightTags.post}`, 'g');
+const escapeRegex = (s: string) => s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+const highlightRegex = new RegExp(`${escapeRegex(highlightTags.pre)}(.*?)${escapeRegex(highlightTags.post)}`, 'g');
Suggestion importance[1-10]: 6

__

Why: The highlightTags.pre and highlightTags.post values (@opensearch-dashboards-highlighted-field@ and @/opensearch-dashboards-highlighted-field@) contain / and @ characters. While @ is not a special regex character, / is not either in a RegExp constructor context. However, the suggestion is valid as a defensive measure since the tags could theoretically change. The impact is low given the current tag values are safe.

Low
General
Avoid mutating raw response data in-place

Mutating rawResponse.data.schema and rawResponse.data.datarows in-place with splice
is a side effect that could affect other code that references the same response
object. Consider working with copies or filtering instead of mutating the original
arrays.

src/plugins/query_enhancements/server/search/ppl_search_strategy.ts [58-60]

 highlights = rawResponse.data.datarows?.map((row: any) => row[hlIndex]) ?? [];
-rawResponse.data.schema.splice(hlIndex, 1);
-rawResponse.data.datarows?.forEach((row: any) => row.splice(hlIndex, 1));
+rawResponse.data.schema = rawResponse.data.schema.filter((_: any, i: number) => i !== hlIndex);
+rawResponse.data.datarows = rawResponse.data.datarows?.map((row: any) =>
+  row.filter((_: any, i: number) => i !== hlIndex)
+);
Suggestion importance[1-10]: 5

__

Why: Mutating rawResponse.data.schema and rawResponse.data.datarows with splice is a side effect that could affect other code referencing the same object. Using filter instead creates new arrays and avoids potential bugs, improving code safety and maintainability.

Low
Render highlight fragments separately instead of joining

Joining multiple highlight fragments with a space (' ') may produce incorrect
display when fragments are meant to be shown separately or when the original text
doesn't have spaces between them. Consider rendering each fragment separately,
similar to how parseHighlightedValue handles individual strings, to preserve correct
context.

src/plugins/data/common/field_formats/utils/highlight/highlight_display.tsx [44-53]

 export const getDisplayValue = (
   fieldName: string,
   formattedValue: string,
   highlight?: Record<string, string[]>
 ): React.ReactNode => {
   if (highlight && highlight[fieldName] && highlight[fieldName].length > 0) {
-    return parseHighlightedValue(highlight[fieldName].join(' '));
+    const fragments = highlight[fieldName];
+    if (fragments.length === 1) {
+      return parseHighlightedValue(fragments[0]);
+    }
+    return fragments.map((fragment, i) => (
+      <React.Fragment key={i}>
+        {i > 0 && ' '}
+        {parseHighlightedValue(fragment)}
+      </React.Fragment>
+    ));
   }
   return formattedValue;
 };
Suggestion importance[1-10]: 4

__

Why: Joining multiple highlight fragments with a space may produce incorrect display when fragments should be shown separately. The improved code renders each fragment independently, which is more semantically correct, though the practical impact depends on how the backend returns highlight fragments.

Low
Suggestions up to commit a3db11f
CategorySuggestion                                                                                                                                    Impact
Security
Restrict allowed HTML tags in sanitization

The dompurify.sanitize call uses default options, which may still allow certain HTML
attributes like style or event handlers depending on the DOMPurify version and
environment. Consider explicitly allowing only safe tags (e.g., , , ) to minimize
XSS risk when rendering highlight markup.

src/plugins/explore/public/components/data_table/table_cell/source_field_table_cell.tsx [59-61]

 dangerouslySetInnerHTML={{
-  __html: dompurify.sanitize(formattedRow[rawKeys[index]]),
+  __html: dompurify.sanitize(formattedRow[rawKeys[index]], {
+    ALLOWED_TAGS: ['em', 'mark', 'strong', 'b', 'i'],
+    ALLOWED_ATTR: [],
+  }),
 }}
Suggestion importance[1-10]: 7

__

Why: This is a valid security improvement that restricts dompurify.sanitize to only allow specific safe tags like em, mark, strong, reducing XSS attack surface when rendering highlight markup. The default DOMPurify configuration may allow more tags/attributes than needed for highlight rendering.

Medium
Possible issue
Fix highlight extraction guard condition

Array.findIndex returns -1 when not found, never undefined, so the hlIndex !==
undefined check is always true and the guard is effectively just hlIndex >= 0. This
is not a bug per se, but mutating rawResponse.data.schema and
rawResponse.data.datarows in-place with splice while hlIndex could be 0 (first
column) is correct, however if datarows is undefined the map call will return []
instead of undefined, causing highlights to be set to an empty array and an empty
highlights key to be added to dataFrame.meta. Guard against this case.

src/plugins/query_enhancements/server/search/ppl_search_strategy.ts [55-61]

-const hlIndex = rawResponse.data.schema?.findIndex((s: any) => s.name === '_highlight');
+const hlIndex = rawResponse.data.schema?.findIndex((s: any) => s.name === '_highlight') ?? -1;
 let highlights: any[] | undefined;
-if (hlIndex !== undefined && hlIndex >= 0) {
-  highlights = rawResponse.data.datarows?.map((row: any) => row[hlIndex]) ?? [];
+if (hlIndex >= 0 && rawResponse.data.datarows?.length > 0) {
+  highlights = rawResponse.data.datarows.map((row: any) => row[hlIndex]);
   rawResponse.data.schema.splice(hlIndex, 1);
-  rawResponse.data.datarows?.forEach((row: any) => row.splice(hlIndex, 1));
+  rawResponse.data.datarows.forEach((row: any) => row.splice(hlIndex, 1));
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that findIndex never returns undefined, making the hlIndex !== undefined check redundant. The additional guard for datarows?.length > 0 prevents setting an empty highlights array in dataFrame.meta, which is a minor but valid improvement to avoid unnecessary metadata pollution.

Low
General
Use explicit null check for highlight data

If highlightData[index] is an empty object {}, the condition highlightData?.[index]
is truthy, so an empty highlight will be attached. More importantly, if
highlightData has fewer entries than data.size, accessing out-of-bounds indices
returns undefined, which is falsy and correctly skipped — this is fine. However,
using != null is more explicit and safer than a truthy check for objects.

src/plugins/data/common/data_frames/utils.ts [127]

-...(highlightData?.[index] && { highlight: highlightData[index] }),
+...(highlightData?.[index] != null && { highlight: highlightData[index] }),
Suggestion importance[1-10]: 3

__

Why: The suggestion to use != null instead of a truthy check is marginally more explicit, but the practical difference is minimal since highlight data entries are objects (never 0, false, or empty string). The improvement is cosmetic rather than functionally significant.

Low
Suggestions up to commit 984c08e
CategorySuggestion                                                                                                                                    Impact
Security
Restrict allowed HTML tags in sanitization

The dompurify.sanitize call uses default options, which may still allow certain HTML
attributes like style or event handlers depending on the DOMPurify version and
environment. Consider explicitly restricting allowed tags to only safe ones like
and that are expected for highlighting, to minimize XSS risk.

src/plugins/explore/public/components/data_table/table_cell/source_field_table_cell.tsx [59-61]

 dangerouslySetInnerHTML={{
-  __html: dompurify.sanitize(formattedRow[rawKeys[index]]),
+  __html: dompurify.sanitize(formattedRow[rawKeys[index]], {
+    ALLOWED_TAGS: ['mark', 'em', 'strong', 'b', 'i'],
+    ALLOWED_ATTR: [],
+  }),
 }}
Suggestion importance[1-10]: 7

__

Why: This is a valid security improvement that restricts dompurify.sanitize to only allow expected highlight tags like <mark> and <em>, reducing XSS attack surface. The improved_code accurately reflects the change and is directly applicable to the existing_code.

Medium
Possible issue
Validate highlight data is an array before indexing

If highlightData has fewer entries than the number of rows (i.e.,
highlightData.length < data.size), accessing highlightData[index] for out-of-bounds
indices will return undefined, which is handled by the conditional spread. However,
if highlightData is a sparse or non-array structure, this could silently fail.
Consider adding a check that highlightData is an array before using index-based
access.

src/plugins/data/common/data_frames/utils.ts [127]

-...(highlightData?.[index] && { highlight: highlightData[index] }),
+...(Array.isArray(highlightData) && highlightData[index] && { highlight: highlightData[index] }),
Suggestion importance[1-10]: 4

__

Why: Adding Array.isArray(highlightData) is a defensive check that prevents potential runtime errors if highlightData is not an array. The improvement is minor but valid since the type of meta.highlights may not be strictly enforced at runtime.

Low
General
Avoid hardcoding highlight enablement flag

The getHighlightRequest is called with true as a hardcoded second argument, but it's
unclear if highlights should always be enabled regardless of user settings or
feature flags. If there's a user preference or configuration for enabling
highlights, it should be checked here rather than always passing true.

src/plugins/query_enhancements/common/utils.ts [74]

-const highlight = isPPLSearchQuery(query) ? getHighlightRequest(query.query, true) : undefined;
+const highlight = isPPLSearchQuery(query) ? getHighlightRequest(query.query, context.uiSettings?.highlightEnabled ?? true) : undefined;
Suggestion importance[1-10]: 2

__

Why: The suggestion proposes using context.uiSettings?.highlightEnabled but context in this scope is EnhancedFetchContext which likely doesn't have a uiSettings property with highlightEnabled. The improved_code references a property that may not exist, making this suggestion potentially incorrect and speculative.

Low
Suggestions up to commit d1d1168
CategorySuggestion                                                                                                                                    Impact
Possible issue
Ensure string coercion before sanitization

The dompurify.sanitize call may receive a non-string value (e.g., number, object)
since formattedRow values can be of any type. You should convert the value to a
string before sanitizing to avoid unexpected behavior or errors. Use
String(formattedRow[rawKeys[index]]) or similar coercion.

src/plugins/explore/public/components/data_table/table_cell/source_field_table_cell.tsx [59-61]

 dangerouslySetInnerHTML={{
-  __html: dompurify.sanitize(formattedRow[rawKeys[index]]),
+  __html: dompurify.sanitize(String(formattedRow[rawKeys[index]] ?? '')),
 }}
Suggestion importance[1-10]: 6

__

Why: formattedRow values can be non-string types (numbers, objects), and passing them directly to dompurify.sanitize could cause unexpected behavior. Converting to string first with String(value ?? '') is a reasonable defensive measure.

Low
Guard against non-string query before highlight request

query.query may be an object or undefined for some query types, but
getHighlightRequest likely expects a string. If query.query is not a string, this
could cause a runtime error or incorrect highlight request. Add a type guard to
ensure query.query is a string before passing it.

src/plugins/query_enhancements/common/utils.ts [74]

-const highlight = isPPLSearchQuery(query) ? getHighlightRequest(query.query, true) : undefined;
+const highlight =
+  isPPLSearchQuery(query) && typeof query.query === 'string'
+    ? getHighlightRequest(query.query, true)
+    : undefined;
Suggestion importance[1-10]: 4

__

Why: While query.query could theoretically be non-string, PPL queries are typically strings, and getHighlightRequest is called only when isPPLSearchQuery is true. The type guard adds safety but may be overly cautious in practice.

Low
General
Use explicit undefined check for highlight data

If highlightData[index] is an empty object {}, the condition highlightData?.[index]
is truthy, but the highlight would be meaningless. More importantly, if
highlightData has fewer entries than data.size, accessing out-of-bounds indices
returns undefined, which is already handled. However, if highlightData[index] is 0
or another falsy value, it would be incorrectly skipped. Use a more explicit check
with !== undefined to be safe.

src/plugins/data/common/data_frames/utils.ts [127]

-...(highlightData?.[index] && { highlight: highlightData[index] }),
+...(highlightData?.[index] !== undefined && { highlight: highlightData[index] }),
Suggestion importance[1-10]: 3

__

Why: The suggestion to use !== undefined is technically more precise, but in practice highlightData[index] will always be an object (never 0 or other falsy primitives), making this a minor defensive improvement with low practical impact.

Low
Suggestions up to commit f1a2106
CategorySuggestion                                                                                                                                    Impact
Possible issue
Ensure string conversion before sanitization

The formattedRow[rawKeys[index]] value may not be a string, and passing a non-string
to dompurify.sanitize could produce unexpected results. Ensure the value is
converted to a string before sanitizing. Also, dompurify.sanitize returns a string
but TypeScript may complain without an explicit cast.

src/plugins/explore/public/components/data_table/table_cell/source_field_table_cell.tsx [59-61]

 dangerouslySetInnerHTML={{
-  __html: dompurify.sanitize(formattedRow[rawKeys[index]]),
+  __html: dompurify.sanitize(String(formattedRow[rawKeys[index]] ?? '')),
 }}
Suggestion importance[1-10]: 5

__

Why: The formattedRow values could be non-string types, and passing them directly to dompurify.sanitize may produce unexpected results. Converting to string first is a reasonable defensive measure, though dompurify generally handles various input types.

Low
General
Use explicit null check for highlight data

Using highlightData?.[index] as a truthy check will skip adding the highlight if the
value is 0, false, or an empty array/object. Since highlight data is typically an
object or array, use an explicit != null check to be safe and avoid accidentally
omitting valid highlight entries.

src/plugins/data/common/data_frames/utils.ts [127]

-...(highlightData?.[index] && { highlight: highlightData[index] }),
+...(highlightData?.[index] != null && { highlight: highlightData[index] }),
Suggestion importance[1-10]: 4

__

Why: The suggestion to use != null instead of a truthy check is valid for robustness, but highlight data is typically an object/array which would be truthy anyway, making this a minor edge case improvement.

Low
Guard against undefined query before highlight request

query.query may be undefined or not a string when isPPLSearchQuery returns true,
which could cause getHighlightRequest to behave unexpectedly or throw. Add a guard
to ensure query.query is a non-empty string before calling getHighlightRequest.

src/plugins/query_enhancements/common/utils.ts [74]

-const highlight = isPPLSearchQuery(query) ? getHighlightRequest(query.query, true) : undefined;
+const highlight =
+  isPPLSearchQuery(query) && query.query
+    ? getHighlightRequest(query.query, true)
+    : undefined;
Suggestion importance[1-10]: 4

__

Why: Adding a guard for query.query being falsy before calling getHighlightRequest is a reasonable defensive check, though isPPLSearchQuery likely already ensures a valid query structure exists.

Low

@github-actions
Copy link
Copy Markdown
Contributor

❌ Empty Changelog Section

The Changelog section in your PR description is empty. Please add a valid changelog entry or entries. If you did add a changelog entry, check to make sure that it was not accidentally included inside the comment block in the Changelog section.

@RyanL1997 RyanL1997 added enhancement New feature or request explore labels Mar 18, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 87.87879% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.49%. Comparing base (a204184) to head (09d30a6).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/plugins/query_enhancements/common/utils.ts 0.00% 2 Missing ⚠️
..._enhancements/server/search/ppl_search_strategy.ts 85.71% 0 Missing and 1 partial ⚠️
...c/plugins/query_enhancements/server/utils/facet.ts 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #11547   +/-   ##
=======================================
  Coverage   61.48%   61.49%           
=======================================
  Files        4988     4989    +1     
  Lines      136636   136668   +32     
  Branches    23630    23641   +11     
=======================================
+ Hits        84009    84038   +29     
- Misses      46576    46578    +2     
- Partials     6051     6052    +1     
Flag Coverage Δ
Linux_1 25.05% <14.28%> (-0.01%) ⬇️
Linux_2 39.40% <14.28%> (-0.01%) ⬇️
Linux_3 26.62% <63.63%> (+0.01%) ⬆️
Linux_4 29.50% <61.90%> (+0.01%) ⬆️
Linux_5 39.01% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 18, 2026

✅ All unit and integration tests passing

🔗 Workflow run · commit 09d30a6fb1ab4e5885283bae4aaf53ded7b87c76

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit d1d1168.

PathLineSeverityDescription
src/plugins/explore/public/components/data_table/table_cell/source_field_table_cell.tsx55lowdangerouslySetInnerHTML is used to render search highlight markup. dompurify.sanitize() is applied, which is appropriate mitigation. The change also removes the 'text' format hint from formatHit(), meaning the function now returns HTML instead of plain text — this is intentional and consistent with the highlighting feature, but the eslint-disable comment acknowledges the risk is accepted. Not malicious, but increases XSS attack surface if dompurify is ever misconfigured or bypassed.
src/plugins/query_enhancements/server/routes/index.ts92lowThe new 'highlight' route parameter uses schema.object({}, { unknowns: 'allow' }), permitting arbitrary key-value pairs to pass through to the backend OpenSearch query without field-level validation. This is consistent with how 'timeRange' and 'options' are handled in the same schema, so it appears to be an accepted pattern rather than a deliberate bypass, but it allows a client to inject arbitrary highlight configuration into backend queries.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit d1d1168

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 984c08e.

PathLineSeverityDescription
src/plugins/explore/public/components/data_table/table_cell/source_field_table_cell.tsx55mediumSwitches from safe text rendering to dangerouslySetInnerHTML to support highlight markers. DOMPurify sanitization is applied, which is a proper mitigation, but this expands the XSS attack surface. The security of this change depends entirely on DOMPurify's sanitization being complete and correct. The accompanying change removing the 'text' mode from formatHit (line 34) means the source of the HTML is now the formatted hit output rather than plain text, making the sanitization boundary critical.
src/plugins/query_enhancements/server/routes/index.ts92lowHighlight schema is defined with unknowns: 'allow', permitting arbitrary unknown fields to be included in the highlight configuration object and forwarded to the backend OpenSearch/PPL query. While this follows an existing pattern in the codebase for other fields (timeRange, options), it means arbitrary user-controlled data can be injected into the backend query body under the highlight key without schema validation.
src/plugins/explore/public/components/data_table/table_cell/source_field_table_cell.tsx34lowformatHit call changes from formatHit(row, 'text') to formatHit(row) without the 'text' mode argument. This is intentional to allow HTML highlight markup to pass through, but it silently changes the output format of the entire row rendering pipeline, not just highlighted fields. Any field value that previously relied on the 'text' mode for escaping or normalization will now return HTML-formatted output, relying solely on DOMPurify for safety.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 1 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 984c08e

@github-actions
Copy link
Copy Markdown
Contributor

❌ Invalid Changelog Heading

The '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax.

10 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

❌ Invalid Changelog Heading

The '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Invalid Changelog Heading

The '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Invalid Changelog Heading

The '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Invalid Changelog Heading

The '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Invalid Changelog Heading

The '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Invalid Changelog Heading

The '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Invalid Changelog Heading

The '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Invalid Changelog Heading

The '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Invalid Changelog Heading

The '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Invalid Changelog Heading

The '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax.

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit a3db11f.

PathLineSeverityDescription
src/plugins/explore/public/components/data_table/table_cell/source_field_table_cell.tsx55mediumIntroduces dangerouslySetInnerHTML for rendering search highlight content. DOMPurify sanitization is applied, which is the correct mitigation, but this widens the XSS attack surface compared to the previous plain-text rendering. If DOMPurify has a bypass or the highlight data originates from untrusted sources beyond the search backend, HTML injection is possible. The change is contextually justified by the highlight feature but represents a meaningful increase in XSS risk.
src/plugins/explore/public/components/data_table/table_cell/source_field_table_cell.tsx35lowThe call to dataset.formatHit is changed from formatHit(row, 'text') to formatHit(row), removing the explicit 'text' format specifier. This alters the output from guaranteed plain text to potentially HTML-formatted strings, which is relied upon downstream by dangerouslySetInnerHTML. If formatHit can produce HTML from user-controlled field values beyond highlight tags, DOMPurify becomes the sole XSS defense.
src/plugins/query_enhancements/server/routes/index.ts92lowThe highlight field in the route schema is declared as schema.object({}, { unknowns: 'allow' }), permitting arbitrary unknown properties to pass through without validation. This is consistent with the pattern used for timeRange and options in the same route, but it means highlight payloads with arbitrary structure are forwarded to the backend search endpoint without schema enforcement.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 1 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a3db11f

@RyanL1997 RyanL1997 marked this pull request as ready for review March 19, 2026 21:37
dangerouslySetInnerHTML={{
__html: dompurify.sanitize(formattedRow[rawKeys[index]]),
}}
/>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 //My thoughts on solution without dangerouslySetInnerHTML. Open to either way though

Looking at the current implementation in src/plugins/explore/public/components/data_table/table_cell/source_field_table_cell.tsx:35-60, the PR changes:

  • const formattedRow = dataset.formatHit(row, 'text');
  • const formattedRow = dataset.formatHit(row); // defaults to 'html'

This means the formatted values now contain HTML with tags (converted by getHighlightHtml() in src/plugins/data/common/field_formats/utils/highlight/highlight_html.ts).

The issue: The codebase already has a utility (getHighlightHtml()) that converts custom tags to HTML tags, but you could skip that step entirely and convert the custom tags directly to React components.


Recommended Implementation

Here's what I suggest for this PR:

import { highlightTags } from 'src/plugins/data/common/field_formats/utils/highlight/highlight_tags';

// Add this utility function
const parseHighlightedValue = (value: string): React.ReactNode => {
if (typeof value !== 'string') return value;

const regex = new RegExp(
  `${escapeRegExp(highlightTags.pre)}(.*?)${escapeRegExp(highlightTags.post)}`,
  'g'
);

const parts: React.ReactNode[] = [];
let lastIndex = 0;
let match;

while ((match = regex.exec(value)) !== null) {
  // Add text before the match
  if (match.index > lastIndex) {
    parts.push(value.substring(lastIndex, match.index));
  }
  // Add highlighted text
  parts.push(<mark key={match.index}>{match[1]}</mark>);
  lastIndex = regex.lastIndex;
}

// Add remaining text
if (lastIndex < value.length) {
  parts.push(value.substring(lastIndex));
}

return parts.length > 0 ? parts : value;

};

// Then in the component:
const formattedRow = dataset.formatHit(row, 'text'); // Keep as text!

{parseHighlightedValue(formattedRow[rawKeys[index]])}

Benefits:

  • Eliminates XSS risk entirely
  • No dangerouslySetInnerHTML
  • No DOMPurify dependency needed
  • More maintainable and testable
  • Consistent with React best practices

Copy link
Copy Markdown
Contributor Author

@RyanL1997 RyanL1997 Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - related to #11390:

This was originally removed in PR #11390 as a performance optimization — that PR switched to formatHit(row, 'text')\ with plain text rendering since highlighting wasn't on the roadmap at the time. We're reverting that specific change because highlight rendering requires the HTML format path. We keep dompurify.sanitize() to prevent XSS from any untrusted content in the formatted values.

Copy link
Copy Markdown
Contributor Author

@RyanL1997 RyanL1997 Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, lets keep the momentum of what #11390 did for performance optimization.

We've adopted the React-based approach — removed dangerouslySetInnerHTML and DOMPurify entirely. We keep formatHit(row, 'text') and since the text format path doesn't include highlight data (that only happens in the html path via getHighlightHtml), we read fragments directly from row.highlight and parse the custom tags into React <mark> elements. The parsing utilities (parseHighlightedValue / getDisplayValue) are centralized in data/common/field_formats/utils/highlight/highlight_display.tsx alongside the existing highlight helpers, with dedicated unit tests.

cc @joshuali925 @TackAdam

TackAdam
TackAdam previously approved these changes Mar 19, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 9e90c55

Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
@RyanL1997 RyanL1997 force-pushed the ppl-syntax-highlighting branch from 9e90c55 to 09d30a6 Compare March 19, 2026 23:39
@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 09d30a6.

PathLineSeverityDescription
src/plugins/query_enhancements/server/routes/index.ts92lowThe highlight schema uses unknowns: 'allow', permitting arbitrary properties to be forwarded to the OpenSearch backend. This is consistent with other parameters in the codebase and matches a legitimate PPL highlight feature, but it expands the surface for unexpected OpenSearch query options to be injected via the highlight body field.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 09d30a6

@mengweieric mengweieric merged commit 7782a80 into opensearch-project:main Mar 20, 2026
79 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants