[Feature] PPL Search Result Highlight Support in Explore#11547
[Feature] PPL Search Result Highlight Support in Explore#11547mengweieric merged 6 commits intoopensearch-project:mainfrom
Conversation
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit f1a2106.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
PR Reviewer Guide 🔍(Review updated until commit 09d30a6)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 09d30a6 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 9e90c55
Suggestions up to commit a3db11f
Suggestions up to commit 984c08e
Suggestions up to commit d1d1168
Suggestions up to commit f1a2106
|
❌ Empty Changelog SectionThe Changelog section in your PR description is empty. Please add a valid changelog entry or entries. If you did add a changelog entry, check to make sure that it was not accidentally included inside the comment block in the Changelog section. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #11547 +/- ##
=======================================
Coverage 61.48% 61.49%
=======================================
Files 4988 4989 +1
Lines 136636 136668 +32
Branches 23630 23641 +11
=======================================
+ Hits 84009 84038 +29
- Misses 46576 46578 +2
- Partials 6051 6052 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
✅ All unit and integration tests passing
|
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit d1d1168.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit d1d1168 |
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 984c08e.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit 984c08e |
❌ Invalid Changelog HeadingThe '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax. |
10 similar comments
❌ Invalid Changelog HeadingThe '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax. |
❌ Invalid Changelog HeadingThe '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax. |
❌ Invalid Changelog HeadingThe '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax. |
❌ Invalid Changelog HeadingThe '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax. |
❌ Invalid Changelog HeadingThe '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax. |
❌ Invalid Changelog HeadingThe '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax. |
❌ Invalid Changelog HeadingThe '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax. |
❌ Invalid Changelog HeadingThe '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax. |
❌ Invalid Changelog HeadingThe '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax. |
❌ Invalid Changelog HeadingThe '## Changelog' heading in your PR description is either missing or malformed. Please make sure that your PR description includes a '## Changelog' heading with proper spelling, capitalization, spacing, and Markdown syntax. |
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit a3db11f.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit a3db11f |
| dangerouslySetInnerHTML={{ | ||
| __html: dompurify.sanitize(formattedRow[rawKeys[index]]), | ||
| }} | ||
| /> |
There was a problem hiding this comment.
+1 //My thoughts on solution without dangerouslySetInnerHTML. Open to either way though
Looking at the current implementation in src/plugins/explore/public/components/data_table/table_cell/source_field_table_cell.tsx:35-60, the PR changes:
- const formattedRow = dataset.formatHit(row, 'text');
- const formattedRow = dataset.formatHit(row); // defaults to 'html'
This means the formatted values now contain HTML with tags (converted by getHighlightHtml() in src/plugins/data/common/field_formats/utils/highlight/highlight_html.ts).
The issue: The codebase already has a utility (getHighlightHtml()) that converts custom tags to HTML tags, but you could skip that step entirely and convert the custom tags directly to React components.
Recommended Implementation
Here's what I suggest for this PR:
import { highlightTags } from 'src/plugins/data/common/field_formats/utils/highlight/highlight_tags';
// Add this utility function
const parseHighlightedValue = (value: string): React.ReactNode => {
if (typeof value !== 'string') return value;
const regex = new RegExp(
`${escapeRegExp(highlightTags.pre)}(.*?)${escapeRegExp(highlightTags.post)}`,
'g'
);
const parts: React.ReactNode[] = [];
let lastIndex = 0;
let match;
while ((match = regex.exec(value)) !== null) {
// Add text before the match
if (match.index > lastIndex) {
parts.push(value.substring(lastIndex, match.index));
}
// Add highlighted text
parts.push(<mark key={match.index}>{match[1]}</mark>);
lastIndex = regex.lastIndex;
}
// Add remaining text
if (lastIndex < value.length) {
parts.push(value.substring(lastIndex));
}
return parts.length > 0 ? parts : value;
};
// Then in the component:
const formattedRow = dataset.formatHit(row, 'text'); // Keep as text!
Benefits:
- Eliminates XSS risk entirely
- No dangerouslySetInnerHTML
- No DOMPurify dependency needed
- More maintainable and testable
- Consistent with React best practices
There was a problem hiding this comment.
Good question - related to #11390:
This was originally removed in PR #11390 as a performance optimization — that PR switched to formatHit(row, 'text')\ with plain text rendering since highlighting wasn't on the roadmap at the time. We're reverting that specific change because highlight rendering requires the HTML format path. We keep dompurify.sanitize() to prevent XSS from any untrusted content in the formatted values.
There was a problem hiding this comment.
Actually, lets keep the momentum of what #11390 did for performance optimization.
We've adopted the React-based approach — removed dangerouslySetInnerHTML and DOMPurify entirely. We keep formatHit(row, 'text') and since the text format path doesn't include highlight data (that only happens in the html path via getHighlightHtml), we read fragments directly from row.highlight and parse the custom tags into React <mark> elements. The parsing utilities (parseHighlightedValue / getDisplayValue) are centralized in data/common/field_formats/utils/highlight/highlight_display.tsx alongside the existing highlight helpers, with dedicated unit tests.
|
Persistent review updated to latest commit 9e90c55 |
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
9e90c55 to
09d30a6
Compare
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 09d30a6.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit 09d30a6 |
Description
When user run a full-text search in PPL, matching terms are now automatically highlighted in the results — the same highlighting experience you get with DQL and Lucene queries, now available with PPL.
Issues Resolved
Demo
ppl-h-no-aud.mp4
Sample Request/Response
Details
Check List
yarn test:jestyarn test:jest_integration