Skip to content

Uplift upstream CL: add URL-id filter to vector search#36845

Merged
darkdh merged 1 commit into
masterfrom
tab-link-history-embeddings
Jun 11, 2026
Merged

Uplift upstream CL: add URL-id filter to vector search#36845
darkdh merged 1 commit into
masterfrom
tab-link-history-embeddings

Conversation

@darkdh

@darkdh darkdh commented May 29, 2026

Copy link
Copy Markdown
Member

Resolves brave/brave-browser#55935

Mirrors https://chromium-review.googlesource.com/c/chromium/src/+/7902082
(merged as 95bcdbb98451). Threads a vector url_id_filter from
HistoryEmbeddingsSearch::Search through SearchParams into
VectorDatabase::MakeUrlDataIterator. When non-empty, SqlDatabase fetches
only matching rows via 'WHERE passages.url_id IN (?,...)' in a single
statement; VectorDatabaseInMemory honors it inline.

Existing upstream callers (omnibox provider, WebUI handler, AI data
service) pass {} to preserve current behavior.

Patches drop once the
local chromium baseline includes the merged CL.

@github-actions

Copy link
Copy Markdown
Contributor

Chromium major version is behind target branch (148.0.7778.167 vs 149.0.7827.23). Please rebase.

@github-actions github-actions Bot added the chromium-version-mismatch The Chromium version on the PR branch does not match the version on the target branch label May 29, 2026
@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

@darkdh darkdh force-pushed the tab-link-history-embeddings branch from 0565eb3 to f23b23d Compare May 29, 2026 17:53
@github-actions github-actions Bot removed the chromium-version-mismatch The Chromium version on the PR branch does not match the version on the target branch label May 29, 2026
@darkdh darkdh force-pushed the tab-link-history-embeddings branch from f23b23d to 3f53753 Compare May 29, 2026 20:40
@darkdh darkdh requested a review from bridiver June 2, 2026 23:22
@darkdh darkdh marked this pull request as ready for review June 2, 2026 23:22
@darkdh darkdh requested review from a team as code owners June 2, 2026 23:22
@darkdh darkdh force-pushed the tab-link-history-embeddings branch from 3f53753 to 5e87039 Compare June 2, 2026 23:37
Comment thread patches/components-history_embeddings-content-history_embeddings_service.cc.patch Outdated
@darkdh darkdh force-pushed the tab-link-history-embeddings branch 2 times, most recently from 72cc3ea to 2b5463b Compare June 10, 2026 19:59
@darkdh darkdh changed the title Add SQL-level URL-id filtering to HistoryEmbeddings Search Uplift upstream CL: add URL-id filter to vector search Jun 10, 2026
@brave-builds

Copy link
Copy Markdown
Collaborator

Warning

You have got a presubmit warning. Please address it if possible.

Patch should not add or remove empty lines at hunk boundaries

Items:

patches/components-history_embeddings-core-sql_database.cc.patch:29 (-empty line)
patches/components-history_embeddings-core-sql_database.cc.patch:38 (+empty line)
patches/components-history_embeddings-core-vector_database.cc.patch:52 (-empty line)
patches/components-history_embeddings-core-vector_database.h.patch:9 (+empty line)

@darkdh darkdh force-pushed the tab-link-history-embeddings branch from 2b5463b to a71d746 Compare June 10, 2026 21:20
@github-actions

Copy link
Copy Markdown
Contributor

[puLL-Merge] - brave/brave-core@36845

Description

Upstream Chromium added a new url_id_filter parameter (std::vector<history::URLID>) to the HistoryEmbeddingsSearch::Search() interface and related storage/iterator methods. This PR adapts Brave's code and patches to pass the new parameter (always empty {} at all current call sites), maintaining API compatibility. The filter allows restricting embedding searches to specific URL IDs when non-empty.

Possible Issues

  • GetUniqueStatement replaces GetCachedStatement in sql_database.cc patch: dynamically-built SQL strings can't use cached statements, but this means a new statement is prepared on every MakeUrlDataIterator call. For large url_id_filter vectors, this generates long SQL strings with many ? placeholders — potential perf concern with very large filter sets (hundreds/thousands of IDs).
  • Linear scan in VectorDatabaseInMemory: std::ranges::contains on url_id_filter_ is O(n) per row. For large filters, converting to base::flat_set or std::unordered_set would be more efficient.

Security Hotspots

  • SQL injection surface in sql_database.cc patch: The dynamic SQL construction builds the IN (?,?,...) clause using url_id_filter.size(). While only ? placeholders are appended (values bound separately), any future modification that interpolates values directly would be dangerous. Current implementation is safe.
Changes

Changes

  • browser/ai_chat/tools/history_search_tool.cc: Added /*url_id_filter=*/{} argument to Search() call.
  • chromium_src/.../history_embeddings_service_unittest.cc: Added /*url_id_filter=*/{} to test Search() call.
  • patches/...history_embeddings_search.h.patch: Added url_id_filter param to abstract Search() interface + doc comment.
  • patches/...vector_database.h.patch: Added url_id_filter field to SearchParams, updated MakeUrlDataIterator signature.
  • patches/...vector_database.cc.patch: FindNearest passes url_id_filter to iterator. VectorDatabaseInMemory::MakeUrlDataIterator filters rows by url_id_filter. Refactored time-range filtering to unified loop.
  • patches/...sql_database.h.patch: Updated MakeUrlDataIterator signature.
  • patches/...sql_database.cc.patch: Replaced two static SQL strings with dynamically-built query adding WHERE TRUE + optional AND visit_time >= ? + optional AND url_id IN (?,...). Switched from GetCachedStatement to GetUniqueStatement.
  • patches/...history_embeddings_service.h.patch / .cc.patch: Added url_id_filter param to HistoryEmbeddingsService::Search(), moves it into search_params.
  • patches/...history_embeddings_service_unittest.cc.patch: 30+ test call sites updated with new param.
  • patches/...history_embeddings_service_browsertest.cc.patch: 13 browser test call sites updated.
  • patches/...ai_data_keyed_service.cc.patch, ...history_embeddings_handler.cc.patch, ...history_embeddings_provider.cc.patch, ...history_embeddings_provider_unittest.cc.patch, ...sql_database_unittest.cc.patch: All callers updated with {}.
sequenceDiagram
    participant Caller as Caller (AI Chat / Omnibox / WebUI)
    participant Service as HistoryEmbeddingsService
    participant Storage as SqlDatabase / VectorDatabaseInMemory
    participant Iterator as UrlDataIterator

    Caller->>Service: Search(query, time_range, count, skip_answering, url_id_filter, callback)
    Service->>Service: Populate SearchParams.url_id_filter
    Service->>Storage: FindNearest(query_embedding, search_params)
    Storage->>Storage: MakeUrlDataIterator(time_range_start, url_id_filter)
    Storage->>Iterator: Create (build SQL with optional WHERE clauses / in-memory filter)
    loop For each row
        Iterator-->>Storage: Next() → UrlData (filtered by time + url_id)
        Storage->>Storage: Score embedding similarity
    end
    Storage-->>Service: SearchInfo (scored results)
    Service-->>Caller: callback(SearchResult)
Loading

Mirrors https://chromium-review.googlesource.com/c/chromium/src/+/7902082
(merged as 95bcdbb98451). Threads a vector<URLID> url_id_filter from
HistoryEmbeddingsSearch::Search through SearchParams into
VectorDatabase::MakeUrlDataIterator. When non-empty, SqlDatabase fetches
only matching rows via 'WHERE passages.url_id IN (?,...)' in a single
statement; VectorDatabaseInMemory honors it inline.

Existing upstream callers (omnibox provider, WebUI handler, AI data
service) pass {} to preserve current behavior. Patches drop once the
local chromium baseline includes the merged CL.
@darkdh darkdh force-pushed the tab-link-history-embeddings branch from a71d746 to 189dd13 Compare June 10, 2026 21:43
@darkdh darkdh enabled auto-merge (squash) June 10, 2026 23:55
@darkdh darkdh disabled auto-merge June 11, 2026 00:09
@darkdh darkdh enabled auto-merge (squash) June 11, 2026 00:10
@darkdh darkdh disabled auto-merge June 11, 2026 00:10
@darkdh darkdh merged commit f7804b0 into master Jun 11, 2026
19 checks passed
@darkdh darkdh deleted the tab-link-history-embeddings branch June 11, 2026 00:48
@brave-builds brave-builds added this to the 1.93.x - Nightly milestone Jun 11, 2026
@brave-builds

Copy link
Copy Markdown
Collaborator

Released in v1.93.55

bridiverbot pushed a commit to bridiverbot/brave-core that referenced this pull request Jun 11, 2026
Mirrors https://chromium-review.googlesource.com/c/chromium/src/+/7902082
(merged as 95bcdbb98451). Threads a vector<URLID> url_id_filter from
HistoryEmbeddingsSearch::Search through SearchParams into
VectorDatabase::MakeUrlDataIterator. When non-empty, SqlDatabase fetches
only matching rows via 'WHERE passages.url_id IN (?,...)' in a single
statement; VectorDatabaseInMemory honors it inline.

Existing upstream callers (omnibox provider, WebUI handler, AI data
service) pass {} to preserve current behavior. Patches drop once the
local chromium baseline includes the merged CL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expand upstream HistoryEmbeddingsSearch

4 participants