changed urls to phishing-links-ACTIVE.txt #17532

maxkpower · 2025-11-21T00:29:29Z

🎟️ Tracking

https://bitwarden.atlassian.net/browse/PM-28548

📔 Objective

Migrates phishing detection from domain-based to URL-based matching.

Detection Logic:

Changed from hostname-only matching to full URL matching with prefix support
Added support for subpath matching (e.g., /login matches /login/oauth)
Implemented case-insensitive URL normalization
Handles trailing slashes and query parameters correctly

📸 Screenshots

⏰ Reminders before review

Contributor guidelines followed
All formatters and local linters executed and passed
Written new unit and / or integration tests where applicable
Protected functional changes with optionality (feature flags)
Used internationalization (i18n) for all UI strings
CI builds passed
Communicated to DevOps any deployment requirements
Updated any necessary documentation (Confluence, contributing docs) or informed the documentation team

🦮 Reviewer guidelines

👍 (:+1:) or similar for great changes
📝 (:memo:) or ℹ️ (:information_source:) for notes or general info
❓ (:question:) for questions
🤔 (:thinking:) or 💭 (:thought_balloon:) for more open inquiry that's not quite a confirmed issue and could potentially benefit from discussion
🎨 (:art:) for suggestions / improvements
❌ (:x:) or ⚠️ (:warning:) for more significant problems or concerns needing attention
🌱 (:seedling:) or ♻️ (:recycle:) for future improvements or indications of technical debt
⛏ (:pick:) for minor or nitpick changes

claude · 2025-11-21T00:31:08Z

Claude finished @maxkpower's task —— View job

Code Review Complete

Summary

This PR migrates phishing detection from domain-based to URL-based matching. The implementation adds valuable functionality for path-specific phishing detection, but there are several concerns that should be addressed before merging.

Findings

Finding 1: Performance issue - Linear search on every page navigation could cause noticeable delays with large databases (apps/browser/src/dirt/phishing-detection/services/phishing-data.service.ts:147)

Finding 2: Query parameter bypass vulnerability - URLs with query params in database can be bypassed (apps/browser/src/dirt/phishing-detection/services/phishing-data.service.ts:155)

Finding 3: Session-based ignore behavior changed without documentation - Security trade-off not mentioned in PR description (apps/browser/src/dirt/phishing-detection/services/phishing-detection.service.ts:76)

Finding 4: Test link protocol mismatch - QA test link changed from domain to HTTP-only URL (apps/browser/src/dirt/phishing-detection/services/phishing-data.service.ts:63)

Finding 5: Redundant trailing slash handling adds unnecessary complexity (apps/browser/src/dirt/phishing-detection/services/phishing-data.service.ts:160)

Finding 6: Technical debt - Field named domains stores URLs, creating maintainability issues (apps/browser/src/dirt/phishing-detection/services/phishing-data.service.ts:24)

Finding 7: Missing test coverage for edge cases - Query params, fragments, ports, protocols, and path case sensitivity not tested (apps/browser/src/dirt/phishing-detection/services/phishing-data.service.spec.ts:123)

PR Description Feedback

💭 The PR description could be improved by documenting:

The behavioral change to session-based hostname ignoring (previously warned on each visit)
How the system handles protocol differences (HTTP vs HTTPS)
Performance considerations for the linear search implementation
Migration strategy for existing cached domain data

claude · 2025-11-21T00:32:13Z

apps/browser/src/dirt/phishing-detection/services/phishing-data.service.ts

+
+      // Prefix match: Check if the current URL starts with the phishing link
+      // This handles cases where the user visits a subpath of a known phishing URL
+      if (normalizedUrl.startsWith(link + "/") || normalizedUrl.startsWith(link + "?")) {


⚠️ Potential bypass with query parameters

The prefix matching logic has a gap. Consider this scenario:

Database contains: http://phish.com/login?param=malicious
User visits: http://phish.com/login

The current logic checks normalizedUrl.startsWith(link + "?") which would fail here since:

normalizedUrl = "http://phish.com/login"

link = "http://phish.com/login?param=malicious"

Check: "http://phish.com/login".startsWith("http://phish.com/login?param=malicious" + "?") → false

Impact: If the phishing database contains URLs with query parameters, users can bypass detection by visiting the same path without those parameters.

Suggested fix:

// Parse both URLs to compare base path separately from query params const normalizedUrlObj = new URL(normalizedUrl); const linkObj = new URL(link); // Compare without query params first const normalizedWithoutQuery = normalizedUrlObj.origin + normalizedUrlObj.pathname; const linkWithoutQuery = linkObj.origin + linkObj.pathname; if (normalizedWithoutQuery === linkWithoutQuery) { return true; }

Or consider if the database should only contain paths without query parameters.

claude · 2025-11-21T00:32:26Z

apps/browser/src/dirt/phishing-detection/services/phishing-data.service.ts

+    // - "http://phish.com/login/oauth" (subpath match)
+    // - "http://phish.com/login?param=value" (with query params)
+
+    for (const link of links) {


❌ Performance issue: O(n) lookup on every navigation

This linear search through all phishing links happens on every single page navigation. With the phishing database containing thousands of URLs, this could cause noticeable delays.

Current complexity: O(n) where n = number of phishing links
Expected database size: Potentially 10,000+ URLs

Impact:

Slows down every page load in the browser

Scales poorly as database grows

Multiple string operations per link (prefix checks, trailing slash handling)

Recommended solution:
Use a more efficient data structure. Consider:

Trie (Prefix Tree): Optimal for prefix matching, O(m) lookup where m = URL length

Hash map by hostname: Pre-group by hostname for O(1) hostname lookup, then check paths

Pre-computed normalized Set: At minimum, remove the trailing slash logic from the hot path

Example optimization using hostname grouping:

private _linksByHostname$ = this._cachedState.state$.pipe( map((state) => { const map = new Map<string, Set<string>>(); for (const link of state?.domains ?? []) { const url = new URL(link); const hostname = url.hostname.toLowerCase(); if (!map.has(hostname)) { map.set(hostname, new Set()); } map.get(hostname)!.add(url.pathname.toLowerCase().replace(/\/$/, "")); } return map; }) );

This would reduce the search space significantly when checking URLs.

claude · 2025-11-21T00:32:34Z

apps/browser/src/dirt/phishing-detection/services/phishing-data.service.ts

+      }
+
+      // Also check if the link in database has a trailing slash
+      if (link.endsWith("/")) {


💭 Redundant trailing slash handling

The code handles trailing slashes in three different places:

Line 60: Normalization when loading data

Line 139: Normalization of input URL

Lines 160-168: Additional trailing slash checks during matching

Question: Is the third check (lines 160-168) necessary given that both the database links and input URL are already normalized to remove trailing slashes?

If both link and normalizedUrl have trailing slashes removed at lines 60 and 139, when would link.endsWith("/") be true at line 160?

Consider removing this redundant check or clarifying why it's needed despite prior normalization.

claude · 2025-11-21T00:32:43Z

apps/browser/src/dirt/phishing-detection/services/phishing-data.service.ts

+            .map((line) => line.toLowerCase().replace(/\/$/, "")) // Normalize URLs
+            .concat(
+              this._testLinks,
+              "http://phishing.testcategory.com", // Included for QA to test in prod


⚠️ Test link missing protocol

The test link "http://phishing.testcategory.com" is added here, but it's being concatenated with already-normalized URLs that have protocols.

However, the old test link was "phishing.testcategory.com" (line 63 in the previous version). Now it's "http://phishing.testcategory.com".

Question: Does QA expect to test with http:// or https:// protocol? If they use https://phishing.testcategory.com, it won't match since only the http:// version is in the set.

Consider adding both protocols or documenting which protocol QA should use.

claude · 2025-11-21T00:35:36Z

apps/browser/src/dirt/phishing-detection/services/phishing-data.service.spec.ts

+      const url = new URL("HTTP://PHISH.COM");
      const result = await service.isPhishingDomain(url);
      expect(result).toBe(true);
    });


🎨 Missing test cases for comprehensive URL matching

The test suite doesn't cover several important edge cases for the new URL-based matching logic:

Missing scenarios:

Query parameter handling: Does http://phish.com/login?param=value match http://phish.com/login?

Fragment/hash handling: Does http://phish.com/login#section match http://phish.com/login?

Port numbers: Does http://phish.com:8080/login match http://phish.com/login?

Protocol differences: Does https://phish.com/login match http://phish.com/login?

Case sensitivity in paths: Does http://phish.com/LOGIN match http://phish.com/login?

Similar prefixes (false positive check): Ensure http://phish.com/log doesn't match http://phish.com/login

Consider adding tests for these scenarios to ensure the matching logic behaves as expected.

claude · 2025-11-21T00:35:45Z

apps/browser/src/dirt/phishing-detection/services/phishing-detection.service.ts

      concatMap(async (message) => {
        const url = new URL(message.url);
+        // Store the hostname so we ignore all URLs on this domain for the session
        this._ignoredHostnames.add(url.hostname);


💭 Session-based ignore behavior change

The behavior changed from "block again on next visit" to "allow all pages on this domain for the session."

Previous behavior: User clicks "continue" → navigates to phishing site → if they visit again later in the same session, they'd be warned again
New behavior: User clicks "continue" once → all future visits to that hostname are allowed for the entire session

Questions:

Is this intentional? The comment was updated but the actual behavioral change isn't mentioned in the PR description.

Should this be more granular (per-URL) instead of per-hostname? For example, if a user allows http://phish.com/safe-page, should http://phish.com/login also be allowed?

This seems like a security trade-off that should be explicitly acknowledged in the PR description.

claude · 2025-11-21T00:35:54Z

apps/browser/src/dirt/phishing-detection/services/phishing-data.service.ts


 export type PhishingData = {
-  domains: string[];
+  domains: string[]; // Note: Despite the name, this now stores full phishing URLs/links


♻️ Technical debt: Misleading field name

The field is named domains but now stores full URLs. This creates confusion and technical debt.

Why this matters:

Code readers will expect domain strings like "phish.com", not URLs like "http://phish.com/login"

The type definition doesn't reflect the actual data structure

Future maintainers might introduce bugs by treating these as domains

Recommendation:
While backward compatibility is important for the storage key name, consider:

Renaming the TypeScript field to urls or links in the type definition

Using a migration/transformation layer when reading/writing to maintain storage compatibility

Updating the KeyDefinition comment to explain the mismatch

This would make the code more maintainable without breaking cached data.

github-actions · 2025-11-21T00:38:30Z

Checkmarx One – Scan Summary & Details – 3a8a78ef-67df-4a6c-b4e4-2ef59966e88e

Great job! No new security vulnerabilities introduced in this pull request

changed urls to phishing-links-ACTIVE.txt

53cd455

maxkpower requested a review from a team as a code owner November 21, 2025 00:29

maxkpower requested a review from willmartian November 21, 2025 00:29

maxkpower enabled auto-merge (squash) November 21, 2025 00:29

claude bot reviewed Nov 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

changed urls to phishing-links-ACTIVE.txt #17532

changed urls to phishing-links-ACTIVE.txt #17532

maxkpower commented Nov 21, 2025

Uh oh!

claude bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

claude bot Nov 21, 2025

Uh oh!

claude bot Nov 21, 2025

Uh oh!

claude bot Nov 21, 2025

Uh oh!

claude bot Nov 21, 2025

Uh oh!

claude bot Nov 21, 2025

Uh oh!

claude bot Nov 21, 2025

Uh oh!

claude bot Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changed urls to phishing-links-ACTIVE.txt #17532

Are you sure you want to change the base?

changed urls to phishing-links-ACTIVE.txt #17532

Conversation

maxkpower commented Nov 21, 2025

🎟️ Tracking

📔 Objective

📸 Screenshots

⏰ Reminders before review

🦮 Reviewer guidelines

Uh oh!

claude bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

Summary

Findings

PR Description Feedback

Uh oh!

claude bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 21, 2025

Great job! No new security vulnerabilities introduced in this pull request

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude bot commented Nov 21, 2025 •

edited

Loading