-
Notifications
You must be signed in to change notification settings - Fork 1.5k
changed urls to phishing-links-ACTIVE.txt #17532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Claude finished @maxkpower's task —— View job Code Review CompleteSummaryThis PR migrates phishing detection from domain-based to URL-based matching. The implementation adds valuable functionality for path-specific phishing detection, but there are several concerns that should be addressed before merging. FindingsFinding 1: Performance issue - Linear search on every page navigation could cause noticeable delays with large databases (apps/browser/src/dirt/phishing-detection/services/phishing-data.service.ts:147) Finding 2: Query parameter bypass vulnerability - URLs with query params in database can be bypassed (apps/browser/src/dirt/phishing-detection/services/phishing-data.service.ts:155) Finding 3: Session-based ignore behavior changed without documentation - Security trade-off not mentioned in PR description (apps/browser/src/dirt/phishing-detection/services/phishing-detection.service.ts:76) Finding 4: Test link protocol mismatch - QA test link changed from domain to HTTP-only URL (apps/browser/src/dirt/phishing-detection/services/phishing-data.service.ts:63) Finding 5: Redundant trailing slash handling adds unnecessary complexity (apps/browser/src/dirt/phishing-detection/services/phishing-data.service.ts:160) Finding 6: Technical debt - Field named Finding 7: Missing test coverage for edge cases - Query params, fragments, ports, protocols, and path case sensitivity not tested (apps/browser/src/dirt/phishing-detection/services/phishing-data.service.spec.ts:123) PR Description Feedback💭 The PR description could be improved by documenting:
|
|
|
||
| // Prefix match: Check if the current URL starts with the phishing link | ||
| // This handles cases where the user visits a subpath of a known phishing URL | ||
| if (normalizedUrl.startsWith(link + "/") || normalizedUrl.startsWith(link + "?")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prefix matching logic has a gap. Consider this scenario:
Database contains: http://phish.com/login?param=malicious
User visits: http://phish.com/login
The current logic checks normalizedUrl.startsWith(link + "?") which would fail here since:
normalizedUrl="http://phish.com/login"link="http://phish.com/login?param=malicious"- Check:
"http://phish.com/login".startsWith("http://phish.com/login?param=malicious" + "?")→ false
Impact: If the phishing database contains URLs with query parameters, users can bypass detection by visiting the same path without those parameters.
Suggested fix:
// Parse both URLs to compare base path separately from query params
const normalizedUrlObj = new URL(normalizedUrl);
const linkObj = new URL(link);
// Compare without query params first
const normalizedWithoutQuery = normalizedUrlObj.origin + normalizedUrlObj.pathname;
const linkWithoutQuery = linkObj.origin + linkObj.pathname;
if (normalizedWithoutQuery === linkWithoutQuery) {
return true;
}Or consider if the database should only contain paths without query parameters.
| // - "http://phish.com/login/oauth" (subpath match) | ||
| // - "http://phish.com/login?param=value" (with query params) | ||
|
|
||
| for (const link of links) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Performance issue: O(n) lookup on every navigation
This linear search through all phishing links happens on every single page navigation. With the phishing database containing thousands of URLs, this could cause noticeable delays.
Current complexity: O(n) where n = number of phishing links
Expected database size: Potentially 10,000+ URLs
Impact:
- Slows down every page load in the browser
- Scales poorly as database grows
- Multiple string operations per link (prefix checks, trailing slash handling)
Recommended solution:
Use a more efficient data structure. Consider:
- Trie (Prefix Tree): Optimal for prefix matching, O(m) lookup where m = URL length
- Hash map by hostname: Pre-group by hostname for O(1) hostname lookup, then check paths
- Pre-computed normalized Set: At minimum, remove the trailing slash logic from the hot path
Example optimization using hostname grouping:
private _linksByHostname$ = this._cachedState.state$.pipe(
map((state) => {
const map = new Map<string, Set<string>>();
for (const link of state?.domains ?? []) {
const url = new URL(link);
const hostname = url.hostname.toLowerCase();
if (!map.has(hostname)) {
map.set(hostname, new Set());
}
map.get(hostname)!.add(url.pathname.toLowerCase().replace(/\/$/, ""));
}
return map;
})
);This would reduce the search space significantly when checking URLs.
| } | ||
|
|
||
| // Also check if the link in database has a trailing slash | ||
| if (link.endsWith("/")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 Redundant trailing slash handling
The code handles trailing slashes in three different places:
- Line 60: Normalization when loading data
- Line 139: Normalization of input URL
- Lines 160-168: Additional trailing slash checks during matching
Question: Is the third check (lines 160-168) necessary given that both the database links and input URL are already normalized to remove trailing slashes?
If both link and normalizedUrl have trailing slashes removed at lines 60 and 139, when would link.endsWith("/") be true at line 160?
Consider removing this redundant check or clarifying why it's needed despite prior normalization.
| .map((line) => line.toLowerCase().replace(/\/$/, "")) // Normalize URLs | ||
| .concat( | ||
| this._testLinks, | ||
| "http://phishing.testcategory.com", // Included for QA to test in prod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test link "http://phishing.testcategory.com" is added here, but it's being concatenated with already-normalized URLs that have protocols.
However, the old test link was "phishing.testcategory.com" (line 63 in the previous version). Now it's "http://phishing.testcategory.com".
Question: Does QA expect to test with http:// or https:// protocol? If they use https://phishing.testcategory.com, it won't match since only the http:// version is in the set.
Consider adding both protocols or documenting which protocol QA should use.
| const url = new URL("HTTP://PHISH.COM"); | ||
| const result = await service.isPhishingDomain(url); | ||
| expect(result).toBe(true); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎨 Missing test cases for comprehensive URL matching
The test suite doesn't cover several important edge cases for the new URL-based matching logic:
Missing scenarios:
- Query parameter handling: Does
http://phish.com/login?param=valuematchhttp://phish.com/login? - Fragment/hash handling: Does
http://phish.com/login#sectionmatchhttp://phish.com/login? - Port numbers: Does
http://phish.com:8080/loginmatchhttp://phish.com/login? - Protocol differences: Does
https://phish.com/loginmatchhttp://phish.com/login? - Case sensitivity in paths: Does
http://phish.com/LOGINmatchhttp://phish.com/login? - Similar prefixes (false positive check): Ensure
http://phish.com/logdoesn't matchhttp://phish.com/login
Consider adding tests for these scenarios to ensure the matching logic behaves as expected.
| concatMap(async (message) => { | ||
| const url = new URL(message.url); | ||
| // Store the hostname so we ignore all URLs on this domain for the session | ||
| this._ignoredHostnames.add(url.hostname); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 Session-based ignore behavior change
The behavior changed from "block again on next visit" to "allow all pages on this domain for the session."
Previous behavior: User clicks "continue" → navigates to phishing site → if they visit again later in the same session, they'd be warned again
New behavior: User clicks "continue" once → all future visits to that hostname are allowed for the entire session
Questions:
- Is this intentional? The comment was updated but the actual behavioral change isn't mentioned in the PR description.
- Should this be more granular (per-URL) instead of per-hostname? For example, if a user allows
http://phish.com/safe-page, shouldhttp://phish.com/loginalso be allowed?
This seems like a security trade-off that should be explicitly acknowledged in the PR description.
|
|
||
| export type PhishingData = { | ||
| domains: string[]; | ||
| domains: string[]; // Note: Despite the name, this now stores full phishing URLs/links |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
♻️ Technical debt: Misleading field name
The field is named domains but now stores full URLs. This creates confusion and technical debt.
Why this matters:
- Code readers will expect domain strings like
"phish.com", not URLs like"http://phish.com/login" - The type definition doesn't reflect the actual data structure
- Future maintainers might introduce bugs by treating these as domains
Recommendation:
While backward compatibility is important for the storage key name, consider:
- Renaming the TypeScript field to
urlsorlinksin the type definition - Using a migration/transformation layer when reading/writing to maintain storage compatibility
- Updating the KeyDefinition comment to explain the mismatch
This would make the code more maintainable without breaking cached data.
|
Great job! No new security vulnerabilities introduced in this pull request |

🎟️ Tracking
https://bitwarden.atlassian.net/browse/PM-28548
📔 Objective
Migrates phishing detection from domain-based to URL-based matching.
Detection Logic:
/loginmatches/login/oauth)📸 Screenshots
⏰ Reminders before review
🦮 Reviewer guidelines
:+1:) or similar for great changes:memo:) or ℹ️ (:information_source:) for notes or general info:question:) for questions:thinking:) or 💭 (:thought_balloon:) for more open inquiry that's not quite a confirmed issue and could potentially benefit from discussion:art:) for suggestions / improvements:x:) or:warning:) for more significant problems or concerns needing attention:seedling:) or ♻️ (:recycle:) for future improvements or indications of technical debt:pick:) for minor or nitpick changes