Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,14 @@ describe("PhishingDataService", () => {
platformUtilsService,
);

fetchChecksumSpy = jest.spyOn(service as any, "fetchPhishingDomainsChecksum");
fetchDomainsSpy = jest.spyOn(service as any, "fetchPhishingDomains");
fetchChecksumSpy = jest.spyOn(service as any, "fetchPhishingLinksChecksum");
fetchDomainsSpy = jest.spyOn(service as any, "fetchPhishingLinks");
});

describe("isPhishingDomains", () => {
it("should detect a phishing domain", async () => {
it("should detect a phishing link with exact URL match", async () => {
setMockState({
domains: ["phish.com", "badguy.net"],
domains: ["http://phish.com", "http://badguy.net"],
timestamp: Date.now(),
checksum: "abc123",
applicationVersion: "1.0.0",
Expand All @@ -62,9 +62,33 @@ describe("PhishingDataService", () => {
expect(result).toBe(true);
});

it("should detect a phishing link with path prefix match", async () => {
setMockState({
domains: ["http://phish.com/login", "http://badguy.net"],
timestamp: Date.now(),
checksum: "abc123",
applicationVersion: "1.0.0",
});
const url = new URL("http://phish.com/login");
const result = await service.isPhishingDomain(url);
expect(result).toBe(true);
});

it("should detect a phishing link when visiting a subpath", async () => {
setMockState({
domains: ["http://phish.com/login", "http://badguy.net"],
timestamp: Date.now(),
checksum: "abc123",
applicationVersion: "1.0.0",
});
const url = new URL("http://phish.com/login/oauth");
const result = await service.isPhishingDomain(url);
expect(result).toBe(true);
});

it("should not detect a safe domain", async () => {
setMockState({
domains: ["phish.com", "badguy.net"],
domains: ["http://phish.com", "http://badguy.net"],
timestamp: Date.now(),
checksum: "abc123",
applicationVersion: "1.0.0",
Expand All @@ -74,14 +98,26 @@ describe("PhishingDataService", () => {
expect(result).toBe(false);
});

it("should match against root domain", async () => {
it("should handle trailing slashes correctly", async () => {
setMockState({
domains: ["phish.com", "badguy.net"],
domains: ["http://phish.com/"],
timestamp: Date.now(),
checksum: "abc123",
applicationVersion: "1.0.0",
});
const url = new URL("http://phish.com/about");
const url = new URL("http://phish.com");
const result = await service.isPhishingDomain(url);
expect(result).toBe(true);
});

it("should match case-insensitively", async () => {
setMockState({
domains: ["http://phish.com"],
timestamp: Date.now(),
checksum: "abc123",
applicationVersion: "1.0.0",
});
const url = new URL("HTTP://PHISH.COM");
const result = await service.isPhishingDomain(url);
expect(result).toBe(true);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๐ŸŽจ Missing test cases for comprehensive URL matching

The test suite doesn't cover several important edge cases for the new URL-based matching logic:

Missing scenarios:

  1. Query parameter handling: Does http://phish.com/login?param=value match http://phish.com/login?
  2. Fragment/hash handling: Does http://phish.com/login#section match http://phish.com/login?
  3. Port numbers: Does http://phish.com:8080/login match http://phish.com/login?
  4. Protocol differences: Does https://phish.com/login match http://phish.com/login?
  5. Case sensitivity in paths: Does http://phish.com/LOGIN match http://phish.com/login?
  6. Similar prefixes (false positive check): Ensure http://phish.com/log doesn't match http://phish.com/login

Consider adding tests for these scenarios to ensure the matching logic behaves as expected.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,50 +21,52 @@ import { LogService } from "@bitwarden/logging";
import { GlobalStateProvider, KeyDefinition, PHISHING_DETECTION_DISK } from "@bitwarden/state";

export type PhishingData = {
domains: string[];
domains: string[]; // Note: Despite the name, this now stores full phishing URLs/links
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

โ™ป๏ธ Technical debt: Misleading field name

The field is named domains but now stores full URLs. This creates confusion and technical debt.

Why this matters:

  • Code readers will expect domain strings like "phish.com", not URLs like "http://phish.com/login"
  • The type definition doesn't reflect the actual data structure
  • Future maintainers might introduce bugs by treating these as domains

Recommendation:
While backward compatibility is important for the storage key name, consider:

  1. Renaming the TypeScript field to urls or links in the type definition
  2. Using a migration/transformation layer when reading/writing to maintain storage compatibility
  3. Updating the KeyDefinition comment to explain the mismatch

This would make the code more maintainable without breaking cached data.

timestamp: number;
checksum: string;

/**
* We store the application version to refetch the entire dataset on a new client release.
* This counteracts daily appends updates not removing inactive or false positive domains.
* This counteracts daily appends updates not removing inactive or false positive links.
*/
applicationVersion: string;
};

export const PHISHING_DOMAINS_KEY = new KeyDefinition<PhishingData>(
PHISHING_DETECTION_DISK,
"phishingDomains",
"phishingDomains", // Key name kept for backward compatibility with existing cached data
{
deserializer: (value: PhishingData) =>
value ?? { domains: [], timestamp: 0, checksum: "", applicationVersion: "" },
},
);

/** Coordinates fetching, caching, and patching of known phishing domains */
/** Coordinates fetching, caching, and patching of known phishing links */
export class PhishingDataService {
private static readonly RemotePhishingDatabaseUrl =
"https://raw.githubusercontent.com/Phishing-Database/Phishing.Database/master/phishing-domains-ACTIVE.txt";
"https://raw.githubusercontent.com/Phishing-Database/Phishing.Database/master/phishing-links-ACTIVE.txt";
private static readonly RemotePhishingDatabaseChecksumUrl =
"https://raw.githubusercontent.com/Phishing-Database/checksums/refs/heads/master/phishing-domains-ACTIVE.txt.md5";
"https://raw.githubusercontent.com/Phishing-Database/checksums/refs/heads/master/phishing-links-ACTIVE.txt.md5";
private static readonly RemotePhishingDatabaseTodayUrl =
"https://raw.githubusercontent.com/Phishing-Database/Phishing.Database/refs/heads/master/phishing-domains-NEW-today.txt";
"https://raw.githubusercontent.com/Phishing-Database/Phishing.Database/refs/heads/master/phishing-links-NEW-today.txt";

private _testDomains = this.getTestDomains();
private _testLinks = this.getTestLinks();
private _cachedState = this.globalStateProvider.get(PHISHING_DOMAINS_KEY);
private _domains$ = this._cachedState.state$.pipe(
private _links$ = this._cachedState.state$.pipe(
map(
(state) =>
new Set(
(state?.domains?.filter((line) => line.trim().length > 0) ?? []).concat(
this._testDomains,
"phishing.testcategory.com", // Included for QA to test in prod
),
(state?.domains?.filter((line) => line.trim().length > 0) ?? [])
.map((line) => line.toLowerCase().replace(/\/$/, "")) // Normalize URLs
.concat(
this._testLinks,
"http://phishing.testcategory.com", // Included for QA to test in prod
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

โš ๏ธ Test link missing protocol

The test link "http://phishing.testcategory.com" is added here, but it's being concatenated with already-normalized URLs that have protocols.

However, the old test link was "phishing.testcategory.com" (line 63 in the previous version). Now it's "http://phishing.testcategory.com".

Question: Does QA expect to test with http:// or https:// protocol? If they use https://phishing.testcategory.com, it won't match since only the http:// version is in the set.

Consider adding both protocols or documenting which protocol QA should use.

),
),
),
);

// How often are new domains added to the remote?
// How often are new links added to the remote?
readonly UPDATE_INTERVAL_DURATION = 24 * 60 * 60 * 1000; // 24 hours

private _triggerUpdate$ = new Subject<void>();
Expand Down Expand Up @@ -125,17 +127,47 @@ export class PhishingDataService {
}

/**
* Checks if the given URL is a known phishing domain
* Checks if the given URL is a known phishing link
*
* @param url The URL to check
* @returns True if the URL is a known phishing domain, false otherwise
* @returns True if the URL is a known phishing link, false otherwise
*/
async isPhishingDomain(url: URL): Promise<boolean> {
const domains = await firstValueFrom(this._domains$);
const result = domains.has(url.hostname);
if (result) {
return true;
const links = await firstValueFrom(this._links$);

// Normalize the URL for comparison (remove trailing slash, convert to lowercase)
const normalizedUrl = url.href.toLowerCase().replace(/\/$/, "");

// Strategy: Check for prefix matches to catch URLs with additional path segments
// For example, if database has "http://phish.com/login", we want to match:
// - "http://phish.com/login" (exact match)
// - "http://phish.com/login/oauth" (subpath match)
// - "http://phish.com/login?param=value" (with query params)

for (const link of links) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

โŒ Performance issue: O(n) lookup on every navigation

This linear search through all phishing links happens on every single page navigation. With the phishing database containing thousands of URLs, this could cause noticeable delays.

Current complexity: O(n) where n = number of phishing links
Expected database size: Potentially 10,000+ URLs

Impact:

  • Slows down every page load in the browser
  • Scales poorly as database grows
  • Multiple string operations per link (prefix checks, trailing slash handling)

Recommended solution:
Use a more efficient data structure. Consider:

  1. Trie (Prefix Tree): Optimal for prefix matching, O(m) lookup where m = URL length
  2. Hash map by hostname: Pre-group by hostname for O(1) hostname lookup, then check paths
  3. Pre-computed normalized Set: At minimum, remove the trailing slash logic from the hot path

Example optimization using hostname grouping:

private _linksByHostname$ = this._cachedState.state$.pipe(
  map((state) => {
    const map = new Map<string, Set<string>>();
    for (const link of state?.domains ?? []) {
      const url = new URL(link);
      const hostname = url.hostname.toLowerCase();
      if (!map.has(hostname)) {
        map.set(hostname, new Set());
      }
      map.get(hostname)!.add(url.pathname.toLowerCase().replace(/\/$/, ""));
    }
    return map;
  })
);

This would reduce the search space significantly when checking URLs.

// Exact match (handles trailing slash differences)
if (link === normalizedUrl || link === normalizedUrl + "/") {
return true;
}

// Prefix match: Check if the current URL starts with the phishing link
// This handles cases where the user visits a subpath of a known phishing URL
if (normalizedUrl.startsWith(link + "/") || normalizedUrl.startsWith(link + "?")) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

โš ๏ธ Potential bypass with query parameters

The prefix matching logic has a gap. Consider this scenario:

Database contains: http://phish.com/login?param=malicious
User visits: http://phish.com/login

The current logic checks normalizedUrl.startsWith(link + "?") which would fail here since:

  • normalizedUrl = "http://phish.com/login"
  • link = "http://phish.com/login?param=malicious"
  • Check: "http://phish.com/login".startsWith("http://phish.com/login?param=malicious" + "?") โ†’ false

Impact: If the phishing database contains URLs with query parameters, users can bypass detection by visiting the same path without those parameters.

Suggested fix:

// Parse both URLs to compare base path separately from query params
const normalizedUrlObj = new URL(normalizedUrl);
const linkObj = new URL(link);

// Compare without query params first
const normalizedWithoutQuery = normalizedUrlObj.origin + normalizedUrlObj.pathname;
const linkWithoutQuery = linkObj.origin + linkObj.pathname;

if (normalizedWithoutQuery === linkWithoutQuery) {
  return true;
}

Or consider if the database should only contain paths without query parameters.

return true;
}

// Also check if the link in database has a trailing slash
if (link.endsWith("/")) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๐Ÿ’ญ Redundant trailing slash handling

The code handles trailing slashes in three different places:

  1. Line 60: Normalization when loading data
  2. Line 139: Normalization of input URL
  3. Lines 160-168: Additional trailing slash checks during matching

Question: Is the third check (lines 160-168) necessary given that both the database links and input URL are already normalized to remove trailing slashes?

If both link and normalizedUrl have trailing slashes removed at lines 60 and 139, when would link.endsWith("/") be true at line 160?

Consider removing this redundant check or clarifying why it's needed despite prior normalization.

const linkWithoutSlash = link.slice(0, -1);
if (
normalizedUrl === linkWithoutSlash ||
normalizedUrl.startsWith(linkWithoutSlash + "/")
) {
return true;
}
}
}

return false;
}

Expand All @@ -148,7 +180,7 @@ export class PhishingDataService {
const applicationVersion = await this.platformUtilsService.getApplicationVersion();

// If checksum matches, return existing data with new timestamp & version
const remoteChecksum = await this.fetchPhishingDomainsChecksum();
const remoteChecksum = await this.fetchPhishingLinksChecksum();
if (remoteChecksum && prev.checksum === remoteChecksum) {
this.logService.info(
`[PhishingDataService] Remote checksum matches local checksum, updating timestamp only.`,
Expand All @@ -157,34 +189,32 @@ export class PhishingDataService {
}
// Checksum is different, data needs to be updated.

// Approach 1: Fetch only new domains and append
// Approach 1: Fetch only new links and append
const isOneDayOldMax = prevAge <= this.UPDATE_INTERVAL_DURATION;
if (isOneDayOldMax && applicationVersion === prev.applicationVersion) {
const dailyDomains: string[] = await this.fetchPhishingDomains(
const dailyLinks: string[] = await this.fetchPhishingLinks(
PhishingDataService.RemotePhishingDatabaseTodayUrl,
);
this.logService.info(
`[PhishingDataService] ${dailyDomains.length} new phishing domains added`,
);
this.logService.info(`[PhishingDataService] ${dailyLinks.length} new phishing links added`);
return {
domains: prev.domains.concat(dailyDomains),
domains: prev.domains.concat(dailyLinks),
checksum: remoteChecksum,
timestamp,
applicationVersion,
};
}

// Approach 2: Fetch all domains
const domains = await this.fetchPhishingDomains(PhishingDataService.RemotePhishingDatabaseUrl);
// Approach 2: Fetch all links
const links = await this.fetchPhishingLinks(PhishingDataService.RemotePhishingDatabaseUrl);
return {
domains,
domains: links,
timestamp,
checksum: remoteChecksum,
applicationVersion,
};
}

private async fetchPhishingDomainsChecksum() {
private async fetchPhishingLinksChecksum() {
const response = await this.apiService.nativeFetch(
new Request(PhishingDataService.RemotePhishingDatabaseChecksumUrl),
);
Expand All @@ -194,29 +224,29 @@ export class PhishingDataService {
return response.text();
}

private async fetchPhishingDomains(url: string) {
private async fetchPhishingLinks(url: string) {
const response = await this.apiService.nativeFetch(new Request(url));

if (!response.ok) {
throw new Error(`[PhishingDataService] Failed to fetch domains: ${response.status}`);
throw new Error(`[PhishingDataService] Failed to fetch links: ${response.status}`);
}

return response.text().then((text) => text.split("\n"));
}

private getTestDomains() {
private getTestLinks() {
const flag = devFlagEnabled("testPhishingUrls");
if (!flag) {
return [];
}

const domains = devFlagValue("testPhishingUrls") as unknown[];
if (domains && domains instanceof Array) {
const links = devFlagValue("testPhishingUrls") as unknown[];
if (links && links instanceof Array) {
this.logService.debug(
"[PhishingDetectionService] Dev flag enabled for testing phishing detection. Adding test phishing domains:",
domains,
"[PhishingDetectionService] Dev flag enabled for testing phishing detection. Adding test phishing links:",
links,
);
return domains as string[];
return links as string[];
}
return [];
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,11 @@ export class PhishingDetectionService {
),
concatMap(async (message) => {
const url = new URL(message.url);
// Store the hostname so we ignore all URLs on this domain for the session
this._ignoredHostnames.add(url.hostname);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๐Ÿ’ญ Session-based ignore behavior change

The behavior changed from "block again on next visit" to "allow all pages on this domain for the session."

Previous behavior: User clicks "continue" โ†’ navigates to phishing site โ†’ if they visit again later in the same session, they'd be warned again
New behavior: User clicks "continue" once โ†’ all future visits to that hostname are allowed for the entire session

Questions:

  1. Is this intentional? The comment was updated but the actual behavioral change isn't mentioned in the PR description.
  2. Should this be more granular (per-URL) instead of per-hostname? For example, if a user allows http://phish.com/safe-page, should http://phish.com/login also be allowed?

This seems like a security trade-off that should be explicitly acknowledged in the PR description.

logService.debug(
`[PhishingDetectionService] Added ${url.hostname} to ignored hostnames for this session`,
);
await BrowserApi.navigateTabToUrl(message.tabId, url);
}),
);
Expand All @@ -97,8 +101,10 @@ export class PhishingDetectionService {
tap((event) => logService.debug(`[PhishingDetectionService] processing event:`, event)),
concatMap(async ({ tabId, url, ignored }) => {
if (ignored) {
// The next time this host is visited, block again
this._ignoredHostnames.delete(url.hostname);
// User chose to continue to this hostname - allow all pages on this domain for the session
logService.debug(
`[PhishingDetectionService] Skipping check for ${url.hostname} (user allowed this session)`,
);
return;
}
const isPhishing = await phishingDataService.isPhishingDomain(url);
Expand Down
Loading