-
Notifications
You must be signed in to change notification settings - Fork 580
Description
What's needed and why?
The blacklist download job runs every hour and performs a full HTTP download of every configured URL on each run, even when the remote list has not changed. For deployments with many configured lists (community lists + custom URLs across multiple services), this means:
- Unnecessary bandwidth consumption on every job run
- Unnecessary CPU time parsing and deduplicating large lists
- Higher load on public list providers (Tor exit nodes, mitchellkrogza, Data-Shield, etc.)
- No benefit when lists are updated less frequently than hourly
The only optimization in place was a 1-hour TTL: if the same URL had been downloaded within the last hour (within the same or a previous job run), the cached copy was reused. Once that hour expired, a full re-download was always triggered regardless of whether the content had changed.
Implementations ideas (optional)
Use standard HTTP conditional request headers to let the server decide whether to send a new response body:
- If-None-Match — sent with the ETag value from the previous response; server replies 304 Not Modified if content is unchanged
- If-Modified-Since — fallback when no ETag is available; server replies 304 if the file has not been modified since that timestamp
When the server responds with 304 Not Modified, the client reuses the cached list data with zero bandwidth cost and re-saves it to bump the last_update timestamp, keeping the 1-hour TTL logic intact.
ETag and Last-Modified values are stored as comment lines in the existing per-URL cache file format (alongside the existing # Downloaded from header), requiring no schema changes or new files.
Code of Conduct
- I agree to follow this project's Code of Conduct