Releases: ArchiveBox/abx-plugins
Releases · ArchiveBox/abx-plugins
v1.9.18: Chrome lifecycle hardening, new extractors, and install/test fixes
What's Changed
- 🌐 Chrome lifecycle hardening with delayed readiness gating, snapshot-scoped launch support, remote/external browser session controls (
CHROME_CDP_URL,CHROME_IS_LOCAL,CHROME_KEEPALIVE,CHROME_ISOLATION), CDP-based download configuration, stronger startup stability checks, and safer stale-tab cleanup. - 📄 New document extractors:
opendataloaderaddsopendataloader-pdfextraction with OCR/hybrid fallback and searchablecontent.md/content.txtoutputs, andliteparseaddslit-based PDF/document extraction via LlamaIndex LiteParse. - 🔎 Search indexing now auto-discovers content across plugin outputs for
.txt,.md,.html, and.htmfiles, so new extractor outputs are picked up without hardcoded file lists. - 🧪 Install and test reliability improved by switching plugin tests to
Binary.load_or_install(), retrying Puppeteerbrowsers install --install-depsviasudowhen needed, repairing cache ownership after sudo installs, pinning ForumDL's macOS-sensitive deps, and tightening shebang/executable coverage.
New plugins in this release
opendataloader
liteparse
Relevant changes
- Chrome lifecycle hardening (
df2bdb2) - OpenDataLoader plugin (
221cc43) - LiteParse plugin (
2b53613) - Binary.load_or_install test cleanup (
b27dbc0) - Puppeteer sudo install recovery (
d5e5223)
Full Changelog: v1.9.13...v1.9.18
v1.9.13: normalize JS node module resolution
- normalize JS hook module resolution through a shared helper that honors NODE_MODULES_DIR, NODE_MODULE_DIR, and LIB_DIR/npm/node_modules
- update chrome-, puppeteer-, and extension-dependent hooks to use the shared resolver before requiring npm packages
- emit NODE_MODULE_DIR and NODE_PATH alongside NODE_MODULES_DIR from the npm provider
- add regression tests covering NODE_MODULE_DIR alias handling, LIB_DIR fallback, and npm provider env emission
v1.9.12: SingleFile NODE_MODULES_DIR regression coverage
- adds a focused regression test for singlefile_extension_save.js honoring NODE_MODULES_DIR when resolving puppeteer-core
- keeps the SingleFile browser-crash fix covered without requiring a live Chromium session in test
v1.9.5: Align version with abx-pkg and improve test diagnostics
What's Changed
Version alignment
- Bumped version to
1.9.5to alignabx-pluginswith theabx-pkg>=1.9.5dependency, establishing a unified versioning scheme across the ArchiveBox plugin ecosystem.
Test improvements
- Improved Chrome runtime fixture error reporting — the
require_chrome_runtimefixture now logs errors vialogging.error()before callingpytest.fail(), and setspytrace=Falsefor cleaner test output when Chrome prerequisites are unavailable.
Dependencies
- Updated
abx-pkgminimum version from>=0.7.0to>=1.9.5
Full Changelog: v0.10.2...v1.9.5
v0.10.2: Add Claude sandbox hint for Puppeteer installs
This patch release improves Puppeteer browser install failures inside Claude sandboxes.
- Detects the
getaddrinfo EAI_AGAIN storage.googleapis.comfailure mode duringpuppeteer browsers install. - Prints a targeted hint explaining that
@puppeteer/browsersrespectsNO_PROXY, which can bypass the sandbox egress proxy for Google download hosts. - Shows a concrete
NO_PROXY/no_proxyoverride users can apply before retrying. - Adds regression coverage for the new diagnostic path.
Suggested retry environment:
NO_PROXY="localhost,127.0.0.1,169.254.169.254,metadata.google.internal,.svc.cluster.local,.local"
no_proxy="$NO_PROXY"Verification:
uv run pytest abx_plugins/plugins/puppeteer/tests/test_puppeteer.py -q
Relevant change:
- Add Claude sandbox hint for Puppeteer browser downloads and test coverage
v0.10.1: Fix Puppeteer/Chrome install ordering during setup
This patch release fixes a setup-time race between the Puppeteer and Chrome install hooks.
puppeteer/on_Crawl__60_puppeteer_installnow runs in the foreground instead of as a.finite.bghook, so itsBinary(puppeteer, npm)side effects are applied immediately and in order.chrome/on_Crawl__70_chrome_install.finite.bg.pycan now safely emitBinary(chromium, puppeteer)after Puppeteer is already available to install Chromium.- Chrome test helpers and Puppeteer regression coverage were updated to lock in the new foreground hook path and prevent the race from creeping back in.
User-visible impact:
archivebox init --setupandabx-dl plugins --install puppeteerno longer intermittently hit Chromium installs before the Puppeteer npm package exists.- Chrome-based plugins now get a deterministic Puppeteer -> Chromium setup sequence.
Verification:
uv run pytest abx_plugins/plugins/puppeteer/tests/test_puppeteer.py -quv run abx_plugins/plugins/puppeteer/on_Crawl__60_puppeteer_install.py
Relevant commits:
v0.10.0: Claude plugins, shared base utilities, and the live plugin gallery
- 🤖 Added a new Claude automation family with
claudecode,claudecodeextract,claudecodecleanup, andclaudechromefor archive-time AI-driven browsing, extraction, and cleanup - 🧰 Introduced the new
baseplugin with shared Python and JS helpers, centralized test utilities, Pydantic-based config loading, and stricterLIB_DIRpermissions so plugins stop reimplementing the same plumbing - 🌐 Published a live plugin gallery on GitHub Pages, added icons and metadata for the marketplace UI, and renamed hooks to clearer
*.finite.bg.*and*.daemon.bg.*forms to make execution intent obvious
New plugin families in this release
base
claudecode
claudecodeextract
claudecodecleanup
claudechrome
Relevant changes
- Claude Code plugins (
a0cdd23) - Shared base utilities (
d6d8848) - Claude Chrome integration (
6843417) - Plugin gallery and hook renames (
490505f)
Full Changelog: v0.9.2...v0.10.0
v0.9.2: Defuddle, Trafilatura, aligned hook contracts, and CI stabilization
- ✍️ Added two new readability-focused extractors:
defuddlefor parsing saved local HTML andtrafilaturawith configurable output formats and real install and snapshot coverage - 🧪 Hardened the plugin test matrix with more deterministic
ytdlp,seo,wget,gallerydl, andpapersdlcoverage, plus live integration fixes for Chrome races, timeout probing, env handling, and extension setup - ⚙️ Tightened runtime contracts and CI by aligning hook and test expectations, stabilizing
headers,redirects, andssl, and moving parallel test runs onmaintoward plugin-managed Chromium instead of flaky ad hoc downloads
# New extraction plugins in this release
defuddle
trafilatura
# Trafilatura output modes
txt | md | json | xmlRelevant changes
- Add Defuddle plugin (
5591c4e) - Add Trafilatura plugin (
ed1d5ec) - Align hook contracts and tests (
15be667) - Stabilize live plugin integration tests (
0b093a1) - Inline checkout in parallel CI (
29d1546)
Full Changelog: v0.9.1...v0.9.2
v0.9.1: Chrome and test hardening with Python 3.11 packaging fixes
- 🧪 Hardened the plugin test suite with shared pytest fixture cleanup, more deterministic coverage for
dns,gallerydl,papersdl,wget,ytdlp, and captcha flows, plus fixes for the new parallel test workflow - 🌐 Kept tightening the Chrome-heavy plugins by continuing
chrome_utils.jsdeduping, improving SingleFile save behavior, fixing Puppeteer and Favicon edge cases, and removing brittle wrapper code aroundforumdl - 🐍 Turned the first big package cut into a usable release by correcting the Python floor to
>=3.11, bumping theabx-pkgrequirement forward, and tagging the repo asv0.9.1
# Packaging/runtime changes in this release
version = "0.9.1"
requires-python = ">=3.11"
abx-pkg >= 0.6.3Relevant changes
- Broad fix pass across tests, providers, and docs (
e743369) - More Chrome util deduping and test cleanup (
57b4c74) - Pytest fixture consolidation (
5cb0866) - Favicon stdlib fallback and parallel CI fix (
80bebe0) - Release version bump (
55415ca)
Full Changelog: v0.9.0...v0.9.1
v0.9.0: Standalone packaging reset, parallel CI, and test bootstrap
- 📦 Reworked the repo into a real standalone package release, jumping from the initial scaffold to
0.9.0with updated project metadata, repo URLs, dependency declarations, and distributable test settings - 🧪 Bootstrapped a much more maintainable test layout by adding package-local
tests/__init__.pyfiles across the plugin tree, a shared top-levelconftest.py, and the first parallel CI workflow - ⚙️ Tightened hook execution details across the suite by making install and snapshot scripts executable, removing vendored npm lockfiles from the repo, and consolidating Chrome test utilities and plugin path helpers
# New release infrastructure added in this cut
.github/workflows/test-parallel.yml
conftest.py
uv.lock
abx_plugins/plugins/path_utils.pyRelevant changes
Full Changelog: v0.1.0...v0.9.0