Skip to content

Commit aaa2b40

Browse files
committed
lots of changes
1 parent bdb7d23 commit aaa2b40

File tree

219 files changed

+3088
-2579
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

219 files changed

+3088
-2579
lines changed
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
name: Parallel Tests
2+
3+
on:
4+
pull_request:
5+
branches: [dev, main, master]
6+
push:
7+
branches: [dev]
8+
9+
env:
10+
PYTHONIOENCODING: utf-8
11+
PYTHONLEGACYWINDOWSSTDIO: utf-8
12+
USE_COLOR: False
13+
14+
jobs:
15+
discover-tests:
16+
name: Discover test files
17+
runs-on: ubuntu-22.04
18+
outputs:
19+
test-files: ${{ steps.set-matrix.outputs.test-files }}
20+
21+
steps:
22+
- uses: actions/checkout@v4
23+
with:
24+
submodules: true
25+
fetch-depth: 1
26+
27+
- name: Discover test files
28+
id: set-matrix
29+
run: |
30+
plugin_tests=$(find abx_plugins/plugins -path "*/tests/test_*.py" -type f | sort)
31+
32+
json_array="["
33+
first=true
34+
for test_file in $plugin_tests; do
35+
if [ "$first" = true ]; then
36+
first=false
37+
else
38+
json_array+=","
39+
fi
40+
41+
plugin=$(echo $test_file | sed 's|abx_plugins/plugins/\([^/]*\)/.*|\1|')
42+
test_name=$(basename $test_file .py | sed 's/^test_//')
43+
name="plugin/$plugin/$test_name"
44+
45+
json_array+="{\"path\":\"$test_file\",\"name\":\"$name\"}"
46+
done
47+
json_array+="]"
48+
49+
echo "test-files=$json_array" >> $GITHUB_OUTPUT
50+
echo "Found $(echo $plugin_tests | wc -w) test files"
51+
echo "$json_array" | jq '.'
52+
53+
run-tests:
54+
name: ${{ matrix.test.name }}
55+
runs-on: ubuntu-22.04
56+
needs: discover-tests
57+
58+
strategy:
59+
fail-fast: false
60+
matrix:
61+
test: ${{ fromJson(needs.discover-tests.outputs.test-files) }}
62+
python: ["3.13"]
63+
64+
steps:
65+
- uses: actions/checkout@v4
66+
with:
67+
submodules: true
68+
fetch-depth: 1
69+
70+
- name: Set up Python ${{ matrix.python }}
71+
uses: actions/setup-python@v4
72+
with:
73+
python-version: ${{ matrix.python }}
74+
architecture: x64
75+
76+
- name: Install uv
77+
uses: astral-sh/setup-uv@v4
78+
with:
79+
version: "latest"
80+
81+
- name: Set up Node JS
82+
uses: actions/setup-node@v4
83+
with:
84+
node-version: 22
85+
86+
- name: Cache uv
87+
uses: actions/cache@v3
88+
with:
89+
path: ~/.cache/uv
90+
key: ${{ runner.os }}-${{ matrix.python }}-uv-${{ hashFiles('pyproject.toml') }}
91+
restore-keys: |
92+
${{ runner.os }}-${{ matrix.python }}-uv-
93+
94+
- uses: awalsh128/cache-apt-pkgs-action@latest
95+
with:
96+
packages: git ripgrep build-essential python3-dev python3-setuptools libssl-dev libldap2-dev libsasl2-dev zlib1g-dev libatomic1 python3-minimal gnupg2 curl wget python3-ldap python3-msgpack python3-mutagen python3-regex python3-pycryptodome procps
97+
version: 1.1
98+
99+
- name: Install dependencies with uv
100+
run: |
101+
uv pip install -e ".[dev]"
102+
103+
- name: Run test - ${{ matrix.test.name }}
104+
run: |
105+
uv run pytest -xvs "${{ matrix.test.path }}" --basetemp=tests/out

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.DS_Store
2+
3+
data/
4+
*.sqlite3*
5+
__pycache__/
6+
*.pyc

README.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,71 @@ plugins_dir = get_plugins_dir()
1616

1717
Tools like `abx-dl` and ArchiveBox can discover plugins from this package
1818
without symlinks or environment-variable tricks.
19+
20+
## Plugin Contract
21+
22+
### Directory layout
23+
24+
Each plugin lives under `plugins/<name>/` and may include:
25+
26+
- `config.json` (optional) - config schema
27+
- `binaries.jsonl` (optional) - binary manifests
28+
- `on_*` hook scripts (required to do work)
29+
30+
Hooks run with:
31+
32+
- **SNAP_DIR** = base snapshot directory (default: `.`)
33+
- **CRAWL_DIR** = base crawl directory (default: `.`)
34+
- **Snapshot hook output** = `SNAP_DIR/<plugin>/...`
35+
- **Crawl hook output** = `CRAWL_DIR/<plugin>/...`
36+
- **Other plugin outputs** can be read via `../<other-plugin>/...` from your own output dir
37+
38+
### Key environment variables
39+
40+
- `SNAP_DIR` - base snapshot directory (default: `.`)
41+
- `CRAWL_DIR` - base crawl directory (default: `.`)
42+
- `LIB_DIR` - binaries/tools root (default: `~/.config/abx/lib`)
43+
- `PERSONAS_DIR` - persona profiles root (default: `~/.config/abx/personas`)
44+
- `ACTIVE_PERSONA` - persona name (default: `Default`)
45+
46+
### Event JSONL interface (bbus-style, no dependency)
47+
48+
Hooks emit JSONL events to stdout. They do **not** need to import `bbus`.
49+
The event envelope matches the bbus style so higher layers can stream/replay.
50+
51+
Minimal envelope:
52+
53+
```json
54+
{
55+
"event_id": "uuidv7",
56+
"event_type": "SnapshotCreated",
57+
"event_created_at": "2026-02-01T20:10:22Z",
58+
"event_parent_id": "uuidv7-or-null",
59+
"event_schema": "abx.events.v1",
60+
"event_path": "abx-plugins",
61+
"data": { "...": "event-specific fields" }
62+
}
63+
```
64+
65+
Conventions:
66+
67+
- Active verb names are **requests** (e.g. `BinaryInstall`, `ProcessLaunch`).
68+
- Past tense names are **facts** (e.g. `BinaryInstalled`, `ProcessExited`).
69+
- Plugins can emit additional fields inside `data` without coordination.
70+
71+
Common event types emitted by hooks:
72+
73+
- `ArchiveResultCreated` (status + output files)
74+
- `Binary` records (dependency detection/install)
75+
- `ProcessStarted` / `ProcessExited`
76+
77+
Higher-level tools (abx-dl / ArchiveBox) can:
78+
79+
- Parse these events from stdout
80+
- Persist or project them (SQLite/JSONL/Django) without plugins knowing
81+
82+
Legacy note:
83+
84+
Some hooks still emit a lightweight JSONL record with a top-level `type` field
85+
(e.g., `{"type": "ArchiveResult", ...}`). Runtimes should accept those and
86+
optionally translate them into the event envelope above.

abx_plugins/plugins/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"""Plugin suite root package for abx-plugins."""

abx_plugins/plugins/accessibility/__init__.py

Whitespace-only changes.

abx_plugins/plugins/accessibility/on_Snapshot__39_accessibility.js

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,13 @@ const puppeteer = require('puppeteer-core');
2323

2424
// Extractor metadata
2525
const PLUGIN_NAME = 'accessibility';
26-
const OUTPUT_DIR = '.';
26+
const PLUGIN_DIR = path.basename(__dirname);
27+
const SNAP_DIR = path.resolve((process.env.SNAP_DIR || '.').trim());
28+
const OUTPUT_DIR = path.join(SNAP_DIR, PLUGIN_DIR);
29+
if (!fs.existsSync(OUTPUT_DIR)) {
30+
fs.mkdirSync(OUTPUT_DIR, { recursive: true });
31+
}
32+
process.chdir(OUTPUT_DIR);
2733
const OUTPUT_FILE = 'accessibility.json';
2834
const CHROME_SESSION_DIR = '../chrome';
2935
const CHROME_SESSION_REQUIRED_ERROR = 'No Chrome session found (chrome plugin must run first)';

abx_plugins/plugins/accessibility/tests/__init__.py

Whitespace-only changes.

abx_plugins/plugins/accessibility/tests/test_accessibility.py

Lines changed: 36 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -8,20 +8,17 @@
88
import json
99
import shutil
1010
import subprocess
11-
import sys
1211
import tempfile
1312
from pathlib import Path
1413

1514
import pytest
16-
from django.test import TestCase
1715

18-
# Import chrome test helpers
19-
sys.path.insert(0, str(Path(__file__).parent.parent.parent / 'chrome' / 'tests'))
20-
from chrome_test_helpers import (
16+
from abx_plugins.plugins.chrome.tests.chrome_test_helpers import (
2117
chrome_session,
2218
get_test_env,
2319
get_plugin_dir,
2420
get_hook_script,
21+
chrome_test_url,
2522
)
2623

2724

@@ -38,29 +35,31 @@ def chrome_available() -> bool:
3835
ACCESSIBILITY_HOOK = get_hook_script(PLUGIN_DIR, 'on_Snapshot__*_accessibility.*')
3936

4037

41-
class TestAccessibilityPlugin(TestCase):
38+
class TestAccessibilityPlugin:
4239
"""Test the accessibility plugin."""
4340

4441
def test_accessibility_hook_exists(self):
4542
"""Accessibility hook script should exist."""
46-
self.assertIsNotNone(ACCESSIBILITY_HOOK, "Accessibility hook not found in plugin directory")
47-
self.assertTrue(ACCESSIBILITY_HOOK.exists(), f"Hook not found: {ACCESSIBILITY_HOOK}")
43+
assert ACCESSIBILITY_HOOK is not None, "Accessibility hook not found in plugin directory"
44+
assert ACCESSIBILITY_HOOK.exists(), f"Hook not found: {ACCESSIBILITY_HOOK}"
4845

4946

50-
class TestAccessibilityWithChrome(TestCase):
47+
class TestAccessibilityWithChrome:
5148
"""Integration tests for accessibility plugin with Chrome."""
5249

53-
def setUp(self):
50+
def setup_method(self, _method=None):
5451
"""Set up test environment."""
5552
self.temp_dir = Path(tempfile.mkdtemp())
53+
self.snap_dir = self.temp_dir / 'snap'
54+
self.snap_dir.mkdir(parents=True, exist_ok=True)
5655

57-
def tearDown(self):
56+
def teardown_method(self, _method=None):
5857
"""Clean up."""
5958
shutil.rmtree(self.temp_dir, ignore_errors=True)
6059

61-
def test_accessibility_extracts_page_outline(self):
60+
def test_accessibility_extracts_page_outline(self, chrome_test_url):
6261
"""Accessibility hook should extract headings and accessibility tree."""
63-
test_url = 'https://example.com'
62+
test_url = chrome_test_url
6463
snapshot_id = 'test-accessibility-snapshot'
6564

6665
try:
@@ -85,7 +84,7 @@ def test_accessibility_extracts_page_outline(self):
8584
)
8685

8786
# Check for output file
88-
accessibility_output = snapshot_chrome_dir / 'accessibility.json'
87+
accessibility_output = Path(env['SNAP_DIR']) / 'accessibility' / 'accessibility.json'
8988

9089
accessibility_data = None
9190

@@ -98,25 +97,25 @@ def test_accessibility_extracts_page_outline(self):
9897
pass
9998

10099
# Verify hook ran successfully
101-
self.assertEqual(result.returncode, 0, f"Hook failed: {result.stderr}")
102-
self.assertNotIn('Traceback', result.stderr)
100+
assert result.returncode == 0, f"Hook failed: {result.stderr}"
101+
assert 'Traceback' not in result.stderr
103102

104103
# example.com has headings, so we should get accessibility data
105-
self.assertIsNotNone(accessibility_data, "No accessibility data was generated")
104+
assert accessibility_data is not None, "No accessibility data was generated"
106105

107106
# Verify we got page outline data
108-
self.assertIn('headings', accessibility_data, f"Missing headings: {accessibility_data}")
109-
self.assertIn('url', accessibility_data, f"Missing url: {accessibility_data}")
107+
assert 'headings' in accessibility_data, f"Missing headings: {accessibility_data}"
108+
assert 'url' in accessibility_data, f"Missing url: {accessibility_data}"
110109

111110
except RuntimeError:
112111
raise
113112

114-
def test_accessibility_disabled_skips(self):
113+
def test_accessibility_disabled_skips(self, chrome_test_url):
115114
"""Test that ACCESSIBILITY_ENABLED=False skips without error."""
116-
test_url = 'https://example.com'
115+
test_url = chrome_test_url
117116
snapshot_id = 'test-disabled'
118117

119-
env = get_test_env()
118+
env = get_test_env() | {'SNAP_DIR': str(self.snap_dir)}
120119
env['ACCESSIBILITY_ENABLED'] = 'False'
121120

122121
result = subprocess.run(
@@ -129,11 +128,11 @@ def test_accessibility_disabled_skips(self):
129128
)
130129

131130
# Should exit 0 even when disabled
132-
self.assertEqual(result.returncode, 0, f"Should succeed when disabled: {result.stderr}")
131+
assert result.returncode == 0, f"Should succeed when disabled: {result.stderr}"
133132

134133
# Should NOT create output file when disabled
135-
accessibility_output = self.temp_dir / 'accessibility.json'
136-
self.assertFalse(accessibility_output.exists(), "Should not create file when disabled")
134+
accessibility_output = self.snap_dir / 'accessibility' / 'accessibility.json'
135+
assert not accessibility_output.exists(), "Should not create file when disabled"
137136

138137
def test_accessibility_missing_url_argument(self):
139138
"""Test that missing --url argument causes error."""
@@ -145,31 +144,31 @@ def test_accessibility_missing_url_argument(self):
145144
capture_output=True,
146145
text=True,
147146
timeout=30,
148-
env=get_test_env()
147+
env=get_test_env() | {'SNAP_DIR': str(self.snap_dir)}
149148
)
150149

151150
# Should fail with non-zero exit code
152-
self.assertNotEqual(result.returncode, 0, "Should fail when URL missing")
151+
assert result.returncode != 0, "Should fail when URL missing"
153152

154-
def test_accessibility_missing_snapshot_id_argument(self):
153+
def test_accessibility_missing_snapshot_id_argument(self, chrome_test_url):
155154
"""Test that missing --snapshot-id argument causes error."""
156-
test_url = 'https://example.com'
155+
test_url = chrome_test_url
157156

158157
result = subprocess.run(
159158
['node', str(ACCESSIBILITY_HOOK), f'--url={test_url}'],
160159
cwd=str(self.temp_dir),
161160
capture_output=True,
162161
text=True,
163162
timeout=30,
164-
env=get_test_env()
163+
env=get_test_env() | {'SNAP_DIR': str(self.snap_dir)}
165164
)
166165

167166
# Should fail with non-zero exit code
168-
self.assertNotEqual(result.returncode, 0, "Should fail when snapshot-id missing")
167+
assert result.returncode != 0, "Should fail when snapshot-id missing"
169168

170-
def test_accessibility_with_no_chrome_session(self):
169+
def test_accessibility_with_no_chrome_session(self, chrome_test_url):
171170
"""Test that hook fails gracefully when no Chrome session exists."""
172-
test_url = 'https://example.com'
171+
test_url = chrome_test_url
173172
snapshot_id = 'test-no-chrome'
174173

175174
result = subprocess.run(
@@ -182,13 +181,12 @@ def test_accessibility_with_no_chrome_session(self):
182181
)
183182

184183
# Should fail when no Chrome session
185-
self.assertNotEqual(result.returncode, 0, "Should fail when no Chrome session exists")
184+
assert result.returncode != 0, "Should fail when no Chrome session exists"
186185
# Error should mention CDP or Chrome
187186
err_lower = result.stderr.lower()
188-
self.assertTrue(
189-
any(x in err_lower for x in ['chrome', 'cdp', 'cannot find', 'puppeteer']),
190-
f"Should mention Chrome/CDP in error: {result.stderr}"
191-
)
187+
assert any(
188+
x in err_lower for x in ['chrome', 'cdp', 'cannot find', 'puppeteer']
189+
), f"Should mention Chrome/CDP in error: {result.stderr}"
192190

193191

194192
if __name__ == '__main__':

abx_plugins/plugins/apt/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)