Skip to content

Commit 38cbd62

Browse files
committed
docs: update AGENTS.md with long-running task rules and current state
1 parent 673910c commit 38cbd62

File tree

1 file changed

+125
-0
lines changed

1 file changed

+125
-0
lines changed

AGENTS.md

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# AGENTS.md — DQE Development & Benchmarking Guide
2+
3+
## DQE Architecture (Quick Mental Model)
4+
5+
```
6+
SQL query → TransportTrinoSqlAction (coordinator)
7+
→ PlanFragmenter (splits into shard plans)
8+
→ TransportShardExecuteAction (per-shard dispatch)
9+
├── FusedScanAggregate (scalar aggs: SUM, AVG, COUNT)
10+
├── FusedGroupByAggregate (GROUP BY: varchar/numeric keys, flat/hash paths)
11+
└── Fast paths (COUNT DISTINCT: HashSet, bitset, ordinal)
12+
→ Coordinator merges shard results → returns to client
13+
```
14+
15+
### Key Files
16+
17+
| File | Lines | Purpose |
18+
|------|-------|---------|
19+
| `TransportShardExecuteAction.java` | ~2200 | Shard-level dispatch: routes queries to fast paths |
20+
| `FusedGroupByAggregate.java` | ~12700 | GROUP BY execution: varchar/numeric keys, flat arrays, collectors |
21+
| `FusedScanAggregate.java` | ~1800 | Scalar aggregation: SUM, AVG, COUNT, flat array path |
22+
| `TransportTrinoSqlAction.java` | ~4200 | Coordinator: plan optimization, shard fan-out, result merge |
23+
24+
### Dispatch Priority (TransportShardExecuteAction.executePlan)
25+
26+
1. Scalar agg → `FusedScanAggregate.canFuse()` → flat array path
27+
2. Bare single-column scan → COUNT(DISTINCT) fast paths
28+
3. 2-key COUNT(DISTINCT) → HashSet paths (numeric/varchar)
29+
4. Expression GROUP BY → ordinal-cached path
30+
5. Generic GROUP BY → `FusedGroupByAggregate.canFuse()` → fused path
31+
6. Fallback → generic pipeline
32+
33+
### Filtered vs Unfiltered Queries
34+
35+
- **MatchAllDocsQuery** (no WHERE): tight `for(doc=0; doc<maxDoc; doc++)` loop, sequential DV access
36+
- **Filtered** (WHERE clause): Collector-based `collect(int doc)` with virtual dispatch overhead
37+
- **Selective filter optimization**: bitset pre-collection for filters matching <50% of docs
38+
39+
## Dev Iteration Loop
40+
41+
1. Code change (edit Java files)
42+
2. Compile: `./gradlew :dqe:compileJava`
43+
3. Reload plugin — see Long-Running Task Rules
44+
4. Correctness gate — MUST be >= 38/43. If regression, STOP and fix.
45+
5. Benchmark target queries — see Long-Running Task Rules
46+
47+
All steps 3-5 MUST follow the async execution pattern in Long-Running Task Rules.
48+
49+
## Long-Running Task Rules
50+
51+
Any command that may take longer than 2 minutes MUST be run asynchronously. This includes: benchmarks, plugin reload, correctness tests, compilation, and full benchmark suites.
52+
53+
### Async Execution Pattern
54+
55+
1. **NEVER run long-running commands synchronously** — always background and poll.
56+
2. **Launch in a subshell** so the parent shell returns immediately:
57+
```bash
58+
nohup bash -c 'cd /local/home/penghuo/oss/os-sql/benchmarks/clickbench && bash run/run_all.sh reload-plugin > /tmp/reload.log 2>&1' &>/dev/null &
59+
echo "launched"
60+
```
61+
**CRITICAL**: Plain `nohup cmd &` or `(cmd &)` does NOT work — the shell hangs waiting for the background process. You MUST use `nohup bash -c '...' &>/dev/null &`.
62+
3. **Poll for completion** — check output tail for success/failure:
63+
```bash
64+
tail -5 /tmp/reload.log
65+
```
66+
4. **Poll interval**: every 10-30s for benchmarks, every 30-60s for builds.
67+
5. **Analyze each poll result** — if ERROR/FAILURE appears in output, stop and diagnose immediately.
68+
6. **Monitoring IS the task** — never launch a long-running command and then do something else.
69+
70+
### Common Long-Running Commands
71+
72+
| Command | Est. Time | Output File | Completion Marker | Error Marker |
73+
|---------|-----------|-------------|-------------------|--------------|
74+
| `./gradlew :dqe:compileJava` | ~5s | `/tmp/compile.log` | `BUILD SUCCESSFUL` | `BUILD FAILED` |
75+
| `run_all.sh reload-plugin` | 2-3 min | `/tmp/reload.log` | `reloaded successfully` | `FAILED` or `Error` |
76+
| `run_all.sh correctness` | ~2 min | `/tmp/correctness.log` | `Summary:` | `Error` |
77+
| `run_opensearch.sh --query N` | ~1 min | `/tmp/bench-qN.log` | `Results written` | `Error` or `failed` |
78+
| `run_opensearch.sh` (full suite) | 5-15 min | `/tmp/bench-full.log` | `Results written` | `Error` or `failed` |
79+
80+
### Multi-Query Benchmark with Monitoring
81+
82+
```bash
83+
# Benchmark multiple queries sequentially, monitoring each
84+
for Q in 31 32 38 41; do
85+
LOG=/tmp/bench-q${Q}.log
86+
nohup bash -c "cd /local/home/penghuo/oss/os-sql/benchmarks/clickbench && bash run/run_opensearch.sh --warmup 1 --num-tries 3 --query $Q --output-dir /tmp/q${Q} > $LOG 2>&1" &>/dev/null &
87+
PID=$!
88+
while kill -0 $PID 2>/dev/null; do sleep 3; tail -1 $LOG 2>/dev/null; done
89+
echo "=== Q${Q} ==="
90+
grep -E "Q[0-9]+ run" $LOG
91+
done
92+
```
93+
94+
### Kill All Benchmarks
95+
96+
```bash
97+
pkill -f "run_opensearch.sh"; pkill -f "run_all.sh"
98+
```
99+
100+
## Query Numbering (CRITICAL)
101+
102+
| Context | Indexing | "Q17" means |
103+
|---------|----------|-------------|
104+
| `--query N` in scripts | 1-based | `--query 18` for Q17 |
105+
| `queries_trino.sql` line | 1-based | line 18 for Q17 |
106+
| JSON `result[N]` | 0-based | `result[17]` for Q17 |
107+
108+
**Mnemonic**: scripts and SQL are 1-based, JSON is 0-based.
109+
110+
## Pitfalls
111+
112+
- **NEVER** run `reload-plugin` while a benchmark is running
113+
- Benchmark on 100M (`hits`), correctness on 1M (`hits_1m`)
114+
- Use ClickHouse-Parquet baseline, NOT native MergeTree
115+
- Baseline file: `benchmarks/clickbench/results/performance/clickhouse_parquet_official/c6a.4xlarge.json`
116+
- OpenSearch endpoint: `http://localhost:9200`, DQE: `POST /_plugins/_trino_sql`
117+
118+
## Current State (2026-03-26)
119+
120+
- Correctness: 29/43 on 1M
121+
- Within 2x of CH-Parquet: 19/43 on r5.4xlarge (was 16/43 on m5.8xlarge before optimization)
122+
- Hybrid bitset/collector optimization deployed (selective filters use bitset, broad use Collector)
123+
- Bitset path: `Weight.count()` estimates selectivity; <50% of docs → bitset, else → Collector
124+
- Big wins: Q18(0.01x), Q39(1.1x), Q41(0.28x), Q42(1.02x), Q43(0.70x)
125+
- Target: >= 32/43 within 2x

0 commit comments

Comments
 (0)