Skip to content

Commit 8d7bf8c

Browse files
authored
Merge pull request #20997 from serathius/robustness-documentation
Update robustness and antithesis documentation.
2 parents 99252c1 + f7da7ff commit 8d7bf8c

File tree

2 files changed

+69
-2
lines changed

2 files changed

+69
-2
lines changed

tests/antithesis/README.md

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,56 @@
1-
This directory enables integration of Antithesis with etcd. There are 4 containers running in this system: 3 that make up an etcd cluster (etcd0, etcd1, etcd2) and one that "[makes the system go](https://antithesis.com/docs/getting_started/basic_test_hookup/)" (client).
1+
# etcd Antithesis tests
2+
3+
This document describes the etcd test integration with [Antithesis].
4+
Antithesis provides a testing platform that allows you to explore edge cases, race conditions, and rare
5+
bugs that are difficult or impossible to reproduce in a normal environment.
6+
7+
[Antithesis]: https://antithesis.com/
8+
9+
## Robustness vs Antithesis tests
10+
11+
[Antithesis] runs the robustness tests inside their
12+
[deterministic simulation testing](https://antithesis.com/resources/deterministic_simulation_testing/)
13+
environment and [fault injection](https://antithesis.com/docs/environment/fault_injection/).
14+
15+
For more details on robustness tests, see the [robustness directory](../robustness).
16+
17+
## Antithesis Setup
18+
19+
The setup consists of a 3-node etcd cluster and a client container, orchestrated
20+
via [Docker Compose](https://antithesis.com/docs/getting_started/setup/).
21+
22+
During the etcd Antithesis test suite the etcd server is built with the following patches:
23+
24+
* **Critical code locations**: We replace etcd `gofail` comments (which signify
25+
code locations important for failure injection in robustness tests) with
26+
Antithesis `assert.Reachable`. This guides Antithesis to explore the
27+
execution space around these points.
28+
* **Assertions**: We change etcd `verify` package assertions to Antithesis
29+
`assert.Always`, encouraging the platform to try and break those assertions.
30+
* **Instrumentation**: The etcd binary is instrumented using
31+
`antithesis-go-instrumentor` to enable coverage tracking and feedback for
32+
the Antithesis platform.
33+
34+
The Antithesis etcd tests configure the
35+
[Test Composer](https://antithesis.com/docs/test_templates/test_composer_reference/)
36+
in the following way:
37+
38+
* **`entrypoint`**:
39+
* Waits for all etcd nodes to be healthy.
40+
* Emits the `setup_complete` message to Antithesis to start the testing phase.
41+
* **`singleton_driver_traffic`**:
42+
* Generates robustness test traffic against the cluster while faults are injected.
43+
* Runs as a [Singleton Driver Command], meaning it is the only one generating traffic.
44+
* All generated traffic is saved as an operation history and stored on a shared volume.
45+
* **`finally_validation`**:
46+
* Runs as a [Finally Command], meaning it is the last to run, with failure injection disabled.
47+
* Reads the history of operations and validates them using the robustness test validation logic.
48+
* Results of robustness tests are executed as Antithesis `assert.Always` assertions.
49+
* Similar to robustness tests, it emits a visualization of the operations
50+
history to an HTML file that is uploaded to the Antithesis platform.
51+
52+
[Singleton Driver Command]: https://antithesis.com/docs/test_templates/test_composer_reference/#singleton-driver
53+
[Finally Command]: https://antithesis.com/docs/test_templates/test_composer_reference/#finally-command
254

355
# Running tests with docker compose
456

tests/robustness/README.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,16 @@ The purpose of these tests is to rigorously validate that etcd maintains its [KV
66
[KV API guarantees]: https://etcd.io/docs/v3.6/learning/api_guarantees/#kv-apis
77
[watch API guarantees]: https://etcd.io/docs/v3.6/learning/api_guarantees/#watch-apis
88

9+
## Robustness vs Antithesis tests
10+
11+
[Antithesis] runs the robustness tests inside their
12+
[deterministic simulation testing](https://antithesis.com/resources/deterministic_simulation_testing/)
13+
environment and [fault injection](https://antithesis.com/docs/environment/fault_injection/).
14+
15+
For more details on Antithesis integration, see the [antithesis directory](../antithesis).
16+
17+
[Antithesis]: https://antithesis.com/
18+
919
## Robustness track record
1020

1121
| Correctness / Consistency issue | Report | Introduced in | Discovered by | Reproducible by robustness test | Command |
@@ -20,9 +30,12 @@ The purpose of these tests is to rigorously validate that etcd maintains its [KV
2030
| Watch events lost during stream starvation [#17529] | Mar 2024 | v3.4 or earlier | User | Yes, after covering of slow watch | `make test-robustness-issue17529` |
2131
| Revision decreasing caused by crash during compaction [#17780] | Apr 2024 | v3.4 or earlier | Robustness | Yes, after covering compaction | |
2232
| Watch dropping an event when compacting on delete [#18089] | May 2024 | v3.4 or earlier | Robustness | Yes, after covering of compaction | `make test-robustness-issue18089` |
33+
| Panic when two snapshots are received in a short period [#18055] | May 2024 | v3.4 or earlier | Robustness | Yes, via Antithesis | |
2334
| Inconsistency when reading compacted revision in TXN [#18667] | Oct 2024 | v3.4 or earlier | User | | |
2435
| Missing delete event on watch opened on same revision as compaction [#19179] | Jan 2025 | v3.4 or earlier | Robustness | Yes, after covering of compaction | `make test-robustness-issue19179` |
25-
| Watch on future revision returns old events or notifications [#20221] | Jun 2025 | v3.4 or earlier | Robustness | Yes, after covering connection to multiple members| |
36+
| Watch on future revision returns notifications [#20221] | Jun 2025 | v3.4 or earlier | Robustness | Yes, after covering connection to multiple members| |
37+
| Watch on future revision returns old events [#20221] | Jun 2025 | v3.4 or earlier | Antithesis | Yes, after covering connection to multiple members| |
38+
| Panic from db page expected to be 5 [#20271] | Jul 2025 | v3.4 or earlier | Antithesis | Yes, via Antithesis | |
2639

2740
[#13766]: https://github.com/etcd-io/etcd/issues/13766
2841
[#14370]: https://github.com/etcd-io/etcd/issues/14370
@@ -37,6 +50,8 @@ The purpose of these tests is to rigorously validate that etcd maintains its [KV
3750
[#18667]: https://github.com/etcd-io/etcd/issues/18667
3851
[#19179]: https://github.com/etcd-io/etcd/issues/19179
3952
[#20221]: https://github.com/etcd-io/etcd/issues/20221
53+
[#18055]: https://github.com/etcd-io/etcd/issues/18055
54+
[#20271]: https://github.com/etcd-io/etcd/issues/20271
4055

4156
## How Robustness Tests Work
4257

0 commit comments

Comments
 (0)