Proving Disruption in Numbers: Load Testing During Chaos Injection

Part 13 of the Tumult series. ← Part 12: The Full Span Waterfall
Chaos engineering without load is a rehearsal without an audience. You can kill a database connection, pause a container, inject latency — but if nothing is using the system when the fault hits, you have no evidence of impact. The experiment passes, the journal says “completed,” and you have learned nothing about how your system behaves under real conditions.
Tumult now runs load tests concurrently with chaos injection. The load generator hammers your system while faults are active. The disruption is measured in numbers — latency spikes, error rates, throughput drops — captured in the same journal, queryable in the same DuckDB store, visible in the same OTel trace.
The Architecture
resilience.experiment (root trace)
├── resilience.hypothesis.before
├── resilience.load (background — k6 running continuously)
│ └── TRACEPARENT propagated to k6 for trace correlation
├── resilience.action: pause-postgres (foreground chaos)
├── resilience.hypothesis.after
└── load_result: {latency_p95: 157ms, error_rate: 0.003, requests: 300}
The load test runs as a background process. The chaos method runs in the foreground. Both share the same parent OTel trace. When you open SigNoz, the load span and the chaos span appear in parallel on the waterfall.
A Real Example
Here is a PostgreSQL experiment. k6 hammers the database with real INSERT and SELECT queries using the xk6-sql driver. While k6 is running, Pumba pauses the PG container for 5 seconds.
title: PostgreSQL under k6 load — container pause disruption
load:
tool: k6
script: examples/k6/pg-load.js
vus: 5
duration_s: 20.0
method[3]:
- name: connection-count-before
activity_type: probe
...
- name: pause-postgres-5s
activity_type: action
...
- name: connection-count-after
activity_type: probe
...
Run it:
tumult run examples/pg-load-chaos.toon
Or use CLI flags to add load to any existing experiment:
tumult run experiment.toon --load k6 --load-script load.js --load-vus 50 --load-duration 30s
The Evidence
The journal captures the load result alongside the method results:
load_result:
tool: k6
duration_s: 10.5
vus: 5
latency_p50_ms: 101.0
latency_p95_ms: 157.0
error_rate: 0.003
total_requests: 300
thresholds_met: true
Compare this to a baseline run without chaos:
| Metric | Baseline (no chaos) | Under chaos (5s pause) | Impact |
|---|---|---|---|
| p95 latency | 97ms | 157ms | +62% |
| Max latency | 151ms | 5,130ms | 34x |
| Avg query time | 18ms | 47ms | 2.6x |
| Error rate | 0% | 0.3% | Disrupted |
| Recovery | — | 100% | Full |
The max latency of 5,130ms is the direct fingerprint of the 5-second container pause. That number exists because k6 was running real queries against PostgreSQL when the container froze. Without load, the experiment would have reported “completed” with no evidence of impact.
SQL Analytics
Load results flow into DuckDB alongside experiment and activity data:
tumult analyze --query "
SELECT e.title, l.tool, l.vus, l.latency_p95_ms, l.error_rate
FROM experiments e
JOIN load_results l ON e.experiment_id = l.experiment_id
ORDER BY l.latency_p95_ms DESC
"
Or use the default summary:
tumult analyze
Experiment: PostgreSQL under k6 load — container pause disruption
Status: PASS (10687ms)
Timeline:
├─ pg-responds (probe) (hypothesis before) 115ms
├─ connection-count-before (probe) 3065ms → 6
├─ pause-postgres-5s (action) 5354ms
├─ connection-count-after (probe) 92ms → 6
└─ pg-responds (probe) (hypothesis after) 55ms
Load Test (k6):
VUs: 5 Duration: 10.5s Requests: 300
Latency: p50=101ms p95=157ms
Throughput: 29 req/s Error rate: 0.003
Thresholds: PASS
OTel Span Attributes
The resilience.load span carries the full result as attributes:
resilience.load.tool: k6
resilience.load.vus: 5
resilience.load.throughput_rps: 29.0
resilience.load.latency_p50_ms: 101.0
resilience.load.latency_p95_ms: 157.0
resilience.load.error_rate: 0.003
resilience.load.total_requests: 300
resilience.load.thresholds_met: true
resilience.load.duration_s: 10.5
These flow to SigNoz, Jaeger, or any OTLP backend — queryable alongside the experiment trace.
When Load Matters
Some experiments need load to produce meaningful results:
- Connection pool exhaustion — killing idle connections has no effect without active traffic
- Network latency injection — p95 impact is only measurable with concurrent requests
- CPU/memory stress — degradation manifests as increased response times under load
- Failover testing — client-side impact (retries, timeouts) only visible with active sessions
Other experiments are meaningful without load:
- Pod deletion — Kubernetes scheduler behavior is independent of traffic
- Node drain — pod rescheduling happens regardless of load
- Data integrity checks — corruption detection doesn’t need concurrent writes
Tumult makes load optional. Add it when the experiment question is “how does this affect users?” Leave it off when the question is “does the infrastructure mechanism work?”
Try It
git clone https://github.com/mwigge/tumult.git && cd tumult && ./install.sh
make up-targets
tumult run examples/pg-under-load.toon
tumult analyze
The numbers tell the story.
Try Tumult at tumult.rs
Next in the series: Part 14 — GameDay Is Here →