Baseline Guide
Tumult’s baseline engine replaces static thresholds with data-driven tolerance derivation. Instead of guessing “latency must be < 500ms”, the engine measures the system and derives “latency should stay within 2 standard deviations of the measured 45ms mean.”
Baseline Methods
Static
Fixed thresholds — compatible with traditional chaos tools.
baseline:
method: static
tolerance_lower: 0
tolerance_upper: 500
Mean ± Nσ (Mean Standard Deviation)
Derives bounds from the arithmetic mean plus/minus N standard deviations. Best for normally distributed metrics like throughput and connection counts.
baseline:
method: mean_stddev
duration_s: 120
interval_s: 2
sigma: 2.0
With σ=2, approximately 95% of normal values fall within the bounds.
Percentile
Uses a percentile value with a safety multiplier. Best for latency metrics which are typically right-skewed.
baseline:
method: percentile
duration_s: 120
interval_s: 2
percentile: 95
multiplier: 1.2
The threshold is p95 × 1.2, giving 20% headroom above the observed 95th percentile.
IQR (Interquartile Range)
Robust to outliers. Uses Q1 - 1.5×IQR to Q3 + 1.5×IQR. Best for noisy environments.
baseline:
method: iqr
duration_s: 120
interval_s: 2
Baseline Phases
Warmup
The first N seconds of baseline collection are discarded. This accounts for system settling time (cold caches, connection pool initialization).
baseline:
warmup_s: 15
duration_s: 120
Anomaly Detection
Before deriving thresholds, the engine checks if the baseline data itself is anomalous:
- High variance: coefficient of variation > 0.5 (50%)
- Extreme range: max - min > 10× the median
- Insufficient samples: fewer than the minimum required
If an anomaly is detected, the experiment can either:
- Abort with a warning (default)
- Continue with a flag in the journal
The anomaly check emits a anomaly.detected span event on the baseline.acquire span, visible in your OTel backend. The span status is set to ERROR when an anomaly is found.
Recovery Detection
After fault removal, the engine scans post-fault samples to find the recovery point — the first index where all subsequent samples are within tolerance. This gives the MTTR (Mean Time to Recovery).
Compliance Ratio
The proportion of post-fault samples within tolerance bounds. A ratio of 1.0 means full recovery; 0.5 means half the post-fault samples breached the baseline.
Streaming Acquisition (AcquisitionStream)
For advanced use cases (custom probes, incremental data feeds), tumult-baseline exposes AcquisitionStream — a streaming interface that accepts samples one at a time rather than collecting a full dataset upfront:
use tumult_baseline::{AcquisitionStream, BaselineConfig};
let config = BaselineConfig {
method: BaselineMethod::MeanStddev,
sigma: 2.0,
..Default::default()
};
let mut stream = AcquisitionStream::new("api-latency", config);
// Feed samples incrementally (e.g., from a live probe loop)
stream.push(45.2);
stream.push(47.8);
stream.push(43.1);
// Derive tolerance bounds at any point
if let Some(result) = stream.derive() {
println!("lower={}, upper={}", result.lower, result.upper);
}
This is used internally by the runner for live baseline capture and is also available to custom integrations.
Choosing a Method
| Metric Type | Recommended Method | Why |
|---|---|---|
| Latency (p50, p95, p99) | Percentile | Latency distributions are skewed |
| Throughput (req/s) | Mean ± Nσ | Throughput is approximately normal |
| Error rate | Mean ± Nσ with σ=2 | Error rates are bounded 0-1 |
| Connection count | Mean ± Nσ | Counts are approximately normal |
| Binary health check | Static (0 or 1) | No distribution to measure |
| Noisy metrics | IQR | Robust to outliers |
Property-Based Testing
The statistical functions in tumult-baseline are covered by property-based tests using proptest. These tests generate arbitrary datasets and verify mathematical invariants (e.g., mean is always between min and max, stddev is non-negative) rather than relying on fixed examples. Run them with:
cargo test -p tumult-baseline