Helix ships a JSON benchmark harness (python -m benchmarks.api_benchmarks) and continuously publishes the results to CI. The goals are:
.bench/baseline.json.docs/data/bench/history.csv (populated automatically on main).python -m benchmarks.api_benchmarks \
--repeat 5 \
--warmup 1 \
--limit 0 \
--out bench-results/api.json \
--summary-md bench-results/api.md
--limit N keeps only the first N nucleotides/aminos (0 = entire dataset). Use this for quick inner-loop runs or to mimic CI’s 10k-sample sweep.HELIX_BENCH_DNA_FASTA=/abs/path/... and HELIX_BENCH_PROTEIN_FASTA=/abs/path/.... The harness records those paths in the JSON payload so dashboards can compare apples-to-apples.--baseline path/to/baseline.json to compute Δ% vs. a stored run. scripts/bench_check.py baseline current --threshold 5 is what CI uses to gate regressions.Trigger a manual heavy sweep from the Actions → CI → Run workflow button:
bench_heavy to true (this bumps repeats to 10 and disables the 10k sampling limit).dna_fasta / protein_fasta. On hosted runners you typically leave these blank; on self-hosted boxes you can point at a mounted volume or fetcher script.Each run publishes:
benchmarks/out/bench-<SHA>.json — the full schema payload.benchmarks/out/bench-<SHA>.md — a Markdown table appended to the CI summary.docs/data/bench/history.csv (main branch only) — an append-only log that powers the chart below.The gallery below visualizes every *.mean_s column recorded in docs/data/bench/history.csv and summarizes the latest run.
CSV source:
docs/data/bench/history.csv. Commit history contains the raw JSON artifacts underbenchmarks/out/(uploaded by CI) if you need to recompute metrics offline.