AI Agent Testing

Turn policies into repeatable agent tests.

Define test suites in YAML or the UI, run them in CI or on a schedule, score with rubrics and thresholds, confirm tool side-effects, and export audit-ready evidence.

What Runs does

Quality scoring
LLM-as-judge with customizable rubrics and numeric thresholds for pass/fail gating.
Policy checks
Regex, keywords, JSONPath, and schema validation on every response — no LLM required.
Latency metrics
Track avg, p50, p95, and max response times per scenario and environment.
Side-effect confirmations
HTTP assertions verify your agent actually took the expected actions.
CI gating
Standard exit codes let you block merges and deploys when tests fail.
Evidence export
Download transcripts, scores, and logs as audit-ready evidence packs.

How it works

1
Define your suite
Write scenarios in YAML or the visual editor. Add personas, variables, and assertions.
2
Point at your agent
Configure your endpoint URL, headers, and environment variables. Bring your own secrets.
3
Run and score
Execute via CLI, CI integration, or schedule. Every run is scored against your rubrics and policy checks.
4
Review and gate
View results in the dashboard, compare across runs, and export evidence for compliance.

Key features

YAML + visual editor
Define suites in code or the UI — version-controlled and reviewable.
Bring your own endpoints
Test any HTTP-based agent. Your secrets stay in your infrastructure.
Run comparison
Diff results across commits, environments, and time — spot regressions instantly.
Policy packs
Org-wide compliance templates that every suite must pass.
Personas and scenarios
Test with diverse user profiles and multi-turn conversation flows.
Versioned suites
Approval workflows for suite changes — no accidental test drift.

Ready to get started?

See pricing or talk to our team about your use case.