Evidence Mode: Simulated Proof

Simulated proof for your workflows.

Define test suites in YAML or the UI, run them in CI or on a schedule, score with rubrics and thresholds, confirm tool side-effects, and export audit-ready evidence.

Book a demo View pricing

What Runs does

Quality scoring

LLM-as-judge with customizable rubrics and numeric thresholds for pass/fail gating.

Policy checks

Regex, keywords, JSONPath, and schema validation on every response — no LLM required.

Latency metrics

Track avg, p50, p95, and max response times per scenario and environment.

Side-effect confirmations

HTTP assertions verify your agent actually took the expected actions.

CI gating

Standard exit codes let you block merges and deploys when tests fail.

Evidence export

Download transcripts, scores, and logs as audit-ready evidence packs.

How it works

Define your suite

Write scenarios in YAML or the visual editor. Add personas, variables, and assertions.

Point at your agent

Configure your endpoint URL, headers, and environment variables. Bring your own secrets.

Run and score

Execute via CLI, CI integration, or schedule. Every run is scored against your rubrics and policy checks.

Review and gate

View results in the dashboard, compare across runs, and export evidence for compliance.

Key features

YAML + visual editor

Define suites in code or the UI — version-controlled and reviewable.

Bring your own endpoints

Test any HTTP-based agent. Your secrets stay in your infrastructure.

Run comparison

Diff results across commits, environments, and time — spot regressions instantly.

Policy packs

Org-wide compliance templates that every suite must pass.

Personas and scenarios

Test with diverse user profiles and multi-turn conversation flows.

Versioned suites

Approval workflows for suite changes — no accidental test drift.

Ready to get started?

See pricing or talk to our team about your use case.

View pricing Contact sales