Compare prompt performance
across every version
Run A/B tests on your prompt variants, track token cost and quality scores side-by-side, and ship the version that actually works.
Get started — $39/moCancel anytime. No usage limits on prompt tests.
Side-by-side diff viewCost per token trackingJudge-model scoringCSV & JSON exportMulti-model support
Pro Plan
$39
/month
- ✓Unlimited prompt variants
- ✓A/B test runs with any model
- ✓Cost & quality dashboards
- ✓Judge-model auto-scoring
- ✓Export reports (CSV / JSON)
- ✓Email support
Frequently asked questions
Which AI APIs are supported?
OpenAI, Anthropic, and any OpenAI-compatible endpoint. Bring your own API keys.
How are quality scores calculated?
You define scoring rubrics per test. The app sends responses to a judge model and aggregates scores across runs.
Can I export the results?
Yes. Every test report exports to CSV and JSON so you can pipe data into your own dashboards.