Stop manually testing your VAPI/Retell assistant
We turned a 1,000+ row Excel UAT sheet into a one-click evaluation harness that catches regressions, scores semantic correctness, and gives you real metrics instead of “it feels better”.
- For teams already on VAPI/Retell
- Keeps your prompts under version control
- No changes to your live call flow
Version-aware test runs
One dashboard for VAPI/Retell assistant QA in minutes
Run your entire UAT suite in minutes, see exactly what changed between versions, and decide with real metrics whether an assistant is ready to ship or needs another prompt round.
- Works with VAPI & Retell
- Safe to iterate every day
-
Every run tied to a VAPI/Retell assistant ID + version hash
-
Semantic scoring to check meaning, not just exact words
-
Regression diffs between the last version and the new one
-
Group tests by flow or intent and see coverage per group
-
Export CSV/PDF reports to share with product and QA
Ship with less regression anxiety
See a clear pass rate across your whole UAT suite so you know if a new assistant (Vapi/Retell) version is actually safe to ship.
Cut QA from hours to minutes
Run hundreds of test cases in a few minutes, compare runs across versions, and stop spending half a day on manual spot-checks.
Give stakeholders real numbers
Show accuracy, regressions, and coverage over time so conversations with product and QA are driven by data, not “it feels better”.
Internal & early teams
What changed when we used it on real VAPI assistants
We built the harness for our own VAPI assistant first, then started rolling it out to a handful of teams who were stuck in the same Excel-driven QA hell.
Before this harness, every prompt change felt risky. Now we run 1,000+ test cases in a few minutes and see exactly which scenarios improved or broke before we touch production.
Bader-Eddine Qodia
Founder & CEO
Drop in manual QA time per release
We finally stopped arguing about “it feels better”. Each run gives us accuracy and regression metrics, so product, QA, and engineering look at the same numbers.
Sara K.
Product owner – pilot VAPI team
More issues caught before reaching production
The biggest win is confidence. We can tweak prompts daily, run the suite, and only ship changes when the numbers say it’s safe based on EvalVista.
Alex M.
QA lead – early access team
Fewer regressions slipping into live assistants
Under the hood
Built for serious assistant QA teams
From embeddings-based semantic scoring to version-locked run history and exports, the harness plugs directly into how engineering and product already work.
Semantic similarity scoring
Use embeddings to compare assistant replies to your ground truth so correct paraphrases pass even when wording changes.
GPT models fallback judge
When embeddings are not decisive, we call AI models (GPT/BERT) as a fallback judge to decide pass or fail based on meaning.
Version-aware run history (VAPI/Retell)
Keep a full history of test runs per assistant version so you can see exactly what changed when you ship a new prompt set in VAPI/Retell.
Suites and flows tags
Group test cases into suites, flows, and intents to see coverage and pass rate per scenario type, not just globally.
CI & pipeline hooks
Trigger test runs from your CI/CD pipeline so every assistant change receives a score before it reaches production.
Exports & audit trail
Export CSV/PDF reports for stakeholders and keep an audit trail of which version produced which results.
Plans & beta access
Start in staging, then scale with your assistants
Pricing is based on how many assistants you run and how many test cases you execute, not how many seats you have.
Free
Start testing immediately with our 7-day free trial. No credit card required. Full platform access included.
-
1 VAPI assistant
Connect a single assistant in staging so you can test flows safely before touching production traffic.
-
500 test cases
Enough volume to replace your Excel sheet and automate your core UAT and smoke-test scenarios.
-
Standard support
Support by email during business hours — we’ll help you set up your first suites and runs.
-
Email support
Support by email during business hours — we’ll help you set up your first suites and runs.
Starter
Perfect for small teams getting started with automated QA. Test multiple assistants with support.
-
Up to 2 assistants
Cover multiple assistants or environments, for example inbound, outbound, and a backup or A/B version.
-
5,000 test cases
Run full regression suites after every prompt or KB change without worrying about hitting a hard cap.
-
Extra tests: $5 /1K
If you exceed your monthly test case limit on paid plans, we'll charge $5 per additional 1,000 test cases. Free tier users must upgrade to continue testing.
-
Email support
Support by email during business hours — we’ll help you set up your first suites and runs.
Team
For growing teams who need Slack alerts, CI/CD integration, and priority support for production testing.
-
Up to 5 assistants
Cover multiple assistants or environments, for example inbound, outbound, and a backup or A/B version.
-
20,000 test cases
Run full regression suites after every prompt or KB change without worrying about hitting a hard cap.
-
Extra tests: $5/2K
If you exceed your monthly test case limit on paid plans, we'll charge $5 per additional 1,000 test cases. Free tier users must upgrade to continue testing.
-
Slack/Teams
Get run summaries and alerts in Slack/Teams plus faster responses when something breaks.
-
Priority support
Get run summaries and alerts in Slack/Teams plus faster responses when something breaks.
Scale
For orgs running multiple assistants or high test volumes who need custom limits and SLAs & SSO integration.
-
∞ assistants
No per-assistant limit — roll the harness out across all your VAPI / Retell assistants and environments.
-
∞ tests & SSO
We tune test volume, environments, and retention to your traffic patterns and hook into your SSO/IdP.
-
Priority support
Named contact, escalation path, and contractual SLAs for response time, uptime, and incident handling.
-
Guaranteed SLA
Named contact, escalation path, and contractual SLAs for response time, uptime, and incident handling.
Free
Start testing immediately with our 7-day free trial. No credit card required. Full platform access included.
-
1 VAPI or Retell assistant
Connect a single assistant in staging so you can test flows safely before touching production traffic.
-
Up to 500 test cases
Enough volume to replace your Excel sheet and automate your core UAT and smoke-test scenarios.
-
Email support
Support by email during business hours — we’ll help you set up your first suites and runs.
Starter
Perfect for small teams getting started with automated QA. Test multiple assistants with email support.
-
Up to 2 assistants (VAPI/Retell)
Cover multiple assistants or environments, for example inbound, outbound, and a backup or A/B version.
-
Up to 5,000 test cases per month
Run full regression suites after every prompt or KB change without worrying about hitting a hard cap.
-
Overage: $5 per 1,000 extra tests
If you exceed your monthly test case limit on paid plans, we'll charge $5 per additional 1,000 test cases. Free tier users must upgrade to continue testing.
-
Email support
Support by email during business hours — we’ll help you set up your first suites and runs.
Team
For growing teams who need Slack alerts, CI/CD integration, and priority support for production testing.
-
Up to 5 assistants (VAPI/Retell)
Cover multiple assistants or environments, for example inbound, outbound, and a backup or A/B version.
-
Up to 20,000 test cases per month
Run full regression suites after every prompt or KB change without worrying about hitting a hard cap.
-
Slack/Teams notifications & priority support
Get run summaries and alerts in Slack/Teams plus faster responses when something breaks.
Scale
For orgs running many assistants or very high test volume who need custom limits and SLAs.
-
Unlimited assistants
No per-assistant limit — roll the harness out across all your VAPI / Retell assistants and environments.
-
Custom test limits, SSO, and security reviews
We tune test volume, environments, and retention to your traffic patterns and hook into your SSO/IdP.
-
Dedicated support & SLAs
Named contact, escalation path, and contractual SLAs for response time, uptime, and incident handling.
Transform QA
See how every assistant version performs over time
Run the same UAT suite on each VAPI/Retell version and instantly see pass rate, regressions, and semantic accuracy before you ship.
Frequently Asked Questions
Everything you need to know before you plug in your assistant
These are the questions teams usually ask when they move from manual Excel UAT to automated evaluation. If you’re still unsure after reading, just reach out and we’ll walk you through your use case.
Yes. Our Sandbox (beta) plan is free while we’re in private beta. You can connect one VAPI or Retell assistant in staging and run up to 2,000 test cases per month to see how the harness fits your workflow.
You only need three things: a VAPI or Retell assistant, your existing UAT sheet (CSV/Excel is perfect), and an API key. We help you import test cases, define expected answers, and run your first suite — most teams see their first results in under a day.
Each test case has a ground-truth answer. We compare the assistant’s reply using embeddings to check if the meaning matches, not just the exact wording. When a case is ambiguous, we fall back to a GPT-BERT Models judge with clear rubrics. You can always review, override, and refine the scoring rules.
We store prompts, answers, and evaluation results for each run so you can audit regressions and track progress over time. Sensitive data can be redacted or disabled in your configuration. Data retention and storage location can be adjusted on higher-tier plans to match your compliance needs.
Billing is subscription-based per workspace. You choose monthly or annual billing, and each plan includes a fixed number of assistants and test cases. If you outgrow your limits, we adjust them together — no surprise overage bills.
Start in staging
Ready to stop manually testing your assistant?
Turn your Excel UAT sheet into a one-click QA harness.
Connect your VAPI or Retell assistant, import your existing test cases, and see clear pass/fail metrics and regressions before every release.