Stop manually testing your VAPI/Retell assistant

We turned a 1,000+ row Excel UAT sheet into a one-click evaluation harness that catches regressions, scores semantic correctness, and gives you real metrics instead of “it feels better”.

EvalVista VAPI assistant QA dashboard showing pass rate, regressions and version history
EvalVista Dashboard Overview
Test cases scored visualization dashboard - EvalVista.com v2
Test runs and percentage change - EvalVista.com
Test cases scored visualization dashboard - EvalVista.com v2
Run execution stats and trends - evalvista.com
Used by teams building serious voice assistants
Version-aware test runs

One dashboard for VAPI/Retell assistant QA in minutes

Run your entire UAT suite in minutes, see exactly what changed between versions, and decide with real metrics whether an assistant is ready to ship or needs another prompt round.

  • Every run tied to a VAPI/Retell assistant ID + version hash
  • Semantic scoring to check meaning, not just exact words
  • Regression diffs between the last version and the new one
  • Group tests by flow or intent and see coverage per group
  • Export CSV/PDF reports to share with product and QA

Ship with less regression anxiety

See a clear pass rate across your whole UAT suite so you know if a new assistant (Vapi/Retell) version is actually safe to ship.

Pass rate QA - evalvista.com

Cut QA from hours to minutes

Run hundreds of test cases in a few minutes, compare runs across versions, and stop spending half a day on manual spot-checks.

Regression tests per run chart - evalvista.com

Give stakeholders real numbers

Show accuracy, regressions, and coverage over time so conversations with product and QA are driven by data, not “it feels better”.

Accuracy trend over recent runs - evalvista.com
Internal & early teams

What changed when we used it on real VAPI assistants

We built the harness for our own VAPI assistant first, then started rolling it out to a handful of teams who were stuck in the same Excel-driven QA hell.

Before this harness, every prompt change felt risky. Now we run 1,000+ test cases in a few minutes and see exactly which scenarios improved or broke before we touch production.

Bader-Eddine Qodia

Founder & CEO

- 0 %

Drop in manual QA time per release

We finally stopped arguing about “it feels better”. Each run gives us accuracy and regression metrics, so product, QA, and engineering look at the same numbers.

Sara K.

Product owner – pilot VAPI team

+ 0 %

More issues caught before reaching production

The biggest win is confidence. We can tweak prompts daily, run the suite, and only ship changes when the numbers say it’s safe based on EvalVista.

Alex M.

QA lead – early access team

+ 0 %

Fewer regressions slipping into live assistants

Under the hood

Built for serious assistant QA teams

From embeddings-based semantic scoring to version-locked run history and exports, the harness plugs directly into how engineering and product already work.

Plans & beta access

Start in staging, then scale with your assistants

Pricing is based on how many assistants you run and how many test cases you execute, not how many seats you have.

Edit Content
Free

Start testing immediately with our 7-day free trial. No credit card required. Full platform access included.

$ 0 7-day free trial
  • 1 VAPI assistant

    Connect a single assistant in staging so you can test flows safely before touching production traffic.

  • 500 test cases

    Enough volume to replace your Excel sheet and automate your core UAT and smoke-test scenarios.

  • Standard support

    Support by email during business hours — we’ll help you set up your first suites and runs.

  • Email support

    Support by email during business hours — we’ll help you set up your first suites and runs.

Starter

Perfect for small teams getting started with automated QA. Test multiple assistants with support.

$ 49 /month
  • Up to 2 assistants

    Cover multiple assistants or environments, for example inbound, outbound, and a backup or A/B version.

  • 5,000 test cases

    Run full regression suites after every prompt or KB change without worrying about hitting a hard cap.

  • Extra tests: $5 /1K

    If you exceed your monthly test case limit on paid plans, we'll charge $5 per additional 1,000 test cases. Free tier users must upgrade to continue testing.

  • Email support

    Support by email during business hours — we’ll help you set up your first suites and runs.

Team

For growing teams who need Slack alerts, CI/CD integration, and priority support for production testing.

$ 149 /month
  • Up to 5 assistants

    Cover multiple assistants or environments, for example inbound, outbound, and a backup or A/B version.

  • 20,000 test cases

    Run full regression suites after every prompt or KB change without worrying about hitting a hard cap.

  • Extra tests: $5/2K

    If you exceed your monthly test case limit on paid plans, we'll charge $5 per additional 1,000 test cases. Free tier users must upgrade to continue testing.

  • Slack/Teams

    Get run summaries and alerts in Slack/Teams plus faster responses when something breaks.

  • Priority support

    Get run summaries and alerts in Slack/Teams plus faster responses when something breaks.

Scale

For orgs running multiple assistants or high test volumes who need custom limits and SLAs & SSO integration.

$ Custom
  • ∞ assistants

    No per-assistant limit — roll the harness out across all your VAPI / Retell assistants and environments.

  • ∞ tests & SSO

    We tune test volume, environments, and retention to your traffic patterns and hook into your SSO/IdP.

  • Priority support

    Named contact, escalation path, and contractual SLAs for response time, uptime, and incident handling.

  • Guaranteed SLA

    Named contact, escalation path, and contractual SLAs for response time, uptime, and incident handling.

Edit Content
Free

Start testing immediately with our 7-day free trial. No credit card required. Full platform access included.

$ 0 7-day free trial
  • 1 VAPI or Retell assistant

    Connect a single assistant in staging so you can test flows safely before touching production traffic.

  • Up to 500 test cases

    Enough volume to replace your Excel sheet and automate your core UAT and smoke-test scenarios.

  • Email support

    Support by email during business hours — we’ll help you set up your first suites and runs.

Starter

Perfect for small teams getting started with automated QA. Test multiple assistants with email support.

$ 459 /year
  • Up to 2 assistants (VAPI/Retell)

    Cover multiple assistants or environments, for example inbound, outbound, and a backup or A/B version.

  • Up to 5,000 test cases per month

    Run full regression suites after every prompt or KB change without worrying about hitting a hard cap.

  • Overage: $5 per 1,000 extra tests

    If you exceed your monthly test case limit on paid plans, we'll charge $5 per additional 1,000 test cases. Free tier users must upgrade to continue testing.

  • Email support

    Support by email during business hours — we’ll help you set up your first suites and runs.

Team

For growing teams who need Slack alerts, CI/CD integration, and priority support for production testing.

$ 1,399 /year
  • Up to 5 assistants (VAPI/Retell)

    Cover multiple assistants or environments, for example inbound, outbound, and a backup or A/B version.

  • Up to 20,000 test cases per month

    Run full regression suites after every prompt or KB change without worrying about hitting a hard cap.

  • Slack/Teams notifications & priority support

    Get run summaries and alerts in Slack/Teams plus faster responses when something breaks.

Scale

For orgs running many assistants or very high test volume who need custom limits and SLAs.

$ Custom
  • Unlimited assistants

    No per-assistant limit — roll the harness out across all your VAPI / Retell assistants and environments.

  • Custom test limits, SSO, and security reviews

    We tune test volume, environments, and retention to your traffic patterns and hook into your SSO/IdP.

  • Dedicated support & SLAs

    Named contact, escalation path, and contractual SLAs for response time, uptime, and incident handling.

Transform QA

See how every assistant version performs over time

Run the same UAT suite on each VAPI/Retell version and instantly see pass rate, regressions, and semantic accuracy before you ship.

Run results over time - Evalvista.com
Run results over time
Frequently Asked Questions

Everything you need to know before you plug in your assistant

These are the questions teams usually ask when they move from manual Excel UAT to automated evaluation. If you’re still unsure after reading, just reach out and we’ll walk you through your use case.

Yes. Our Sandbox (beta) plan is free while we’re in private beta. You can connect one VAPI or Retell assistant in staging and run up to 2,000 test cases per month to see how the harness fits your workflow.

You only need three things: a VAPI or Retell assistant, your existing UAT sheet (CSV/Excel is perfect), and an API key. We help you import test cases, define expected answers, and run your first suite — most teams see their first results in under a day.

Each test case has a ground-truth answer. We compare the assistant’s reply using embeddings to check if the meaning matches, not just the exact wording. When a case is ambiguous, we fall back to a GPT-BERT Models judge with clear rubrics. You can always review, override, and refine the scoring rules.

We store prompts, answers, and evaluation results for each run so you can audit regressions and track progress over time. Sensitive data can be redacted or disabled in your configuration. Data retention and storage location can be adjusted on higher-tier plans to match your compliance needs.

Billing is subscription-based per workspace. You choose monthly or annual billing, and each plan includes a fixed number of assistants and test cases. If you outgrow your limits, we adjust them together — no surprise overage bills.

Test cases scored visualization dashboard - EvalVista.com v2
Run execution stats and trends - evalvista.com
Test runs and percentage change - EvalVista.com
Test cases scored visualization dashboard - EvalVista.com v2
Start in staging

Ready to stop manually testing your assistant?
Turn your Excel UAT sheet into a one-click QA harness.

Connect your VAPI or Retell assistant, import your existing test cases, and see clear pass/fail metrics and regressions before every release.