Stop manually testing your VAPI/Retell assistant

We turned a 1,000+ row Excel UAT sheet into a one-click evaluation harness that catches regressions, scores semantic correctness, and gives you real metrics instead of “it feels better”.

Watch 7‑min Loom demo

Join free beta

Used by teams building serious voice assistants

Version-aware test runs

One dashboard for VAPI/Retell assistant QA in minutes

Run your entire UAT suite in minutes, see exactly what changed between versions, and decide with real metrics whether an assistant is ready to ship or needs another prompt round.

See a sample test run

Every run tied to a VAPI/Retell assistant ID + version hash
Semantic scoring to check meaning, not just exact words
Regression diffs between the last version and the new one
Group tests by flow or intent and see coverage per group
Export CSV/PDF reports to share with product and QA

Ship with less regression anxiety

See a clear pass rate across your whole UAT suite so you know if a new assistant (Vapi/Retell) version is actually safe to ship.

Cut QA from hours to minutes

Run hundreds of test cases in a few minutes, compare runs across versions, and stop spending half a day on manual spot-checks.

Give stakeholders real numbers

Show accuracy, regressions, and coverage over time so conversations with product and QA are driven by data, not “it feels better”.

Internal & early teams

What changed when we used it on real VAPI assistants

We built the harness for our own VAPI assistant first, then started rolling it out to a handful of teams who were stuck in the same Excel-driven QA hell.

Before this harness, every prompt change felt risky. Now we run 1,000+ test cases in a few minutes and see exactly which scenarios improved or broke before we touch production.

Bader-Eddine Qodia

Founder & CEO

- 0 %

Drop in manual QA time per release

We finally stopped arguing about “it feels better”. Each run gives us accuracy and regression metrics, so product, QA, and engineering look at the same numbers.

Sara K.

Product owner – pilot VAPI team

+ 0 %

More issues caught before reaching production

The biggest win is confidence. We can tweak prompts daily, run the suite, and only ship changes when the numbers say it’s safe based on EvalVista.

Alex M.

QA lead – early access team

+ 0 %

Fewer regressions slipping into live assistants

Under the hood

Built for serious assistant QA teams

From embeddings-based semantic scoring to version-locked run history and exports, the harness plugs directly into how engineering and product already work.

Semantic similarity scoring

Use embeddings to compare assistant replies to your ground truth so correct paraphrases pass even when wording changes.

Learn more

GPT models fallback judge

When embeddings are not decisive, we call AI models (GPT/BERT) as a fallback judge to decide pass or fail based on meaning.

Learn more

Version-aware run history (VAPI/Retell)

Keep a full history of test runs per assistant version so you can see exactly what changed when you ship a new prompt set in VAPI/Retell.

Learn more

Suites and flows tags

Group test cases into suites, flows, and intents to see coverage and pass rate per scenario type, not just globally.

Learn more

CI & pipeline hooks

Trigger test runs from your CI/CD pipeline so every assistant change receives a score before it reaches production.

Learn more

Exports & audit trail

Export CSV/PDF reports for stakeholders and keep an audit trail of which version produced which results.

Learn more

Plans & beta access

Start in staging, then scale with your assistants

Pricing is based on how many assistants you run and how many test cases you execute, not how many seats you have.

Monthly
Annually (Save 20%)

Edit Content

Free

Start testing immediately with our 7-day free trial. No credit card required. Full platform access included.

^$ 0 _{7-day free trial}

1 VAPI assistant

Connect a single assistant in staging so you can test flows safely before touching production traffic.
500 test cases

Enough volume to replace your Excel sheet and automate your core UAT and smoke-test scenarios.
Standard support

Support by email during business hours — we’ll help you set up your first suites and runs.
Email support

Support by email during business hours — we’ll help you set up your first suites and runs.

Join for free

Starter

Perfect for small teams getting started with automated QA. Test multiple assistants with support.

^$ 49 _/month

Up to 2 assistants

Cover multiple assistants or environments, for example inbound, outbound, and a backup or A/B version.
5,000 test cases

Run full regression suites after every prompt or KB change without worrying about hitting a hard cap.
Extra tests: $5 /1K

If you exceed your monthly test case limit on paid plans, we'll charge $5 per additional 1,000 test cases. Free tier users must upgrade to continue testing.
Email support

Support by email during business hours — we’ll help you set up your first suites and runs.

Request onboarding

Team

For growing teams who need Slack alerts, CI/CD integration, and priority support for production testing.

^$ 149 _/month

Up to 5 assistants

Cover multiple assistants or environments, for example inbound, outbound, and a backup or A/B version.
20,000 test cases

Run full regression suites after every prompt or KB change without worrying about hitting a hard cap.
Extra tests: $5/2K

If you exceed your monthly test case limit on paid plans, we'll charge $5 per additional 1,000 test cases. Free tier users must upgrade to continue testing.
Slack/Teams

Get run summaries and alerts in Slack/Teams plus faster responses when something breaks.
Priority support

Get run summaries and alerts in Slack/Teams plus faster responses when something breaks.

Request onboarding

Scale

For orgs running multiple assistants or high test volumes who need custom limits and SLAs & SSO integration.

^$ Custom

∞ assistants

No per-assistant limit — roll the harness out across all your VAPI / Retell assistants and environments.
∞ tests & SSO

We tune test volume, environments, and retention to your traffic patterns and hook into your SSO/IdP.
Priority support

Named contact, escalation path, and contractual SLAs for response time, uptime, and incident handling.
Guaranteed SLA

Named contact, escalation path, and contractual SLAs for response time, uptime, and incident handling.

Talk to sales

Edit Content

Free

Start testing immediately with our 7-day free trial. No credit card required. Full platform access included.

^$ 0 _{7-day free trial}

1 VAPI or Retell assistant

Connect a single assistant in staging so you can test flows safely before touching production traffic.
Up to 500 test cases

Enough volume to replace your Excel sheet and automate your core UAT and smoke-test scenarios.
Email support

Support by email during business hours — we’ll help you set up your first suites and runs.

Join free beta

Starter

Perfect for small teams getting started with automated QA. Test multiple assistants with email support.

^$ 459 _/year

Up to 2 assistants (VAPI/Retell)

Cover multiple assistants or environments, for example inbound, outbound, and a backup or A/B version.
Up to 5,000 test cases per month

Run full regression suites after every prompt or KB change without worrying about hitting a hard cap.
Overage: $5 per 1,000 extra tests

If you exceed your monthly test case limit on paid plans, we'll charge $5 per additional 1,000 test cases. Free tier users must upgrade to continue testing.
Email support

Support by email during business hours — we’ll help you set up your first suites and runs.

Request onboarding

Team

For growing teams who need Slack alerts, CI/CD integration, and priority support for production testing.

^$ 1,399 _/year

Up to 5 assistants (VAPI/Retell)

Cover multiple assistants or environments, for example inbound, outbound, and a backup or A/B version.
Up to 20,000 test cases per month

Run full regression suites after every prompt or KB change without worrying about hitting a hard cap.
Slack/Teams notifications & priority support

Get run summaries and alerts in Slack/Teams plus faster responses when something breaks.

Request onboarding

Scale

For orgs running many assistants or very high test volume who need custom limits and SLAs.

^$ Custom

Unlimited assistants

No per-assistant limit — roll the harness out across all your VAPI / Retell assistants and environments.
Custom test limits, SSO, and security reviews

We tune test volume, environments, and retention to your traffic patterns and hook into your SSO/IdP.
Dedicated support & SLAs

Named contact, escalation path, and contractual SLAs for response time, uptime, and incident handling.

Talk to sales

Transform QA

See how every assistant version performs over time

Run the same UAT suite on each VAPI/Retell version and instantly see pass rate, regressions, and semantic accuracy before you ship.

See dashboard

Frequently Asked Questions

Everything you need to know before you plug in your assistant

These are the questions teams usually ask when they move from manual Excel UAT to automated evaluation. If you’re still unsure after reading, just reach out and we’ll walk you through your use case.

1. Is there a free beta or sandbox?

Yes. Our Sandbox (beta) plan is free while we’re in private beta. You can connect one VAPI or Retell assistant in staging and run up to 2,000 test cases per month to see how the harness fits your workflow.

2. What do I need to get started?

You only need three things: a VAPI or Retell assistant, your existing UAT sheet (CSV/Excel is perfect), and an API key. We help you import test cases, define expected answers, and run your first suite — most teams see their first results in under a day.

3. How do you decide if an answer is a pass or a fail?

Each test case has a ground-truth answer. We compare the assistant’s reply using embeddings to check if the meaning matches, not just the exact wording. When a case is ambiguous, we fall back to a GPT-BERT Models judge with clear rubrics. You can always review, override, and refine the scoring rules.

4. What data do you store and for how long?

We store prompts, answers, and evaluation results for each run so you can audit regressions and track progress over time. Sensitive data can be redacted or disabled in your configuration. Data retention and storage location can be adjusted on higher-tier plans to match your compliance needs.

5. How does billing work?

Billing is subscription-based per workspace. You choose monthly or annual billing, and each plan includes a fixed number of assistants and test cases. If you outgrow your limits, we adjust them together — no surprise overage bills.

Start in staging

Ready to stop manually testing your assistant?
Turn your Excel UAT sheet into a one-click QA harness.

Connect your VAPI or Retell assistant, import your existing test cases, and see clear pass/fail metrics and regressions before every release.

Join free beta

Watch 7-min Loom demo

Try for free

Stop manually testing your VAPI/Retell assistant

Used by teams building serious voice assistants

Version-aware test runs

One dashboard for VAPI/Retell assistant QA in minutes

Ship with less regression anxiety

Cut QA from hours to minutes

Give stakeholders real numbers

Internal & early teams

What changed when we used it on real VAPI assistants

Before this harness, every prompt change felt risky. Now we run 1,000+ test cases in a few minutes and see exactly which scenarios improved or broke before we touch production.

Bader-Eddine Qodia

Drop in manual QA time per release

We finally stopped arguing about “it feels better”. Each run gives us accuracy and regression metrics, so product, QA, and engineering look at the same numbers.

Sara K.

More issues caught before reaching production

The biggest win is confidence. We can tweak prompts daily, run the suite, and only ship changes when the numbers say it’s safe based on EvalVista.

Alex M.

Fewer regressions slipping into live assistants

Under the hood

Built for serious assistant QA teams

Semantic similarity scoring

GPT models fallback judge

Version-aware run history (VAPI/Retell)

Suites and flows tags

CI & pipeline hooks

Exports & audit trail

Plans & beta access

Start in staging, then scale with your assistants

Free

Starter

Team

Scale

Free

Starter

Team

Scale

Transform QA

See how every assistant version performs over time

Frequently Asked Questions

Everything you need to know before you plug in your assistant

Start in staging

Ready to stop manually testing your assistant? Turn your Excel UAT sheet into a one-click QA harness.

Product

Resources

Company

Get in touch

Ready to stop manually testing your assistant?
Turn your Excel UAT sheet into a one-click QA harness.