Guides
A practical guide to using Thoughtly Test Agent, Call Me, sample metadata, and call review to catch launch issues before they hit real leads.
Last updated
If I had to reduce prelaunch QA to one rule, it would be this: test logic in text first, then test experience on a real call. Most launch-day failures are not mysterious model problems. They are routing collisions, weak extraction instructions, awkward timing, or transfer paths that nobody tested under pressure.
Thoughtly gives operators a practical stack for catching those failures early: Test Agent for fast text debugging, sample metadata for realistic context, response logs with node step numbers, and Call Me for real-call checks on voice, latencyLatencyThe delay between a caller speaking and the agent responding. Lower latency makes AI voice conversations feel more natural., interruptions, and transfer behavior. Used in order, those tools make launch week much less dramatic.
This guide shows how to use that workflowWorkflowAn automated, multi-step process — usually triggered by an event (form fill, new lead) and orchestrating one or more voice / SMS / email actions. before you put a voice agentVoice agentAn autonomous, conversational interface that interacts with humans over the phone — answering, qualifying, and routing calls without human staffing. in front of real leads. The examples assume high-volume inbound conversion teams in insurance, mortgage, education enrollment, healthcare, home services, real estate, automotive, financial services, legal, and similar funnels where speed-to-lead and handoff quality matter. If you still need to tighten the flow itself, pair this guide with How to Use Outcomes and Branching for Complex Call Flows and How to Use Thoughtly Variables for Dynamic Call Personalization.
If the agent is still early in buildout, start with the core AI VoiceAI voiceAn artificially generated, natural-sounding voice produced by a TTS model. Thoughtly supports a library of AI voices and brand-specific cloning. Agents product page and Thoughtly’s Agent Builder overview. If you are testing a multi-step qualification path, it also helps to review How to Build an AI Agent That Handles Objections During Lead Calls.
Do not start by clicking around and seeing what happens. Start with the exact moments that would make a launch succeed or fail. For an insurance lead flow, that may be whether the agent identifies the company clearly, captures coverage intent, routes a high-intent caller to a producer, and stops cleanly when the caller opts out. For mortgage or education, it may be whether the agent captures urgency, eligibilityEligibilityThe fit criteria that determine whether a prospect can move forward, such as service area, insurance coverage, loan type, location, age, or program requirements., and the preferred next step without sounding robotic or repetitive.
A simple prelaunch matrix keeps testing honest:
| Test layer | What to verify | Primary Thoughtly tool |
|---|---|---|
| Conversation logic | The agent follows the correct node path for each scenario | Test Agent |
| Variable extraction | Fields capture the right value and format | Test Agent |
| Outcome routing | The right branch fires for objections, callbacks, transfers, and exits | Test Agent |
| Action behavior | Lookups, schedulers, and alerts return the expected outputs | Test Agent plus response log |
| Voice experience | Tone, pronunciation, interruption handling, and latency feel right | Call Me |
| Human handoff | Transfers, summaries, and post-call updates land cleanly | Call Me plus live-call review |
Keep the first test pack small but representative. Five strong scenarios are more useful than twenty vague ones. The goal is not to prove the agent can survive every possible sentence on day one. The goal is to confirm that the highest-volume paths work the way your team expects.
Thoughtly’s Test Agent lets you talk to the agent in text, which is the fastest way to catch logic problems while building. The testing docs recommend using it first because you get instant feedback without placing a real call. Open the agent, click Test Agent, and run through representative lead messages: greeting, qualification answers, objections, callback requests, and disqualifying responses.
While you test, watch four things on every turn:
Thoughtly’s docs suggest keeping a list of 10–15 common caller phrases per branch and rerunning them after each edit. That is a good habit because outcome labels that seem clear in theory often collide in practice. A phrase like ‘I can’t talk right now’ should not drift into a generic not-interested path if your actual goal is to schedule a callback.
If a branch handles open-ended questions, use Test Agent to push on Q&A depth before you worry about polish. Thoughtly’s docs explicitly call out the self-loop pattern for testing follow-up questions, and they also recommend shortening any Prompt that feels wordy because clarity beats cleverness. This is where the earlier Outcomes and branching guide becomes useful in practice.
Text testing will not tell you whether the agent sounds natural. It will tell you whether the flow is sane. That alone saves a lot of wasted live-call debugging.
A flow can look perfect in a generic test and still break once CRMCRMThe system of record for leads, contacts, deals, and activity. Thoughtly reads from and writes to your CRM continuously. or workflow data is present. Thoughtly’s testing docs support sample metadata inside Test Agent so you can validate personalization, conditional routing, and prompts that depend on caller context. Use it whenever your agent references lead sourceLead sourceThe channel, campaign, marketplace, referral partner, or form that generated a lead. Lead source often determines routing, compliance rules, and follow-up cadence., appointment type, priority, service areaService areaThe geography where a business can serve a prospect. Service-area checks prevent routing or booking leads a team cannot actually handle., or any other upstream value.
{
"first_name": "Jordan",
"lead_source": "website_form",
"appointment_type": "consultation",
"priority": "high"
}This is especially important for consumer lead funnels where routing depends on context before the first question is even answered. A home services agent might open differently for an emergency repair than a routine estimate. A mortgage agent may route differently for purchase versus refinance. An enrollment agent may change the next question based on the program or campus already attached to the lead.
When you test with metadata, check three things: the opener sounds natural with the injected values, the agent does not over-assume facts that are missing, and the first branch still leaves room for the caller to correct the record. If your agent uses variableVariableA named value the voice agent stores during a conversation — caller name, intent, qualifying answers — and uses to drive routing and post-call actions. names like lead_source, appointment_type, or priority, make sure the prompt treats them as context rather than absolute truth.
Most broken flows trace back to one of three layers: extraction, routing, or action configuration. Thoughtly’s variable docs matter here because variables extract immediately after the caller’s latest reply and before outcome evaluation. If the extraction instructions are loose, the routing decision can be wrong even when the outcome itself is written correctly.
Start with variables. Make sure the source is right for the question you asked. Current speak node is better for precise answers like callback time or email because it ignores older context. Conversation history is better when the caller may have mentioned the value earlier and you want a fallbackFallbackA safe backup path used when the caller says something unexpected, an integration fails, or the agent cannot confidently complete the intended step..
Then test outcomes. Prompt-based outcomes are useful when caller wording varies, but labels need to be distinct. Rule-based outcomes are better for deterministic checks like validated fields, human-transfer rules, or hard exits. If two prompt outcomes sound similar, rename them and retest with messy phrasing, negative cases, and no-input cases.
Only after logic is stable should you trust the connected actions. Add the action, run the same text scenario again, and confirm that the expected downstream behavior occurs. If a scheduler action, webhookWebhookAn event-based integration that sends data from one system to another when something happens, such as a form submission, booked appointment, or completed call., or CRM lookup is part of the path, check that the result is visible in the response log and that the next branch still makes sense when the result is empty, slow, or different than expected.
This is also a good point to review any knowledge-grounded or lookup-heavy turns. If the agent is leaning on Genius for factual answers, keep the data concise and current. Thoughtly’s docs are pretty blunt about this: use Q&A-shaped source material and test extracted data in the output tab before deployment.
Once Test Agent is clean, switch to Call Me. Thoughtly places a real phone call to you from the agent, which is where voice quality and timing problems finally show themselves. The testing docs recommend running your top 5–10 scenarios here: success path, objection, no-answer path, transfer path, and any must-say compliance language.
During the call, listen for the exact issues Thoughtly flags in its docs:
| What to listen for | Why it matters | Where to tune it |
|---|---|---|
| Voice and style | The selected voice needs to match your brand and pronounce key terms correctly | Settings and Voice Selector |
| Barge-in behavior | Critical lines should not be interrupted while natural turns should stay conversational | Uninterrupted message and Presence settings |
| Endpointing and latency | The agent should not cut callers off or wait so long that the call feels broken | Settings → Presence |
| Transfer behavior | Pre-transfer messaging and handoff timing need to feel intentional | Transfer node plus call review |
| Action timing | Long mid-call actions need a short expectation-setting line | Speak node copy and action design |
If numbers, confirmation codes, or IDs are hard to understand, Thoughtly’s testing checklist specifically calls out Read numbers phonetically. If the agent talks over you, lower sensitivity or shorten utterance end in Settings → Presence. If the agent waits too long, reduce utterance end or silence timeout. These are not glamorous fixes, but they often make the difference between a demo-quality agent and a production-quality one.
This is also where you validate that pre-transfer language, voicemail copy, and post-call notifications land the way your team expects. Text chat cannot tell you whether a transfer feels abrupt or whether a disclosure sounds natural when spoken aloud.
After every live test call, review the response log. Thoughtly’s testing docs call out node step numbers as the fastest way to identify where the conversation actually went during replay. That matters because launch teams often fix the wrong thing. A bad outcome can look like a bad prompt. A weak variable instruction can look like an AI issue. The log tells you where the break started.
A practical way to debug is to categorize each failure before you edit anything:
If you hear odd spoken punctuation or markdown-like artifacts in a live call, Thoughtly’s troubleshooting guide recommends fixing that in the Advanced Prompt rather than inside individual speak nodes. If a branch keeps failing, simplify it. The docs repeatedly push teams toward a simpler skeleton: build the flow, test it, then add complexity incrementally.
Do not turn on a brand-new agent for every lead source at once. Launch it on one segment first: one form, one campaign, one product line, one service area, or one appointment queue. That gives you a cleaner read on what is breaking and whether the agent is actually improving coverage, speed, or handoff quality.
For the first week, review real calls daily. Sample the clean wins, not just the obvious failures. The win calls show whether the agent is reaching the right outcome efficiently or just eventually. If you already use Thoughtly analytics, tie your QA notes back to the same conversion and handoff metrics your team tracks in production.
If you want a stronger reporting layer after launch, use How to Use Thoughtly Analytics to Optimize Agent Performance and How to Integrate Thoughtly with Google Sheets for Reporting as the next step.
A good prelaunch test cycle should improve more than subjective confidence. Measure the parts of the launch that are visible in the flow and meaningful to revenue teams.
Yes. Thoughtly’s recommended workflow is to validate logic and extractions with Test Agent first, add Actions and retest text, tune Settings, and only then use Call Me for final polish. It is faster and cheaper than debugging the same logic over live calls.
Test Agent is best for outcome paths, variable extraction, action outputs, and rapid edge-case repetition. It does not reveal TTSText-to-Speech (TTS)The system that turns the agent's generated text into spoken audio — the voice the caller actually hears. quality, barge-in timing, background-noise behavior, or live transfer feel.
Call Me exposes the real phone experience: voice choice, pronunciation, silence timing, interruptions, transfer behavior, and whether a disclosure or pre-transfer message sounds natural when spoken aloud.
Start with five to ten high-volume scenarios that cover the main revenue and risk paths. That usually includes ideal fit, callback, objection, disqualification, human transfer, and stop-contact requests. Add edge cases once those are stable.
Use it whenever the agent depends on CRM, workflow, or campaign context. If your opener, branching, or personalization changes based on fields attached to the lead, test with those values before launch so you do not discover bad assumptions in production.