Guides
Call volume tells you the agent is working. Conversion lift tells you it is working on something that matters. This guide walks through holdout test design, metric selection, and the reporting framework for proving incremental pipeline impact from AI follow-up across voice, SMS, and email.
Last updated
You deployed an AI follow-up agent. Calls are going out, texts are landing, and your CRMCRMThe system of record for leads, contacts, deals, and activity. Thoughtly reads from and writes to your CRM continuously. is filling up with notes. The C-suite question lands in your inbox on schedule: "Is this thing actually working?"
Answering that question with confidence requires more than a vibes check on call volume. You need a measurement framework that isolates the incremental impact of AI-powered follow-up on your pipeline — separating the leads who converted because of the agent from the ones who would have converted anyway.
This guide walks through the complete process: choosing the right baseline, structuring a holdout test, identifying the metrics that matter, and building the reporting layer inside Thoughtly. By the end, you will have a repeatable playbook for proving (or disproving) conversion lift from AI follow-up across voice, SMS, and email.
Call volume is an activity metric. It tells you the agent is working. Conversion lift is a business metric. It tells you the agent is working on something that matters.
Most teams measure AI follow-up the same way they measure human SDRs: dials, connects, conversations. Those numbers are necessary but insufficient. What you actually need to know is:
That delta is your conversion lift. It is the only number that translates directly into revenue attribution and ROI justification.
Before you can measure lift, you need to know what your funnel looked like before the agent showed up. Pull at least 90 days of historical data from your CRM covering the same lead source and follow-up workflowWorkflowAn automated, multi-step process — usually triggered by an event (form fill, new lead) and orchestrating one or more voice / SMS / email actions. the agent now handles.
| Metric | Definition | Where to Find It |
|---|---|---|
| Lead-to-contact rate | Percentage of inbound leads who received a follow-up attempt within your SLA | CRM activity reports |
| Contact-to-conversation rate | Percentage of follow-up attempts that resulted in a live conversation | CRM call logs or dialer reports |
| Conversation-to-conversion rate | Percentage of conversations that ended in a booked appointment, quote request, or qualified handoff | Pipeline/deal stage reports |
| Speed-to-lead | Median time from form submission or lead creation to first follow-up touch | CRM timestamp delta |
| Overall lead-to-conversion rate | End-to-end percentage of inbound leads who convert to a qualified opportunity | Pipeline reports |
| Cost per conversion | Total follow-up cost (labor + tools) divided by conversions | Finance or ops reporting |
Document these numbers clearly. They become the denominator in every lift calculation you run later.
The most reliable way to measure conversion lift is a holdout test: split your incoming leads into two groups and compare outcomes.
Thoughtly's Automations support conditional routing that makes holdout testing straightforward:
test_group: "A" or test_group: "B" in the payload.Aim for at least 250 leads per group over a two-to-four-week period. For high-volume funnels (insurance, mortgage, home services), you may reach statistical significance in under a week. For lower-volume verticals, extend the test window rather than shrinking the sample.
If pulling 50% of leads out of AI follow-up feels too risky, use a 90/10 or 80/20 split instead. You will need a longer test period, but you protect most of your pipeline.
Not every number in your dashboard matters for a lift calculation. Focus on these five:
| Metric | Why It Matters | How Thoughtly Captures It |
|---|---|---|
| Contact rate | Measures whether the agent is actually reaching people | History page — filter by status (Completed, No Answer, Busy, Left Voicemail) |
| Conversation rate | Measures quality of contact — did a real conversation happen? | On Call Completed trigger output — check call duration and transcript length |
| Appointment/conversion rate | The primary outcome metric — did the lead take the next step? | Variables captured during the call (e.g., appointment_booked = true) written as Attributes |
| Speed-to-lead | Time from lead creation to first AI touch | Compare CRM lead creation timestamp to Thoughtly call start timestamp in History |
| Revenue per lead (if trackable) | The ultimate downstream metric | CRM deal data joined to Thoughtly contact attributes |
For each metric, calculate the value for Group A and Group B separately. The difference is your raw lift.
Thoughtly provides the raw interaction data. Your job is to structure it for lift analysis.
Thoughtly's Disposition feature (available in Agent Settings and via the Add Disposition automation step) automatically tags calls with labels like "Qualified lead," "No answer," "Left voicemail," or "Request callback" based on transcriptTranscriptThe text record of a voice conversation, used for review, training, compliance audit, and search. content and call outcome. These disposition labels become the primary filter for segmenting successful vs. unsuccessful follow-up attempts.
After each call, use an On Call Completed automation to write key variables as contact Attributes. Attributes persist across calls, so they form the longitudinal record you need for conversion tracking.
last_ai_follow_up_at — ISO timestamp of the most recent AI follow-upai_follow_up_outcome — the disposition or key outcome from the last call (e.g., "booked", "callback_requested", "not_interested")ai_follow_up_count — running total of AI follow-up touchesUse Thoughtly's History Export to pull filtered call data as a CSV. Join it with your CRM's deal or pipeline export on phone number or contact ID. This merged dataset becomes your lift analysis table.
If you use Zapier, Make, or a direct webhook, you can also push Thoughtly call outcomes directly into a Google Sheet or BI tool in real time using a Send Webhook step in your On Call Completed automation.
With data from both groups in hand, the lift formula is straightforward:
Conversion lift = (Group A conversion rate − Group B conversion rate) ÷ Group B conversion rate × 100Example: if Group A (AI follow-up) converted at 18% and Group B (manual follow-up) converted at 11%, your lift is:
(18% − 11%) ÷ 11% × 100 = 63.6% conversion liftBefore presenting the number to leadership, verify:
Conversion lift is compelling. Revenue impact closes the deal with finance.
| Input | Value |
|---|---|
| Monthly inbound leads | 2,000 |
| Pre-AI conversion rate (baseline) | 11% |
| Post-AI conversion rate (measured) | 18% |
| Incremental conversions per month | 140 (2,000 × 7%) |
| Average revenue per conversion | $1,200 |
| Monthly incremental revenue | $168,000 |
| Monthly Thoughtly cost | Per-minute pricing varies — contact Thoughtly for a quote |
| Net ROI | Monthly incremental revenue minus Thoughtly cost |
Adjust the numbers for your vertical. Insurance and mortgage leads typically carry higher per-conversion value. Home services and education enrollment may convert at higher rates but with lower per-deal revenue. The framework stays the same.
Once you have your initial lift measurement, here is what to track on an ongoing basis to ensure the agent continues delivering value:
| Metric | Target | Review Cadence |
|---|---|---|
| Conversion lift vs. baseline | Positive and stable over 90-day rolling window | Monthly |
| Contact rate (AI agent) | Above 45% for voice, above 85% for SMS delivery | Weekly |
| Speed-to-lead (AI agent) | Under 60 seconds for form-fill triggers | Weekly |
| Cost per incremental conversion | Lower than human-only cost per conversion | Monthly |
| Agent conversation quality score | Dispositions match expected outcome distribution | Bi-weekly |
| Attribution accuracy | Less than 5% unattributed conversions in the AI cohort | Monthly |
Review these in a standing weekly or bi-weekly ops review. Thoughtly's Analytics dashboard gives you the activity-level view; your CRM reporting gives you the pipeline-level view. The combination is where lift measurement lives.
Aim for at least 250 leads per group (treatment and control) over a two-to-four-week window. For high-volume verticals like insurance or mortgage where you process thousands of leads per month, you may see statistically meaningful results within a week. For lower-volume funnels, extend the window rather than lowering the sample size.
You can compare your pre-AI baseline metrics (from CRM historical data) to your post-AI metrics. This is called a before/after comparison. It is simpler to set up but less rigorous because it does not control for seasonal trends, changes in lead quality, or other variables that may have shifted between the two periods. A holdout test is always more defensible.
This is a multi-touch attribution challenge. Use Thoughtly's contact Attributes to tag every AI interaction, and ensure your CRM logs every human touch. When a lead converts, check the full timeline. Many teams credit the AI agent for the first-touch (it re-engaged the lead or booked the callback) and credit the human for the close. The right model depends on your sales motion, but the data needs to be captured either way.
Thoughtly's Automations include a Random Outcome step that can split traffic for experiments — for example, routing 80% of leads through one agent flow and 20% through another. You can use this for A/B testing different scripts, channels, or follow-up cadences. For holdout testing specifically, use the Conditions (If/Else) step with a test_group field passed in via webhook.
Run a formal holdout test quarterly, especially if you change agent scripts, add new channels, or shift lead sources. Between formal tests, monitor the ongoing metrics (contact rateContact rateThe percentage of inbound leads your team actually reaches by phone. Most B2C teams hover around 25%; Thoughtly typically delivers 90%+., conversion rate, speed-to-lead) weekly using Thoughtly Analytics and your CRM dashboards. If those leading indicators shift significantly, trigger an off-cycle test.