Blog
Goldman Sachs says agentic AI usage is set to soar while inference costs keep falling. That makes voice AI less like a one-time labor arbitrage bet and more like an operating asset whose economics improve after deployment.
Goldman Sachs recently published a useful forecast on agentic AI: token usage is expected to soar, inference costs are expected to keep falling, and enterprise adoption will be uneven because real workflows are harder than demos. The tempting read is that agents are going to use a lot of tokens, chips will be scarce, and some use cases are still too expensive — therefore, wait.
I think that is exactly backwards. The more interesting read is this: if agentic AI demand rises sharply while inference costs keep falling, then the companies that put the right workflows into production now will own the operating leverage later.
Not every workflowWorkflowAn automated, multi-step process — usually triggered by an event (form fill, new lead) and orchestrating one or more voice / SMS / email actions.. Not every model. Not every science project with an agent sticker on it. The right workflow: narrow, measurable, repeatable, revenue-linked, and painful enough that the business already knows what failure costs.
That is why voice AI is such an important category to watch. It sits in the uncomfortable middle of the AI economy: real-time, latency-sensitive, and operationally messy. Goldman Sachs is right to flag that some real-time voice workloads can be less economically attractive today than humans because of time dependency and latency characteristics. That is the sober version of the story, but it is not the whole story.
The whole story is that once a voice agentVoice agentAn autonomous, conversational interface that interacts with humans over the phone — answering, qualifying, and routing calls without human staffing. is good enough at a specific job, the business does not need to keep paying frontier-model prices forever. The workflow can stabilize. The model can be held constant. The prompt, routing logic, compliance rules, handoff criteria, and CRMCRMThe system of record for leads, contacts, deals, and activity. Thoughtly reads from and writes to your CRM continuously. write-back can become infrastructure. Then the cost curve keeps moving underneath it. That is the part most people miss.
Goldman Sachs Research expects agentic AI to drive a 24-fold increase in token consumption by 2030, reaching 120 quadrillion tokens per month as consumers and enterprises adopt agents. In the same piece, Jim Schneider says semiconductor providers are lowering the cost per token for inference by 60% to 70% per year, driven by chip efficiency and AI data center architecture improvements.
That is a strange pair of facts to hold at the same time. Demand explodes. Unit costs fall. Capacity gets tight. Margins improve. Enterprises adopt unevenly. Some workloads look expensive today. Some become absurdly cheap tomorrow. If you are buying AI like software, this feels confusing. If you are operating AI like infrastructure, it is the whole game.
The first generation of enterprise AI buyers asked, “Can this replace a human task?” The better question is, “Which task becomes strategically cheaper once it is proven?”
That distinction matters because a deployed agent is not just a model call. It is a workflow asset. The business has already done the slow parts: mapping the process, integrating systems, proving the decision rules, collecting transcripts, defining the handoff, measuring outcomes, and training the organization to trust the automation.
When inference gets cheaper, the company with the workflow already in production benefits immediately. The company that waited still has to start the hard part.
A human rep has a wage. A voice agent has inference, telephony, speech-to-text, text-to-speech, orchestration, observability, QA, integration, and support costs.
So yes, if you compare one minute of human talk time to one minute of real-time AI talk time, you can make voice AI look less attractive in some cases. But that is usually the wrong denominator.
For inbound lead conversion, the cost is not “what does a minute cost?” The cost is “what happens when nobody calls fast enough?”
A mortgage lead submits a form and waits. An insurance shopper asks for a quote. A student requests program information. A homeowner needs a repair. A patient asks about an appointment. In each case, the business already paid to create demand. The leak happens after the hand raise.
That is where voice AI gets interesting. The most valuable agent is often not the one that replaces your best rep. It is the one that works the part of the funnel your best rep never touches: every after-hours form fill, every second-priority inquiry, every aged lead, every missed call, every prospect who needed a response in seconds and got one in hours.
Thoughtly is built around that specific problem: autonomous inbound lead conversion for high-consideration consumer industries. The agent calls, texts, emails, qualifies, books, routes, and writes outcomes back to the CRM. The point is not to make a human disappear. The point is to make sure every qualified hand raise gets worked while intent is still alive.
That is a much better use case than “replace all calls.” It is also a much better cost case.
A lot of AI strategy is still trapped in model-chasing mode.
New model drops. Everyone tests it. It performs better on a benchmark. Someone asks whether the entire business should switch. Costs spike. Latency changes. Edge cases change. QA starts over.
That is not how mature agent operations should work. For production workflows, the goal is not to run the newest model everywhere. The goal is to find the cheapest stable model that performs the job at the required quality level.
Once an agent can do a specific task well — say, qualify an inbound insurance shopper, schedule a healthcare consult, re-engage an aged mortgage lead, or route a hot lead to a human with context — the business has something valuable: a quantitatively proven workflow.
At that point, changing the model is not automatically progress. It is a migration decision. You keep the proven task on the proven model until there is a reason to move. You use the expensive frontier model for the next harder workflow. You graduate workflows down the cost curve as the market catches up.
This is how voice AI becomes an operating system, not a demo. The frontier model is for exploration. The stable model is for production. The cost decline is the dividend.
Prompt caching is a small but useful example of how quickly the economics change once a workflow has repetition.
OpenAI says prompt caching can reduce latency by up to 80% and input token costs by up to 90% for repeated prompt prefixes. Anthropic describes prompt caching as a way to reduce processing time and cost for repetitive tasks or prompts with consistent elements. Google’s Gemini API documentation similarly explains that cached tokens can be lower cost than repeatedly passing the same corpus at certain volumes.
That matters because production voice agents are full of repeated structure. The compliance instructions repeat. The brand voice repeats. The qualification criteria repeat. The routing rules repeat. The CRM field schema repeats. The transfer rules repeat. The call summary format repeats.
A one-off chatbot prompt has limited repetition. A production revenue workflow has a lot of it. This is one reason the “AI is too expensive” take ages badly. The people saying it are often looking at raw frontier inference, not the optimized version of a known workflow.
Inside Thoughtly, call token costs were roughly 100x higher in 2023 than they are today. That is not just because model providers got cheaper. It is because the stack got better: model selection improved, prompts got tighter, caching became useful, routing improved, and the platform learned which parts of the job actually required expensive intelligence.
That is the curve buyers should care about.
Not “what does the most expensive agent cost today?”
“What does this workflow cost after we prove it, constrain it, cache it, measure it, and let the infrastructure market compress the unit cost?”
McKinsey’s report on agentic AI makes a point that should be printed out and taped to every enterprise AI roadmap: broad adoption has not automatically produced broad impact because too much AI is bolted onto existing workflows instead of integrated into core processes.
That is exactly right. Voice AI does not win because it talks. It wins when it owns a job. A generic voice bot is a toy. A workflow-specific agent is a system.
The difference is whether the agent has a defined operating lane:
This is why inbound lead conversion is a better early wedge than vague “AI phone support.”
It has a clear trigger: a lead raises their hand.
It has a clear race: reach them before intent decays.
It has clear outcomes: connected, qualified, booked, transferred, nurtured, disqualified, or no answer.
It has clear data exhaust: transcripts, call summaries, dispositions, source attribution, follow-up activity, and pipeline outcomes.
It has a clear human boundary: the agent handles coverage, qualification, persistence, and context; the human steps in when judgment, persuasion, or closing matters.
That is a real operating model.
If you want the practical version, Thoughtly’s guides on speed-to-lead, form-fill follow-up, and measuring conversion lift all point to the same idea: the agent has to be connected to the revenue system, not floating beside it.
Waiting for costs to fall sounds conservative. Sometimes it is. But in agentic AI, waiting also has a cost: you delay the organizational learning curve.
The first months of a voice AI program are not just about minutes and tokens. They are about discovering what actually happens in the funnel.
Which lead sources answer fastest? Which objections repeat? Which qualification fields are useful and which are theater? Which calls should transfer immediately? Which prospects prefer SMS after the first miss? Which humans take transfers well? Which CRM stages are lying?
This is the work that makes the agent better. It is also the work that makes the business better.
By the time inference costs fall again, the early adopter has more than cheaper calls. They have better scripts, cleaner CRM data, sharper routing rules, stronger QA, and proof of which workflows deserve expansion.
The late adopter gets the same lower model price but starts with none of that. This is why “we’ll wait until AI is cheaper” is not a strategy. It is a way to buy the future at a discount while forfeiting the compounding part.
The bullish case for voice AI does not mean every voice workflow should be automated now.
Some calls are too ambiguous. Some are too emotionally sensitive. Some require judgment the business has not encoded. Some depend on messy systems the agent cannot reliably access. Some are regulated in ways the team has not operationalized. Some simply do not have enough volume to justify the work.
That is fine. The best AI operators do not begin with the most impressive workflow. They begin with the most measurable one.
Start where the job is frequent, the stakes are clear, the success criteria are obvious, and the human team is already failing for reasons of coverage or speed rather than skill.
That usually means:
These are not glamorous use cases. That is the point. They are where the economics work first.
The Goldman Sachs article is useful because it refuses to make the AI economy too simple.
Yes, demand is rising. Yes, token costs are falling. Yes, capacity may be constrained. Yes, enterprise adoption is uneven. Yes, some real-time voice workloads are expensive today. Yes, the infrastructure players may see margin improvement. All of that can be true at once.
For buyers, the takeaway is not “buy AI because it is magic.” The takeaway is sharper:
If a workflow is valuable, repeatable, and measurable today, and if the model required to run it is likely to get cheaper over time, then deploying it now is not just an automation decision. It is a timing decision.
You are installing the workflow before the economics fully mature. That is what makes the next few years interesting.
The companies that win with voice AI will not be the ones that wait for a perfect model at a perfect price. They will be the ones that identify the narrow jobs where agents already perform, wire those jobs into the business, measure the results, and let the cost curve do what cost curves do.
Thoughtly’s bet is simple: inbound demand is too expensive to waste, and humans should not be the bottleneck between a hand raise and a conversation.
Call every lead. Follow up across channels. Route the right ones to humans. Keep the CRM clean. Measure the lift. Improve the agent. Then run the same proven workflow cheaper next quarter than you ran it this quarter.
That is not a chatbot story. That is operating leverage.
About the author
Torrey Leonard is the CEO and Founder of Thoughtly, where he helps consumer businesses turn inbound demand into revenue with AI voice agents. He previously led product at Affiniti Finance.