xAI Launches Grok Voice Think Fast 1.0 via API

xAI has launched grok-voice-think-fast-1.0, its new flagship voice agent model, available immediately via API. The model tops a leading voice-agent benchmark and is already powering Starlink’s phone sales and support line — with a pricing model that makes it unusually accessible to student developers.

Peter Corrigan

xAI on Tuesday announced grok-voice-think-fast-1.0, a new flagship voice agent model designed for complex, multi-step phone interactions. The model is available immediately through the xAI API and can be tested in the company’s voice playground — no waitlist required.

The announcement is notable not just for the technology itself but for the real-world proof point xAI is leading with: the model is already live, powering Starlink’s customer support and phone sales line at +1 (888) GO STARLINK. According to xAI, that deployment spans 28 tools across hundreds of sales and support workflows. The company reports a 20% sales conversion rate from inbound inquiries and says the model resolves 70% of customer support calls autonomously, without any human handoff. Those figures are self-reported, but they represent an unusually concrete deployment for a model announced on launch day.

What the Model Actually Does

Unlike text-to-speech services that simply read words aloud, grok-voice-think-fast-1.0 is built to act as a full voice agent — meaning it can reason, call external tools, capture structured data, and manage multi-turn conversations, all while maintaining low response latency. xAI says it performs background reasoning in real time without slowing down the conversational response, a tricky balance that most voice systems struggle with.

The model’s structured data-capture capability is worth flagging specifically. Voice AI has historically been unreliable when it comes to collecting precise information — email addresses, account numbers, street addresses — especially when spoken quickly or with an accent. xAI says grok-voice-think-fast-1.0 handles those corrections naturally, the way a human agent would, accepting mid-sentence revisions and re-confirming normalized data before proceeding.

The model natively supports more than 25 languages and has been tested against telephony audio conditions: background noise, heavy accents and frequent interruptions. It takes the top position on the τ-voice Bench leaderboard, a benchmark that evaluates full-duplex voice agents under realistic conversational conditions rather than clean-studio scenarios.

“This new model excels at complex, ambiguous, multi-step workflows across customer support, sales and enterprise applications. It is especially well-suited for high-stakes scenarios that demand precise data entry and high-volume tool calling to address the user’s request.” — xAI

A Crowded Market With a Clear Pricing Difference

The voice API space is already competitive. OpenAI’s Realtime API — the incumbent for voice agent orchestration — went generally available in August 2025 and recently added remote MCP server support, image inputs and SIP-based phone calling. But OpenAI prices its realtime model at roughly $32 per million audio input tokens and $64 per million audio output tokens, a structure that is technically flexible but hard to estimate in practice, especially for developers building variable-length conversations.

xAI’s pricing is simpler: a flat $0.05 per minute of connection time. That rate is the same as Vapi’s base orchestration fee, though Vapi adds provider costs on top and operates as a provider-agnostic layer across 14+ models rather than a single integrated system. ElevenLabs leads on voice expressiveness — sub-100ms latency, more than 11,000 voice options, 70+ languages — but its focus is text-to-speech quality rather than agentic task completion. xAI’s pitch is that it handles reasoning, tool-calling, data capture and natural conversation inside one vertically integrated model, battle-tested at production scale.

xAI also says grok-voice-think-fast-1.0 is compatible with the OpenAI Realtime API specification, which means developers already building on OpenAI’s voice stack can migrate with relatively little friction.

Why This Matters for Students and Indie Developers

For students building portfolio projects, the combination of flat-rate pricing and a production-ready API removes two of the biggest barriers to voice AI experimentation: unpredictable costs and unreliable performance in real conditions. At $0.05 per minute, a developer can run thousands of test calls before spending what a single month of some enterprise voice platforms costs.

The use cases xAI highlights — appointment booking, restaurant reservations, customer support bots, phone sales — are exactly the kind of applied, business-facing demos that stand out in job applications and hackathons. A voice agent that can reliably collect a user’s address or account number over a noisy phone line, then confirm it back before triggering an API call, is genuinely useful in ways that a chatbot is not. That opens up project ideas in accessibility, global commerce, healthcare intake forms, and campus services that were previously too error-prone to build with voice.

Students already familiar with the OpenAI Realtime API spec have the lowest barrier to entry — the compatibility claim means existing code may transfer largely intact. For anyone starting fresh, xAI has published API documentation and an open voice playground.

The Bottom Line

xAI has entered the voice agent API market with a model that tops a key benchmark, runs at production scale for a major enterprise client, and costs a fraction of what comparable systems charge per conversation. The Starlink deployment numbers are self-reported and should be evaluated accordingly, but the fact that a real, high-volume deployment exists at launch is a meaningful differentiator. For students and developers looking to build voice-first products without a large infrastructure budget, grok-voice-think-fast-1.0 is worth a close look.

Source: xAI

Additional research sources