ElevenLabs’ new speech-to-text model Scribe is here with highest accuracy rate so far (96.7% for English)

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


ElevenLabs, the highly-valued AI voice cloning and generation startup from former Palantir alumni, today launched Scribe v1, a new speech-to-text model that reportedly achieves the highest accuracy across multiple languages. Users can try it here on the ElevenLabs site.

According to the company’s benchmarks, it outperforms Google’s Gemini 2.0 Flash, OpenAI’s Whisper v3, and Deepgram Nova-3 on accurately converting spoken speech into text on the web, achieving new record-low error rates.

The company claims that Scribe delivers state-of-the-art transcription accuracy in 99 languages, including improved performance in previously underserved languages such as Serbian, Cantonese, and Malayalam.

As Flavio Schneider, ElevenLabs Lead Researcher wrote on X, Scribe is the “smartest audio understanding model” released by ElevenLabs yet.

“Scribe doesn’t just transcribe — it understands audio,” Schneider continued in a threaded reply. “It can detect non-verbal events (like laughter, sound effects, music, and background noise) and analyze long audio contexts for accurate diarization, even in the most challenging environments.”

Diarization” is the name given to processes of separating speakers by their vocal qualities on a recording.

In fact, ElevenLabs’ documentation states Scribe can distinguish and isolate up to 32 different speakers in the same audio file.

While ElevenLabs cautions that Scribe is “best used for when high-accuracy transcription is required rather than real-time transcription,” the company also plans to introduce a low-latency version soon, expanding its use for real-time applications.

Lowest word error rates (WER)

Scribe is designed to handle real-world audio challenges with precision. According to benchmark results from FLEURS and Common Voice, it records the lowest word error rates (WER) for many languages, including Italian (98.7%) and English (96.7%).

Key features include:

  • Speaker diarization to differentiate speakers in multi-speaker recordings
  • Word-level timestamps for detailed transcription accuracy
  • Detection of non-speech events, such as laughter and background noises
  • Structured transcript output for seamless integration via API

Pricing and availability

Scribe is available now through the ElevenLabs website and API.

Pricing is set at $0.40 per hour of input audio, with a 50% discount for the next six weeks. A low-latency version for real-time applications is also in development.

What it means for enterprises

For enterprise decision-makers, Scribe presents a tool for scalable, high-accuracy transcription, making it useful for industries relying on automated documentation, meeting transcription, and content accessibility.

The model’s ability to handle diverse languages with high precision also benefits multinational businesses, media companies, and customer support applications.

Scribe’s pricing structure makes it competitive for businesses that require high-volume transcription services, and its API-based integration allows for seamless adoption in enterprise workflows.

Additionally, the upcoming low-latency version could position Scribe as a viable option for real-time communication tools.

Coming the same day as rival Hume’s opposite text-to-speech model Octave

Timing is everything, and ElevenLabs chose to launch Scribe the same day as rival Hume AI unveiled Octave, an LLM-powered text-to-speech model that allows users to customize AI-generated voices with adjustable emotions.

It is designed for content creation, including audiobooks, podcasts, and video game voiceovers. Unlike standard TTS systems, Octave considers context beyond individual sentences, adjusting tone, rhythm, and cadence dynamically to sound more natural.

Hume AI positions Octave as a direct competitor to ElevenLabs’ text-to-speech offerings, highlighting that Octave’s pricing is about half the cost of ElevenLabs’ current AI voice services.

While Scribe and Octave serve different functions, their development reflects the growing competition in AI-driven audio models.

ElevenLabs is prioritizing precise, multi-language speech recognition, while Hume AI is advancing expressive AI-generated speech.

For enterprises, this means more specialized solutions for both transcription and synthetic voice applications, enabling more efficient content production, customer engagement, and accessibility tools.

Scribe is now live, and ElevenLabs is hosting a virtual event next week with the team behind its development. More details, benchmarks, and API documentation are available in the official blog post.

Share the Post:

Related Posts

NYSE-Parent ICE to Explore Stablecoins, Tokenized Funds for Financial Services With Circle

Intercontinental Exchange, the parent company of the New York Stock Exchange, said it plans to explore using Circle’s stablecoin and tokenized asset to develop new products, joining a roster of U.S. traditional financial giants pushing into crypto under the Trump administration.
According to an agreement announced on Thursday, the two firms will look at how Circle’s USDC stablecoin and USYC tokenized money market fund could be integrated into derivatives exchanges, clearinghouses and other services.

STORY CONTINUES BELOW
Don’t miss another story.Subscribe to the Crypto Daybook Americas Newsletter today. See all newsletters

Sign me up

By signing up, you will receive emails about CoinDesk products and you agree to our terms of use and privacy policy.

“We believe Circle’s regulated stablecoins and tokenized digital currencies can play a larger role in capital markets as digital currencies become more trusted by market participants as an acceptable equivalent to the U.S. dollar,” said Lynn Martin, president of the New York Stock Exchange said in a statement. “We are excited to explore the potential use cases for USDC and USYC across ICE’s markets.”

USDC is the second-largest stablecoin, trailing Tether’s USDT. It has a $60 billion market capitalization and is fully backed by U.S. government securities and cash-equivalent assets. USYC is a money market fund token issued by Hashnote, which was acquired by Circle earlier this year.
ICE is the latest example of U.S. financial behemoths delving into applying digital assets, stablecoins and tokenization as regulatory headwinds over the crypto industry subside under the Trump administration.
In the past few days, asset manager Fidelity Investments filed to launch a tokenized money market fund and is reportedly working on issuing a stablecoin, while derivatives exchange CME Group said it’s testing tokenization with Google Cloud’s private distributed ledger, aiming to launch new services next year. Tokenization is the process of placing financial instruments like bonds, funds and other securities on blockchain rails to pursue operational gains.
Martin foreshadowed the firm’s potential push into digital assets last May at a Consensus 2024 panel discussion, saying that the exchange would consider offering crypto trading if the regulatory picture in the U.S. were clearer.

Read More

Ripple Partners With Chipper Cash to Boost Payments in Africa Using XRP

Shaurya Malwa
Shaurya is the Co-Leader of the CoinDesk tokens and data team in Asia with a focus on crypto derivatives, DeFi, market microstructure, and protocol analysis. Shaurya holds over $1,000 in BTC, ETH, SOL, AVAX, SUSHI, CRV, NEAR, YFI, YFII, SHIB, DOGE, USDT, USDC, BNB, MANA, MLN, LINK, XMR, ALGO, VET, CAKE, AAVE, COMP, ROOK, TRX, SNX, RUNE, FTM, ZIL, KSM, ENJ, CKB, JOE, GHST, PERP, BTRFLY, OHM, BANANA, ROME, BURGER, SPIRIT, and ORCA. He provides over $1,000 to liquidity pools on Compound, Curve, SushiSwap, PancakeSwap, BurgerSwap, Orca, AnySwap, SpiritSwap, Rook Protocol, Yearn Finance, Synthetix, Harvest, Redacted Cartel, OlympusDAO, Rome, Trader Joe, and SUN.

Read More