What is word error rate and who is winning?

We love sharing with you more about automated speech transcription.

Word Error Rate Formula

Word error rate often referred to as WER is a way to measure the performance of an automatic speech recognition (ASR) system. It is tricky to measure because the "ASR result" can have a different length than the "Voice input."

Here is a simple way to understand how WER is calculated:

Sonix - Word Error Rate Formula

To help clarify further, here are some definitions:

Deletion by ASR system:

Voice input: I surf small waves
ASR result: I surf waves

Insertion by ASR system:

Voice input: I surf waves
ASR result: I surf small waves

Substitution by ASR system:

Voice input: I surf small waves
ASR result: I surf all waves

Who is winning? 🏆

Speech recognition technology has come a long way since the 1950s. Our earlier post a short history of speech recognition talks about some of the key events along the way. I talked about how we’ve reached (or almost reached depending on who you talk to) an inflection point in automated speech recognition.

The largest technology companies like Google, IBM, and Microsoft are all clamoring for the accuracy title. Below is the chronology of the claims made in 2017:

Mar 2017: IBM claims 5.5% word error rate
May 2017: Google claims 4.9% word error rate
Aug 2017: Microsoft claims 5.1% word error rate

We’ll continue to update this as new claims are made.

Other Sonix Articles 📃

Tips on how to capture great audio

Hear from other Sonix users about how they record high quality audio

Why should you transcribe?

Five reasons you should be transcribing your audio and video files

History of speech recognition

How did we get to where we are today in speech recognition? Sonix explains

How to remove the metallic sound

The metallic, tin-like sound you hear in your audio is an unwelcome annoyance

Six best tips for transcriptionists

Helping transcriptionists work faster and be more accurate

AI, ML, and NLP

Artificial Intelligence, Machine Learning, and Natural Language Processing

Is voice the next major UI?

We think that it will change how we interact with technology

Word error rate

How do you judge accuracy in the realm of speech recognition?

A comparison of automation services

Independently reviewed and Sonix scores the highest among automated services

Start transcribing with Sonix 🚀

Sonix transcribes, timestamps, and organizes your audio and video files so they are easy to search, edit, and share. Start your free trial today—all features included, no credit card required.

Try Sonix for free

Your first 30 minutes of transcription are free