What is word error rate?

We love sharing with you more about automated speech transcription.

Word error rate formula

Word error rate often referred to as WER is a way to measure the performance of an automatic speech recognition (ASR) system. It is tricky to measure because the "ASR result" can have a different length than the "Voice input."

Here is a simple way to understand how WER is calculated:

Sonix - Word Error Rate Formula

To help clarify further, here are some definitions:

Deletion by ASR system:

Voice input: I surf small waves
ASR result: I surf waves

Insertion by ASR system:

Voice input: I surf waves
ASR result: I surf small waves

Substitution by ASR system:

Voice input: I surf small waves
ASR result: I surf all waves

Who is winning?

Speech recognition technology has come a long way since the 1950s. Our earlier post a short history of speech recognition talks about some of the key events along the way. I talked about how we've reached (or almost reached depending on who you talk to) an inflection point in automated speech recognition.

The largest technology companies like Google, IBM, and Microsoft are all clamoring for the accuracy title. Below is the chronology of the claims made in 2017:

Mar 2017: IBM claims 5.5% word error rate
May 2017: Google claims 4.9% word error rate
Aug 2017: Microsoft claims 5.1% word error rate

We'll continue to update this as new claims are made.

Get started

Try Sonix for free

Sonix transcribes, timestamps, and organizes your audio and video files so you can search, edit, and share your media.

Includes 30 minutes of free transcription

Keep reading

99% accuracy. Every word matters.

AI transcription and translation in 53+ languages.

30 minutes free
No credit card
Cancel anytime