Voice is the next UI
While speech recognition technology has been around since the 1950s, recent advances in computing power and machine learning have made voice interfaces far more practical. The technology to support voice applications now both relatively inexpensive and powerful.
With these advances, we have seen more voice-driven applications brought to market, including devices that are “voice-first” like Amazon Echo and Google Home. Unit shipments of these devices have grown exponentially and are predicted to pass 30 million in American homes by the end of this year. One could argue that the companies and products oriented to voice-based use cases will be ahead of the pack in the coming years. Just like those that were first to drive a mobile-first or mobile-only strategy were winners of this past decade, those that drive voice platforms could be the market leaders of the 2020s.
Automated voice recognition (ASR) is now at a level that is roughly on par with humans. Just three years ago, word error rate averages among the top providers hovered around 25%. Today, the behemoths in the space, Google, Microsoft, and IBM are all claiming ~5%. Put another way, in a transcript of 100 words, only five might be incorrect. At this level of accuracy a whole world of novel applications and use cases emerges.
Against the backdrop of advancements in speech technology, the smartphone has reached maturity. The smartphone form factor hasn’t changed much over the last few years and the functionality across all the major manufacturers is similar. Benedict Evans of venture capital firm Andreesen Horowitz, believes smartphones are at the top of their product life cycle (see chart below). If this is true, then the question is: What is going to drive the next growth curve in tech? Is it Augmented Reality, Virtual Reality, or is it Voice?
The beauty of voice-based systems is that they leverage an interface that almost all humans can access: language. Even small advances in the technology can theoretically have massive, far-reaching global impact.
Fast, accurate automated transcription
Sonix automatically transcribes and translates your audio/video files in 38+ languages. Easily search, edit, and share your media files. Sonix is the best automated transcription software in 2023. Fast, accurate, and affordable. Millions of users from all over the world.
Includes 30 minutes of free transcription
Other Sonix articles
Tips on how to capture great audio
Hear from other Sonix users about how they record high quality audio
History of speech recognition
How did we get to where we are today in speech recognition? Sonix explains
How to remove the metallic sound
The metallic, tin-like sound you hear in your audio is an unwelcome annoyance
Remove background audio noise
Background noise is annoying and lowers the accuracy of your transcript
Remove background noise in videos
Background noise is distracting in videos and doesn't transcribe well
Room tone: what is it?
Room tone is the naturally occurring noise in the environment during your recording
Want accurate video transcription?
Sonix is the best online video transcription service. It's fast, accurate, and affordable.
Quickly convert audio to text
We accept over 100 different audio formats. Transcribe with Sonix today.
Interview transcription with Sonix
Made for folks who conduct tons of interviews (incl journalists and researchers)
The seven best audio converters
Here are the 7 best free services to convert one audio file format to another
How to add subtitles in AVID
The best way to add subtitles and captions to your AVID Media Composer videos
How to mic a two-person interview
Make your transcription more accurate by recording it the right way
Free Automated Speech to Text
Sonix: the most accurate automated transcripts for your audio and video
Automatic Video To Text Converter
Sonix gives you the best accuracy when automatically transcribing video to text
A comparison of automation services
Independently reviewed and Sonix scores the highest among automated services