Voice recognition technology has become increasingly ubiquitous, forming part of everyday life for most people. It’s commonly used in call centres to handle large volumes of calls and improve profitability, is popular amongst drivers who prefer to use a safer hands-free method to control their console, and Apple’s virtual assistant Siri has become a familiar part of the iOS landscape. Other uses for this technology include voice-activated cockpit control, and of course digital dictation and transcription across a wide range of industries. People with visual, motor or learning difficulties such as dyslexia who find writing harder than speaking are also finding voice recognition software incredibly helpful.
Over the last five years, the major players in this field have been working hard to develop and customise their software, producing specialised dictation programs with industry-specific terminology for professionals such as doctors and lawyers. They have also begun to leverage mobile and cloud solutions, making dictation more convenient from locations outside the office. However, the primary candidate for their products still tends to be a single speaker, working in a relatively quiet environment. This results in limitations that hamper the use of voice recognition technology across a broader range of audio types and environments. Following are a few of the key limitations we’ve uncovered in the use of this type of software:
1. Inability to handle multi-speaker recording
2. Problems filtering background noise
3. Misinterpretation of mumbled, slurred or heavily accented speech
4. Time and productivity costs associated with training the software, proofreading and editing output
5. Increased temptation of proofreading your own work, leading to potentially critical errors
Almost five years ago we posed the topical question about whether traditional transcription services could one day be replaced by voice recognition technology. Our conclusion was that although this outcome was inevitable, it wouldn’t happen for quite some time. While voice recognition technology has come a long way since 2011, there is no doubt that human input in proofreading and editing is still an essential part of any transcription process, and that traditional transcription services remain the gold standard approach across the entire process.