Research shows that your brain can only remember 10-20% of the information it receives from verbal sources, so the value of having that information transcribed is indisputable. Helping people to capture and retrieve this information quickly using speech recognition software has been the goal of artificial intelligence giants for several decades.
There is no doubt that the technology of automated transcription is constantly evolving and becoming more sophisticated every year. At this year’s Mobile World Congress an exciting new iOS and Android app called Otter.ai was announced. The Otter app can transcribe audio in real time using a novel approach called ‘Ambient Voice Intelligence’ (‘ambient’ simply means it’s working in the background) and is downloadable for free. The app is 100% fully automated, produces text with a 2-3 second delay and recognises individual voices. One of the best features of the software is the search function, where once the recording is finished, the app’s machine learning automatically creates approximately ten keywords so that you know what the meeting was about. This allows you to start searching the full text immediately. Once you hone in on a keyword, you can hit the play button to listen to the particular section of the audio where it occurred.
Another great feature of the app is its shareable nature. If you have a meeting and a colleague is unable to attend, you can easily send them the transcript and audio, so that they can quickly access what’s relevant to them.
In theory, Otter is a great tool, which, when the technology is mature, may make traditional transcription obsolete, but our question is, “How does the current release handle real-world transcription tasks?”
As with all new voice technology, we gave Otter a road test. We recorded a single speaker using slow, clear, uninterrupted speech containing common words without background noise. Despite optimal conditions the resultant transcript contained misunderstood words and missing punctuation (which could confuse the reader). This meant that the transcript needed to be proofread and corrected, which in itself is often a time-consuming exercise. There is little doubt that had the audio contained background noise, mumbling, soft speech, uncommon words, accents, speech overlap – or if speakers were positioned further than the recommended three feet distance – these factors could have created a much less accurate outcome. Other reviewers have found that speakers are also sometimes misidentified.
The following is an excerpt from our Otter transcript, with major corrections shown in brackets:
Answer [Otter] is a note taking app that empowers you to remember search and share your voice conversations all in one place also [Otter] can be used to capture and manage information from in person meetings, phone calls video conferences interviews sales pitches school actors [lectures] and other important conversations.
As you can see, under ideal conditions Otter made a few significant errors. Real world audio is very rarely recorded in a perfect environment.
While speed and automation are certainly valuable assets in our fast-paced world, human transcriptionists still cannot be beaten for efficiency, accuracy and quality. As the old adage goes, “more haste, less speed”.
Call to Action!
If you have audio you would like transcribed quickly and efficiently with the accuracy, care, and attention to detail that only a human transcriptionist can provide, please contact us today for more information or a quote: