Transcribing a multi-speaker audio recording can be very similar to a bustling networking event. Imagine you’re at the sort of event you frequented pre COVID. This event is happening at a popular bar, with one section cordoned off for networkers.
The networkers are a mix of people you know and have done business with, and those you are yet to meet.
You make a beeline for Ms A, an IT business development manager you’ve been meaning to call because you want to know more about a new product she mentioned last time you two spoke. Ms A is speaking with Mr B, someone you’ve seen at these meetings a few times but don’t know well, and Ms C, who you don’t know but has been asking some very interesting questions about Ms A’s product, including some you were going to ask yourself.
This trio is holding an animated conversation, with Ms A doing most of the talking. However, Mr B and Ms C are interjecting frequently to ask questions and gain further clarification on certain points. You follow most of what they are saying, but as you came in mid conversation and there are many other conversations happening around the room, it is hard to hear every word.
This scenario highlights several issues, including background noise and unfamiliar voices and topics, but we’d like to address two important factors in multi-speaker transcription: over-talking and speaker identification.
When speakers interrupt or speak over the top of one another, or if one speaker is louder than the rest, the transcription exercise becomes more difficult. This issue can affect the clarity of the audio and make it hard to distinguish which speaker is talking and when, especially if each is competing to be heard. Speech tends be faster and its intensity can rise if someone is battling to be heard over others, so it becomes difficult to decipher specific words.
Identifying which speaker is talking can also be problematic in multi-speaker transcription, especially when the transcriptionist is unfamiliar which the speaker’s voice patterns, or two speakers have similar voice tones and inflexions.
One speaker talking is much easier to transcribe than two people engaged in a discussion.
This problem worsens as more speakers are introduced, or with increased background noise.
We specialise in multi-speaker transcription, with our team trained to deal with the issues outlined above. Each transcriptionist listens to the audio recording to familiarise themselves with the content and different voices.
This process can be repeated multiple times, as needed. If specific words cannot be made out after multiple attempts, a time stamp will be recorded in the transcript so the client can refer to the audio directly and insert the correct word, based on their knowledge of the topic.
Would an automatic voice recognition product provide this level of service?
Unable to display Facebook posts.
Show error