The same machine-learning technology can be used to transcribe interviews. I reviewed three
different options and services, ultimately using one of them throughout a current project.
My aim was to speed up my transcription process, and to remove the frustration that normally
accompanies this step in the research process for me. I recorded nine interviews, which ranged
from 18-48 minutes, and in which there was one participant, the primary interviewer, and myself.
Three voice to text services
Based on some research on popular services, the three services that I looked at were:
I found that approximately 1-2% of the participant’s text from the two Machine transcriptions
needed edits, typically for similar-sounding words such as “IV” vs “IB”, or “wake” vs “wait”. I
also needed to change the speaker names manually. I had higher error rates in the interviewer
text, but this was not of significance to my needs.
The low cost and almost immediate turnaround of the machine transcriptions meant that for just
over $20 in total, I could have very workable transcriptions almost ready to code within minutes
for nine of the half-hour interviews.
I found that making the few edits required was far less frustrating and time-consuming than
manual transcription had been in the past. This is obviously a function of my typing speed and
keyboard dexterity, but may be similar for many other researchers.
3. Recording and importing the interviews
I used Skype for Business to create the meeting requests for the interviews, provide Voice over IP
(VoIP) services and dial-in numbers, and allow me to record the call as an MP4 file with good
audio quality. I experimented with two alternative processes:
In the first, I used the content of an MS Word document from the transcription provider, and in
the second process, I used MAXQDA’s Transcripts with Timestamps function to import a SubRip
(SRT) file with timestamps from the provider.
My transcription environment, captured in Figure 1, shows text from the machine learning service
pasted into MAXQDA’s “Document Browser” window for the selected audio file imported into
the Document System (highlighted in the “Document System” window).
At this point, I am about to edit the name of the second speaker using the Automatic Speaker
Change and Autotext functions: