

There are two things to set up with this app. The application at the center of this process is Audio HiJack Pro by Rogue Amoeba ($32 USD). It isn’t intuitive or Apple-easy but it is something that anyone can accomplish with a bit of determination. You can, in fact, route the speech in an audio file through Apple’s speech-to-text subsystem and render very usable text output. There is no obviously easy way to route speech from a recorded file through Apple’s Dictation system to produce usable text. Still, this is a system that assumes a live speaker. However, Dictation was given a significant boost in MacOS X 10.9 (Mavericks) with the introduction of Enhanced Dictation which enables offline use and continuous dictation with live feedback. This created delays and was difficult to use for substantial bodies of text. However, the first iteration of this system required an internet connection so that speech could be uploaded to Apple’s servers where it would be turned into text. This is quite an advance over having to purchase a two hundred dollar application to accomplish the same end. MacOS X recently introduced Dictation (speech-to-text) as a feature usable in any application that takes text as input. Speech to text (STT) is a bit more difficult than text to speech (TTS) which has been in use much longer. Indeed, many important videos are created in ad hoc fashion (interviews, panel discussions, conference presentations and the like) where scripts would be totally inappropriate.Ĭreating text from speech has become essential to meeting these expectations, especially where all one has to work with is the speech in the audio track of a video. The problem is that many videos are created without a script that is followed closely by the speakers in that video.

For video content creators, this means providing a transcript or, better, providing subtitles to that video so that dialogue may be viewed in the same context as the video. One important aspect of that challenge is to make video more accessible to persons who are deaf or have difficulty hearing. The pressure is on to to make screencasts and other online video more accessible. This can be extremely handy for anyone that needs to create captions for a video, but lacks the transcribed text.

Lowney describes how to use the Enhanced Dictation feature in MacOS X 10.9 (Mavericks), combined with Audio Hijack and Soundflower to turn recorded audio into a text file. If you’re interested in captioning your videos, you’ll find this interesting. Frank Lowney from the Digital Innovation Group at Georgia College & State University for this informative guest post.
