Back in March, YouTube gave users the ability to run an automated closed captioning feature which uses speech-to-text technology to convert a video’s audio track into live subtitles. This feature was in development for more than two years, and was in private beta testing since November 2009.
The Auto-captioning feature combines some of the speech-to-text algorithms found in Google’s Voice Search, and automatically generate video captions when requested by a viewer. The video owner can also download the auto-generated captions, correct the mistakes, and then upload the corrected version. Viewers can even choose an option to translate those captions into any one of 50 different languages.
It is an amazing step in the right direction for accessibility, but in terms of execution, the beta has a long way to go before it’s useful and accurate.
When the video owner hasn’t taken the time to correct the automated subtitles, the feature is more or less useless due to its highly inaccurate translations.
In my time playing with the feature, I noticed that proper nouns present some of the biggest problems to the algorithm. So tonight, just on a whim, I thought I’d test the feature’s accuracy for proper nouns. Since I work at a technology-focused publication, I find that the majority of the words that I use in articles are the names of companies and products, so recognition of these is integral.
I rifled off 41 random names, and only 14 were transcribed correctly, while 27 were transcribed incorrectly. Have a look:
It can do a lot of good for hearing impaired YouTube users, but only right now if video owners make a concerted effort to proofread and update their subtitles. If you’re a frequent uploader of content, this feature will prove very useful with conscientious tweaking.