Rich Media OCR

by Ilya Evdokimov | Mar 11, 2010 | OCR

using OCR technology to find and extract text from video frames

I often speak of unique uses of OCR, and here is yet another. OCRing video files! But why? Part of the management of rich media assets is indexing these files. Technologies such as speech recognition and optical character recognition give a greater index and search value to rich media.

By using OCR technology to find and extract text from video frames, the data can be stored as meta-data. In the simplest scenario, this is a text file that accompanies the video file. More complex environments will even tell you the minuet and second the text occurs. Because this is not a traditional use of the technology, some special consideration must take place.

First is converting and separating frames to individual images files. For the OCR to be effective it needs to work on a series of images. Although a video is only a sequence of images that repeat at a high rate of speed, it’s still somewhat of a challenge to convert video files such as MPEG to a series of images. Not only that, dealing with motion blurs that might occur in some frames will also be a problem.

The second challenge is dealing with frames that are repeats. Essentially, because there are so many similar images that are only slightly different from each other, the text on a series of frames might not change. Better OCR results will account for this and not repeat text as the frames would.

And finally dealing with the variations of fonts, and often small sizes. This requires an OCR engine with specific settings for specialized OCR, and one that is very accurate on complex low quality documents.

I expect that in the future, this technique in conjunction with speech recognition will be used in eDiscovery, content management, and robust search of rich media files.

Chris Riley – Industry Expert