Multi-Pass document recognition

by Ilya Evdokimov | Feb 15, 2010 | Uncategorized

When accuracy is the primary concern in document recognition, the best technique is multiple passes of the OCR or recognition process. Similar to how you would have a document manually entered two to three times, why not have an OCR engine convert it 3, 4, or even 5 times all with different settings?

The important thing to note in multiple pass recognition is that you NEVER use a different engine for the same process. Reconciling results from two separate engines is self-defeating. This is often called voting and does not work because of the fact that each engine represents the confidence of characters differently, so you might end up always picking one engine that is less accurate just because it told you it was more confident than the more accurate engine. But using the same engine multiple times with different settings is consistent and a good idea.

An example of a scenario where this is being used successfully is with documents that have both machine and hand-printed text. A first read can be done with an OCR engine with settings A and a second read can be done with the same OCR engine but with settings B. In the areas where both produce just garbage text might indicate that in that area is hand print. Now you can use ICR ( hand-print engine ) in that region to pickup additional information. That is 3 total passes of recognition. The results are combined to make the final document.

At minimum, 3 runs of the same engine would be ideal as the statistical chance of two different settings producing the same error reduce drastically and the final output is nearly as good as it’s going to be. Some document types lend themselves to multiple pass recognition over others. Sometimes its determined by the environment, for example, environments that have a lot of traditional documents mixed with invoice looking documents would benefit from having a full-page read with standard settings on every page and a full-page read with special document analysis designed for documents with lines and tables.

While multiple pass OCR slows down the entire process, it’s still faster and more accurate than manual entry most of the time. I recommend this approach for any organization where accuracy is the primary concern.

Chris Riley – Sr. Solutions Architect