On the fly OCR – Click-Entry and Rubber-band OCR

by Ilya Evdokimov | Feb 22, 2010 | data capture

If you were to put the degrees of automation on a scale, you would first have no automation, semi-automation, and the varying degrees of full automation which is dependent on system accuracy. No automation is of course manual entry of documents into an organization. Full-automation is an attempt to collect all data automatically from the document and only using manual labor when required for exceptions and quality assurance steps. The degree of automation here is dependent on accuracy and the lower the accuracy, the more exceptions there will be and less documents in quality assurance.

Semi-automated data capture and OCR has not been thought about much. The primary reason for this is because when document automation technology was introduced, people wanted to go full force. It was a combination of poor market education and grand dreams. Semi-automated is an intermediary step where the operator will see every image, but their time spent per image is far less than manual entry. It allows organizations to start using the technology with less risk, more control, and lower cost. The challenge with the adoption of semi-automated data capture is that it’s hard to change from or upgrade. Some packages out there allow you a seamless integration into full-automation, but you are stuck with a solution. Now that you know what it is, how does it work?

Semi-automated data capture is very basic. When an image is scanned, it is displayed for the operator to see in as much real-estate as possible. If it’s a click-entry solution, then a full-page OCR read has already happened and if it’s a rubber-band solution, then it’s just the image. In both scenarios, an operator on some other portion of the screen has a field list, in which they field by locating information on the page. Since the OCR is already done, using click-entry, they highlight the word or words on the document they want to populate in the field and they click. When they click, the text is transferred to the next unpopulated field. In rubber-band OCR, all the fields are rubber-banded in advance and a “read” button is clicked after the rubber-banding is done and then all the text is populated into each field.

Semi-automated data capture is becoming more popular for organizations that are budget prohibited or scared from adopting full automation and surprisingly, companies that have adopted full automation, did not do it well. I very much believe in full document automation, but semi-automated data capture has a necessary place in the spectrum of document automation.

Chris Riley – Sr. Solutions Architect