Re-OCR, Lessons learned

by Ilya Evdokimov | Feb 19, 2010 | OCR

To my surprise, I still receive requests from companies needing to start over on their OCR processes. Companies that have used the technology, did not plan, and are now finding themselves in a situation where they have to repeat OCR efforts. These companies fall into two categories.

First category is where the companies find they have processed large volumes of paper and the accuracy was not what they expected. This can be discovered in a relatively short time-frame or long after initial integration of the technology. It can be as easy as fixing bad settings for a particular document type to as bad as purchasing and correcting a bad choice in software solutions.

For companies in category one, it’s truly a lesson learned scenario. I will work with these companies to evaluate proper OCR settings and to test future prospect engines. My hope is that the company at least scanned their documents at a high enough quality so that the already converted or scanned images can be used for backlog conversion versus a re-scan if that is even possible.

The second category is companies who discovered they were collecting too little of data from their documents. This usually happens in data capture environments where companies configure to capture 3 key fields only to find later that there were an additional 2 fields required for downstream processes. Depending on the severity of it, it’s often better to do day forward processing with proper settings on new documents and to key in missing fields for incorrect documents. The reason for this is that sometimes the work of getting the additional fields and reconciliation on old documents takes away from day forward production and may not be worth the additional cost it imposes. Or a common practice is to have the backlog documents run from scratch through the new process.

The trend in both categories is due to improper planning by the organization before evaluating technology. It’s important for companies to take the time and plan for capture technology. A part of this planning is forward looking the need for the data. One of the best tricks to exposing the requirements is to involve ALL constituents that create, use, and benefit from extracted data. Plan, Plan, Plan.

Chris Riley – Sr. Solutions Architect