Often times we are blinded by technology and forget the pain we originally adopted technology to solve. When I first learned accounting, more tenured accountants would explain to me how they made journal entries on paper not Quick Books. Then I learned math I was freely solving complex equations on my graphic calculator as my professor explained how long these equations would take without it. OCR is no different. OCR is replacing manual data-entry that is not very accurate. If an OCR system is 85% or more accurate on a particular document type, then it most likely is more accurate than a single entry by human on that same document type, and faster!
So we know there is a clear benefit to the technology; increased speed, increased accuracy, it’s when companies want to be 100% accurate they start to groan. Before OCR and even today to reach 100% accuracy with data entry they perform double or triple blind data entry. Double or triple the labor cost. What that means is that two separate people will enter the data of the same document and the results will be compared, make this three people and you will almost always be 100% accurate. You can do the same with OCR! Most large service bureaus in fact prefer that OCR technology make the first pass then they do one pass with manual entry making it double-blind. I’m going to suggest one step further.
Why not have OCR with settings geared towards numbers, and OCR with settings geared towards words ( our two separate data entry people ) both enter the same document and compare the results. Why not three sets of settings, maybe four? If you were to take the same OCR engine with different settings and compare their extraction results from each instance you are creating automated double blind data entry! You can replicate the trusted process for producing high accuracy with greater efficiency and lower cost.
I am a constant advocate of human intervention on low confident fields or characters, but in the above approach you are using more technology to replicate existing very accurate processes. Never forget the original problem and you will see very quickly that OCR is a benefit.
Chris Riley – Sr. Solutions Architect
Ilya Evdokimov is a long-term practitioner and expert in leading Optical Character Recognition (OCR), Data Capture and Document Processing techniques, technologies and solutions. With over 15 years of experience spanning enterprise software implementations, mobile applications development, cloud-based systems integration and desktop-level automation, Ilya Evdokimov uses through industry knowledge and experience to achieve high efficiency and workflow optimization in most challenging paper-dependent and digital image capture environments.