One of the most popular questions to ask when organizations purchase data capture or OCR software, “what accuracy can you guarantee?”. If you have ever asked this question of a vendor you got one of two responses: the first was a percentage of accuracy, the second is a long explanation on why they can’t guarantee anything. If the vendor gave you a percentage you should probably run, because it’s the start of a bad relationship.
Why? It’ not really possible for a vendor to tell you how accurate your recognition will be on your documents. Vendors can estimate accuracy based on samples, they can give you an idea of range, but because of the nature of the technology there is no way to guarantee anything. The first fact of OCR is that you can ALWAYS find a document that breaks the norms of recognition and accuracy. Because of this possibility, it’s hard to know how exception documents will effect the accuracy of the entire system. So lets talk about what is reasonable.
It is reasonable to provide a sample set of documents and expect an average accuracy level as a percentage on the samples. Because they are a discrete subset of documents, this is something that can actually be measured. It is the job of the organization to pick samples that most closely represent production. It would be wise to include bad, average, and good documents in the sample set so as to cover the entire range of possibilities.
What organizations often forget is that even if 50% of the documents are automated there is a cost savings as compared to manual entry. The industry standard for accuracy is 85% however this changes heavily based on document type and the organizations perception of accuracy. The ideal way to measure accuracy is to compare recognition results to truth data. If truth data is not available the next best thing is to count not accuracy but level of uncertainty on the document. If a document is 5% uncertain according to the OCR engine, then it is 95% certain and this should be your measure.
Next time a vendor is faced with the question of “how accurate are you?” or “what accuracy do you guarantee” I hope they issue the proper response of “how accurate will your process allow us to be?”. It’s a fair question to ask when you are not familiar with the technology, but hopefully the above gives you the proper approach to measuring a solution.
Chris Riley – Sr. Solutions Architect
Ilya Evdokimov is a long-term practitioner and expert in leading Optical Character Recognition (OCR), Data Capture and Document Processing techniques, technologies and solutions. With over 15 years of experience spanning enterprise software implementations, mobile applications development, cloud-based systems integration and desktop-level automation, Ilya Evdokimov uses through industry knowledge and experience to achieve high efficiency and workflow optimization in most challenging paper-dependent and digital image capture environments.