Organizations seeking full-page or Data Capture technology have a serious need to estimate accuracy before they even deploy a technology, as this is a primary variable in determining the range of return on investment they can expect to achieve. When organizations try to understand accuracy by asking the vendor “How accurate are you?” they have gone down a path that may be hard to undue.
Accuracy is tied very closely to your document types and business process. While even asking for an accuracy on a document similar to yours is fair, it should not have much weight. An organization’s business process dramatically impacts OCR accuracy as well. Instead of asking “How accurate are you?” you should be asking “Can I test your software on my documents?”.
A properly established test bed of documents is the ideal way to evaluate the accuracy of a product. You want to know worse case. Build a set of documents that are samples of your production documents, make sure your collection is proportional to the volume you intend to process and the number of variations. Of that 25% of them should be the “pretty” documents, 50% should be your typical documents, and 25% your worse documents. Use this sample set on all products you test. If you are able to compile truth data ( 100% accurate manual results from these documents ) then you are even better off in your analysis.
While I would hope no vendor answers this question directly, the question itself means that you don’t understand yet the problem you are trying to solve. Today the ability to test is essential and the vendor should grant you that right. Taking the time to test will save you much pain and time later.
Chris Riley – Sr. Solutions Architect
Ilya Evdokimov is a long-term practitioner and expert in leading Optical Character Recognition (OCR), Data Capture and Document Processing techniques, technologies and solutions. With over 15 years of experience spanning enterprise software implementations, mobile applications development, cloud-based systems integration and desktop-level automation, Ilya Evdokimov uses through industry knowledge and experience to achieve high efficiency and workflow optimization in most challenging paper-dependent and digital image capture environments.