Measuring Document Automation Efficiency

by Ilya Evdokimov | Mar 16, 2010 | Accuracy

Measuring OCR Document Automation Efficiency

The two most common question when organizations ask when they are seeking document automation technology is “how fast is it?” and “how accurate is it?”. Many don’t realize that the two are at opposition to each other most of the time. The more accurate a system, the slower it is, and the faster it is, the less accurate. But there is one fatal mistake in all these calculations, and that mistake is how efficiency is calculated.

Most companies who trial data capture, calculate performance on the slowest step which is optical character recognition (OCR). Literally, companies will hit the “read” button and immediately start timing until the read is complete. This is what is considered the speed of the document automation system. This is incorrect.

There is no question that OCR can be a tremendous bottleneck in the entire entry process, but poor OCR could create an even greater bottleneck. Imagine an OCR engine that reads a document with 100 characters in 1 second as compared to an engine that reads the same 100 characters in 3 seconds. Your initial thought is that the first engine would be better, but consider that the first engine may be 60% accurate leaving 40 characters to be manually entered, and the other engine 98% accurate leaving 2 characters to be manually entered or correct. If you consider an average entry speed of 1.6 characters per second then it will take the 40 characters an additional 25 seconds to enter for a total entry time of 26 seconds for the faster engine. For the slower engine it will take an additional 1.25 seconds to enter or edit 2 wrong characters thus a total entry time of 4.25 seconds. This means that end-to-end, the slower engine is 6 times faster in the document automation process then the slower engine.

This simple calculation illustrates the folly in assuming that the slower OCR time makes for a slower overall process. Usually focusing on accuracy has the greatest benefit for an organization unless you are improving the speed of a slower engine with hardware, or two engines are too close to see a benefit.

Chris Riley – Industry Expert