After viewing the power of data capture technology, I’ve yet to see an organization un-impressed, until the conversation explores quality assurance steps. Though the technology is extremely powerful, there will always be some level of quality checking to get a 100% accurate results. Think of it this way;if you were to spill coffee on a perfectly printed document, scan it soon after ( rollers making a nice smudge ) you likely would be unable to read the text yourself, so how can the software? In this scenario, QA would be required for the smudged fields. This seems obvious but illustrates the fact. I have good news however, if you provide the right tools you can use a computer to do the first pass of verification.
It’s just like a human verifying a document but much faster and less expensive. Organizations that deploy these methods can eliminate a large percentage of verification, but the caveat is they must first know their documents. After data capture has happened, if you combine first data types with a dictionary or database lookup, you have created an electronic verifier.
A data type tells the software what structure a field should be in. A data type can be used to confirm a field result OR can modify uncertain results based on the knowledge contained within. For example take a date field. After data capture, the field is recognized as 1O/13/8I. We see there are two errors an “O” instead of a “0” and a “I” instead of a “1”. If you were to deploy a date data type that says simply you will always have numbers 1-12 followed by a “/” followed by numbers 1-31 followed by a “/” followed by two numbers. Then the date would automatically be converted to 10/13/81 which is correct. Some data types are universal such as date and time, others are specific to a document type and the organization if they know ALL of them stands to benefit greatly.
Dictionaries and database lookup functions are essentially the same with a slight variation. The purpose of these two approaches is to validate what was extracted via data capture against pre-existing acceptable results. The simplest example to consider is existing customer names. If you are processing a form distributed to existing customers that contains first name and last name because you already know they exist, you should be able to look in a database for the customer and confirm the results. If no match is found then likely there is a problem with the form. Dictionaries can provide the same value but are more static and often used for fields such as product type, or rate type that have one set of possibilities that rarely change. The point is that organizations should look at the database or dictionary assets they already have to augment the data capture process and make it more accurate.
There will always be quality assurance steps with any technology that involves interpretation of data. Organizations wanting to deny these steps either do not understand the technology, do not understand their own processes, or were mislead by a vendor. Quality assurance is the place where much effort should be spent to streamline, and one of the ways to do that is by leveraging data types, dictionaries, and databases that already exist.
Ilya Evdokimov is a long-term practitioner and expert in leading Optical Character Recognition (OCR), Data Capture and Document Processing techniques, technologies and solutions. With over 15 years of experience spanning enterprise software implementations, mobile applications development, cloud-based systems integration and desktop-level automation, Ilya Evdokimov uses through industry knowledge and experience to achieve high efficiency and workflow optimization in most challenging paper-dependent and digital image capture environments.