There are several aspects of how we talk about documents, scanning, OCR and Data Capture that are the culprits of confusion, misunderstanding, and unfortunately deferred adoption. One of these common language misconceptions is when discussing documents and their structure. All data capture and OCR implementations involve the concept of a document even if the document is a single page document that is repeatedly scanned. But the definition of what a document is often gets blurred between vends, end-users, resellers, and even internally in all of these.
Some times people think a document is just one page in a collection of pages, others believe a document is a record in a database that consist of several page types but are combined together in a single record. In this last thought it does not include when the scanning happens so one page can come in at a different time than another, but not until they are all there do you have a document. And others think a document is multiple pages scanned together with a page type that determines the beginning and the end.
Where the confusion comes in is that they are all correct, but are influenced by different things. Documents to an organization can be defined by a business process, or a scanning process. To add to the confusion the scanning department has a concept of a document related to scanning, but the back office has a different concept as it relates to the data base. To reconcile this let me tell you in complete what a document is.
A document is all the paper it takes to create a single record in a system or data base. This definition actually combines all of the above and generalizes it. The reason it’s important to reconcile everyone’s opinion on what a document is, is because document structure and business rules around a document directly impact how you implement OCR and Data Capture and keep it accurate.
The biggest challenge of all these language misconceptions is purely understanding that they exist. If you know it’s going to happen then you can mitigate their impact. Not knowing their presence can make them a silent killer of success.
Chris Riley – Sr. Solutions Architect
Ilya Evdokimov is a long-term practitioner and expert in leading Optical Character Recognition (OCR), Data Capture and Document Processing techniques, technologies and solutions. With over 15 years of experience spanning enterprise software implementations, mobile applications development, cloud-based systems integration and desktop-level automation, Ilya Evdokimov uses through industry knowledge and experience to achieve high efficiency and workflow optimization in most challenging paper-dependent and digital image capture environments.