The difference often between easy data capture projects and more complex ones has to do with the type of data being collected. For both hand-print and machine print forms, certain fields are easy to capture while others pose challenges. This post is to discuss those “problem fields” and how to address them.
In general fields that are not easily constrained and don’t have a limited character set are problem fields. Fields that are usually very accurate and easy to configure are number fields, dates, phone numbers, etc. Then there are the middle ground fields such as dollar amounts and invoice numbers for example. The problem fields are addresses, proper names, items.
Address fields are for most people surprisingly complex. Many would like to believe that address fields are easy. The only way to very easily capture address fields would be to have for example in the US the entire USPS data base of addresses that they themselves use in their data capture. It is possible to buy this data base. If you don’t have this data base the key to addresses is less constraint. Many think that you should specify a data type for address fields that starts with numbers and ends with text. While this might be great for 60% of the addresses out there, by doing so you made all exception address 0%. It’s best to let it read what it’s going to read and only support it with an existing data base of addresses if you have it.
Proper names is next in complexity to address. Proper names can be a persons name or company names It is possible to constrain the amount of characters and eliminate for the most part numbers, but the structure of many names makes the recognition of them complex. If you have an existing data base of names that would be in the form you will excel at this field. Like addresses, it would not be prudent to create a data type constraining the structure of a name.
Items consist of inventory items, item descriptions, and item codes. Items can either be a breeze or very difficult, and it comes down to the organizations understanding of their structure and if they have supporting data. For example if a company knows exactly how item codes are formed then it’s very easy to accurately process them with an associated data type. The best trick for items is again a data base with supporting data.
As you can see, the common trend is finding a data base with existing supporting data. Knowing the problem fields focuses companies and helps them with a plan of attack to creating very accurate data capture.
Chris Riley – Sr. Solutions Architect
Ilya Evdokimov is a long-term practitioner and expert in leading Optical Character Recognition (OCR), Data Capture and Document Processing techniques, technologies and solutions. With over 15 years of experience spanning enterprise software implementations, mobile applications development, cloud-based systems integration and desktop-level automation, Ilya Evdokimov uses through industry knowledge and experience to achieve high efficiency and workflow optimization in most challenging paper-dependent and digital image capture environments.