Often data capture projects can be complex, this it has to do with the type of data that is collected. Some fields are easy to capture and some pose challenges. Let’s discuss the data fields that pose problems and how to address them.
In general, fields that are not easily constrained and don’t have a limited character set are problem fields. Fields that are usually very accurate and easy to configure are number fields, dates, phone numbers, etc. Then there are the middle ground fields such as dollar amounts and invoice numbers for example. The problem fields are addresses, proper names, items.
Address fields are for most surprisingly complex. Many would like to believe that address fields are easy. The only way to very easily capture address fields would be to have for example, in the US the entire USPS database of addresses that they themselves use in their data capture. It is possible to buy this database. If you don’t have this database the key to addresses is less constraint. Many think that you should specify a data type for address fields that starts with numbers and ends with text. While this might be great for 60% of the addresses out there, by doing so you made all exception address 0%. It’s best to let it read what it’s going to read and only support it with an existing database of addresses if you have it.
Proper names are next in complexity to address. Proper names can be a person’s name or company names It is possible to constrain the number of characters and eliminate for the most part numbers, but the structure of many names makes the recognition of them complex. If you have an existing database of names that would be in the form, you will excel in this field. Like addresses, it would not be prudent to create a data type constraining the structure of a name.
Items consist of inventory items, item descriptions, and item codes. Items can either be a breeze or very difficult, and it comes down to the organizations understanding of their structure and if they have supporting data. For example, if a company knows exactly how item codes are formed then it’s very easy to accurately process them with an associated data type. The best trick for items is again a database with supporting data.
As you can see, the common trend is finding a database with existing supporting data. Knowing the problem fields focuses on companies and helps them with a plan of attack to creating very accurate data capture.
Ilya Evdokimov is a long-term practitioner and expert in leading Optical Character Recognition (OCR), Data Capture and Document Processing techniques, technologies and solutions. With over 15 years of experience spanning enterprise software implementations, mobile applications development, cloud-based systems integration and desktop-level automation, Ilya Evdokimov uses through industry knowledge and experience to achieve high efficiency and workflow optimization in most challenging paper-dependent and digital image capture environments.