Any form can be classified as “fixed” (aka structured) or “unstructured” (aka semi-structured).
Fixed form – form that contains same type of data in exactly same placement on each page. This document can be a single-page form or a multi-page form, but the number of pages is constant and exactly the same in every document. Example: DMV form, survey, questionnaire, new account application, registration card. Typically, but not always, this form will have boxes or special registration fields and marking for better data placement.
Semi-structured form – form that contains similar data but position of that data my shift or change from document to document. This document may contain multiple pages, and frequently number of pages varies from document to document depending on content length. Data may be located in in different areas form document to document, and may occupy different space. Some data may be missing. Some data may appear one or more times, such as the number of line items on an invoice. Examples: invoice, utility bill, bill of lading, bank statement.
Two different setup procedures are used depending on which kind of form will be processed in production. Unstructured form procedures CAN be used for either fixed or unstructured form, because even a fixed form can be considers as semi-structured form with little to no variation. However, fixed form procedures CANNOT be used on unstructured forms, because those procedures require data placement to be in same relevant positions.
Some external factors besides the form design itself may make the form more applicable to semi-structured processes than to fixed form processing, even though the form was originally designed as fixed form. For example, assume a company has a PDF form on the website for clients to print, fill out and return back to the company. Several factors have to be considered before classifying this form as good candidate for fixed form processing:
- PDF scaling – different users may have their PDF printing software scale the PDF differently before printing, resulting in forms having slightly different size.
- Printing variations – different users will have different printing hardware, which may use different printing margins, stretching due to worn out paper feeding rollers, and color intensity and sharpness (thick vs thin characters).
- Fax vs scan vs original – different users may return the form via different channels. Faxed form will be compressed by the fax machine to add fax header line to the top of the page. Various fax machines cause stretching and compression due to various conditions with paper feeding mechanisms. Scanned forms may have different scanning resolution, skew, and rotation. Paper forms received by the company may be scanned at Xerox and multi-function devices, introducing other variations into scan qualities.
A combination of such external factors may make it easier and mare rewarding to process these forms using unstructured methods due to artificially ‘introduced’ variations.
Ilya Evdokimov is a long-term practitioner and expert in leading Optical Character Recognition (OCR), Data Capture and Document Processing techniques, technologies and solutions. With over 15 years of experience spanning enterprise software implementations, mobile applications development, cloud-based systems integration and desktop-level automation, Ilya Evdokimov uses through industry knowledge and experience to achieve high efficiency and workflow optimization in most challenging paper-dependent and digital image capture environments.