You can read the fine-print

by Ilya Evdokimov | Jan 15, 2010 | OCR

As fonts get smaller the challenge to read them with OCR software increases, however there are some key things that organizations should be aware of when reading the fine-print.

OCR technology today is capable of reading fonts as small as 8 pt or even 6 pt very accurately. It used to be the case that unless you have a 12 pt font you stood no chance. Because of increased quality of scans and more advanced OCR engines, reading small fonts will not be a problem if the right approaches are used.

Small fonts have a higher sensitivity to image quality and degradation to the document. For this reason, original source images that are scanned at 300 DPI or higher are necessary. For normal fonts there is seldom reason to scan higher than 300 DPI but for small fonts the goal is to get them to appear more or less the same as the regular fonts, so scanning them at 400 to 600 DPI is useful. Additionally documents that are “clean” are very important. A smudge or spill on a document impacts smaller fonts many times more then a larger font because of the closeness of lines. Once you have a good image quality you can start the conversion.

The next best benefit for small fonts is for them to be zoned separately. Zoning is the process of rubber banding the region where the text exists. When small fonts are grouped in the same zone with normal sized fonts the OCR software assumes that they should be of the same size and the confidence and accuracy go down. If you zone the small fonts separately you increase the OCR engines ability to use experts just for small fonts and increase the accuracy on them.

Next time someone tells you to read the small print, tell them you wont read it, you will scan and OCR it.

Chris Riley – Sr. Solutions Architect