In the early days of OCR soon after Kurzweil invented it, the desired approach to increase accuracy was to institute a printing standard. That standard included two fonts OCR-A and OCR-B fonts that the first OCR engines were specially trained for. Today use of these fonts sometimes actually reduces OCR accuracy with modern engines. It’s a fact that if you just run a modern engine on a document with OCR-A text that it will initially be less accurate unless you tell the software that it is OCR-A at which point it will be extremely accurate.
Some of the education around OCR processing still discusses these fonts as a living standard. In the area of OCR of numbers only the fonts are beneficial as it demonstrates a significant difference between numbers that look like characters “1”, “0”, etc. This font, if you extract the numbers only portion of OCR-A is called “Index”. But for the most part the fonts provide no additional benefit in everyday OCR processing. So what happened?
Three major things happened that prevented this standard from taking off:
1. The adoption of OCR technology was very low at the time and used in special cases so there was not a large enough user base to embrace it.
2. It’s really hard to tell users how to create their documents, especially because the people doing the OCR often are not the creators of the original document and do not have the power to determine printing font. All documents printed in these fonts are very boring and document a generator like style.
3. The OCR engines in-spite of the standard improved to work very well on the vast majority of all fonts minus cursive and stylized special fonts. Because of this, it quickly became clear that any typographic text could be converted.
As a little bit of OCR history these fonts are interesting to explore the rapid growth in the technologies accuracy. There are a few specialized engines out there that utilize only the OCR-A and OCR-B fonts especially when dealing with very fast camera OCR of part numbers on product assembly lines, but for the most part the standard is not required and not widely used.
Chris Riley – Sr. Solution Architect
Ilya Evdokimov is a long-term practitioner and expert in leading Optical Character Recognition (OCR), Data Capture and Document Processing techniques, technologies and solutions. With over 15 years of experience spanning enterprise software implementations, mobile applications development, cloud-based systems integration and desktop-level automation, Ilya Evdokimov uses through industry knowledge and experience to achieve high efficiency and workflow optimization in most challenging paper-dependent and digital image capture environments.