Converting Digital Photos to Text – The Reality

by Ilya Evdokimov | Dec 07, 2009 | OCR

Converting Digital Photos to Text

Contrary popular belief it will be many years before a digital photograph of a document will be close to the accuracy of a document scan. Yes there are document scanners today that are based on a mounted digital camera, this is very accurate, but not what I’m referring to. I’m talking about photography of documents with your cell phone, or digital camera. One would assume that taking a photograph of a document at the highest possible resolution would be able to eventually replace document scanning, but that is not the whole story. Even your 12 mega-pixel digital camera will not beat a 300 DPI document scan when it comes to document imaging. While it is possible to get better and better digital photographs of documents there is one major problem in converting them using OCR and that is that OCR engines have to account for many more variable elements, the most complicated being layers.

When you take a photograph of a document there is the potential of several different focal points, a table, a finger, the floor. Some of these focal points can be easily be mistaken for the flat surface of a document. The OCR engine has to determine which layer or focal point is the actual document and what it’s borders are. The way the do this is color detection primarily. Because in a document scan there is only one focal point, as the document is the entirety of the image, the OCR engine does not need to guess and make any modification to the image to find it. This increases the accuracy of both document analysis and character reading. The next challenge is perspective.

A digital photograph of a document should be taken head on. Think about the LCD screen on your camera as being on the same plane as the piece of paper. Any variation to this causes problems with distortion where for example the top portion of the document from left-edge to right-edge has a shorter distance than the bottom portion. There are some capture applications out there for the iPhone and other mobile devices that force you to line the document up in brackets. This forces the capture to focus only on the document and know by virtue of the guide where the borders are, but lining it up is very time consuming. That gets to the final point, time.

It actually would take you much more time to capture 10 page document with a digital photograph than with a ADF or sheet-fed document scanner. Because the quality of the photo is so important in running OCR on a digital photograph It requires a lot of conscious effort on no shaking, lining up the document, and placing the document on a surface that does not contain many layers or focal points. Because of this additional effort it’s actually not saving any time.

I am a fan of blooming technology as well, but for acquiring paper images and converting them there is not better way then a portable or traditional document scanner. In time digital photographs of documents will become a popular way to capture single page documents for one-off processing, but as long as paper exists so will the reality of document scanners.

Sr. Solutions Architect