Answer on StackOverflow: Detecting text in images
This question was answered on www.StackOverflow.com
Is there any good way of detecting whether an image contains text or not?
I’m not looking for a way to retrieve the text, only to detect if there is one or more characters present in the image.
I can understand that there is no foolproof way of detecting text, like when the font is a bit off standard; it might be hard to recognize. I’m after a “as good as can be” solution.
Detecting if there is text is nearly the same as extracting the text, i.e. if you are able to extract text, it confirms that there is text. Detecting the text is roughly 90% same steps as extracting the text, the last 10% being some optimizations for specific languages and text types within OCR to produce better text recognition. Most of the heavy lifting happens at the beginning of the process, specifically image binarization and backgrounds removal, segmentation into objects, document analysis for layout, object type detection, and recognition of each object separately. For background information, take a look at the blog post I wrote many months ago about detecting and extracting various text via OCR from complex pictures and images. For given images, take these steps one after another, and you will be able to decide if today’s technology can see text in these, and any other pictures.
- Binarization. Convert images to black & white. After this conversion can you see printed text characters. If no – end of of process – no text can be detected. If yes, proceed to the next step.
- Character separability. Human eyes are more adaptive than any technology and can pick out data even obscurely hidden in other objects. In the binarized images, are visible characters separate from any other elements, i.e. they do not touch other characters or elements. If no – end of process – those characters most likely will become not individual characters, but parts of some other non-text objects like pictures/logos/diagrams during analysis. If yes – you can see clearly separate characters – proceed to the next step.
- Rotation. Are characters on the same ‘baseline’ (can you draw one line below all characters)? Is that line about horizontal or vertical? If no – usually end of process – unless you instruct OCR software to detect individual characters one by one. If yes – proceed to the next step. (NOTE: If there is a baseline, but it is at some steep angle, like in the “Smoothdealer” picture, the trick is to rotate the picture 15 degrees at a time and pass each rotated variant through OCR. On some variant the text will be near vertical or near horizontal, which OCR can detect. OCR systems today can read only text in horizontal or vertical (some can) rotations)
- Language. OCR needs to be instructed and pre-set to look for some specific language, or at least a character set. You will need to specify the range of possible characters to look for. For example, if you set English character set, then some Russian or Chinese letters don’t look like letters (from English language perspective) but more like graphics.
Furthermore, quality of the OCR software will determine how powerfully each step can act. More powerful OCR will be able to successfully process more complex images. For example, using Tesseract in the past, it frequently returned nothing indicating there was no text on images. Some other commercial OCR was able to return text from the same small or very low quality picture, indicating there was text. Essentially two entities tell you different things, and you need to know which one is wiser, and listen to that one.
Also, some OCR will have special modes for ‘aggressive’ text extraction, which will go even into logos and graphs in order to find and extract every single piece of text, and anything that looks like text. Other OCR may just treat the same logo as picture, even if it has text inside, but that text will not become characters. Think Microsoft or Google logos. I know of two commercial products that have this capability for advanced text extraction from within other objects: ABBYY FlexiCapture advanced enterprise data capture software, and OCR-IT Cloud OCR API which has TextAgressive analysis and extraction mode.
Using the above methodology, let’s look at every sample provided:
Donald – some characters CAN be detected, with low chances
Vip House – characters CAN NOT be detected
Smoothdealer – characters CAN be detected with 15 degree rotation tests
Oneplus – characters CAN be detected. Most OCR software support inverted text.
500PCS – characters CAN be easily detected