There a few niche uses of OCR technology that many people don’t realize exist, many will never impact the average user. But there are twos mainstream use of the technology that impacts everyone. OCR can and is used to thwart spammers, and even detect virus’s. How you ask?
Spammers for years have realized that by embedding images with text in their messages they are avoiding the text analysis processes that detects the keywords that give away spammers. But there is away to get around this. By OCRing the images with text the same text analysis process can be run and spammer caught! This is deployed in some anti-spamming applications and it’s usage will get even more popular as the technology becomes even more and more a commodity. The use of it today is primarily done on server side anti-spam detection vs. client side applications. I expect to see in the future all anti-spam applications to also include OCR technology. This trick seems obvious when you think about it, but how does OCR prevent viruses?
If you are familiar with how viruses work you know that occasionally virus’s come to your machine as an invited friend to an already installed malware application already on your machine. Occasional harmless malware applications are just the first step in getting malicious virus’s on a machine. The reason this works is because already installed applications are granted greater access to machine resources than applications that are yet installed. Now here is where it gets even tricker. Usually the virus portion of the attack or the “payload” is received from a website or silently downloaded at a certain time. Virus protection applications are very good at spotting both the malware and the payload when it comes across as a text stream. But when the payload comes across as an image containing the code for the payload it’s a little trickier. The attacker is banking on the fact the image passes the virus checking, the malware converts the image to text, using OCR, compiles and runs it secretly. Now Anti-Virus engines are getting privy to this process and can OCR the image first to see if there is any code in it, and stop the payload before it even has a chance.
Attackers are tricky, but so are the makers of protection software. Often times makers of virus’s give away the solution to prevent any attack, in this case OCR.
Chris Riley – Sr. Solutions Architect
Ilya Evdokimov is a long-term practitioner and expert in leading Optical Character Recognition (OCR), Data Capture and Document Processing techniques, technologies and solutions. With over 15 years of experience spanning enterprise software implementations, mobile applications development, cloud-based systems integration and desktop-level automation, Ilya Evdokimov uses through industry knowledge and experience to achieve high efficiency and workflow optimization in most challenging paper-dependent and digital image capture environments.