IT departments like the new latest and greatest computer technology, why shouldn’t they. Usually when shopping for a machine it’s always true that MORE = BETTER. But in the case of OCR organizations are surprised when a desktop testing machine outperforms their new Beefy server. In the case of OCR there are very specific things that increase the performance of processing. Many desktop grade machines will do an amazing job at OCR if you just hit the right points.
1.)Bus speed. If you consider that OCR is moving images in memory and on the hard-drive very rapidly and doing it a lot than you will quickly realize that the time it takes to move from point A to point B could be one of your biggest bottle necks. Lets try an analogy. San Francisco, and New York are two very large cities. They have quite an amazing capacity for people, and things. Let’s say San Francisco is computer memory, and New York is a hard-drive. If I and 200 of my friends want to move from San Francisco to New York with all our stuff, driving 100 or so VW Beatles cross country would take a LONG time. But if we were to all load on a jumbo jet we would be there in a matter of hours. This is how the BUS works and the slower the BUS speed on memory, hard-drive, and CPU the more of a delay for these image files to write. Servers often have fast BUS speeds but have a tremendous amount of overhead that gets in the way.
2.)OCR is a CPU HOG. It will take 99% of any single thread when it is running, so putting energy into a more powerful CPU with more threads is not a bad idea. However assuming that a server grade CPU such as the Xeon is better then a Desktop CPU such as the Duo might be a mistake. The reason for this is simple and two fold. Again servers have more overhead which can get in the way of processes that have a lot of moving from one place to another. Most importantly is that the chip-set of the older established CPUs is just that, older. They may be the same speed, but they don’t deploy some of the faster math processing that is very good for OCR and found in the new chip sets.
3.)Hard-Drive speed is the same story as BUS speed. You want your hard drives to write quickly. Images are being serialized very often with OCR. Not only do you want it to be fast but you want it’s connection to the motherboard to be fast. Serial ATA so far is the proven fastest way. Server’s tend to implement SCSI which is great for redundancy, but not a promoter of speed because of the overhead.
4.)Memory is important but amount of memory is less important then the memory speed. 4 GB should be sufficient for most activity any machine can handle. The difference between 266 MHz speed and 666 MHz is a huge difference.
If you keep it simple and focus on those tools that REALLY increase OCR performance you may be surprised that you have to pay less to get more in this case.
Chris Riley – Sr. Solutions Architect
Ilya Evdokimov is a long-term practitioner and expert in leading Optical Character Recognition (OCR), Data Capture and Document Processing techniques, technologies and solutions. With over 15 years of experience spanning enterprise software implementations, mobile applications development, cloud-based systems integration and desktop-level automation, Ilya Evdokimov uses through industry knowledge and experience to achieve high efficiency and workflow optimization in most challenging paper-dependent and digital image capture environments.