OCR-IT Cloud OCR API provides access to high-quality OCR from devices and environments where OCR does not reside locally due technical limitations and other constraints. This enables such environments to perform OCR-related tasks without use of local resources or maintenance and upkeep. In some cases, cloud-based OCR is the only option to enable image processing and text recognition. As the result, since images are processed off-device, developers should consider several optimization techniques at every stage of their submission process.
The entire conversion workflow can be separated into these logical steps:
- Image capture, creation, optimization
- Transmission to cloud
- Processing
- Transmission back to source
- Text/data processing
There are multiple actions developers can take at each process stage to achieve fastest possible processing. Let’s explore each stage separately.
1. Image capture, creation, optimization – preparation of the image for submission to processing. This is one of the most important steps in successful workflow, since all consecutive stages will depend on the result of this stage. Image should be as clear as possible to achieve higher level of OCR. This means using various techniques such as user guidance and training to achieve better images, on-device quality check, resolution check, shake detection, image cleanup to prepare clean and small image for transmission, as well as other techniques. An average 3G connection upload speed on iPhone or Android device is about 0.85 Mbps (0.11MBps) per PCWorld field tests here. The average photo size is about 2.5 MB. This means the upload of the original photo alone will take about 23 seconds. However, if the image is binarized prior to transmission, the resulting black & white image filesize can be about 30 KB, which would take substantially shorter time to upload, thus increasing the overall processing speed greatly. In fact, there is no bigger impact onto processing speed than the size and quality of the image, and the time it takes to transmit it.
Binarization – the process of converting every pixel in the photo to either black or white, which effectively converts the photo into a pure black & white image. See another Blog post for detailed binarization explanation here.
Binarization is one of top beneficial and useful image prepossessing techniques. OCR-IT now offers powerful binarization algorithm, available for licensing to all iOS developers. For practical use and test of this OCR-IT binarization algorithm, test the FotoNote app from Apple App Store. Contact support@wisetrend.com for additional information.
For office documents from scanners or personal computers, the suggested format for submission is TIFF Group 4, which offers very good image quality and small filesize balance.
2. Transmission. The actual connection speed directly impacts upload time of the image and monitoring of the process. EDGE, 3G, 4G, dialup, broadband, WiFi – all have different variable connection speeds, which will result in different speed of your submission progression. Developers are strongly encouraged to test their typical environments and have common statistics for average processing speed calculations.
3. Processing. This stage happens within OCR-IT cloud. Incoming images are queued for processing, processed, and data is returned back to the submitting source. Speed of processing turnaround depends on queue at that moment in OCR-IT system (minor impact), image filesize to be sent between internal components (minor impact), and the structure and text content within images (major impact). If the image contains a simple document or a few lines of clean text, the speed of actual text recognition will be under a few seconds. If the image contains a lot of text and mixed graphics, tables, noisy backgrounds, heavy colors, handwriting, or other OCR-unfriendly elements, then processing will take longer. For example, submitting a full page of small print text like the back of a credit card agreement, or a newspaper page (example), may take up to 1-2 minutes to recognize. If your document is large in format size, or contains multiple pages, that will also increase speed of processing. Processing 100-page PDF file will take about 100 times longer than processing 1-page PDF file (in reality, it will take less than 100 times due to advanced OCR-IT scheduling algorithms that will process this multi-page document in parts, but this functionality is overlooked for purposes of this explanation).
4. Transmission back to requester. There are two favorable aspects that work in favor of speed, so this stage causes least amount of concern. First, OCR-IT returns processed text, which is usually smaller in size than the original image. Result from a 2.5 MB image could be just a few KB of text. Second, download streams for most Internet connections are faster than upload speeds, so content can be downloaded within few seconds or less.
5. Processing of OCR data. Once the data is returned, developer may want to do something with it. This step is usually fast and has minimal impact onto processing speed. Nevertheless, developers should optimize it also.
Developers are strongly encouraged to review their entire process and evaluate and optimize every stage. OCR-IT team will be glad to provide suggestions and feedback and best practices. Contact us with specific questions and we’ll be glad to assist.