OCR FOR BANK STATEMENTS
I need to be able to OCR bank statements, including getting all the numbers and description in a form that can be processed. How did you cope with the fact that every bank any a different layout?
We have first hand experience, and I have done it in two different ways in the past.
Full Page OCR
First, you can take the approach of “full-page OCR” and then parse the information into your desired data format. There is a variety of Engines with .NET support, such as ABBYY Engine SDK, or even a completely free-to-start cloud-based on-demand OCR API . This is more of a classic approach I used for over past 10 years and up to a few years ago. OCR provides you with a complete text-based result, and you use algorithms to extract information. This approach is quite static and requires heavy programming usually, especially if there are multiple variations. There are two potentially troublesome areas to look for in this approach:
-
- making sure that OCR provides consistent layout and text structure so it could be parsed reliably. If there is a table without gridlines, or if there is just tabular data that could be detected as a table, then OCR may work unpredictably from document to document, which essentially breaks your parsing down the road.
- making sure that your parsing logic can accommodate various formatting differences and multiple variations of data structures. This is pure programming that requires code changes for adjustments or updates.
Dynamic Data Capture
Second, use a modern dynamic data capture system that automates template identification and data extraction. This is the approach I have been using instead of parsing for a few years now, and it is several times faster and more convenient to create and operate. In this process you would use a specialized software, such as ABBYY FlexiCapture, which will take care of two aforementioned issues with variable data formats and different templates. Before processing it needs to be setup and “trained” to identify different statement types and how the data is located on those different variations. It performs all setup through User Interface and not coding, and you can plugin custom scripts if desired. If it needs to be re-trained for a new template, or trained to capture some data more reliably, it takes a few minutes without coding or programming experience. I have trained accountants to maintain and adjust their invoice templates themselves.
NOTE: FlexiCapture is not .NET SDK, but an application with complete automation. It can be used interactively by operators, or typically I use it for 100% server-based automation. Once setup, I feed images to it for Input, and get my properly formatted text as Output in either CSV, XML, or direct export into my ODBC databases. So you could use it as ‘black box’ server based component.
I have a sample project for bank statements somewhere, so please let me know if you would like to see it yourself live.
SOURCE: OCR & Data Capture consultants with 11 years experience.