<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>OCR and Data Capture Blog</title>
	<atom:link href="http://www.wisetrend.com/ocr_and_data_capture_blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.wisetrend.com/ocr_and_data_capture_blog</link>
	<description>All about Exracting Text and Data from Images and Paper</description>
	<lastBuildDate>Thu, 26 Apr 2012 18:22:18 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Why output PDF may decrease in quality and increase in size after OCR</title>
		<link>http://www.wisetrend.com/ocr_and_data_capture_blog/why-output-pdf-may-decrease-in-quality-and-increase-in-size-after-ocr/</link>
		<comments>http://www.wisetrend.com/ocr_and_data_capture_blog/why-output-pdf-may-decrease-in-quality-and-increase-in-size-after-ocr/#comments</comments>
		<pubDate>Thu, 26 Apr 2012 18:22:18 +0000</pubDate>
		<dc:creator>Ilya Evdokimov</dc:creator>
				<category><![CDATA[Accuracy]]></category>
		<category><![CDATA[FineReader]]></category>
		<category><![CDATA[General IT]]></category>
		<category><![CDATA[OCR]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Recognition Server]]></category>
		<category><![CDATA[Scan Settings]]></category>
		<category><![CDATA[Scanning]]></category>
		<category><![CDATA[imaging]]></category>

		<guid isPermaLink="false">http://www.wisetrend.com/ocr_and_data_capture_blog/?p=240</guid>
		<description><![CDATA[Your original PDF increased in size and decreased in quality after OCR?  You find that a) the overall file size increased substantially and b) the quality of digital pages has decreased when viewed on the screen.]]></description>
			<content:encoded><![CDATA[<p>Overall, let&#8217;s say your original PDF was a multi-page digital document with text and graphics and had a very small size of a few KB.  The text looked very sharp no matter how much you zoomed in.  When processing through OCR, and comparing to the original PDF, you find that a) the overall file size increased substantially and b) the quality of digital pages has decreased when viewed on the screen.</p>
<p>The result is expected if you are processing and saving to <span style="text-decoration: underline;">PDF Text Under Image</span>.  In this mode, by your specification, the software rasterizes (creates image of) every page in order to produce the output where page image is visible and text is stored under it.  This is the reason for decrease in quality, because the image visible in the document will be more pixelated than digital text, which is hidden under the image.  Also, because in the result PDF now there is a newly create picture of every entire page + the OCR text result, whereas before it was only digital text, the file size can increase substantially.  It takes more storage to store newly created images per page in the result PDF.</p>
<p>I tested a digitally-generated PDF file containing 10 pages and some color graphics.  I think this testing scenario will reproduce this common situation well.  No compression or down-sampling has been specified in export settings for this test, which if used in PDF export settings can help decrease file size further.</p>
<ul>
<li>Original digital 10-page PDF<br />
Contains digital text and some digital color graphics.<br />
30.5 KB</li>
<li> Processed PDF, Text Only<br />
Digital test is visible, along with some OCR mistakes.  Formatting around graphics has been altered slightly.<br />
15.4 KB</li>
<li>Processed PDF, Text Under Image<br />
Rasterized pixelated picture of each page is visible.  Perfect preservation of original look and formatting.  Text is stored under page pictures for selection and searching.<br />
763 KB</li>
<li> Processed PDF, Text Over Image<br />
Rasterized pixelated picture of each page is visible in some graphics, with good preservation of original formatting, but the text is sharp due to being placed on top of page picture.  OCR inaccuracies are also visible.<br />
67.8 KB</li>
</ul>
<p>Notice that this test applies only to ‘digitally-created’ PDFs where text already exists in vector form.  As a result, when saving to Text Under Picture, a whole new picture layer is created, which increases the storage size.  If you were to process ‘image-based’ PDF such as a scan, it would have contained the image of the page before processing.  OCR would add a text layer only, which is small in size, and the size difference would not be noticeable.</p>
<p>Selecting one of three available PDF export types will affect the look and size of the output PDF.  PDF Text Under Image is the most commonly used format for archiving, indexing, and preservation, but in some cases it comes at a cost of size increase if original PDF did not contain an image layer.  Available quality and compression algorithms included in the software can help decrease the output size.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wisetrend.com/ocr_and_data_capture_blog/why-output-pdf-may-decrease-in-quality-and-increase-in-size-after-ocr/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Clearing list of old projects in ABBYY FlexiCapture startup dialog</title>
		<link>http://www.wisetrend.com/ocr_and_data_capture_blog/clearing-list-of-old-projects-in-abbyy-flexicapture-startup-dialog/</link>
		<comments>http://www.wisetrend.com/ocr_and_data_capture_blog/clearing-list-of-old-projects-in-abbyy-flexicapture-startup-dialog/#comments</comments>
		<pubDate>Wed, 18 Apr 2012 04:28:24 +0000</pubDate>
		<dc:creator>Ilya Evdokimov</dc:creator>
				<category><![CDATA[Fixed]]></category>
		<category><![CDATA[FlexiCapture]]></category>
		<category><![CDATA[General IT]]></category>
		<category><![CDATA[Semi-structured]]></category>
		<category><![CDATA[data capture]]></category>

		<guid isPermaLink="false">http://www.wisetrend.com/ocr_and_data_capture_blog/?p=234</guid>
		<description><![CDATA[When starting ABBYY FlexiCapture, a user is presented with a list of last opened projects.  Frequently, once folders have been moved or projects deleted, paths in that list will no longer be valid.  It may be desireable to clear that list.]]></description>
			<content:encoded><![CDATA[<p>When starting ABBYY FlexiCapture, a user is presented with a list of last opened projects.  Frequently, once folders have been moved or projects deleted, paths in that list will no longer be valid.  It may be desireable to clear that list.</p>
<p><a href="http://www.wisetrend.com/ocr_and_data_capture_blog/wp-content/uploads/2012/04/ABBYY_FlexiCapture_Projects_List.png"><img class="aligncenter size-full wp-image-235" title="ABBYY_FlexiCapture_Projects_List" src="http://www.wisetrend.com/ocr_and_data_capture_blog/wp-content/uploads/2012/04/ABBYY_FlexiCapture_Projects_List.png" alt="ABBYY FlexiCapture Projects List" width="531" height="361" /></a></p>
<p>The only known method how to clear that list is through Registry.</p>
<p>Navigate to \HKEY_CURRENT_USER\Software\ABBYY\FlexiCapture\9.0\Shell\MRU_List and remove unwanted items from the &#8220;List&#8221; node.</p>
<p style="text-align: center;"><a href="http://www.wisetrend.com/ocr_and_data_capture_blog/wp-content/uploads/2012/04/ABBYY_FlexiCapture_Registry_Projects_List.png"><img class="aligncenter size-full wp-image-236" title="ABBYY_FlexiCapture_Registry_Projects_List" src="http://www.wisetrend.com/ocr_and_data_capture_blog/wp-content/uploads/2012/04/ABBYY_FlexiCapture_Registry_Projects_List.png" alt="ABBYY FlexiCapture Registry Projects List" width="465" height="296" /></a></p>
<p>NOTE: Registry editing is dangerous if performed incorrectly.  Please do it at your own risk.  If unsure, copying original entries into Notepad so they could be restored if something happens during testing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wisetrend.com/ocr_and_data_capture_blog/clearing-list-of-old-projects-in-abbyy-flexicapture-startup-dialog/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Error when installing ABBYY Recognition Server on Windows 7 or Windows Server 2008</title>
		<link>http://www.wisetrend.com/ocr_and_data_capture_blog/error-when-installing-abbyy-recognition-server-on-windows-7-or-windows-server-2008/</link>
		<comments>http://www.wisetrend.com/ocr_and_data_capture_blog/error-when-installing-abbyy-recognition-server-on-windows-7-or-windows-server-2008/#comments</comments>
		<pubDate>Tue, 17 Apr 2012 01:28:42 +0000</pubDate>
		<dc:creator>Ilya Evdokimov</dc:creator>
				<category><![CDATA[General IT]]></category>
		<category><![CDATA[OCR]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://www.wisetrend.com/ocr_and_data_capture_blog/?p=228</guid>
		<description><![CDATA[ABBYY Recognition Server installer may produce MSILog access violation during an install/uninstall/upgrade on Windows 7 or Windows Server 2008.]]></description>
			<content:encoded><![CDATA[<p>In some cases, ABBYY Recognition Server installer may produce during an install/uninstall/upgrade on Windows 7 or Windows Server 2008.</p>
<div id="_mcePaste">A sharing violation occurred while accessing</div>
<div id="_mcePaste">C:\Users\ilyae\AppData\Local\Temp\AbbyyMsiLog.txt.</div>
<p><a href="http://www.wisetrend.com/ocr_and_data_capture_blog/wp-content/uploads/2012/04/ABBYY_Recognition_Server_Installer_MSILog_Error.png"><img class="aligncenter size-full wp-image-229" title="ABBYY_Recognition_Server_Installer_MSILog_Error" src="http://www.wisetrend.com/ocr_and_data_capture_blog/wp-content/uploads/2012/04/ABBYY_Recognition_Server_Installer_MSILog_Error.png" alt="ABBYY Recognition Server Installer MSILog Error" width="498" height="220" /></a></p>
<p>This issue is due to a potential bug in an installer, and it is easy to resolve. To resolve the issue, it is nessesary to create string EnableMSILog and set its value to “false” in HKEY_CURRENT_USER\Software\ABBYY\Debug branch of the Registry.  After the modificaiotn, the Registry screen should look like this:</p>
<p style="text-align: center;"><a href="http://www.wisetrend.com/ocr_and_data_capture_blog/wp-content/uploads/2012/04/ABBYY_Recognition_Server_MSILog_Registry.png"><img class="aligncenter size-full wp-image-230" title="ABBYY_Recognition_Server_MSILog_Registry" src="http://www.wisetrend.com/ocr_and_data_capture_blog/wp-content/uploads/2012/04/ABBYY_Recognition_Server_MSILog_Registry.png" alt="ABBYY Recognition Server MSILog Registry" width="543" height="246" /></a></p>
<p>Restarting the installation process should complete smoothly after this modification.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wisetrend.com/ocr_and_data_capture_blog/error-when-installing-abbyy-recognition-server-on-windows-7-or-windows-server-2008/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to make a copy of a project from ABBYY FlexiCapture</title>
		<link>http://www.wisetrend.com/ocr_and_data_capture_blog/how-to-get-a-copy-of-project-from-abbyy-flexicapture-distributed-edition/</link>
		<comments>http://www.wisetrend.com/ocr_and_data_capture_blog/how-to-get-a-copy-of-project-from-abbyy-flexicapture-distributed-edition/#comments</comments>
		<pubDate>Fri, 06 Apr 2012 18:53:42 +0000</pubDate>
		<dc:creator>Ilya Evdokimov</dc:creator>
				<category><![CDATA[Fixed]]></category>
		<category><![CDATA[General IT]]></category>
		<category><![CDATA[Invoice Processing]]></category>
		<category><![CDATA[Semi-structured]]></category>
		<category><![CDATA[Technology Procurement]]></category>
		<category><![CDATA[data capture]]></category>

		<guid isPermaLink="false">http://www.wisetrend.com/ocr_and_data_capture_blog/?p=226</guid>
		<description><![CDATA[Summary of how to save a copy of project for preservation or backup and restore purposes from ABBYY FlexiCapture.]]></description>
			<content:encoded><![CDATA[<p>Frequently, a System Administrator may want to save a snapshot of the entire project in ABBYY FlexiCapture.  The process is the same for version 9.0 or 10.0, and for Standalone or Distributed Edition.  The purpose of this snapshot copy maby be any of the following:</p>
<ul>
<li> to save a backup copy of the project in its current state for backup purposes</li>
<li>to make a copy of the project before applying new changes, where a copy can be used as a recovery if needed</li>
<li>to send or transfer project settings to a different installation , such as between development and production</li>
</ul>
<p>The snapshot created includes global project settings, all templates and document definitions (and naturally their content), and optionally batches with documents.  If you intend to include batches with documents in your exported copy, select those batches when performing the export.  If you do not wan to have any batches included, make sure to not have any batches selected.</p>
<p>Open your Administration Station (in Standalone Edition) or Project Setup Station (in Distributed Edition) and do the following:</p>
<ol>
<li>Go to Project menu, and select Export Project.</li>
<li>Browse to a desired destination, give it a meaningful name (we recommend to include date and time) and click Create.</li>
<li>A copy of the project will be saved in your destination.  It looks like a folder with sub-folders, but the entire folder is the entire project, so it should remain intact when moving or sending it.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.wisetrend.com/ocr_and_data_capture_blog/how-to-get-a-copy-of-project-from-abbyy-flexicapture-distributed-edition/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Scripts for Table Element in ABBYY FlexiCapture 10.0</title>
		<link>http://www.wisetrend.com/ocr_and_data_capture_blog/using-scripts-for-table-element-in-abbyy-flexicapture-10-0/</link>
		<comments>http://www.wisetrend.com/ocr_and_data_capture_blog/using-scripts-for-table-element-in-abbyy-flexicapture-10-0/#comments</comments>
		<pubDate>Mon, 19 Mar 2012 20:58:14 +0000</pubDate>
		<dc:creator>Ilya Evdokimov</dc:creator>
				<category><![CDATA[Fixed]]></category>
		<category><![CDATA[Invoice Processing]]></category>
		<category><![CDATA[Semi-structured]]></category>
		<category><![CDATA[data capture]]></category>

		<guid isPermaLink="false">http://www.wisetrend.com/ocr_and_data_capture_blog/?p=222</guid>
		<description><![CDATA[Table is a powerful compound element to capture two-dimensional data structured.  When writing rules for a Table element, ABBYY FlexiCapture 9.0 or 10.0 will have two different major behaviors depending on where the actual Script rule is placed.  Currently Table object provides access to Columns collection only, and not Rows collection.  This means that Table.Items [...]]]></description>
			<content:encoded><![CDATA[<p>Table is a powerful compound element to capture two-dimensional data structured.  When writing rules for a Table element, ABBYY FlexiCapture 9.0 or 10.0 will have two different major behaviors depending on <span style="text-decoration: underline;">where</span> the actual Script rule is placed.  Currently Table object provides access to Columns collection only, and not Rows collection.  This means that Table.Items returns the number of Columns.  In order to access Rows in a Table, a single columns needs to be selected so that me.Field(“Column1”).Items could be used to return Rows collection.  Note that the Column is called by the column name.</p>
<p>Here are the two different behaviors of the .Items depending on the placement of the script rule:</p>
<ol>
<li>The Script rule is stored in the Table or Column object (to which most developers are used to).  In this case the Column object has NO .Items collection.  Columns can be accessed individually by name as a regular field, but script acts on all cells at once.  The advantage of this script is simplicity since row iteration becomes unnecessary.  The disadvantage is that the Cells count is unknown, and we cannot work with individual cells.</li>
<li>The Script rule is stored elsewhere, but NOT in the Table of Column object.  In this case the Column object has .Items or .Items.Item(i) which returns a collection of Cells.  We can iterate through each cell or access individual cells.</li>
</ol>
<p>The above is targeting working with Columns and collections of Cells within columns.  So in order to process Rows, we need to access an appropriate .Items.Item(i) of each Column where (i) represented the actual Row number.  Also, we must use approach # 2, which is to store our Script rule outside of Table or Column elements.  The most logical place to store such scripts is right in the Document Section page element (called “Page” in your Definition).</p>
<p>For example, let&#8217;s use some imaginary Table with multiple rows and columns and generate some rules.</p>
<p>a)      If Charges cell is populated and any other cells are empty in a row, then require Verification on those other fields</p>
<p>b)      If Charges cell is empty, then skip Verification on other fields.  (I even suggest to improve this logic by maybe clearing other fields from misc. data if Charges cell is empty)</p>
<p>Here is the VBScript code:</p>
<p><code> </code></p>
<pre><code>'======================================
' This Rule checks certain conditions in each row of GridLines
'======================================
dim i

'check that columns exist
'large all-inclusive IF-THEN with AND operator
if me.Field("FromDOS").IsMatched and _
    me.Field("ToTOS").IsMatched and _
    me.Field("POS").IsMatched and _
    me.Field("Units").IsMatched and _
    me.Field("CPT").IsMatched and _
    me.Field("DC").IsMatched and _
    me.Field("Charges").IsMatched Then

    'iterate through all rows, the below action within For loop happens per row
    For i = 0 to me.Field("Charges").Items.Count-1
        'check if CHARGES cell has at least one character
        'and other cellsFROM, TO, POS, CPT, DC, UNITS are empty
        If Len(me.Field("Charges").Items.Item(i).Text) &gt; 0 and _
        Len(me.Field("FromDOS").Items.Item(i).Text) = 0 and _
        Len(me.Field("ToTOS").Items.Item(i).Text) = 0 and _
        Len(me.Field("POS").Items.Item(i).Text) = 0 and _
        Len(me.Field("CPT").Items.Item(i).Text) = 0 and _
        Len(me.Field("DC").Items.Item(i).Text) = 0 and _
        Len(me.Field("Units").Items.Item(i).Text) = 0 Then
            'make all fields as NeedVerification
            me.Field("FromDOS").Items.Item(i).NeedVerification = true
            me.Field("ToTOS").Items.Item(i).NeedVerification = true
            me.Field("POS").Items.Item(i).NeedVerification = true
            me.Field("CPT").Items.Item(i).NeedVerification = true
            me.Field("DC").Items.Item(i).NeedVerification = true
            me.Field("Units").Items.Item(i).NeedVerification = true
        End If

        'check if CHARGES has no characters
        'and irrelevant if other cells FROM, TO, POS, CPT, DC, UNITS contain data
        If Len(me.Field("Charges").Items.Item(i).Text) = 0 Then
            'make all fields as NOT NeedVerification
            '(Perhaps a better option here is to clear out all other fields)
            me.Field("FromDOS").Items.Item(i).NeedVerification = false
            me.Field("ToTOS").Items.Item(i).NeedVerification = false
            me.Field("POS").Items.Item(i).NeedVerification = false
            me.Field("CPT").Items.Item(i).NeedVerification = false
            me.Field("DC").Items.Item(i).NeedVerification = false
            me.Field("Units").Items.Item(i).NeedVerification = false
        End If        

    Next

End If<code>
</code>

This code can be modified to perform any other logic while iterating through any number of Rows and calling on specific Columns by name to perform some actions.
</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://www.wisetrend.com/ocr_and_data_capture_blog/using-scripts-for-table-element-in-abbyy-flexicapture-10-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is the difference between PDF and PDF/Ab produced by ABBYY Recognition Server 3.0?</title>
		<link>http://www.wisetrend.com/ocr_and_data_capture_blog/what-is-the-difference-between-pdf-and-pdfab-produced-by-abbyy-recognition-server-3-0/</link>
		<comments>http://www.wisetrend.com/ocr_and_data_capture_blog/what-is-the-difference-between-pdf-and-pdfab-produced-by-abbyy-recognition-server-3-0/#comments</comments>
		<pubDate>Wed, 14 Mar 2012 05:30:29 +0000</pubDate>
		<dc:creator>Ilya Evdokimov</dc:creator>
				<category><![CDATA[General IT]]></category>
		<category><![CDATA[Scanning]]></category>
		<category><![CDATA[imaging]]></category>

		<guid isPermaLink="false">http://www.wisetrend.com/ocr_and_data_capture_blog/?p=219</guid>
		<description><![CDATA[What is the difference between PDF and PDF/Ab produced by ABBYY Recognition Server 3.0?  ]]></description>
			<content:encoded><![CDATA[<div id="_mcePaste">There are numerous resources that define specifications for the long-term preservation format PDF/A.  In general, this format includes less advanced features in order to be compatible across mroe software versions, platforms, and remain valid for many years.  This posting was specifically targeted to explore differences in PDFs produced by ABBYY Recognition Server.</div>
<div></div>
<div id="_mcePaste">Upon opening a PDF/A result file, Adove Acrobat Reader X detected and confirmed full compliance.  The file had Fast Viewing enabled as it should.  Looking at Fonts properties confirmed that all fonts were encapsulated in the PDF file itself.  Embedded fonts guarantee that the PDF/A file has all nessesary fonts within it to display the content accurately.</div>
<div></div>
<div id="_mcePaste">Inspection of file sizes between PDF and PDF/A revealed that PDF/A was about 10% larger than the same PDF file.  This makes logical and technical sense &#8211; PDF/A file has to include fonts embedded within.  The difference becomes less ntoiceable as quantity of pages increases within a file, because a font that has bene included once will be used throughout all pages that need it, not the same font file repeatedly embedded within each page.  On PDFs with smaller quantity of pages the file size difference is mroe noticeable.</div>
<div></div>
<div id="_mcePaste">Here is a simple example:</div>
<div id="_mcePaste"></div>
<div>Assume each page of PDF is 10 KB.</div>
<div id="_mcePaste">Assume fonts have a size of 20 KB.</div>
<div id="_mcePaste"></div>
<div>In a single page documents, the file size will be 10 + 20 KB, total of 30 KB.  Taht is 3x larger than a standard PDF without fonts embedded.</div>
<div id="_mcePaste"></div>
<div>In a 100 page document, the file size will be 10 * 100 + 20 KB, so a total of 1020 KB.  That is only 2% larger than a regular PDF with no fonts embedded.</div>
<div id="_mcePaste"></div>
<div>In conclusion, the file size between standard PDF and PDF/A is negligible since typicality PDFs are multi-page with repeating fonts.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.wisetrend.com/ocr_and_data_capture_blog/what-is-the-difference-between-pdf-and-pdfab-produced-by-abbyy-recognition-server-3-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is the best hardware for high-volume OCR?</title>
		<link>http://www.wisetrend.com/ocr_and_data_capture_blog/what-is-the-best-hardware-for-high-volume-ocr/</link>
		<comments>http://www.wisetrend.com/ocr_and_data_capture_blog/what-is-the-best-hardware-for-high-volume-ocr/#comments</comments>
		<pubDate>Wed, 23 Nov 2011 22:10:52 +0000</pubDate>
		<dc:creator>Ilya Evdokimov</dc:creator>
				<category><![CDATA[General IT]]></category>
		<category><![CDATA[OCR]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Technology Procurement]]></category>
		<category><![CDATA[data capture]]></category>
		<category><![CDATA[imaging]]></category>
		<category><![CDATA[outsourcing]]></category>
		<category><![CDATA[roi]]></category>

		<guid isPermaLink="false">http://www.wisetrend.com/ocr_and_data_capture_blog/?p=216</guid>
		<description><![CDATA[
For all of you out there with theoretical and practical experience, what is the best hardware and server types for large-volume OCR conversion.  What type of processors and other resources have you found most effective and why?  Please give a short description of your OCR work and loads to illustrate your feedback.  Thanks for your [...]]]></description>
			<content:encoded><![CDATA[<div>
<p>For all of you out there with theoretical and practical experience, what is the best hardware and server types for large-volume OCR conversion.  What type of processors and other resources have you found most effective and why?  Please give a short description of your OCR work and loads to illustrate your feedback.  Thanks for your feedback.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.wisetrend.com/ocr_and_data_capture_blog/what-is-the-best-hardware-for-high-volume-ocr/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Customer feedback about WiseTREND WiseINVOICE 9.0 package for AP/AR</title>
		<link>http://www.wisetrend.com/ocr_and_data_capture_blog/customer-feedback-about-wisetrend-wiseinvoice-9-0-package-for-apar/</link>
		<comments>http://www.wisetrend.com/ocr_and_data_capture_blog/customer-feedback-about-wisetrend-wiseinvoice-9-0-package-for-apar/#comments</comments>
		<pubDate>Mon, 06 Dec 2010 21:16:42 +0000</pubDate>
		<dc:creator>Ilya Evdokimov</dc:creator>
				<category><![CDATA[Accuracy]]></category>
		<category><![CDATA[Invoice Processing]]></category>
		<category><![CDATA[OCR]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Scanning]]></category>
		<category><![CDATA[Semi-structured]]></category>
		<category><![CDATA[Technology Procurement]]></category>
		<category><![CDATA[data capture]]></category>

		<guid isPermaLink="false">http://wisetrend.com/ocr_and_data_capture_blog/?p=213</guid>
		<description><![CDATA[Here is some early stage feedback form recent two implementations of WiseINVOICE for AP data capture automation:
Todd Gruff, B &#38; G Manufacturing Inc.’s lead for FlexiCapture ingratiation, said WiseTrend solution and services were top notch.  “We were very impressed with ABBYY FlexiCapture’s ability to capture the data we required.  The software is easy to use [...]]]></description>
			<content:encoded><![CDATA[<p>Here is some early stage feedback form recent two implementations of WiseINVOICE for AP data capture automation:</p>
<p>Todd Gruff, B &amp; G Manufacturing Inc.’s lead for FlexiCapture ingratiation, said WiseTrend solution and services were top notch.  “We were very impressed with ABBYY FlexiCapture’s ability to capture the data we required.  The software is easy to use and, with the terrific support from WiseTrend, the software&#8217;s ability to be adapted to different variations of Purchase Orders is limitless”.  B &amp; G Manufacturing Inc. is a world-renowned manufacturer and supplier of machined parts.  The company receives high volumes of Purchase Orders that require accurate and efficient processing and data gets imported into SAP.</p>
<p>IBT, Inc. is a well-known wholesale industrial supplier interested to optimize and streamline Invoice processing and storage.  “After starting our data capture project with the industry’s standard approach, we switched to a more predictable approach using WiseTrend methodology.  After numerous testing this approach has proven to be the best way.  The single complex template would not produce reliable results on different variations.  There was too much time required filling in missed fields and even double checking all seemingly successful fields (aka false positives).  Using this new methodology we can control and rely on the data capture result.  Even when vendors are consistently changing their invoice format, which would cause serious complications and a major need for professional services in the past, using WiseTrend’s method it is quick and efficient to make the change.”  Kevin Thompson and Randy Bledsoe are accountants at IBT, Inc., who now run and maintain their own full data capture system without any IT involvement.  Captured data and images get exported to DocuWare.  The project was led by Toshiba Business Solutions, a major copier integrator and ABBYY VAR.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wisetrend.com/ocr_and_data_capture_blog/customer-feedback-about-wisetrend-wiseinvoice-9-0-package-for-apar/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>WiseTrend Unveils Effective, Predictable and Profitable Approach to Invoice Processing</title>
		<link>http://www.wisetrend.com/ocr_and_data_capture_blog/wisetrend-unveils-effective-predictable-and-profitable-approach-to-invoice-processing/</link>
		<comments>http://www.wisetrend.com/ocr_and_data_capture_blog/wisetrend-unveils-effective-predictable-and-profitable-approach-to-invoice-processing/#comments</comments>
		<pubDate>Wed, 01 Dec 2010 16:34:08 +0000</pubDate>
		<dc:creator>Ilya Evdokimov</dc:creator>
				<category><![CDATA[Invoice Processing]]></category>
		<category><![CDATA[Scanning]]></category>
		<category><![CDATA[Semi-structured]]></category>
		<category><![CDATA[Technology Procurement]]></category>
		<category><![CDATA[data capture]]></category>
		<category><![CDATA[roi]]></category>
		<category><![CDATA[AP]]></category>
		<category><![CDATA[Automation]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[semi-structured]]></category>

		<guid isPermaLink="false">http://wisetrend.com/ocr_and_data_capture_blog/?p=208</guid>
		<description><![CDATA[Top data capture and document recognition integrator announces proven AP solution focusing on customer empowerment]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><img class="aligncenter" title="WiseTrend WiseINVOICE" src="http://www.wisetrend.com/images/wiseinvoice_invoice_done.png" alt="" width="135" height="170" /></p>
<p>We are proud to announce our new method of servicing invoice processing and AP data capture automation needs.  WiseTrend WiseINVOICE implementation combines best modern technologies, proven methodology, and user-friendly training focusing on user self-sufficiency.  This solution is part of a growing trend to eliminate runaway costs for professional services while obtaining a reliable AP automation system through a truly customizable fully functional solution with immediate and accurately calculable ROI.</p>
<p>Up until now, meticulous programming or applying the “one-size-fits-all” approach for Invoice processing were only solutions in the industry.  This often misleading approach led customers to projects outside of budgets, skyrocketing professional service cost, long integration period, frustration, and inability to see clear return on the technology investment.  As the quantity of document variations increases linearly, the complexity of the system and associated setup increases exponentially.</p>
<p>With over 10 years of data capture experience, we developed a solution that works to optimize the technology and minimize production labor.  Our programmers previously spent weeks un-tangling projects that were based on the current industry lure—a one-size-fits-all solution for invoice processing.  Software manufacturers promise a single solution as a magic pill for all template variations.  Contrary to the promise, we spent many professional service days running regression testing on complex implementations for pre-setup AP projects.  Over many projects, we streamlined the Invoice and Purchase Order project setup to become a consistent and repeatable process.</p>
<p>Our approach provides tangibles that are surprisingly uncommon in the industry, including predictable timeframe for achieving customer’s return on investment as well as removing the often unpredictable run-away professional service costs and deadline extensions.  Following the proven formula, customers can anticipate the accurate calculation of time and effort it would require to build an effective data extraction system for a particular quantity of variations.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wisetrend.com/ocr_and_data_capture_blog/wisetrend-unveils-effective-predictable-and-profitable-approach-to-invoice-processing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Text Parsing vs. Dynamic Data Capture</title>
		<link>http://www.wisetrend.com/ocr_and_data_capture_blog/text-parsing-vs-dynamic-data-capture/</link>
		<comments>http://www.wisetrend.com/ocr_and_data_capture_blog/text-parsing-vs-dynamic-data-capture/#comments</comments>
		<pubDate>Fri, 18 Jun 2010 16:58:40 +0000</pubDate>
		<dc:creator>Ilya Evdokimov</dc:creator>
				<category><![CDATA[Accuracy]]></category>
		<category><![CDATA[Invoice Processing]]></category>
		<category><![CDATA[OCR]]></category>
		<category><![CDATA[Semi-structured]]></category>
		<category><![CDATA[Technology Procurement]]></category>
		<category><![CDATA[data capture]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[form processing]]></category>
		<category><![CDATA[parsing]]></category>

		<guid isPermaLink="false">http://wisetrend.com/ocr_and_data_capture_blog/?p=205</guid>
		<description><![CDATA[Need to extract fields from OCRed text from forms.  The document types may vary hence a generic parser might not work.  What would you suggest to do in this situation?]]></description>
			<content:encoded><![CDATA[<p>I answered this question on StackOverflow, and it was too important not to duplicate here</p>
<p>QUESTION<br />
=================</p>
<p>I am extracting texts from OCRed Tiff files by using a library and dumping it in database. The text I am extracting are actually FORMS having fields like NAME, DOB, COUNTRY etc.   Since OCR does not know the difference between actual value and the label, it&#8217;s just dumping all text. Now I have text in DB in following format:</p>
<p>Name:    MyName Address:   My Address</p>
<p>Now the next step is to extract values lile <strong>MyName</strong> and <strong>MyAddrss</strong> from the DB. The document types may vary hence a generic parser might not work.</p>
<p>What would you suggest to do in this situation? Should I write different parsers? I am working on .NET</p>
<p>ANSWER<br />
=================</p>
<div>
<p>Hello. This is a common question for which an OCR industry found a generic solution years ago, and the solution branches into two separate directions. Using OCR for form processing, otherwise known as data extraction, can be one of the following two methods.</p>
<p>TEXT PARSING &#8211; considered as an old approach that still works in many situations. Obviously you are experienced in that and know the pros and cons, so I will be brief here. Pros is that it requires no other technology, just generic programming. Cons are that a) it requires programming, b) not very adaptive to variations, c) if formatting changes overtime may have to deal re-write some spagetti or legacy code, and d) requires near-perfect OCR result in order to find data successfully (i.e. mis-recognized label may result in missing data). In other words, great for quick and simple solutions, but not too adaptive to variations and changes. Have done it a lot back in my school and early programming days.</p>
<p>DYNAMIC DATA CAPTURE &#8211; using some special technology to dynamically locate data. Some technologies do it on the image-level and feed clean data to your database. Other technologies do it on the post-OCR text level. I am most familiar with data capture on image level, as it has several key benefits for complex projects I have done, so I will talk more about that. Only con is that you may need to invest into a specialized software tool, but that is a tool that provides a lot of benefit. Even a plumber has to invest into tools to do his job. The benefit of image-based data extraction is that post-OCR text is not always perfect, so the text-based extractor has to accommodate for mistakes, something that an old text parsing approach cannot. Also, in text parsing you can use only text, while in image parsing you have a ton of other information, such as lines (like in table columns), white gaps between texts (such as paragraph separators), pictures, logos, checkboxes, etc.</p>
<p>For example, I heavily use ABBYY FlexiCapture for these types of extraction (http://www.wisetrend.com/abbyy_flexicapture.shtml). That tool allows me to define what data I need to extract and how it should be extracted. For example, you would do something like this:</p>
<ol>
<li>Identify the format style, if more than one. If you have multiple formats, you can apply a different set of extraction rules per format.</li>
<li>Locate label &#8220;Name:&#8221; or some other variation of it using fuzzy search or rules to accommodate OCR mistakes if any. Look in a certain area if more than one name occurs on the page</li>
<li>Locate the area that contains chars of certain type next to the found label Name. Those chars have to fit certain criteria to be accepted as MyName field, and all those criteria are defined through UI (or scripting if you want).</li>
<li>OCR the area content with MyName chars. Another benefit here is that you no longer use a generic OCR. You can use a very specific OCR settings that apply only to your MyName area &#8211; which increases the accuracy of OCR and data. This is most useful for specialized data, such as part numbers, codes, addresses, etc. You can use regular expressions, dictionaries, rules. You can be specific per field. That is not possible when full page OCR is used.</li>
<li>Send the clean data to DB. Before you send the data, if you want to guarantee OCR quality, most tools usually have some kind of Verification capability to visually check (requires a human) OCRed text against the image.</li>
</ol>
<p>In general, setting up these processes is much quicker and more liberating than code-based text parsing. There is plenty of scripting and APIs available for those who want to go past UI or need additional automation.</p>
<p>I scratched the surface, but hopefully that provides a start for your research and decision. If I have not addressed anything, please feel free to let me know.</p>
<p>Ilya Evdokimov, Data Capture Expert for 10+ years, CDIA+ Certified</p>
<p>My blog with more data capture stuff is here:  <a rel="nofollow" href="../">http://wisetrend.com/ocr_and_data_capture_blog/</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.wisetrend.com/ocr_and_data_capture_blog/text-parsing-vs-dynamic-data-capture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

