What is the difference between PDF and PDF/Ab produced by ABBYY Recognition Server 3.0?

What is the difference between PDF and PDF/Ab produced by ABBYY Recognition Server 3.0Difference between PDF and PDF/Ab

There are numerous resources that define specifications for the long-term preservation format PDF/A.  In general, this format includes less advanced features in order to be compatible across more software versions, platforms, and remain valid for many years.  This posting was specifically targeted to explore differences in PDFs produced by ABBYY Recognition Server.

Upon opening a PDF/A result file, Adobe Acrobat Reader X detected and confirmed full compliance.  The file had Fast Viewing enabled as it should. Looking at Fonts properties confirmed that all fonts were encapsulated in the PDF file itself.  Embedded fonts guarantee that the PDF/A file has all necessary fonts within it to display the content accurately.

Inspection of file sizes between PDF and PDF/A revealed that PDF/A was about 10% larger than the same PDF file.  This makes logical and technical sense – PDF/A file has to include fonts embedded within.  The difference becomes less noticeable as quantity of pages increases within a file, because a font that has bene included once will be used throughout all pages that need it, not the same font file repeatedly embedded within each page.  On PDFs with smaller quantity of pages the file size difference is more noticeable.

Here is a simple example:

  • Assume each page of PDF is 10 KB.
  • Assume fonts have a size of 20 KB.

In a single page documents, the file size will be 10 + 20 KB, total of 30 KB.  Taht is 3x larger than a standard PDF without fonts embedded.

In a 100 page document, the file size will be 10 * 100 + 20 KB, so a total of 1020 KB.  That is only 2% larger than a regular PDF with no fonts embedded.

In conclusion, the file size between standard PDF and PDF/A is negligible since typicality PDFs are multi-page with repeating fonts.