Ilya Evdokimov is a long-term practitioner and expert in leading Optical Character Recognition (OCR), Data Capture and Document Processing techniques, technologies and solutions. With over 15 years of experience spanning enterprise software implementations, mobile applications development, cloud-based systems integration and desktop-level automation, Ilya Evdokimov uses through industry knowledge and experience to achieve high efficiency and workflow optimization in most challenging paper-dependent and digital image capture environments.
There are numerous resources that define specifications for the long-term preservation format PDF/A. In general, this format includes less advanced features in order to be compatible across mroe software versions, platforms, and remain valid for many years. This posting was specifically targeted to explore differences in PDFs produced by ABBYY Recognition Server.
Upon opening a PDF/A result file, Adove Acrobat Reader X detected and confirmed full compliance. The file had Fast Viewing enabled as it should. Looking at Fonts properties confirmed that all fonts were encapsulated in the PDF file itself. Embedded fonts guarantee that the PDF/A file has all nessesary fonts within it to display the content accurately.
Inspection of file sizes between PDF and PDF/A revealed that PDF/A was about 10% larger than the same PDF file. This makes logical and technical sense – PDF/A file has to include fonts embedded within. The difference becomes less ntoiceable as quantity of pages increases within a file, because a font that has bene included once will be used throughout all pages that need it, not the same font file repeatedly embedded within each page. On PDFs with smaller quantity of pages the file size difference is mroe noticeable.
Here is a simple example:
Assume each page of PDF is 10 KB.
Assume fonts have a size of 20 KB.
In a single page documents, the file size will be 10 + 20 KB, total of 30 KB. Taht is 3x larger than a standard PDF without fonts embedded.
In a 100 page document, the file size will be 10 * 100 + 20 KB, so a total of 1020 KB. That is only 2% larger than a regular PDF with no fonts embedded.
In conclusion, the file size between standard PDF and PDF/A is negligible since typicality PDFs are multi-page with repeating fonts.