PDF/A is targeted at reliable long-time preservation of digital documents with text, raster images and vector graphics as well as associated metadata. The PDF/A format specified in the ISO 19005 standard strives to provide a consistent and robust subset of PDF which can faithfully be reproduced even after a long archiving period, or used for reliable data exchange in enterprise and government environments. Important technical aspects of PDF/A-1, PDF/A-2 and PDF/A-3 will be discussed below.
PDF/A-1, the first standard within a series of multiple parts, has been published in 2005 as ISO 19005-1. It is based on PDF 1.4 (the file format of Acrobat 5) and imposes some restrictions regarding the use of color, fonts, annotations, and other elements. There are two flavors of PDF/A-1:
Level B conformance (PDF/A-1b; »b« as in »basic«) ensures that the visual appearance of a document is preservable in the long term. PDF/A-1b ensures that the document will look the same when it is viewed or printed some time in the future.
Level A conformance (PDF/A-1a; »a« as in »accessible«) is based on level B, but adds crucial properties of Tagged PDF: it requires structure information and reliable Unicode text semantics in order to preserve the document’s logical structure and natural reading order. Simply put, PDF/A-1a not only ensures that the document will look the same when it is used in the future, but also that its contents (semantics) can be reliably interpreted and will be accessible to physically impaired users. As an important example, screenreader programs can read Tagged PDF documents to blind users.
The PDF world advanced a lot since the publication of PDF/A-1. Among many other milestones, PDF 1.7 (the file format of Acrobat 8) has been standardized as ISO 32000-1 in 2008. In order to make numerous new PDF features available in PDF/A workflows, a new part of the standard called PDF/A-2 has been published in 2011 as ISO 19005-2.
PDF/A-2 is based on PDF 1.7 and includes many useful additions which are not available in PDF/A-1. These include important file format aspects such as JPEG 2000 compression, optional content (layers), PDF packages and others. PDF/A-2 documents may contain file attachments provided the attached documents themselves conform to PDF/A-1 or PDF/A-2.
Similar to PDF/A-1, PDF/A-2 offers level B and level A conformance. It adds a new flavor called level U conformance. Level U sits in between PDF/A-2a and PDF/A-2b in that it requires reliable Unicode semantics, but not structure information. PDF/A-2u guarantees that the pages can faithfully be reproduced visually and that the text can be extracted and searched.
PDF/A-2 does not make PDF/A-1 obsolete or force users to migrate to the newer version – after all, this would be absurd for a standard which is targeted at long-time preservation!
Another part of the standard called PDF/A-3 has been published in 2012 as ISO 19005-3. PDF/A-3 also supports conformance levels A, B, and U. It differs from PDF/A-2 in the following aspects:
While PDF/A-2 allows only file attachments which conform to PDF/A, PDF/A-3 allows arbitrary file types as attachments to meet the requirements of various user groups.
File attachments are associated with the whole document, a page, or some other part of the document. The relationship between the attached file and the corresponding part of the document must be specified explicitly, e.g. source, alternative, or supplemental data.
Typical PDF/A-3 scenarios include embedding of word processor or spreadsheet source files in a finalform PDF/A document, or the inclusion of machine-readable XML data in a PDF intended for human consumption, e.g. an invoice.
PDF/A viewers are not required to do anything specific with attached non-PDF/A files except for extracting them. The PDF/A standard does not guarantee that attachments can be viewed or otherwise used in the future – it simply allows their presence in an archivable document.
In the same spirit as PDF/A-2 which does not replace PDF/A-2, PDF/A-3 does not replace PDF/A-2. Any part of the PDF/A standard can be used for long term archival as appropriate.
PDFlib GmbH is a founding member of the PDF Association. Founded in 2006 as the PDF/A Competence Center, the PDF Association exists to promote the adoption and implementation of International Standards for PDF technology.
Developers use the PDF Association to share knowledge and experience with PDF technology.
Decision-makers use the PDF Association to learn about the role and capabilities of PDF and PDF’s subset standards in ECM and other electronic document applications.
End-users benefit from improved reliability, quality and functionality and interoperability in their experience of electronic documents.
PDFlib GmbH Whitepaper: A Technical Introduction to PDF/A
PDF/A Competence Center of the PDF Association
PDF/A-1 standard (ISO 19005-1:2005)
PDF/A-1 Technical Corrigendum 1 (ISO 19005-1:2005/Cor 1:2007)
PDF/A-1 Technical Corrigendum 2 (ISO 19005-1:2005/Cor.2:2010(E))
Technical Notes for PDF/A-1 published by the PDF/A Competence Center
PDF/A-2 standard (ISO 19005-2)
PDF/A-3 standard (ISO 19005-3)