The PDF/A Standards

The PDF/A Family of Archiving Standards

PDF/A is targeted at reliable long-time preservation of digital documents with text, raster images and vector graphics as well as associated metadata. The PDF/A format specified in the ISO 19005 standard series defines a consistent and robust subset of PDF which can faithfully be reproduced even after a long archiving period, or used for reliable data exchange in enterprise and government environments. The major technical aspects of PDF/A-1, PDF/A-2, PDF/A-3 and PDF/A-4 will be discussed below.

PDF/A-1

PDF/A-1, the first standard within a series of multiple parts, has been published in 2005 as ISO 19005-1. It is based on PDF 1.4, the file format of Acrobat 5, and imposes some restrictions regarding the use of color, fonts, annotations, and other elements. There are two flavors of PDF/A-1 (called conformance levels):

  • Level B conformance (PDF/A-1b; »b« as in »basic«) ensures that the visual appearance of a document is preservable in the long term. PDF/A-1b ensures that the document will look the same when it is viewed or printed in the near or far future.
  • Level A conformance (PDF/A-1a; »a« as in »accessible«) is based on level B, but adds crucial properties of Tagged PDF: It requires structure information and reliable Unicode text semantics in order to preserve the document’s logical structure and natural reading order. Simply put, PDF/A-1a not only ensures that the document will look the same when it is used in the future, but also that its contents can be interpreted reliably and will be accessible to physically impaired users. As an important example, screenreader programs can read Tagged PDF documents to blind users.

PDF/A-2

PDF 1.7, the file format of Acrobat 8, has been standardized as ISO 32000-1 in 2008. In order to make new PDF features available in PDF/A, a new part of the standard called PDF/A-2 has been published in 2011 as ISO 19005-2.

PDF/A-2 is based on PDF 1.7 and includes many additions which are not available in PDF/A-1. These include important file format aspects such as JPEG 2000 compression, optional content (layers), PDF packages and others. PDF/A-2 documents may contain file attachments provided the attached documents themselves conform to PDF/A-1 or PDF/A-2.

Similar to PDF/A-1, PDF/A-2 offers level B and level A conformance. It adds another flavor called level U conformance. Level U sits in between PDF/A-2a and PDF/A-2b in that it requires reliable Unicode semantics, but not structure information. PDF/A-2u guarantees that the visual appearance of pages can be reproduced faithfully and that the text can be extracted and searched.

PDF/A-2 does not make PDF/A-1 obsolete or force users to migrate to the newer part of the standard - after all, this would be absurd for a standard which is targeted at long-term preservation.

PDF/A-3

Screenshot: PDF/A-3b conformance display in Acrobat

Another part of the standard called PDF/A-3 has been published in 2012 as ISO 19005-3. PDF/A-3 is quite similar to PDF/A-2 and also supports conformance levels A, B, and U. It differs from PDF/A-2 in the following aspects:

  • While PDF/A-2 allows only file attachments which conform to PDF/A, PDF/A-3 allows arbitrary file types as attachments to meet the requirements of various use cases.
  • File attachments are associated with the whole document, a page, or some other part of the document. The kind of relationship between an attached file and the corresponding part of the document must be specified explicitly, e.g. source, alternative, or supplemental data. For each file attachment its relationship to some part of the document must be specified with the AFRelationship key.

Typical PDF/A-3 scenarios include embedding of word processor or spreadsheet source files in a finalform PDF/A document, or the inclusion of machine-readable XML data in a PDF intended for human consumption, e.g. an invoice. In fact, the ZUGFeRD and Factur-X invoice standards are an important application of PDF/A-3.

PDF/A-4

PDF/A-4 has been published in 2020 as ISO 19005-4. Since it is based on PDF 2.0 (published as ISO 32000-2 in 2017 and updated in 2020) it can take advantage of new PDF features. While PDF/A-2 and PDF/A-3 each comprise three different conformance levels which tended to confuse users, PDF/A-4 simplifies things since PDF/A-4 documents may or may not contain tags. Unlike previous parts of the standard no dedicated conformance level is required for tagged PDF/A-4 documents, thus eliminating the previous A/B/U conformance levels. Similarly, PDF/A-4 documents may or may not contain file attachments. The attached files must conform to PDF/A-1, PDF/A-2 or PDF/A-4.

While abandoning the A/B/U conformance levels, PDF/A-4 introduces two new conformance levels:

  • PDF/A-4f allows non-PDF/A file attachments similar to how PDF/A-3 extends PDF/A-2.
  • PDF/A-4e is targeted at the engineering community. It is slated as successor of the PDF/E-1 standard ISO 24517-1 which is based on PDF 1.6. The initial plan to define a new flavor PDF/E-2 has been cancelled. Instead, PDF/A-4e adds RichMedia annotations for 3D content in U3D or PRC format to the base PDF/A-4 format.

Regarding structure information and accessibility PDF/A-1a/2a/3a require only the mere presence of tags, but don’t go into detail regarding the nature and use of PDF tags. PDF/A-4 goes one step backwards and one step forthwards at the same time: while PDF/A is agnostic regarding the presence of tags, it points out the advantages of Tagged PDF regarding content repurposing and accessibility. Regarding the specifics the standard references the PDF/UA standard (ISO 14289) which discusses many details of Tagging. Also, PDF/A-4 inherits the rigid regime of PDF tags which is part of the underlying PDF 2.0 specification.

Which part to use?

In the same sense as PDF/A-2 does not replace PDF/A-1, PDF/A-3 does not replace PDF/A-2 and PDF/A-4 does not replace PDF/A-3. Any part of the PDF/A standard can be used for long term archival. You simply have to relinquish certain PDF features as long as you work with an older part of the PDF/A standard. For example, simple office documents without transparent graphics can still be implemented with PDF/A-1. If you need arbitrary file attachments use PDF/A-3 or PDF/A-4f. If you need RichMedia/3D contents use PDF/A-4e.

PDF Association and PDF/A Competence Center

PDF Association member logo

PDFlib GmbH is a founding member of the PDF Association. Founded in 2006 as the PDF/A Competence Center, this non-profit organisation grew to more than 150 members from the PDF industry around the world. The PDF Association promotes the adoption and implementation of International Standards for PDF technology.

  • Developers use the PDF Association to share knowledge and experience with PDF technology.
  • Decision-makers use the PDF Association to learn about the role and capabilities of PDF and PDF’s subset standards in ECM and other electronic document applications.
  • End-users benefit from improved reliability, quality and functionality and interoperability in their experience of electronic documents.

PDF/A Validation

PDF/A validation is the process of checking whether a document conforms to the requirements of a particular part of the PDF/A standard. Validation has been available for a long time as part of Acrobat's Preflight component as well as from several independent software vendors. In order to provide a useful resource for the community the Open Preservation Foundation (OPF), the PDF Association and the Digital Preservation Coalition (DPC) collaborated in the development of a freely available and reliable PDF/A validator called veraPDF. Its development has been funded by the European Commission's Preforma project and is supported by the PDF software developer community as organized in the PDF Association.

If you are in doubt regarding the standard conformance of a particular PDF/A document we recommend checking the issue with veraPDF.