BASKET
Search

The PDF/A Family of Archiving Standards

PDF/A is targeted at reliable long-time preservation of digital documents with text, raster images and vector graphics as well as associated metadata. The PDF/A format specified in the ISO 19005 standard strives to provide a consistent and robust subset of PDF which can faithfully be reproduced even after a long archiving period, or used for reliable data exchange in enterprise and government environments. Important technical aspects of PDF/A-1, PDF/A-2 and PDF/A-3 will be discussed below.

PDF/A-1

PDF/A-1, the first standard within a series of multiple parts, has been published in 2005 as ISO 19005-1. It is based on PDF 1.4 (the file format of Acrobat 5) and imposes some restrictions regarding the use of color, fonts, annotations, and other elements. There are two flavors of PDF/A-1:

Level B conformance (PDF/A-1b; »b« as in »basic«) ensures that the visual appearance of a document is preservable in the long term. PDF/A-1b ensures that the document will look the same when it is viewed or printed some time in the future.

Level A conformance (PDF/A-1a; »a« as in »accessible«) is based on level B, but adds crucial properties of Tagged PDF: it requires structure information and reliable Unicode text semantics in order to preserve the document’s logical structure and natural reading order. Simply put, PDF/A-1a not only ensures that the document will look the same when it is used in the future, but also that its contents (semantics) can be reliably interpreted and will be accessible to physically impaired users. As an important example, screenreader programs can read Tagged PDF documents to blind users.

PDF/A-2

The PDF world advanced a lot since the publication of PDF/A-1. Among many other milestones, PDF 1.7 (the file format of Acrobat 8) has been standardized as ISO 32000-1 in 2008. In order to make numerous new PDF features available in PDF/A workflows, a new part of the standard called PDF/A-2 has been published in 2011 as ISO 19005-2.

PDF/A-2 is based on PDF 1.7 and includes many useful additions which are not available in PDF/A-1. These include important file format aspects such as JPEG 2000 compression, optional content (layers), PDF packages and others. PDF/A-2 documents may contain file attachments provided the attached documents themselves conform to PDF/A-1 or PDF/A-2.

Similar to PDF/A-1, PDF/A-2 offers level B and level A conformance. It adds a new flavor called level U conformance. Level U sits in between PDF/A-2a and PDF/A-2b in that it requires reliable Unicode semantics, but not structure information. PDF/A-2u guarantees that the pages can faithfully be reproduced visually and that the text can be extracted and searched.

PDF/A-2 does not make PDF/A-1 obsolete or force users to migrate to the newer version - after all, this would be absurd for a standard which is targeted at long-time preservation!

PDF/A-3

Another part of the standard called PDF/A-3 has been published in 2012 as ISO 19005-3. PDF/A-3 also supports conformance levels A, B, and U. It differs from PDF/A-2 in the following aspects:

While PDF/A-2 allows only file attachments which conform to PDF/A, PDF/A-3 allows arbitrary file types as attachments to meet the requirements of various user groups.

File attachments are associated with the whole document, a page, or some other part of the document. The relationship between the attached file and the corresponding part of the document must be specified explicitly, e.g. source, alternative, or supplemental data.

Typical PDF/A-3 scenarios include embedding of word processor or spreadsheet source files in a finalform PDF/A document, or the inclusion of machine-readable XML data in a PDF intended for human consumption, e.g. an invoice.

PDF/A viewers are not required to do anything specific with attached non-PDF/A files except for extracting them. The PDF/A standard does not guarantee that attachments can be viewed or otherwise used in the future - it simply allows their presence in an archivable document.

In the same spirit as PDF/A-2 which does not replace PDF/A-2, PDF/A-3 does not replace PDF/A-2. Any part of the PDF/A standard can be used for long term archival as appropriate.

PDF/A-4

The next part of the PDF/A series is planned as PDF/A-4 (ISO 19005-4). It will be based on PDF 2.0 (ISO 32000-2).

PDF/A Competence Center

PDF/A Competence Center

PDFlib GmbH is a founding member of the PDF Association. Founded in 2006 as the PDF/A Competence Center, the PDF Association exists to promote the adoption and implementation of International Standards for PDF technology.

Developers use the PDF Association to share knowledge and experience with PDF technology.

Decision-makers use the PDF Association to learn about the role and capabilities of PDF and PDF’s subset standards in ECM and other electronic document applications.

End-users benefit from improved reliability, quality and functionality and interoperability in their experience of electronic documents.

PDF/A Validation

PDF/A validation means checking whether a given document conforms to the requirements of a particular part of the PDF/A standard. Validation has been available for a long time as part of Acrobat's Preflight component as well as from several independent software vendors. In order to provide a useful resource for the community the Open Preservation Foundation (OPF), the PDF Association and the Digital Preservation Coalition (DPC) collaborated in the development of a freely available and reliable PDF/A validator called veraPDF. Its development has been funded by the European Commission's PREFORMA project and is supported by the PDF software developer community as organized in the PDF Association.

If you are in doubt regarding the standard conformance of a particular PDF/A document we recommend to check the issue with veraPDF.