Processing PDF/A

Processing PDF/A Documents

Special care must be taken when processing PDF/A documents in order to maintain standard conformance. Even simple operations may spoil a document’s conformance. It is therefore crucial to deploy only tools which are PDF/A-aware to guard against the risk that PDF/A documents are modified in a way which violates the standard.

Splitting and Merging

Even simple operations may result in non-conforming documents. For example, inserting a page in a PDF/A document poses several immediate dangers:

  • If the inserted page stems from a non-PDF/A document, it may use unembedded fonts.
  • Even if the imported page stems from a PDF/A document dangers lurk in multiple areas. For example, the color characteristics (e.g. output intent) of both documents don’t necessarily match, which could result in non-conforming output.
  • A small operation such as adding a metadata field may violate the standard unless the software properly implements the rules for XMP metadata as mandated by PDF/A-1/2/3.

Any kind of content or metadata processing applied to PDF/A documents must be applied with PDF/A-aware software to avoid jeopardizing PDF/A conformance.

Digital Signatures

In order to make use of digital signatures in PDF/A workflows the signature software must be aware of PDF/A, i.e. observe the rules outlined above.

The bottom line is that only PDF/A-aware tools must be used in PDF/A workflows; otherwise PDF/A conformance may be spoiled. In order to avoid PDF/A violations through accidental modification Adobe Acrobat opens PDF/A documents in read-only mode by default. Once the available editing and modification tools in Acrobat are used, PDF/A conformance is no longer guaranteed.

Document assembly and Tagged PDF

Assembling documents from Tagged PDF pages is particularly tricky. On the technical level the structure hierarchies of the involved PDF documents must be combined which involves convoluted operations with the Tagging data structures. Even more difficult are semantic challenges. For example, the document assembly process must take into account the logical entities which are combined. For example, a structure element such as a paragraph or table may span multiple pages. If these pages are separated or combined in different order the structure hierarchy is easily spoiled.

Document assembly with Tagged PDF requires careful planning of all involved semantic entities. For example, the task can be simplified if the workflow ensures that major semantic units like document sections start on a new page.