Requirements

PDF/UA-1 File Format Requirements

PDF/UA-1 is based on ISO 32000-1 (PDF 1.7). It doesn’t add any new features to the PDF file format, but makes some aspects required which are optional in PDF 1.7. The following conditions must be met in all PDF/UA-1 documents:

  • The document must be tagged. While PDF 1.7 includes some requirements regarding the nesting and relationship of different types of structure elements, PDF/UA-1 extends and clarifies these rules (see below for details).
  • All fonts used in the document must be embedded (except fonts for invisible text, e.g. OCR results).
  • Some layer options are not allowed.
  • External content (reference XObjects as mandated by PDF/X-5) is not allowed.
  • The document title must be specified in the document’s metadata.

Semantic requirements

When creating the structure hierarchy for PDF/UA-1, the following semantic aspects must be obeyed:

  • Tagging must use structure elements which are appropriate for the document structure: if it’s a heading, it must be tagged as heading. If it’s a table, it must be tagged as a table. If it’s a list, it must be tagged as a list.
  • Contents which are not relevant for the document’s meaning must not be included in the document hierarchy, but must instead be tagged as Artifact. Typical examples are running headers and footers, page numbers, and background images.
  • Structure elements must be arranged in logical reading order.
  • Content must be tagged appropriately if the intended information is not otherwise accessible because of the content’s color, format or layout.
  • Text represented in a graphic requires the Alt attribute with an explanation if it doesn’t contain text in a natural language (e.g. font or script samples).
  • Images must provide alternative text; image captions must be marked with a Caption tag.
  • Links must be accompanied by a suitable Link annotation.
  • Only a single Figure tag must be created for groups of graphical elements which logically belong together.
  • Footnotes, endnotes, note labels and references to locations within the document must be tagged as Note or Reference as appropriate.

Because of the semantic requirements outlined above it is difficult or impossible to automatically convert existing untagged PDF documents to conforming PDF/UA. Similarly, applying OCR techniques to scanned documents is unlikely to result in fully conforming PDF/UA without human intervention. For example, alternative text for images cannot be derived automatically.

Requirements for specific tags

All standard tags defined in PDF 1.7 may be used in PDF/UA-1. If other tags are used, a mapping of those custom tags to the standard tags must be provided in the document’s Rolemap. Various rules must be obeyed regarding certain standard element types:

  • The Figure element used for images and graphics which are not artifacts requires an Alt or ActualText attribute.
  • Table elements must be created for logical tables, but must not be created for tables which are created for layout purposes. Table-related tags must be properly nested, e.g. Table tag contains table rows TR which in turn contains table header cells TH or table data cells TD. The Scope attribute is required for TH (table header) elements.
  • Heading tags must be properly nested. If numbered heading tags (H1, H2, ...) are used these must be properly nested (i.e. levels must not be skipped). If unnumbered heading tags are used (so-called strongly structured documents) the H tag must be used, but not more than once in each node of the structure tree. Heading elements must not have any descendants.
  • The list element type L requires a ListNumbering attribute which designates the numbering system used in the list, e.g. Disc for a simple bullet without numbers, Decimal, or UpperRoman.

Requirements for specific content types

The following requirements must be met for various types of PDF content:

  • The natural language of text must declared, either with the Lang document info entry for the whole document or with the Lang attribute of individual structure elements. Invisible text must be tagged as Artifact unless it has a rendered equivalent (e.g. a scanned image).
  • Vector graphics and raster images must be tagged as Figure or Artifact.
  • Annotations and form fields must be included in the structure tree and require certain flags to ensure accessibility.

Unicode requirements

PDF/UA requires proper Unicode semantics for all text in the document. This requirement is rooted in the fact that PDF supports a variety of font and encoding techniques, not all of which support Unicode. For example, PDF supports PostScript Type 1 fonts which have been introduced in the 1980’s, while the Unicode consortium started its work in 1991. PDF/UA requires that supplementary Unicode mapping information must be present for fonts which do not contain it internally. But not all Unicode values are acceptable: values in the Private Use Area (PUA) are not allowed since they do not carry any common interpretation (semantics).

Symbolic fonts are an important area where this PDF/UA requirement holds, e.g. fonts containing logos or pictograms. Since standardized Unicode values are not available for custom symbolic glyphs, suitable Unicode semantics must be provided in an ActualText attribute. The ActualText may be assigned to an individual glyph or a sequence of multiple glyphs, and may contain an arbitrary Unicode string.

As an example, code 0x1A in the common WingDings font contains an image of a computer keyboard with the glyph name keyboard and the Unicode value U+F037 in the PUA range, which is not acceptable in PDF/UA-1. For lack of better substitute text the glyph name could be used to construct suitable ActualText, e.g. »symbol for keyboard«. It should be noted that programmatically constructing Actual-Text must be considered a makeshift solution; human-selected text is always preferable to machinegenerated ActualText.

Other recommendations

While not strictly required in PDF/UA-1, the following items are recommended:

  • Bookmarks are recommended for improved navigation. They should reflect proper reading order and nesting of the content.
  • Tables should include headers.
  • Attachments should be accompanied by a description, and should be accessible in their own right.
  • If present, page labels (e.g. roman page numbers) should be appropriate.

PDF/UA and PDF/A

The archiving standards PDF/A-1a, PDF/A-2a and PDF/A-3a require the use of Tagged PDF. Although there is no direct relationship between PDF/A-1a/2a/3a and PDF/UA-1, a document can conform to both standards at the same time. In fact, if you want to create PDF/A with conformance level A we recommend to adhere to the PDF/UA-1 requirements as well in order to improve accessibility. For more information please read the PDF/A pages.

We recommend to avoid PDF/A-1a and work with the newer PDF/A-2a or PDF/A-3a standards instead because there is a minor conflict between PDF/UA-1 and PDF/A-1a: PDF/UA-1 requires the Tabs entry for pages with annotations. This key specifies the tab order for the page’s annotation and must specify »structure order«. However, this key is not available in PDF 1.4 and thus cannot be used in combined PDF/A-1a and PDF/UA-1 documents.