Using output encoding "UTF-8"
ABC
FontReporter
Version 1.11
Outdated version - only for demonstration!
A Plugin for analyzing fonts in PDF
Copyright © 2005-2017 PDFlib GmbH. All rights reserved.
PDFlib GmbH
Franziska-Bilek-Weg 9, 80339 München, Germany
www.pdflib.com
phone +49 • 89 • 452 33 84-0
fax +49 • 89 • 452 33 84-99
If you have questions check the PDFlib mailing list and archive at
groups.yahoo.com/neo/groups/pdflib/info
Licensing contact: sales@pdflib.com
Support for commercial PDFlib licensees: support@pdflib.com (please include your license number)
You can use PDFlib FontReporter free of charge; however, it is not in the public domain. This software cannot
be sold or redistributed (whether for a fee or at no charge), either stand-alone or in combination with
any other product, without the express written permission of PDFlib GmbH.
This publication and the information herein is furnished as is, is subject to change without notice, and
should not be construed as a commitment by PDFlib GmbH. PDFlib GmbH assumes no responsibility or liability
for any errors or inaccuracies, makes no warranty of any kind (express, implied or statutory) with respect
to this publication, and expressly disclaims any and all warranties of merchantability, fitness for particular
purposes and noninfringement of third party rights.
PDFlib FontReporter is provided »as is« without any warranty, express or implied, including but not limited
to any implied warranties of merchantability and fitness for a particular purpose. In no event will PDFlib
GmbH be liable for any damages, including lost profits, lost savings, or other incidental consequential
damages.
Although PDFlib FontReporter is not a commercial product, we strive to provide high quality. If you run
into problems you are encouraged to contact us at support@pdflib.com.
Adobe, Acrobat, PostScript, and XMP are trademarks of Adobe Systems Inc. AIX, IBM, OS/390, WebSphere,
iSeries, and zSeries are trademarks of International Business Machines Corporation. ActiveX, Microsoft,
OpenType, and Windows are trademarks of Microsoft Corporation. Apple, Macintosh and TrueType are
trademarks of Apple Computer, Inc. Unicode and the Unicode logo are trademarks of Unicode, Inc. Unix is
a trademark of The Open Group. Java and Solaris are trademarks of Sun Microsystems, Inc. HKS is a registered
trademark of the HKS brand association: Hostmann-Steinberg, K+E Printing Inks, Schmincke. Other
company product and service names may be trademarks or service marks of others.
Contents
1 Installing PDFlib FontReporter 5
2 Working with FontReporter 7
2.1 What can you do with FontReporter? 7
2.2 Overview of PDF Font Formats 9
2.3 Contents of a Font Report 11
2.4 Investigate PDF Problems with FontReporter 14
2.5 Error Messages 15
A Revision History 17
Contents 3
1 Installing PDFlib FontReporter
Requirements. PDFlib FontReporter works with the following Acrobat versions:
> Acrobat 8/9/X/XI/DC on Windows
> Acrobat X/XI/DC on macOS
The Plugin doesn’t work with Adobe Reader/Acrobat Reader.
Installing FontReporter on Windows. To install PDFlib FontReporter in Acrobat, the
plugin files must be placed in a subdirectory of the Acrobat plugin folder. This is done
automatically by the plugin installer, but can also be done manually. A typical location
of the plugin folder looks as follows:
C:\Program Files\Adobe\Acrobat XXX\Acrobat\plug_ins\PDFlib FontReporter
For 32-bit versions of Acrobat running on 64-bit Windows the first part should be
C:\Program Files (x86)\...
Installing FontReporter for Acrobat X/XI/DC on macOS. Proceed as follows to install
the plugin for all users:
> Double-click the disk image to mount it. A folder with the plugin files will be visible.
> Copy the plugin folder to the following path in the system’s Library folder:
/Library/Application Support/Adobe/Acrobat/XXX/Plug-ins
Alternatively you can install the plugin only for a single user as follows:
> Click the desktop to make sure you’re in the Finder, hold down the Option key, and
choose Go, Library to open the user’s Library folder.
> Copy the plugin folder to the following path in the user’s Library folder:
/Users/<username>/Library/Application Support/Adobe/Acrobat/XXX/Plug-ins
Multi-lingual interface. PDFlib FontReporter supports multiple languages in the user
interface. Depending on the application language of Acrobat, FontReporter chooses its
interface language automatically. Currently English and German interfaces are available.
If Acrobat runs in any other language mode, FontReporter uses the English interface.
Troubleshooting. If PDFlib FontReporter doesn’t seem to work check the following:
Make sure that in Edit, Preferences, [General...], General the box Use only certified plug-ins is
unchecked. The plugin is not loaded if Acrobat runs in Certified Mode.
5
2 Working with FontReporter
2.1 What can you do with FontReporter?
FontReporter is a useful tool if you are interested in fonts within PDF documents. It provides
font- and encoding-related information which helps in a variety of situations:
> analyze printing problems (e.g. a particular font causes printing errors)
> investigate text extraction problems (e.g. copying text from a PDF results in garbage)
> visualize Unicode mappings for a font
> find flaws in the PDF creation workflow (e.g. printer driver converted a PostScript
Type 1 font to Type 3)
> test whether ToUnicode mapping tables (required for PDF/A-1a) are present
> identify logos and symbols which are represented as text in a PDF
> learn which fonts are contained in a PDF, and which glyphs they contain (e.g. the file
size is too large because some fonts ended up in the PDF unintentionally)
> check font subsets to see which glyphs are contained in the subset
> learn more about PDF font technology
Using FontReporter is as easy as bringing up the menu Plug-Ins, PDFlib FontReporter...,
Create Font Report in Acrobat. This will create a font report for all pages of the current
PDF document as a separate PDF. Two pages from typical font reports are shown in Figure
2.1.
Fig. 2.1
Sample font reports
2.1 What can you do with FontReporter? 7
Supported PDF and font formats. FontReporter supports all PDF versions up to
Acrobat DC. All font and encoding formats in PDF are supported, as well as all types of
embedded font data.
Advantages over Acrobat’s font properties panel. All versions of Acrobat including
Adobe Reader provide font information via File, Document Properties..., Fonts. However,
Acrobat’s font overview is limited in use; FontReporter provides the following advantages
compared to Acrobat’s font list:
> FontReporter provides much more information about each font
> FontReporter deals with CJK font names even on Western systems
> FontReporter provides glyph tables containing the glyphs of a font along with their
widths, names, and Unicode values
> FontReporter presents the output as a PDF document so that you can save or print it
> FontReporter is guaranteed to process the full document, regardless of which pages
have already been displayed in Acrobat
PDF text extraction with PDFlib TET. FontReporter is an auxiliary tool to our PDFlib
Text and Image Extraction Toolkit (TET). TET is software for extracting the text and image
contents of PDF documents. It is available both as a standalone program and a programming
library/component which can be integrated into existing software. TET extracts
text from all kinds of PDF documents and normalizes the text to Unicode.
FontReporter can be used to create Unicode mapping tables for PDF documents which
do not contain enough information for extracting text, or which contain wrong Unicode
mapping tables. Fully functional evaluation versions of TET are available for
download from www.pdflib.com.
TET PDF IFilter. TET PDF IFilter extracts text and metadata from PDF documents and
makes it available to search and retrieval software on Windows. This allows PDF documents
to be searched on the local desktop, a corporate server, or the Web. TET PDF IFilter
is based on the patented PDFlib Text and Image Extraction Toolkit (TET). TET PDF IFilter
is a robust implementation of Microsoft’s IFilter indexing interface. It works with all
search and retrieval products which support the IFilter interface, e.g. SharePoint and
SQL Server. Fully functional evaluation versions of TET PDF IFilter are available for
download from www.pdflib.com.
Free TET Plugin. The TET Plugin is a free companion to the FontReporter Plugin. It can
be installed in Adobe Acrobat and allows interactive use of the Text and Image Extraction
Toolkit (TET) with any PDF document that is currently open in Acrobat. Using the
TET plugin you can access TET’s functionality and experiment with TET options. The
TET plugin can freely be downloaded from www.pdflib.com.
8 Chapter 2: Working with FontReporter
2.2 Overview of PDF Font Formats
PDF supports a large number of font formats, the details of which can get confusing. In
order to help you interpret the reports created by FontReporter we provide a quick summary
of PDF font formats and their most important properties.
While the format of a font in a PDF document depends on the format of the original
font used to compose the document, this is not the only factor which plays a role here.
Other factors include the configuration options in the PDF-creating software, the settings
of the printer driver used to generate PostScript data for PDF conversion, the set of
characters in the document, the overall number of used characters, and more.
A particularly important aspect is the distinction between simple fonts and composite
(CID) fonts.
Simple fonts. Simple fonts comprise the PostScript Type 1 (including Multiple Master),
TrueType, and Type 3 types, and are addressed with 8-bit codes. They are therefore limited
a maximum of 256 characters. Simple fonts use a name-based encoding, which is a
table for mapping the character codes to the glyphs in the font.
Composite (CID) fonts. Composite or CID (character ID) fonts come in PostScript and
TrueType flavors. They can contain up to 65535 characters and are much more flexible
than simple fonts. While CID fonts often use 2-byte codes for addressing the glyphs in
the font, more complicated schemes with a variable number of bytes per character (1-4)
are used for CJK fonts. Instead of an encoding table CID fonts require a CMap (Character
Map) for providing the mapping from character codes to actual glyphs in the font. Dozens
of predefined CMaps are available for common CJK fonts. The font’s character collection
specifies a particular set of Chinese, Japanese, or Korean characters. So-called
Identity CMaps are used (mainly for Western fonts) in order to directly address the
glyphs in a font without any intermediate mapping table.
Comparison of PDF font formats. Table 2.1 details the font formats supported in PDF,
and explains which original font formats can be converted to these types by the PDF
creation software.
Table 2.1 Font formats in PDF
Name
Type1
MMType1
Type3
Notes
Classic PostScript Type 1 fonts. In addition to the original Type 1 format they can also be embedded as CFF
(Compressed Font Format) under the name Type1C (Type 1 Compressed).
These fonts are the result of classic PostScript Type 1 fonts or OpenType fonts with PostScript outlines.
(In Acrobat: MM) Multiple Master fonts are an extension of the Type 1 format, and are rarely used.
These fonts are the result of PostScript Type 1 Multiple Master fonts.
User-defined fonts, i.e. the glyphs are described by raw vector or image operations instead of a readymade
font. Type 3 fonts are always embedded. They are mainly intended for bitmapped fonts and logo
fonts.
These fonts are often the result of a printer driver converting a PostScript Type 1 or TrueType font to a bitmap
font. Some applications use Type 3 fonts for achieving special effects, such as filling an area with a
pattern.
2.2 Overview of PDF Font Formats 9
Programs After Market Services (PAMS)
Technical Documentation
�������
������
[NMP Part No.0275421 + 0275485]
NSD–1 SERIES
CELLULAR
PHONES
NSD–1 issue 2 : 09/00
Copyright � 1999. Nokia Mobile Phones Ltd. All Rights Reserved.
NSD–1
Foreword
PAMS Technical Documentation
AMENDMENT RECORD SHEET
Amendment
Number
Date Inserted By Comments
08/99 Issue 1
Issue 2 09/00 OJuntune General information:
New variants NSD–1
FW/GW/AW, p.8 & 9
ARS added p.2
Issue 2 09/00 OJuntune System module:
New variant NSD–1AW
updated pages 1, 2, 5, 12
Issue 2 09/00 OJuntune System module schematic
diagarams: 11 new A3 pages:
11 to 25
Issue2 09/00 OJuntune Product variants ARS added p.2
Issue 2 09/00 OJuntune UIF module:
foreword warning p.3 added
NSD–3AY assy parts added
NSD–1FW, 1GW. 1AW data
added, p. 7 to 10
ARS added
v..6.3 parts list added p.15
Issue 2 09/00 OJuntune Parts list:: ARS p.2, 0201583,
0201577, 0201549 added
0201293, 0201294 updated
Issue 2 09/00 OJuntune Troubleshooting instructions:
ARS added, p.3 updated
Page 2
� Nokia Mobile Phones Ltd.
Issue 2 09/00
PAMS Technical Documentation
NSD–1
Foreword
CONTENTS:
Foreword
NSD–1
SERIES CELLULAR PHONES
SERVICE MANUAL
General Information
System Module
Part Lists (System Module)
UI Module
Product Variants
Service Software and Tuning Instructions
Service Tools
Disassembly/Troubleshooting Instructions
Handsfree Unit HFU–2
Non–serviceable Accessories
Installation Instructions CARK–64
Installation Instructions CARK–91
Issue 2 09/00 � Nokia Mobile Phones Ltd.
Page 3
NSD–1
Foreword
PAMS Technical Documentation
IMPORTANT
This document is intended for use by qualified service personnel only.
Company Policy
Our policy is of continuous development; details of all technical modifications will
be included with service bulletins.
While every endeavour has been made to ensure the accuracy of this document,
some errors may exist. If any errors are found by the reader, NOKIA MOBILE
PHONES Ltd should be notified in writing.
Please state:
Title of the Document + Issue Number/Date of publication
Latest Amendment Number (if applicable)
Page(s) and/or Figure(s) in error
Please send to:
Nokia Mobile Phones Ltd
PAMS Technical Documentation
PO Box 86
24101 SALO
Finland
Page 4
� Nokia Mobile Phones Ltd.
Issue 2 09/00
PAMS Technical Documentation
NSD–1
Foreword
Warnings and Cautions
Please refer to the phone’s user guide for instructions relating to operation,
care and maintenance including important safety information. Note also the
following:
Warnings:
1. CARE MUST BE TAKEN ON INSTALLATION IN VEHICLES
FITTED WITH ELECTRONIC ENGINE MANAGEMENT
SYSTEMS AND ANTI–SKID BRAKING SYSTEMS. UNDER
CERTAIN FAULT CONDITIONS, EMITTED RF ENERGY CAN
AFFECT THEIR OPERATION. IF NECESSARY, CONSULT THE
VEHICLE DEALER/MANUFACTURER TO DETERMINE THE
IMMUNITY OF VEHICLE ELECTRONIC SYSTEMS TO RF
ENERGY.
2. THE HANDPORTABLE TELEPHONE MUST NOT BE OPERATED
IN AREAS LIKELY TO CONTAIN POTENTIALLY EXPLOSIVE
ATMOSPHERES EG PETROL STATIONS (SERVICE STATIONS),
BLASTING AREAS ETC.
3. OPERATION OF ANY RADIO TRANSMITTING EQUIPMENT,
INCLUDING CELLULAR TELEPHONES, MAY INTERFERE WITH
THE FUNCTIONALITY OF INADEQUATELY PROTECTED
MEDICAL DEVICES. CONSULT A PHYSICIAN OR THE
MANUFACTURER OF THE MEDICAL DEVICE IF YOU HAVE
ANY QUESTIONS. OTHER ELECTRONIC EQUIPMENT MAY
ALSO BE SUBJECT TO INTERFERENCE.
Cautions:
1. Servicing and alignment must be undertaken by qualified
personnel only.
2. Ensure all work is carried out at an anti–static workstation and that
an anti–static wrist strap is worn.
3. Ensure solder, wire, or foreign matter does not enter the telephone
as damage may result.
4. Use only approved components as specified in the parts list.
5. Ensure all components, modules screws and insulators are
correctly re–fitted after servicing and alignment. Ensure all cables
and wires are repositioned correctly.
6. All PC’s used with NMP Service Software for this produce must be
bios and operating system ”Year 2000 Compliant”.
Issue 2 09/00 � Nokia Mobile Phones Ltd.
Page 5
NSD–1
Foreword
PAMS Technical Documentation
This page intentionally left blank.
Page 6
� Nokia Mobile Phones Ltd.
Issue 2 09/00
A Technical
Introduction
to PDF/A-1/2/3/4
PDFlib Whitepaper
The PDF/A Family of Archiving Standards
PDF/A is targeted at reliable long-time preservation of digital documents with text, raster images and
vector graphics as well as associated metadata. The PDF/A format specified in the ISO 19005 standard
series defines a consistent and robust subset of PDF which can faithfully be reproduced even after a
long archiving period or used for reliable data exchange in enterprise and government environments.
This whitepaper discusses the major technical aspects of PDF/A-1, PDF/A-2, PDF/A-3 and PDF/A-4.
PDF/A-1
PDF/A-2
PDF/A-1, the first standard within a series of multiple parts, has been published in 2005 as ISO 19005-1.
It is based on PDF 1.4, the file format of Acrobat 5, and imposes restrictions regarding the use of color,
fonts, annotations and other elements. There are two flavors of PDF/A-1 (called conformance levels):
> > Level B conformance (PDF/A-1b; »b« as in »basic«) ensures that the visual appearance of a document
is preservable in the long term. PDF/A-1b ensures that the document will look the same when it is
viewed or printed in the near or far future.
> > Level A conformance (PDF/A-1a; »a« as in »accessible«) is based on level B, but adds crucial properties
of Tagged PDF. It requires structure information and reliable Unicode text semantics in order to
preserve the document’s logical structure and natural reading order. Simply put, PDF/A-1a not only
ensures that the document will look the same when it is used in the future, but also that its contents
can be interpreted reliably and will be accessible to physically impaired users. As an important
example, screenreader programs can read Tagged PDF documents to blind users.
PDF 1.7, the file format of Acrobat 8, has been standardized as ISO 32000-1 in 2008. In order to make
new PDF features available in PDF/A, a new part of the standard called PDF/A-2 has been published in
2011 as ISO 19005-2.
PDF/A-2 is based on PDF 1.7 and includes many additions which are not available in PDF/A-1. These
include important file format aspects such as JPEG 2000 compression, optional content (layers), PDF
packages and others. PDF/A-2 documents may contain file attachments provided the attached documents
themselves conform to PDF/A-1 or PDF/A-2.
Similar to PDF/A-1, PDF/A-2 offers level B and level A conformance. It adds another flavor called level U
conformance. Level U sits in between PDF/A-2a and PDF/A-2b in that it requires reliable Unicode semantics,
but not structure information. PDF/A-2u guarantees that the visual appearance of pages can be
reproduced faithfully and that the text can be extracted and searched.
PDF/A-2 does not make PDF/A-1 obsolete or force users to migrate to the newer part of the standard –
after all, this would be absurd for a standard which is targeted at long-term preservation.
2 Whitepaper: A Technical Introduction to PDF/A, 2021-04 PDFlib GmbH
www.pdflib.com
PDF/A-3
Another part of the standard called PDF/A-3 has been published in 2012 as ISO 19005-3. PDF/A-3 is quite
similar to PDF/A-2 and also supports conformance levels A, B, and U. It differs from PDF/A-2 in the following
aspects:
> > While PDF/A-2 allows only file attachments which conform to PDF/A, PDF/A-3 allows arbitrary file
types as attachments to meet the requirements of various use cases.
> > File attachments are associated with the whole document, a page, or some other part of the document.
The kind of relationship between an attached file and the corresponding part of the document
must be specified explicitly, e.g. source, alternative, or supplemental data. For each file attachment its
relationship to some part of the document must be specified with the AFRelationship key.
Typical PDF/A-3 scenarios include embedding of word processor or spreadsheet source files in a finalform
PDF/A document or the inclusion of machine-readable XML data in a PDF intended for human
consumption, e.g. an invoice. In fact, the ZUGFeRD and Factur-X invoice standards are an important
application of PDF/A-3.
PDF/A-4
PDF/A-4 has been published in 2020 as ISO 19005-4. Since it is based on PDF 2.0 (published as ISO
32000-2 in 2017 and updated in 2020) it can take advantage of new PDF features. While PDF/A-2 and
PDF/A-3 each comprise three different conformance levels which tended to confuse users, PDF/A-4
simplifies things since PDF/A-4 documents may or may not contain tags. Unlike previous parts of the
standard no dedicated conformance level is required for tagged PDF/A-4 documents, thus eliminating
the previous A/B/U conformance levels. Similarly, PDF/A-4 documents may or may not contain file attachments.
The attached files must conform to PDF/A-1, PDF/A-2 or PDF/A-4.
While abandoning the A/B/U conformance levels, PDF/A-4 introduces two new conformance levels:
> > PDF/A-4f allows non-PDF/A file attachments similar to how PDF/A-3 extends PDF/A-2.
> > PDF/A-4e is targeted at the engineering community. It is slated as successor of the PDF/E-1 standard
ISO 24517-1 which is based on PDF 1.6. The initial plan to define a new flavor PDF/E-2 has been
cancelled. Instead, PDF/A-4e adds RichMedia annotations for 3D content in U3D or PRC format to the
base PDF/A-4 format.
Regarding structure information and accessibility PDF/A-1a/2a/3a require only the mere presence of
tags, but don’t go into detail regarding the nature and use of PDF tags. PDF/A-4 goes one step backwards
and one step forthwards at the same time: while PDF/A is agnostic regarding the presence of
tags, it points out the advantages of Tagged PDF regarding content repurposing and accessibility. Regarding
the specifics the standard references the PDF/UA standard (ISO 14289) which discusses many
details of Tagging. Also, PDF/A-4 inherits the rigid regime of PDF tags which is part of the underlying
PDF 2.0 specification.
Which part to use?
In the same sense as PDF/A-2 does not replace PDF/A-1, PDF/A-3 does not replace PDF/A-2 and PDF/A-4
does not replace PDF/A-3. Any part of the PDF/A standard can be used for long term archival. You simply
have to relinquish certain PDF features as long as you work with an older part of the PDF/A standard.
For example, simple office documents without transparent graphics can still be implemented with
PDF/A-1. If you need arbitrary file attachments use PDF/A-3 or PDF/A-4f. If you need RichMedia/3D contents
use PDF/A-4e.
Technical Concepts in PDF/A
Fundamental
PDF/A requirements
PDF/A requires certain PDF features and prohibits others:
> > To guarantee the exact visual reproduction of text all fonts used in a document must be embedded.
The only exception are fonts used for invisible text; these don’t have to be embedded.
> > To guarantee exact color reproduction all colors must be specified in a device-independent way.
> > Metadata must be embedded using the XMP format. The PDF/A conformance level must be recorded
with specific XMP properties. While PDF/A-1/2/3 impose strict requirements on custom metadata
properties, this has been relaxed in PDF/A-4.
> > Encryption is not allowed to make sure that that the document contents can always be accessed
without any restriction.
> > Certain requirements for annotations and form fields ensure that the visualization is fixed and that
screen and print representation are identical.
3 Whitepaper: A Technical Introduction to PDF/A, 2021-04 PDFlib GmbH
www.pdflib.com
In addition to these straight-forward requirements, however, PDF/A requires various other PDF features
which are more subtle (e.g. certain entries in font data structures), and prohibits some critical structures,
e.g. certain combinations of TrueType fonts and encodings without guaranteed rendering results.
There are many aspects which must be implemented and checked by software developers before they
arrive at fully standard-conforming PDF/A products. PDF/A is much more than simply »PDF with embedded
fonts and no encryption«.
Specific restrictions
in PDF/A-1
Device-independent
color specification
PDF/A-1 reflects the fact that it was the first in the PDF/A family: the standard was created at a time
when important PDF concepts were not yet ready for prime time. As a result, the following features are
prohibited in PDF/A-1, but are allowed in the newer parts:
> > All features which require PDF 1.5 or above, e.g. JPEG 2000 compression and layers (optional content).
> > Transparency: although transparency is possible in PDF 1.4, it was not considered suitable for archiving
purposes at the time because there was no consistent description of transparency support
available. Since identical behavior in all PDF viewers could not be guaranteed transparency was completely
banned from PDF/A-1. After the publication of PDF/A-1 the exact semantics of PDF transparency
have been clarified and standardized in ISO 32000-1; later standards therefore allow the use of
transparency.
> > File attachments were banned from PDF/A-1 to make sure that all document contents are fully archivable.
In order to ensure consistent color reproduction across output devices and time, PDF/A requires the use
of device-independent color, usually achieved via ICC color profiles or CIE Lab color specifications. The
optional output intent describes the color characteristics of the document with an ICC profile. While
these concepts are widely used in the graphic arts industry, enterprise PDF developers are not necessarily
familiar with color management and must familiarize themselves with ICC profiles and related
concepts.
Raster images, e.g. TIFF and JPEG, play a vital role in document creation. Scanned paper documents and
photographs from digital cameras are common examples of raster image data in document workflows.
Often raster image data is already device-independent, usually by means of an embedded ICC color
profile or standardized color spaces such as sRGB. Such images are ready for use in PDF/A. However,
legacy image data is in many cases device-dependent, such as black-and-white or RGB scans without
an associated ICC profile.
XMP metadata and
extension schemas in
PDF/A-1/2/3
Extensible Metadata Platform (XMP) is an XML-based for mat modeled after W3C’s RDF (Resource Description
Framework) which forms the foundation of the semantic Web initiative. In 2012 XMP has been
standardized as ISO 16684-1. PDF/A mandates the use of XMP metadata for storing information about
a document inside the PDF itself. XMP provides a powerful and flexible framework for storing standard
and custom metadata properties (see separate PDFlib Whitepaper on XMP).
The XMP specification includes more than a dozen predefined schemas with hundreds of properties for
common document and image characteristics. The most widely used predefined XMP schema is called
the Dublin Core. It in cludes properties such as Title, Creator, Subject, and Description.
XMP is extensible by its nature, i.e. company- or industry-specific metadata requirements can be addressed
with custom schemas. PDF/A supports this concept. However, in order to ensure automated
retrieval PDF/A mandates that a machine-readable description of custom metadata must be included
in the metadata. This is achieved with an »XMP extension schema description«: a part of the XMP
metadata describes the structure of custom XMP metadata properties.
Metadata in PDF/A-4
The convoluted concept of XMP extension schemas introduced with PDF/A-1 didn’t really catch on with
developers and users. The industry had to struggle for several years to work out those details about
extension schema processing which were missing from the standard text. This led to frustration, since
on the one hand it was hard to correctly add custom metadata properties to PDF/A, and on the other
hand applications which didn’t use custom properties nevertheless triggered XMP-related errors in
PDF/A validators. PDF/A-4 eliminates these problems in a radical way by completely getting rid of XMP
extension schema descriptions. They are replaced with a machine-readable schema description according
to the Relax NG standard, published in 2014 as ISO 16684-2. However, unlike the required extension
schemas in PDF/A-1/2/3, schema descriptions are optional in PDF/A-4.
Another source of problems was the requirement to synchronize XMP metadata with entries in the
document information dictionary. This so-called crosswalk was underspecified and even got some
4 Whitepaper: A Technical Introduction to PDF/A, 2021-04 PDFlib GmbH
www.pdflib.com
details wrong in the first published version of PDF/A-1. Since PDF 2.0, the basis of PDF/A-4, almost completely
deprecates document info entries, PDF/A-4 no longer requires metadata synchronization.
PDF/A-1/2/3 Level A
conformance: Tagged PDF
PDF/A-1a, PDF/A-2a and PDF/A-3a require the use of Tagged PDF. While plain PDF only places visible contents
on a page, Tagged PDF requires that the document’s logical structure is recorded within the structure
hierarchy. Tagged PDF offers predefined structure element types for common parts of a document
such as headings, tables and lists. So-called marked content items can be considered the equivalent
of tagged content in markup languages. They refer to elements in this structure tree. Similar to HTML
and XML, Tagged PDF supports attributes for structure elements. For example, table elements can carry
attributes regarding the row or column spanning properties of table cells.
Level A conformance also requires that all text in the document has Unicode semantics available (see
below) and that logical words are separated by space characters.
PDF/UA-1 (Universal Accessibility) clarifies many aspects of Tagged PDF. It has been published in 2012
as ISO 14289. Although there is no direct relationship between both standards, a PDF/A document can
at the same time conform to PDF/UA. In fact, if you want to create PDF/A-1/2/3 with conformance level
A we recommend to adhere to the PDF/UA requirements in order to improve accessibility. For more
information refer to the PDFlib Whitepaper on PDF/UA.
PDF/A-4 abandons level A conformance and simply mentions the advantages of Tagged PDF for content
recovery. The standard references PDF/UA for further guidance, i.e. the recommendation above is now
included in the standard.
PDF/A-2/3 Level U
conformance:
Unicode requirements
PDF/A-2 and PDF/A-3 offer level U conformance in addition to levels A and B. Level U requires proper
Unicode semantics for all text in the document, but does not mandate Tagged PDF. This requirement is
rooted in the fact that PDF supports a variety of font and encoding techniques, not all of which support
Unicode. For example, PDF supports PostScript Type 1 fonts, a format which is deprecated or no longer
supported in many current operating systems and applications. This format has been introduced in
the 1980’s, while the Unicode consortium started its work in 1991. PDF/A conformance levels A and U
require that supplementary Unicode mapping information must be present for fonts which do not
contain it internally. But not all Unicode values are acceptable: values in the Private Use Area (PUA) are
not allowed since they don’t carry any common interpretation.
Symbolic fonts are an important area where this PDF/A requirement holds, e.g. fonts containing logos
or pictograms. Since standardized Unicode values are not available for custom symbolic glyphs, suitable
Unicode semantics must be provided in an ActualText marked content attribute for the text. While this
attribute is commonly used only in Tagged PDF, it can also be supplied in untagged documents – and
this is what level U conformance requires. The ActualText attribute can be assigned to an individual
glyph or a sequence of multiple glyphs.
PDF/A-4 eliminates level U conformance, but recommends level U Unicode properties for all documents.
However, this is not a strict requirement.
Annotations and
PDF/A-4 Level E
conformance
File Attachments and
PDF/A-4 Level F
conformance
PDF supports a variety of annotation types (also called comments) which enrich documents. Some
annotation types are prohibited in PDF/A; allowed annotations must adhere to several rules.
In PDF/A-1 Sound and Movie annotations are not permitted since »support for multimedia content is
outside the scope« of the standard. In the same spirit PDF/A-2 and PDF/A-3 disallow the newer 3D and
Screen annotation types. PDF/A-4 prohibits Sound, Screen and Movie annotations.
In addition, PDF/A-4 introduces conformance level E. It can be considered the successor of the PDF/E
standard for PDF in engineering which didn’t find widespread adoption. PDF/A-4e allows 3D and Rich-
Media annotations in support of interactive applications. Regarding 3D data the standard recommends
RichMedia annotations instead of 3D annotations.
Another new condition in PDF/A-4 which stems from PDF 2.0 is the requirement to have annotation
appearances included in the document. These describe the graphical representation of an appearance.
While the appearance dictionary contains a description of its visual representation (such as border
style, color, font etc.) the task of creating the visual representation from the description is up to the PDF
viewer and not standardized. In order to ensure reliable rendering of annotations the PDF creation software
must include the visual representation of the appearance of all annotation types except Popup
and Link.
Attachments can be embedded in a PDF document on the document level or on a page with the help of
FileAttachment annotations. Rules for embedded files differ substantially among PDF/A parts:
> > PDF/A-1 completely prohibits attachments.
5 Whitepaper: A Technical Introduction to PDF/A, 2021-04 PDFlib GmbH
www.pdflib.com
> > PDF/A-2 allows attachments, but the embedded documents must conform to PDF/A-1 or PDF/A-2.
> > PDF/A-3 allows attachments with arbitrary content types.
>
> PDF/A-4 allows attachments which conform to PDF/A-1, PDF/A-2 or PDF/A-4. It also introduces a dedicated
conformance level F which allows arbitrary content types.
PDF/A viewers are not required to do anything specific with attached non-PDF/A files except for extracting
them. The PDF/A standard does not guarantee that attachments can be viewed or otherwise
used in the future – it simply uses PDF/A as a carrier document.
Digital Signatures
Digital signatures in PDF documents can be used to check the document’s integrity, authenticate the
person who created the signature, and determine the date and time of signature. Digital signatures are
part of PDF 1.4 and are allowed in PDF/A. Multiple document signatures using PDF’s incremental update
feature are also allowed. However, the signatures must meet certain requirements for PDF/A:
> > If the signature has a visual appearance (e.g. an image or a textual representation of the signer’s
name) this appearance must meet the same PDF/A requirements as other document parts (deviceindependent
color, fonts embedded, etc.).
> > PDF/A-2 and PDF/A-3 contain additional requirements regarding technical details of the signature.
The standard also recommends to include timestamps and certificate revocation information in the
signature.
> > PDF/A-4 allows one certification signature, one or more approval signatures and one or more timestamp
signatures. All signatures must conform to an appropriate PAdES profile.
Conforming PDF/A Viewers
While conforming PDF/A documents are PDF documents, not all PDF viewers are necessarily conforming
PDF/A viewers. This is caused by additional requirements imposed on PDF viewers by the PDF/A
standard. The concept of a »PDF reader« as defined in the standard includes tools for viewing the contents
of a document interactively, but also encompasses non-interactive tools such as a Raster Image
Processor (RIP). While basic rendering of a document on screen or paper is specified in ISO 32000, PDF/A
further qualifies several aspects of rendering including the following:
> > While plain PDF viewers are free to ignore ICC-based color specifications and may use the alternate
color space instead, conforming PDF/A readers must always use the device-independent color information.
> > Conforming PDF/A readers must ignore certain device-specific information in a document, e.g. black
generation and undercolor removal (these are device-specific features for the graphic arts industry).
> > Conforming PDF/A readers are not allowed to render documents with fonts which may happen to
be available locally on the viewing system. Instead, only the fonts embedded in the document are
allowed for rendering.
> > Starting with PDF/A-2, conforming viewers must ignore old-style document information fields and
must fully rely on XMP metadata.
PDF/A Validation
PDF/A validation is the process of checking whether a document conforms to the requirements of a
particular part of the PDF/A standard. Validation has been available for a long time as part of Acrobat’s
Preflight component as well as from several independent software vendors. In order to provide a useful
resource for the community the Open Preservation Foundation (OPF), the PDF Association and the
Digital Preservation Coalition (DPC) collaborated in the development of a freely available and reliable
PDF/A validator called veraPDF. Its development has been funded by the European Commission’s
Preforma project and is supported by the PDF software developer community as organized in the PDF
Association.
If you are in doubt regarding the standard conformance of a particular PDF/A document we recommend
to check the issue with veraPDF.
6 Whitepaper: A Technical Introduction to PDF/A, 2021-04 PDFlib GmbH
www.pdflib.com
Processing PDF/A Documents
Special care must be taken when processing PDF/A documents in order to maintain standard conformance.
Even simple operations may spoil a document’s conformance. It is therefore crucial to deploy
only tools which are PDF/A-aware to guard against the risk that PDF/A documents are modified in a
way which violates the standard.
Splitting and Merging
Even simple operations may result in non-conforming documents. For example, inserting a page in a
PDF/A document poses several immediate dangers:
> > If the inserted page stems from a non-PDF/A document, it may use unembedded fonts.
> > Even if the imported page stems from a PDF/A document dangers lurk in multiple areas. For example,
the color characteristics (e.g. output intent) of both documents don’t necessarily match, which
could result in non-conforming output.
> > A small operation such as adding a metadata field may violate the standard unless the software
properly implements the rules for XMP metadata as mandated by PDF/A-1/2/3.
Any kind of content or metadata processing applied to PDF/A documents must be applied with PDF/Aaware
software to avoid jeopardizing PDF/A conformance.
Digital Signatures
In order to make use of digital signatures in PDF/A workflows the signature software must be aware of
PDF/A, i.e. observe the rules outlined above.
The bottom line is that only PDF/A-aware tools must be used in PDF/A workflows; otherwise PDF/A
conformance may be spoiled. In order to avoid PDF/A violations through accidental modification Adobe
Acrobat opens PDF/A documents in read-only mode by default. Once the available editing and modification
tools in Acrobat are used, PDF/A conformance is no longer guaranteed.
Document assembly and
Tagged PDF
Assembling documents from Tagged PDF pages is particularly tricky. On the technical level the
structure hierarchies of the involved PDF documents must be combined which involves convoluted
operations with the Tagging data structures. Even more difficult are semantic challenges. For example,
the document assembly process must take into account the logical entities which are combined. For
example, a structure element such as a paragraph or table may span multiple pages. If these pages are
separated or combined in different order the structure hierarchy is easily spoiled.
Document assembly with Tagged PDF requires careful planning of all involved semantic entities. For
example, the task can be simplified if the workflow ensures that major semantic units like document
sections start on a new page.
PDF/A Support in PDFlib GmbH Products
Creating PDF/A
with PDFlib
PDFlib GmbH introduced PDF/A functionality in its products in 2006. PDFlib products were the first
with support for XMP extension schemas. All products in the PDFlib product family support all flavors
of PDF/A-1, PDF/A-2 and PDF/A-3 (PDF/A-4 support in development). It provides application developers
with a toolkit which allows the following PDF/A-related operations:
> > create PDF/A from scratch, e.g. based on text from a database
> > convert raster images (e.g. scans) to PDF/A
> > process existing PDF/A documents, e.g. merge or split
> > work with ICC profiles and device-independent color to deal with all color management issues
> > create PDF/A level A with structure information (Tagged PDF), also in combination with PDF/UA
> > assemble Tagged PDF/A from existing tagged pages
> > attach XMP metadata to the generated documents, including XMP extension schemas
> > attach PDF/A documents to PDF/A-2 or arbitrary file types to PDF/A-3
All of these operations can be implemented with simple PDFlib calls. Sample code for a variety of
programming languages and development environments is provided with the PDFlib distribution. Additional
programming techniques for PDF/A are available in the PDFlib Cookbook.
Creating PDF/A-conforming output with PDFlib is achieved by the following means:
> > PDFlib automatically takes care of several formal settings for PDF/A, such as PDF version number and
required XMP identification entries.
> > The PDFlib application program must explicitly use certain function calls and options (e.g. for font
embedding).
7 Whitepaper: A Technical Introduction to PDF/A, 2021-04 PDFlib GmbH
www.pdflib.com
> > The PDFlib application program must refrain from using certain other function calls and option settings
(e.g. encryption).
If the PDFlib application program obeys to these rules valid PDF/A output is guaranteed. If PDFlib
detects a violation of the PDF/A creation rules it throws an exception which must be handled by the application.
No PDF output is created in case of an exception; there is no risk of creating non-conforming
output. Details of required and prohibited operations are discussed in the PDFlib documentation.
Processing PDF/A
with PDFlib+PDI
Creating PDF/A
level A with PDFlib
Additional rules apply when importing pages from existing PDF/A-conforming documents. When dealing
with existing PDF/A documents, PDFlib+PDI carefully examines the PDF/A properties of all input
and output documents to make sure that the output still conforms to PDF/A. For additional control the
output intent of an imported document can be copied to the output PDF, effectively cloning the PDF/A
color properties of an existing document. Similarly, XMP metadata from imported documents can be
cloned or merged.
PDF/A conformance level A can be regarded as level B plus Tagged PDF. PDFlib’s support for PDF/A level
A is based on the features for producing Tagged PDF: each content item can be placed at a particular
location in the document’s structure tree; content items which are not relevant for the document
structure (e.g. headers and footers, pagination) can be tagged as Artifacts which means that they will
be ignored when the document is read aloud by software or converted to some other format. Alternative
text can be attached to images and vector graphics. PDFlib automatically tags tables and Artifacts
which is a big time-saver for the developer. PDFlib checks the supplied tags to make sure that the
structure element nesting and attributes conform to ISO 32000. For example, heading or list tags must
be properly nested.
Integrated support for PDF/UA makes it easy to create PDF output which is both accessible and archivable.
Note that you need detailed knowledge about the document’s logical structure in order to create
Tagged PDF. PDFlib takes care of the PDF-related details, but it cannot infer the document structure
from its contents.
PDF/A-conforming
signatures with PLOP DS
PDFlib PLOP DS is a toolkit for applying digital signatures to PDF documents according to the PAdES
signature standards required for signatures according to European eIDAS regulations. PLOP DS applies
signatures to PDF/A documents such that the signed output also conforms to PDF/A.
PDFlib
PDFlib GmbH
Franziska-Bilek-Weg 9
80339 München, Germany
support@pdflib.com
www.pdflib.com/knowledge-base/pdfa
PDFlib GmbH is completely focused on PDF technology. Customers worldwide use PDFlib products
since 1997. The company closely follows development and market trends, such as ISO standards for PDF.
PDFlib GmbH products are distributed all over the world with major markets in North America, Europe,
and Japan.
Founded in 2006 as PDF/A Competence Center, in 2011 the PDF association broadened its scope to cover
all aspects of PDF technology. Today, it provides an industry meeting-place, and a platform for members
to exercise thought-leadership in the community.
> > Developers use the PDF Association to share knowledge and experience with PDF technology.
> > Decision-makers use the PDF Association to learn about the role and capabilities of PDF and PDF’s
subset standards in ECM and other electronic document applications.
> > End-users benefit from improved reliability, quality and functionality and interoperability in their
experience of electronic documents.