Using output encoding "UTF-8" ABC FontReporter Version 1.11 Outdated version - only for demonstration! A Plugin for analyzing fonts in PDF Copyright © 2005-2017 PDFlib GmbH. All rights reserved. PDFlib GmbH Franziska-Bilek-Weg 9, 80339 München, Germany www.pdflib.com phone +49 • 89 • 452 33 84-0 fax +49 • 89 • 452 33 84-99 If you have questions check the PDFlib mailing list and archive at groups.yahoo.com/neo/groups/pdflib/info Licensing contact: sales@pdflib.com Support for commercial PDFlib licensees: support@pdflib.com (please include your license number) You can use PDFlib FontReporter free of charge; however, it is not in the public domain. This software cannot be sold or redistributed (whether for a fee or at no charge), either stand-alone or in combination with any other product, without the express written permission of PDFlib GmbH. This publication and the information herein is furnished as is, is subject to change without notice, and should not be construed as a commitment by PDFlib GmbH. PDFlib GmbH assumes no responsibility or liability for any errors or inaccuracies, makes no warranty of any kind (express, implied or statutory) with respect to this publication, and expressly disclaims any and all warranties of merchantability, fitness for particular purposes and noninfringement of third party rights. PDFlib FontReporter is provided »as is« without any warranty, express or implied, including but not limited to any implied warranties of merchantability and fitness for a particular purpose. In no event will PDFlib GmbH be liable for any damages, including lost profits, lost savings, or other incidental consequential damages. Although PDFlib FontReporter is not a commercial product, we strive to provide high quality. If you run into problems you are encouraged to contact us at support@pdflib.com. Adobe, Acrobat, PostScript, and XMP are trademarks of Adobe Systems Inc. AIX, IBM, OS/390, WebSphere, iSeries, and zSeries are trademarks of International Business Machines Corporation. ActiveX, Microsoft, OpenType, and Windows are trademarks of Microsoft Corporation. Apple, Macintosh and TrueType are trademarks of Apple Computer, Inc. Unicode and the Unicode logo are trademarks of Unicode, Inc. Unix is a trademark of The Open Group. Java and Solaris are trademarks of Sun Microsystems, Inc. HKS is a registered trademark of the HKS brand association: Hostmann-Steinberg, K+E Printing Inks, Schmincke. Other company product and service names may be trademarks or service marks of others. Contents 1 Installing PDFlib FontReporter 5 2 Working with FontReporter 7 2.1 What can you do with FontReporter? 7 2.2 Overview of PDF Font Formats 9 2.3 Contents of a Font Report 11 2.4 Investigate PDF Problems with FontReporter 14 2.5 Error Messages 15 A Revision History 17 Contents 3 1 Installing PDFlib FontReporter Requirements. PDFlib FontReporter works with the following Acrobat versions: > Acrobat 8/9/X/XI/DC on Windows > Acrobat X/XI/DC on macOS The Plugin doesn’t work with Adobe Reader/Acrobat Reader. Installing FontReporter on Windows. To install PDFlib FontReporter in Acrobat, the plugin files must be placed in a subdirectory of the Acrobat plugin folder. This is done automatically by the plugin installer, but can also be done manually. A typical location of the plugin folder looks as follows: C:\Program Files\Adobe\Acrobat XXX\Acrobat\plug_ins\PDFlib FontReporter For 32-bit versions of Acrobat running on 64-bit Windows the first part should be C:\Program Files (x86)\... Installing FontReporter for Acrobat X/XI/DC on macOS. Proceed as follows to install the plugin for all users: > Double-click the disk image to mount it. A folder with the plugin files will be visible. > Copy the plugin folder to the following path in the system’s Library folder: /Library/Application Support/Adobe/Acrobat/XXX/Plug-ins Alternatively you can install the plugin only for a single user as follows: > Click the desktop to make sure you’re in the Finder, hold down the Option key, and choose Go, Library to open the user’s Library folder. > Copy the plugin folder to the following path in the user’s Library folder: /Users//Library/Application Support/Adobe/Acrobat/XXX/Plug-ins Multi-lingual interface. PDFlib FontReporter supports multiple languages in the user interface. Depending on the application language of Acrobat, FontReporter chooses its interface language automatically. Currently English and German interfaces are available. If Acrobat runs in any other language mode, FontReporter uses the English interface. Troubleshooting. If PDFlib FontReporter doesn’t seem to work check the following: Make sure that in Edit, Preferences, [General...], General the box Use only certified plug-ins is unchecked. The plugin is not loaded if Acrobat runs in Certified Mode. 5 2 Working with FontReporter 2.1 What can you do with FontReporter? FontReporter is a useful tool if you are interested in fonts within PDF documents. It provides font- and encoding-related information which helps in a variety of situations: > analyze printing problems (e.g. a particular font causes printing errors) > investigate text extraction problems (e.g. copying text from a PDF results in garbage) > visualize Unicode mappings for a font > find flaws in the PDF creation workflow (e.g. printer driver converted a PostScript Type 1 font to Type 3) > test whether ToUnicode mapping tables (required for PDF/A-1a) are present > identify logos and symbols which are represented as text in a PDF > learn which fonts are contained in a PDF, and which glyphs they contain (e.g. the file size is too large because some fonts ended up in the PDF unintentionally) > check font subsets to see which glyphs are contained in the subset > learn more about PDF font technology Using FontReporter is as easy as bringing up the menu Plug-Ins, PDFlib FontReporter..., Create Font Report in Acrobat. This will create a font report for all pages of the current PDF document as a separate PDF. Two pages from typical font reports are shown in Figure 2.1. Fig. 2.1 Sample font reports 2.1 What can you do with FontReporter? 7 Supported PDF and font formats. FontReporter supports all PDF versions up to Acrobat DC. All font and encoding formats in PDF are supported, as well as all types of embedded font data. Advantages over Acrobat’s font properties panel. All versions of Acrobat including Adobe Reader provide font information via File, Document Properties..., Fonts. However, Acrobat’s font overview is limited in use; FontReporter provides the following advantages compared to Acrobat’s font list: > FontReporter provides much more information about each font > FontReporter deals with CJK font names even on Western systems > FontReporter provides glyph tables containing the glyphs of a font along with their widths, names, and Unicode values > FontReporter presents the output as a PDF document so that you can save or print it > FontReporter is guaranteed to process the full document, regardless of which pages have already been displayed in Acrobat PDF text extraction with PDFlib TET. FontReporter is an auxiliary tool to our PDFlib Text and Image Extraction Toolkit (TET). TET is software for extracting the text and image contents of PDF documents. It is available both as a standalone program and a programming library/component which can be integrated into existing software. TET extracts text from all kinds of PDF documents and normalizes the text to Unicode. FontReporter can be used to create Unicode mapping tables for PDF documents which do not contain enough information for extracting text, or which contain wrong Unicode mapping tables. Fully functional evaluation versions of TET are available for download from www.pdflib.com. TET PDF IFilter. TET PDF IFilter extracts text and metadata from PDF documents and makes it available to search and retrieval software on Windows. This allows PDF documents to be searched on the local desktop, a corporate server, or the Web. TET PDF IFilter is based on the patented PDFlib Text and Image Extraction Toolkit (TET). TET PDF IFilter is a robust implementation of Microsoft’s IFilter indexing interface. It works with all search and retrieval products which support the IFilter interface, e.g. SharePoint and SQL Server. Fully functional evaluation versions of TET PDF IFilter are available for download from www.pdflib.com. Free TET Plugin. The TET Plugin is a free companion to the FontReporter Plugin. It can be installed in Adobe Acrobat and allows interactive use of the Text and Image Extraction Toolkit (TET) with any PDF document that is currently open in Acrobat. Using the TET plugin you can access TET’s functionality and experiment with TET options. The TET plugin can freely be downloaded from www.pdflib.com. 8 Chapter 2: Working with FontReporter 2.2 Overview of PDF Font Formats PDF supports a large number of font formats, the details of which can get confusing. In order to help you interpret the reports created by FontReporter we provide a quick summary of PDF font formats and their most important properties. While the format of a font in a PDF document depends on the format of the original font used to compose the document, this is not the only factor which plays a role here. Other factors include the configuration options in the PDF-creating software, the settings of the printer driver used to generate PostScript data for PDF conversion, the set of characters in the document, the overall number of used characters, and more. A particularly important aspect is the distinction between simple fonts and composite (CID) fonts. Simple fonts. Simple fonts comprise the PostScript Type 1 (including Multiple Master), TrueType, and Type 3 types, and are addressed with 8-bit codes. They are therefore limited a maximum of 256 characters. Simple fonts use a name-based encoding, which is a table for mapping the character codes to the glyphs in the font. Composite (CID) fonts. Composite or CID (character ID) fonts come in PostScript and TrueType flavors. They can contain up to 65535 characters and are much more flexible than simple fonts. While CID fonts often use 2-byte codes for addressing the glyphs in the font, more complicated schemes with a variable number of bytes per character (1-4) are used for CJK fonts. Instead of an encoding table CID fonts require a CMap (Character Map) for providing the mapping from character codes to actual glyphs in the font. Dozens of predefined CMaps are available for common CJK fonts. The font’s character collection specifies a particular set of Chinese, Japanese, or Korean characters. So-called Identity CMaps are used (mainly for Western fonts) in order to directly address the glyphs in a font without any intermediate mapping table. Comparison of PDF font formats. Table 2.1 details the font formats supported in PDF, and explains which original font formats can be converted to these types by the PDF creation software. Table 2.1 Font formats in PDF Name Type1 MMType1 Type3 Notes Classic PostScript Type 1 fonts. In addition to the original Type 1 format they can also be embedded as CFF (Compressed Font Format) under the name Type1C (Type 1 Compressed). These fonts are the result of classic PostScript Type 1 fonts or OpenType fonts with PostScript outlines. (In Acrobat: MM) Multiple Master fonts are an extension of the Type 1 format, and are rarely used. These fonts are the result of PostScript Type 1 Multiple Master fonts. User-defined fonts, i.e. the glyphs are described by raw vector or image operations instead of a readymade font. Type 3 fonts are always embedded. They are mainly intended for bitmapped fonts and logo fonts. These fonts are often the result of a printer driver converting a PostScript Type 1 or TrueType font to a bitmap font. Some applications use Type 3 fonts for achieving special effects, such as filling an area with a pattern. 2.2 Overview of PDF Font Formats 9 Programs After Market Services (PAMS) Technical Documentation ������� ������ [NMP Part No.0275421 + 0275485] NSD–1 SERIES CELLULAR PHONES NSD–1 issue 2 : 09/00 Copyright � 1999. Nokia Mobile Phones Ltd. All Rights Reserved. NSD–1 Foreword PAMS Technical Documentation AMENDMENT RECORD SHEET Amendment Number Date Inserted By Comments 08/99 Issue 1 Issue 2 09/00 OJuntune General information: New variants NSD–1 FW/GW/AW, p.8 & 9 ARS added p.2 Issue 2 09/00 OJuntune System module: New variant NSD–1AW updated pages 1, 2, 5, 12 Issue 2 09/00 OJuntune System module schematic diagarams: 11 new A3 pages: 11 to 25 Issue2 09/00 OJuntune Product variants ARS added p.2 Issue 2 09/00 OJuntune UIF module: foreword warning p.3 added NSD–3AY assy parts added NSD–1FW, 1GW. 1AW data added, p. 7 to 10 ARS added v..6.3 parts list added p.15 Issue 2 09/00 OJuntune Parts list:: ARS p.2, 0201583, 0201577, 0201549 added 0201293, 0201294 updated Issue 2 09/00 OJuntune Troubleshooting instructions: ARS added, p.3 updated Page 2 � Nokia Mobile Phones Ltd. Issue 2 09/00 PAMS Technical Documentation NSD–1 Foreword CONTENTS: Foreword NSD–1 SERIES CELLULAR PHONES SERVICE MANUAL General Information System Module Part Lists (System Module) UI Module Product Variants Service Software and Tuning Instructions Service Tools Disassembly/Troubleshooting Instructions Handsfree Unit HFU–2 Non–serviceable Accessories Installation Instructions CARK–64 Installation Instructions CARK–91 Issue 2 09/00 � Nokia Mobile Phones Ltd. Page 3 NSD–1 Foreword PAMS Technical Documentation IMPORTANT This document is intended for use by qualified service personnel only. Company Policy Our policy is of continuous development; details of all technical modifications will be included with service bulletins. While every endeavour has been made to ensure the accuracy of this document, some errors may exist. If any errors are found by the reader, NOKIA MOBILE PHONES Ltd should be notified in writing. Please state: Title of the Document + Issue Number/Date of publication Latest Amendment Number (if applicable) Page(s) and/or Figure(s) in error Please send to: Nokia Mobile Phones Ltd PAMS Technical Documentation PO Box 86 24101 SALO Finland Page 4 � Nokia Mobile Phones Ltd. Issue 2 09/00 PAMS Technical Documentation NSD–1 Foreword Warnings and Cautions Please refer to the phone’s user guide for instructions relating to operation, care and maintenance including important safety information. Note also the following: Warnings: 1. CARE MUST BE TAKEN ON INSTALLATION IN VEHICLES FITTED WITH ELECTRONIC ENGINE MANAGEMENT SYSTEMS AND ANTI–SKID BRAKING SYSTEMS. UNDER CERTAIN FAULT CONDITIONS, EMITTED RF ENERGY CAN AFFECT THEIR OPERATION. IF NECESSARY, CONSULT THE VEHICLE DEALER/MANUFACTURER TO DETERMINE THE IMMUNITY OF VEHICLE ELECTRONIC SYSTEMS TO RF ENERGY. 2. THE HANDPORTABLE TELEPHONE MUST NOT BE OPERATED IN AREAS LIKELY TO CONTAIN POTENTIALLY EXPLOSIVE ATMOSPHERES EG PETROL STATIONS (SERVICE STATIONS), BLASTING AREAS ETC. 3. OPERATION OF ANY RADIO TRANSMITTING EQUIPMENT, INCLUDING CELLULAR TELEPHONES, MAY INTERFERE WITH THE FUNCTIONALITY OF INADEQUATELY PROTECTED MEDICAL DEVICES. CONSULT A PHYSICIAN OR THE MANUFACTURER OF THE MEDICAL DEVICE IF YOU HAVE ANY QUESTIONS. OTHER ELECTRONIC EQUIPMENT MAY ALSO BE SUBJECT TO INTERFERENCE. Cautions: 1. Servicing and alignment must be undertaken by qualified personnel only. 2. Ensure all work is carried out at an anti–static workstation and that an anti–static wrist strap is worn. 3. Ensure solder, wire, or foreign matter does not enter the telephone as damage may result. 4. Use only approved components as specified in the parts list. 5. Ensure all components, modules screws and insulators are correctly re–fitted after servicing and alignment. Ensure all cables and wires are repositioned correctly. 6. All PC’s used with NMP Service Software for this produce must be bios and operating system ”Year 2000 Compliant”. Issue 2 09/00 � Nokia Mobile Phones Ltd. Page 5 NSD–1 Foreword PAMS Technical Documentation This page intentionally left blank. Page 6 � Nokia Mobile Phones Ltd. Issue 2 09/00 A Technical Introduction to PDF/A-1/2/3/4 PDFlib Whitepaper The PDF/A Family of Archiving Standards PDF/A is targeted at reliable long-time preservation of digital documents with text, raster images and vector graphics as well as associated metadata. The PDF/A format specified in the ISO 19005 standard series defines a consistent and robust subset of PDF which can faithfully be reproduced even after a long archiving period or used for reliable data exchange in enterprise and government environments. This whitepaper discusses the major technical aspects of PDF/A-1, PDF/A-2, PDF/A-3 and PDF/A-4. PDF/A-1 PDF/A-2 PDF/A-1, the first standard within a series of multiple parts, has been published in 2005 as ISO 19005-1. It is based on PDF 1.4, the file format of Acrobat 5, and imposes restrictions regarding the use of color, fonts, annotations and other elements. There are two flavors of PDF/A-1 (called conformance levels): > > Level B conformance (PDF/A-1b; »b« as in »basic«) ensures that the visual appearance of a document is preservable in the long term. PDF/A-1b ensures that the document will look the same when it is viewed or printed in the near or far future. > > Level A conformance (PDF/A-1a; »a« as in »accessible«) is based on level B, but adds crucial properties of Tagged PDF. It requires structure information and reliable Unicode text semantics in order to preserve the document’s logical structure and natural reading order. Simply put, PDF/A-1a not only ensures that the document will look the same when it is used in the future, but also that its contents can be interpreted reliably and will be accessible to physically impaired users. As an important example, screenreader programs can read Tagged PDF documents to blind users. PDF 1.7, the file format of Acrobat 8, has been standardized as ISO 32000-1 in 2008. In order to make new PDF features available in PDF/A, a new part of the standard called PDF/A-2 has been published in 2011 as ISO 19005-2. PDF/A-2 is based on PDF 1.7 and includes many additions which are not available in PDF/A-1. These include important file format aspects such as JPEG 2000 compression, optional content (layers), PDF packages and others. PDF/A-2 documents may contain file attachments provided the attached documents themselves conform to PDF/A-1 or PDF/A-2. Similar to PDF/A-1, PDF/A-2 offers level B and level A conformance. It adds another flavor called level U conformance. Level U sits in between PDF/A-2a and PDF/A-2b in that it requires reliable Unicode semantics, but not structure information. PDF/A-2u guarantees that the visual appearance of pages can be reproduced faithfully and that the text can be extracted and searched. PDF/A-2 does not make PDF/A-1 obsolete or force users to migrate to the newer part of the standard – after all, this would be absurd for a standard which is targeted at long-term preservation. 2 Whitepaper: A Technical Introduction to PDF/A, 2021-04 PDFlib GmbH www.pdflib.com PDF/A-3 Another part of the standard called PDF/A-3 has been published in 2012 as ISO 19005-3. PDF/A-3 is quite similar to PDF/A-2 and also supports conformance levels A, B, and U. It differs from PDF/A-2 in the following aspects: > > While PDF/A-2 allows only file attachments which conform to PDF/A, PDF/A-3 allows arbitrary file types as attachments to meet the requirements of various use cases. > > File attachments are associated with the whole document, a page, or some other part of the document. The kind of relationship between an attached file and the corresponding part of the document must be specified explicitly, e.g. source, alternative, or supplemental data. For each file attachment its relationship to some part of the document must be specified with the AFRelationship key. Typical PDF/A-3 scenarios include embedding of word processor or spreadsheet source files in a finalform PDF/A document or the inclusion of machine-readable XML data in a PDF intended for human consumption, e.g. an invoice. In fact, the ZUGFeRD and Factur-X invoice standards are an important application of PDF/A-3. PDF/A-4 PDF/A-4 has been published in 2020 as ISO 19005-4. Since it is based on PDF 2.0 (published as ISO 32000-2 in 2017 and updated in 2020) it can take advantage of new PDF features. While PDF/A-2 and PDF/A-3 each comprise three different conformance levels which tended to confuse users, PDF/A-4 simplifies things since PDF/A-4 documents may or may not contain tags. Unlike previous parts of the standard no dedicated conformance level is required for tagged PDF/A-4 documents, thus eliminating the previous A/B/U conformance levels. Similarly, PDF/A-4 documents may or may not contain file attachments. The attached files must conform to PDF/A-1, PDF/A-2 or PDF/A-4. While abandoning the A/B/U conformance levels, PDF/A-4 introduces two new conformance levels: > > PDF/A-4f allows non-PDF/A file attachments similar to how PDF/A-3 extends PDF/A-2. > > PDF/A-4e is targeted at the engineering community. It is slated as successor of the PDF/E-1 standard ISO 24517-1 which is based on PDF 1.6. The initial plan to define a new flavor PDF/E-2 has been cancelled. Instead, PDF/A-4e adds RichMedia annotations for 3D content in U3D or PRC format to the base PDF/A-4 format. Regarding structure information and accessibility PDF/A-1a/2a/3a require only the mere presence of tags, but don’t go into detail regarding the nature and use of PDF tags. PDF/A-4 goes one step backwards and one step forthwards at the same time: while PDF/A is agnostic regarding the presence of tags, it points out the advantages of Tagged PDF regarding content repurposing and accessibility. Regarding the specifics the standard references the PDF/UA standard (ISO 14289) which discusses many details of Tagging. Also, PDF/A-4 inherits the rigid regime of PDF tags which is part of the underlying PDF 2.0 specification. Which part to use? In the same sense as PDF/A-2 does not replace PDF/A-1, PDF/A-3 does not replace PDF/A-2 and PDF/A-4 does not replace PDF/A-3. Any part of the PDF/A standard can be used for long term archival. You simply have to relinquish certain PDF features as long as you work with an older part of the PDF/A standard. For example, simple office documents without transparent graphics can still be implemented with PDF/A-1. If you need arbitrary file attachments use PDF/A-3 or PDF/A-4f. If you need RichMedia/3D contents use PDF/A-4e. Technical Concepts in PDF/A Fundamental PDF/A requirements PDF/A requires certain PDF features and prohibits others: > > To guarantee the exact visual reproduction of text all fonts used in a document must be embedded. The only exception are fonts used for invisible text; these don’t have to be embedded. > > To guarantee exact color reproduction all colors must be specified in a device-independent way. > > Metadata must be embedded using the XMP format. The PDF/A conformance level must be recorded with specific XMP properties. While PDF/A-1/2/3 impose strict requirements on custom metadata properties, this has been relaxed in PDF/A-4. > > Encryption is not allowed to make sure that that the document contents can always be accessed without any restriction. > > Certain requirements for annotations and form fields ensure that the visualization is fixed and that screen and print representation are identical. 3 Whitepaper: A Technical Introduction to PDF/A, 2021-04 PDFlib GmbH www.pdflib.com In addition to these straight-forward requirements, however, PDF/A requires various other PDF features which are more subtle (e.g. certain entries in font data structures), and prohibits some critical structures, e.g. certain combinations of TrueType fonts and encodings without guaranteed rendering results. There are many aspects which must be implemented and checked by software developers before they arrive at fully standard-conforming PDF/A products. PDF/A is much more than simply »PDF with embedded fonts and no encryption«. Specific restrictions in PDF/A-1 Device-independent color specification PDF/A-1 reflects the fact that it was the first in the PDF/A family: the standard was created at a time when important PDF concepts were not yet ready for prime time. As a result, the following features are prohibited in PDF/A-1, but are allowed in the newer parts: > > All features which require PDF 1.5 or above, e.g. JPEG 2000 compression and layers (optional content). > > Transparency: although transparency is possible in PDF 1.4, it was not considered suitable for archiving purposes at the time because there was no consistent description of transparency support available. Since identical behavior in all PDF viewers could not be guaranteed transparency was completely banned from PDF/A-1. After the publication of PDF/A-1 the exact semantics of PDF transparency have been clarified and standardized in ISO 32000-1; later standards therefore allow the use of transparency. > > File attachments were banned from PDF/A-1 to make sure that all document contents are fully archivable. In order to ensure consistent color reproduction across output devices and time, PDF/A requires the use of device-independent color, usually achieved via ICC color profiles or CIE Lab color specifications. The optional output intent describes the color characteristics of the document with an ICC profile. While these concepts are widely used in the graphic arts industry, enterprise PDF developers are not necessarily familiar with color management and must familiarize themselves with ICC profiles and related concepts. Raster images, e.g. TIFF and JPEG, play a vital role in document creation. Scanned paper documents and photographs from digital cameras are common examples of raster image data in document workflows. Often raster image data is already device-independent, usually by means of an embedded ICC color profile or standardized color spaces such as sRGB. Such images are ready for use in PDF/A. However, legacy image data is in many cases device-dependent, such as black-and-white or RGB scans without an associated ICC profile. XMP metadata and extension schemas in PDF/A-1/2/3 Extensible Metadata Platform (XMP) is an XML-based for mat modeled after W3C’s RDF (Resource Description Framework) which forms the foundation of the semantic Web initiative. In 2012 XMP has been standardized as ISO 16684-1. PDF/A mandates the use of XMP metadata for storing information about a document inside the PDF itself. XMP provides a powerful and flexible framework for storing standard and custom metadata properties (see separate PDFlib Whitepaper on XMP). The XMP specification includes more than a dozen predefined schemas with hundreds of properties for common document and image characteristics. The most widely used predefined XMP schema is called the Dublin Core. It in cludes properties such as Title, Creator, Subject, and Description. XMP is extensible by its nature, i.e. company- or industry-specific metadata requirements can be addressed with custom schemas. PDF/A supports this concept. However, in order to ensure automated retrieval PDF/A mandates that a machine-readable description of custom metadata must be included in the metadata. This is achieved with an »XMP extension schema description«: a part of the XMP metadata describes the structure of custom XMP metadata properties. Metadata in PDF/A-4 The convoluted concept of XMP extension schemas introduced with PDF/A-1 didn’t really catch on with developers and users. The industry had to struggle for several years to work out those details about extension schema processing which were missing from the standard text. This led to frustration, since on the one hand it was hard to correctly add custom metadata properties to PDF/A, and on the other hand applications which didn’t use custom properties nevertheless triggered XMP-related errors in PDF/A validators. PDF/A-4 eliminates these problems in a radical way by completely getting rid of XMP extension schema descriptions. They are replaced with a machine-readable schema description according to the Relax NG standard, published in 2014 as ISO 16684-2. However, unlike the required extension schemas in PDF/A-1/2/3, schema descriptions are optional in PDF/A-4. Another source of problems was the requirement to synchronize XMP metadata with entries in the document information dictionary. This so-called crosswalk was underspecified and even got some 4 Whitepaper: A Technical Introduction to PDF/A, 2021-04 PDFlib GmbH www.pdflib.com details wrong in the first published version of PDF/A-1. Since PDF 2.0, the basis of PDF/A-4, almost completely deprecates document info entries, PDF/A-4 no longer requires metadata synchronization. PDF/A-1/2/3 Level A conformance: Tagged PDF PDF/A-1a, PDF/A-2a and PDF/A-3a require the use of Tagged PDF. While plain PDF only places visible contents on a page, Tagged PDF requires that the document’s logical structure is recorded within the structure hierarchy. Tagged PDF offers predefined structure element types for common parts of a document such as headings, tables and lists. So-called marked content items can be considered the equivalent of tagged content in markup languages. They refer to elements in this structure tree. Similar to HTML and XML, Tagged PDF supports attributes for structure elements. For example, table elements can carry attributes regarding the row or column spanning properties of table cells. Level A conformance also requires that all text in the document has Unicode semantics available (see below) and that logical words are separated by space characters. PDF/UA-1 (Universal Accessibility) clarifies many aspects of Tagged PDF. It has been published in 2012 as ISO 14289. Although there is no direct relationship between both standards, a PDF/A document can at the same time conform to PDF/UA. In fact, if you want to create PDF/A-1/2/3 with conformance level A we recommend to adhere to the PDF/UA requirements in order to improve accessibility. For more information refer to the PDFlib Whitepaper on PDF/UA. PDF/A-4 abandons level A conformance and simply mentions the advantages of Tagged PDF for content recovery. The standard references PDF/UA for further guidance, i.e. the recommendation above is now included in the standard. PDF/A-2/3 Level U conformance: Unicode requirements PDF/A-2 and PDF/A-3 offer level U conformance in addition to levels A and B. Level U requires proper Unicode semantics for all text in the document, but does not mandate Tagged PDF. This requirement is rooted in the fact that PDF supports a variety of font and encoding techniques, not all of which support Unicode. For example, PDF supports PostScript Type 1 fonts, a format which is deprecated or no longer supported in many current operating systems and applications. This format has been introduced in the 1980’s, while the Unicode consortium started its work in 1991. PDF/A conformance levels A and U require that supplementary Unicode mapping information must be present for fonts which do not contain it internally. But not all Unicode values are acceptable: values in the Private Use Area (PUA) are not allowed since they don’t carry any common interpretation. Symbolic fonts are an important area where this PDF/A requirement holds, e.g. fonts containing logos or pictograms. Since standardized Unicode values are not available for custom symbolic glyphs, suitable Unicode semantics must be provided in an ActualText marked content attribute for the text. While this attribute is commonly used only in Tagged PDF, it can also be supplied in untagged documents – and this is what level U conformance requires. The ActualText attribute can be assigned to an individual glyph or a sequence of multiple glyphs. PDF/A-4 eliminates level U conformance, but recommends level U Unicode properties for all documents. However, this is not a strict requirement. Annotations and PDF/A-4 Level E conformance File Attachments and PDF/A-4 Level F conformance PDF supports a variety of annotation types (also called comments) which enrich documents. Some annotation types are prohibited in PDF/A; allowed annotations must adhere to several rules. In PDF/A-1 Sound and Movie annotations are not permitted since »support for multimedia content is outside the scope« of the standard. In the same spirit PDF/A-2 and PDF/A-3 disallow the newer 3D and Screen annotation types. PDF/A-4 prohibits Sound, Screen and Movie annotations. In addition, PDF/A-4 introduces conformance level E. It can be considered the successor of the PDF/E standard for PDF in engineering which didn’t find widespread adoption. PDF/A-4e allows 3D and Rich- Media annotations in support of interactive applications. Regarding 3D data the standard recommends RichMedia annotations instead of 3D annotations. Another new condition in PDF/A-4 which stems from PDF 2.0 is the requirement to have annotation appearances included in the document. These describe the graphical representation of an appearance. While the appearance dictionary contains a description of its visual representation (such as border style, color, font etc.) the task of creating the visual representation from the description is up to the PDF viewer and not standardized. In order to ensure reliable rendering of annotations the PDF creation software must include the visual representation of the appearance of all annotation types except Popup and Link. Attachments can be embedded in a PDF document on the document level or on a page with the help of FileAttachment annotations. Rules for embedded files differ substantially among PDF/A parts: > > PDF/A-1 completely prohibits attachments. 5 Whitepaper: A Technical Introduction to PDF/A, 2021-04 PDFlib GmbH www.pdflib.com > > PDF/A-2 allows attachments, but the embedded documents must conform to PDF/A-1 or PDF/A-2. > > PDF/A-3 allows attachments with arbitrary content types. > > PDF/A-4 allows attachments which conform to PDF/A-1, PDF/A-2 or PDF/A-4. It also introduces a dedicated conformance level F which allows arbitrary content types. PDF/A viewers are not required to do anything specific with attached non-PDF/A files except for extracting them. The PDF/A standard does not guarantee that attachments can be viewed or otherwise used in the future – it simply uses PDF/A as a carrier document. Digital Signatures Digital signatures in PDF documents can be used to check the document’s integrity, authenticate the person who created the signature, and determine the date and time of signature. Digital signatures are part of PDF 1.4 and are allowed in PDF/A. Multiple document signatures using PDF’s incremental update feature are also allowed. However, the signatures must meet certain requirements for PDF/A: > > If the signature has a visual appearance (e.g. an image or a textual representation of the signer’s name) this appearance must meet the same PDF/A requirements as other document parts (deviceindependent color, fonts embedded, etc.). > > PDF/A-2 and PDF/A-3 contain additional requirements regarding technical details of the signature. The standard also recommends to include timestamps and certificate revocation information in the signature. > > PDF/A-4 allows one certification signature, one or more approval signatures and one or more timestamp signatures. All signatures must conform to an appropriate PAdES profile. Conforming PDF/A Viewers While conforming PDF/A documents are PDF documents, not all PDF viewers are necessarily conforming PDF/A viewers. This is caused by additional requirements imposed on PDF viewers by the PDF/A standard. The concept of a »PDF reader« as defined in the standard includes tools for viewing the contents of a document interactively, but also encompasses non-interactive tools such as a Raster Image Processor (RIP). While basic rendering of a document on screen or paper is specified in ISO 32000, PDF/A further qualifies several aspects of rendering including the following: > > While plain PDF viewers are free to ignore ICC-based color specifications and may use the alternate color space instead, conforming PDF/A readers must always use the device-independent color information. > > Conforming PDF/A readers must ignore certain device-specific information in a document, e.g. black generation and undercolor removal (these are device-specific features for the graphic arts industry). > > Conforming PDF/A readers are not allowed to render documents with fonts which may happen to be available locally on the viewing system. Instead, only the fonts embedded in the document are allowed for rendering. > > Starting with PDF/A-2, conforming viewers must ignore old-style document information fields and must fully rely on XMP metadata. PDF/A Validation PDF/A validation is the process of checking whether a document conforms to the requirements of a particular part of the PDF/A standard. Validation has been available for a long time as part of Acrobat’s Preflight component as well as from several independent software vendors. In order to provide a useful resource for the community the Open Preservation Foundation (OPF), the PDF Association and the Digital Preservation Coalition (DPC) collaborated in the development of a freely available and reliable PDF/A validator called veraPDF. Its development has been funded by the European Commission’s Preforma project and is supported by the PDF software developer community as organized in the PDF Association. If you are in doubt regarding the standard conformance of a particular PDF/A document we recommend to check the issue with veraPDF. 6 Whitepaper: A Technical Introduction to PDF/A, 2021-04 PDFlib GmbH www.pdflib.com Processing PDF/A Documents Special care must be taken when processing PDF/A documents in order to maintain standard conformance. Even simple operations may spoil a document’s conformance. It is therefore crucial to deploy only tools which are PDF/A-aware to guard against the risk that PDF/A documents are modified in a way which violates the standard. Splitting and Merging Even simple operations may result in non-conforming documents. For example, inserting a page in a PDF/A document poses several immediate dangers: > > If the inserted page stems from a non-PDF/A document, it may use unembedded fonts. > > Even if the imported page stems from a PDF/A document dangers lurk in multiple areas. For example, the color characteristics (e.g. output intent) of both documents don’t necessarily match, which could result in non-conforming output. > > A small operation such as adding a metadata field may violate the standard unless the software properly implements the rules for XMP metadata as mandated by PDF/A-1/2/3. Any kind of content or metadata processing applied to PDF/A documents must be applied with PDF/Aaware software to avoid jeopardizing PDF/A conformance. Digital Signatures In order to make use of digital signatures in PDF/A workflows the signature software must be aware of PDF/A, i.e. observe the rules outlined above. The bottom line is that only PDF/A-aware tools must be used in PDF/A workflows; otherwise PDF/A conformance may be spoiled. In order to avoid PDF/A violations through accidental modification Adobe Acrobat opens PDF/A documents in read-only mode by default. Once the available editing and modification tools in Acrobat are used, PDF/A conformance is no longer guaranteed. Document assembly and Tagged PDF Assembling documents from Tagged PDF pages is particularly tricky. On the technical level the structure hierarchies of the involved PDF documents must be combined which involves convoluted operations with the Tagging data structures. Even more difficult are semantic challenges. For example, the document assembly process must take into account the logical entities which are combined. For example, a structure element such as a paragraph or table may span multiple pages. If these pages are separated or combined in different order the structure hierarchy is easily spoiled. Document assembly with Tagged PDF requires careful planning of all involved semantic entities. For example, the task can be simplified if the workflow ensures that major semantic units like document sections start on a new page. PDF/A Support in PDFlib GmbH Products Creating PDF/A with PDFlib PDFlib GmbH introduced PDF/A functionality in its products in 2006. PDFlib products were the first with support for XMP extension schemas. All products in the PDFlib product family support all flavors of PDF/A-1, PDF/A-2 and PDF/A-3 (PDF/A-4 support in development). It provides application developers with a toolkit which allows the following PDF/A-related operations: > > create PDF/A from scratch, e.g. based on text from a database > > convert raster images (e.g. scans) to PDF/A > > process existing PDF/A documents, e.g. merge or split > > work with ICC profiles and device-independent color to deal with all color management issues > > create PDF/A level A with structure information (Tagged PDF), also in combination with PDF/UA > > assemble Tagged PDF/A from existing tagged pages > > attach XMP metadata to the generated documents, including XMP extension schemas > > attach PDF/A documents to PDF/A-2 or arbitrary file types to PDF/A-3 All of these operations can be implemented with simple PDFlib calls. Sample code for a variety of programming languages and development environments is provided with the PDFlib distribution. Additional programming techniques for PDF/A are available in the PDFlib Cookbook. Creating PDF/A-conforming output with PDFlib is achieved by the following means: > > PDFlib automatically takes care of several formal settings for PDF/A, such as PDF version number and required XMP identification entries. > > The PDFlib application program must explicitly use certain function calls and options (e.g. for font embedding). 7 Whitepaper: A Technical Introduction to PDF/A, 2021-04 PDFlib GmbH www.pdflib.com > > The PDFlib application program must refrain from using certain other function calls and option settings (e.g. encryption). If the PDFlib application program obeys to these rules valid PDF/A output is guaranteed. If PDFlib detects a violation of the PDF/A creation rules it throws an exception which must be handled by the application. No PDF output is created in case of an exception; there is no risk of creating non-conforming output. Details of required and prohibited operations are discussed in the PDFlib documentation. Processing PDF/A with PDFlib+PDI Creating PDF/A level A with PDFlib Additional rules apply when importing pages from existing PDF/A-conforming documents. When dealing with existing PDF/A documents, PDFlib+PDI carefully examines the PDF/A properties of all input and output documents to make sure that the output still conforms to PDF/A. For additional control the output intent of an imported document can be copied to the output PDF, effectively cloning the PDF/A color properties of an existing document. Similarly, XMP metadata from imported documents can be cloned or merged. PDF/A conformance level A can be regarded as level B plus Tagged PDF. PDFlib’s support for PDF/A level A is based on the features for producing Tagged PDF: each content item can be placed at a particular location in the document’s structure tree; content items which are not relevant for the document structure (e.g. headers and footers, pagination) can be tagged as Artifacts which means that they will be ignored when the document is read aloud by software or converted to some other format. Alternative text can be attached to images and vector graphics. PDFlib automatically tags tables and Artifacts which is a big time-saver for the developer. PDFlib checks the supplied tags to make sure that the structure element nesting and attributes conform to ISO 32000. For example, heading or list tags must be properly nested. Integrated support for PDF/UA makes it easy to create PDF output which is both accessible and archivable. Note that you need detailed knowledge about the document’s logical structure in order to create Tagged PDF. PDFlib takes care of the PDF-related details, but it cannot infer the document structure from its contents. PDF/A-conforming signatures with PLOP DS PDFlib PLOP DS is a toolkit for applying digital signatures to PDF documents according to the PAdES signature standards required for signatures according to European eIDAS regulations. PLOP DS applies signatures to PDF/A documents such that the signed output also conforms to PDF/A. PDFlib PDFlib GmbH Franziska-Bilek-Weg 9 80339 München, Germany support@pdflib.com www.pdflib.com/knowledge-base/pdfa PDFlib GmbH is completely focused on PDF technology. Customers worldwide use PDFlib products since 1997. The company closely follows development and market trends, such as ISO standards for PDF. PDFlib GmbH products are distributed all over the world with major markets in North America, Europe, and Japan. Founded in 2006 as PDF/A Competence Center, in 2011 the PDF association broadened its scope to cover all aspects of PDF technology. Today, it provides an industry meeting-place, and a platform for members to exercise thought-leadership in the community. > > Developers use the PDF Association to share knowledge and experience with PDF technology. > > Decision-makers use the PDF Association to learn about the role and capabilities of PDF and PDF’s subset standards in ECM and other electronic document applications. > > End-users benefit from improved reliability, quality and functionality and interoperability in their experience of electronic documents.