XMP Metadata
The term metadata literally means »data about data«. Metadata has been described as the business card of a particular digital document. Metadata often comprises a set of properties, where each property has specific meaning in the Extensible Metadata Platform (XMP).
Extensible Metadata Platform (XMP)
Extensible Metadata Platform (XMP) is an XML-based format modeled after W3C’s RDF (Resource Description Framework) which forms the foundation of the semantic Web initiative. In 2012 XMP has been standardized as ISO 16684-1:2012 (revised in 2019).
XMP metadata travels with the file, and can be embedded in many common file formats including PDF, TIFF, and JPEG. Metadata properties are grouped in schemas. Each schema is identified by a unique namespace URI and holds an arbitrary number of properties. While namespace URIs look very similar to the familiar Web addresses (actually, they often look the same), it's important to note that they do not identify a particular Web page. In fact, namespace URIs are not required to point to any resource - they are simply unique identifiers for some entity used in XMP.
The XMP specification includes more than a dozen predefined schemas with hundreds of properties for common document and image characteristics. The most widely used predefined XMP schema is called the Dublin Core, or dc. It includes general properties such as Title, Creator, Subject, and Description. In addition to predefined schemas, custom schemas can be defined to cover company- or industry-specific metadata requirements.
The Dublin Core has been standardized as ISO 15836 (published in 2003, revised in 2009): »Information and documentation - The Dublin Core metadata element set«.
XMP is implemented in all Adobe publishing products and supported by dozens of independent software vendors and user groups. Adobe Bridge, part of the Creative Suite, deals with XMP metadata in various file formats.
PDF and XMP
XMP for PDF documents has been introduced with Acrobat 5 and PDF 1.4 in 2001. The predecessor of XMP in PDF was formed by simple key/value pairs, so-called document info entries, which served as the sole carrier of metadata prior to the introduction of XMP. While document info entries are still supported in Acrobat and PDF, XMP metadata is a much more powerful concept and allows metadata to survive format conversions, e.g. from scanned TIFF to PDF.
XMP mandated by ISO standards for PDF
There are various ISO standards which specify PDF subsets for certain application domains, such as the graphic arts industry, archiving, or engineering. Except for the prepress standards PDF/X-1 and PDF/X-3 which have been introduced as early as 2001 and 2002, all ISO standards for PDF include the use of XMP metadata (even mandatory in most cases except ISO 32000). This includes PDF/A, PDF/UA, PDF/E, PDF/X-4/5, and ISO 32000-1 (PDF 1.7). ISO 32000-2:2017 »Document management - Portable Document Format - PDF 2.0« deprecates old-style document info entries (with the exception of the CreationDate and ModDate entries) in favor of XMP metadata.
Summary of XMP support in PDFlib products
PDFlib products include the following kinds of support for XMP in PDF (for details please refer to the product-specific pages):
- PDFlib: create PDF documents with XMP metadata on document, page or image level. Sample code for importing and embedding XMP with PDFlib can be found in the PDFlib Cookbook. XMP handling for PDF/A including XMP extension schemas is demonstrated here.
- PDFlib+PDI, PDFlib Personalization Server (PPS): combine PDF documents with control over XMP metadata
- PLOP and PLOP DS: insert or remove XMP in existing PDF documents
- pCOS: extract XMP from PDF on document, page or image level
- PDFlib Text and Image Extraction Toolkit (TET): include XMP in XML created from PDF documents
- TET PDF IFilter: make XMP metadata searchable with Windows search engines
XMP Resources
Adobe's main XMP page:
http://www.adobe.com/products/xmp.html
XMP 2012 specification and Adobe’s XMP developer's page with XMP Toolkit:
http://www.adobe.com/devnet/xmp.html
XMP standard ISO 16684-1 (first published in 2012, revised in 2019): »Extensible metadata platform (XMP) specification - Part 1: Data model, serialization and core properties«