The term metadata literally means »data about data«. Metadata has been described as the business card of a particular digital document. Metadata often comprises a set of properties, where each property has specific meaning in the Extensible Metadata Platform (XMP).
Extensible Metadata Platform (XMP) is an XML-based format modeled after W3C’s RDF (Resource Description Framework) which forms the foundation of the semantic Web initiative. In 2012 XMP has been standardized as ISO 16684-1:2012.
XMP metadata travels with the file, and can be embedded in many common file formats including PDF, TIFF, and JPEG. Metadata properties are grouped in schemas. Each schema is identified by a unique namespace URI and holds an arbitrary number of properties. While namespace URIs look very similar to the familiar Web addresses (actually, they often look the same), it's important to note that they do not identify a particular Web page. In fact, namespace URIs are not required to point to any resource – they are simply unique identifiers for some entity used in XMP.
The XMP specification includes more than a dozen predefined schemas with hundreds of properties for common document and image characteristics. The most widely used predefined XMP schema is called the Dublin Core, or dc. It includes general properties such as Title, Creator, Subject, and Description. In addition to predefined schemas, custom schemas can be defined to cover company- or industry-specific metadata requirements.
The Dublin Core has been standardized as ISO 15836 (published in 2003, revised in 2009): »Information and documentation — The Dublin Core metadata element set«.
XMP is implemented in all Adobe publishing products and supported by dozens of independent software vendors and user groups. Adobe Bridge, part of the Creative Suite, deals with XMP metadata in various file formats.
XMP for PDF documents has been introduced with Acrobat 5 and PDF 1.4 in 2001. The predecessor of XMP in PDF was formed by simple key/value pairs, so-called document info entries, which served as the sole carrier of metadata prior to the introduction of XMP. While document info entries are still supported in Acrobat and PDF, XMP metadata is a much more powerful concept and allows metadata to survive format conversions, e.g. from scanned TIFF to PDF.
Please read our XMP Whitepaper for more information about XMP in PDF.
There are variuos ISO standards which specify PDF subsets for certain application domains, such as the graphic arts industry, archiving, or engineering. Except for the prepress standards PDF/X-1 and PDF/X-3 which have been introduced in 2001 and 2002, all ISO standards for PDF include the use of XMP metadata (even mandatory in most cases except ISO 32000). Unless mentioned otherwise all standards are based on XMP 2005:
PDF/A-1 in ISO 19005-1 (published in 2005): »Electronic document file format for long-term preservation – Use of PDF 1.4«. PDF/A-1 requires XMP for identifying conforming files and supports custom metadata through XMP extension schemas. XMP support in PDF/A-1 is based on the XMP 2004 specification.
PDF/A-2 in ISO 19005-2 (published in 2011): »Electronic document file format for long-term preservation – Part 2: Use of ISO 32000-1 (PDF/A-2)«
PDF/A-3 in ISO 19005-3 (published in 2012): »Electronic document file format for long-term preservation – Part 3: Use of ISO 32000-1 with support for embedded files (PDF/A-3)«
PDF/E-1 in ISO 24517-1 (published in 2008): »Engineering document format using PDF – Use of PDF 1.6«. XMP support in PDF/E is almost identical to PDF/A-1, except that it is based on the newer XMP 2005 specification.
PDF/X-4 in ISO 15930-7 (published in 2008, revised in 2010): »Complete exchange of printing data (PDF/X-4) and partial exchange of printing data with external profile reference (PDF/X-4p) using PDF 1.6«. Similar to PDF/A-1, XMP is required to express standard conformance in PDF/X-4.
PDF/X-5 in ISO 15930-8 (published in 2008, revised in 2010): »Partial exchange of printing data using PDF 1.6 (PDF/X-5)«. PDF/X-5 documents reference other PDF/X documents, where the target of such a reference is identified by using various XMP entries. This makes XMP a crucial component of PDF/X-5.
ISO 32000-1 (published in 2008http://www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf): »Document management – Portable document format – PDF 1.7«. ISO 32000 is the standardized version of PDF 1.7. The technical content is identical to PDF 1.7 which fully supports XMP metadata.
PDF/VT in ISO 16612 (published in 2010): »Variable data exchange – Part 2: Using PDF/X-4 and PDF/X-5 (PDF/VT-1 and PDF/VT-2)«. For more information on PDF/VT see the PDF/VT pages in our knowledge base.
PDF/UA-1 in ISO 14289-1 (published in 2012): »Document management applications – Electronic document file format enhancement for accessibility - Part 1: Use of ISO 32000-1 (PDF/UA-1)«
ISO 32000-2 (still in the standardization process as of 2016): »Document management – Portable Document Format – PDF 2.0«. PDF 2.0 will deprecate old-style document info entries (with the exception of the CreationDate and ModDate entries) in favor of XMP metadata.
PDFlib products include the following kinds of support for XMP in PDF (for details please refer to our XMP Whitepaper and the product-specific pages):
PDFlib: create PDF documents with XMP metadata on document, page or image level. Sample code for importing and embedding XMP with PDFlib can be found in the PDFlib Cookbook. XMP handling for PDF/A including XMP extension schemas is demonstrated here.
pCOS: extract XMP from PDF on document, page or image level
PDFlib Text and Image Extraction Toolkit (TET): include XMP in XML created from PDF documents
TET PDF IFilter: make XMP metadata searchable with Windows search engines
Adobe's main XMP page:
XMP 2012 specification and Adobe’s XMP developer's page with XMP Toolkit:
XMP 2004 specification (only relevant for PDF/A-1):
XMP 2005 specification:
XMP standard ISO 16684-1 (published in 2012): »Extensible metadata platform (XMP) specification — Part 1: Data model, serialization and core properties«