Please note that the new major release TET 5 is available here.
The current page covers the previous version TET 4.4 which will be available in parallel for some time. TET 5 is recommended for all new customers; TET 4.4 is available only for existing integration customers.
PDFlib TET (Text Extraction Toolkit) reliably extracts text, images and metadata from PDF documents. TET makes available the text contents of a PDF as Unicode strings, plus detailed glyph and font information as well as the position on the page. Raster images are extracted in common raster formats. TET optionally converts PDF documents to an XML-based format called TETML which contains text and metadata as well as resource information.
TET contains advanced content analysis algorithms for determining word boundaries, grouping text into columns and removing redundant text. Using the integrated pCOS interface you can retrieve arbitrary objects from the PDF, such as metadata, interactive elements, etc.
With PDFlib TET you can:
Implement the PDF indexer for a search engine
Repurpose the text and images in PDFs
Convert the contents of PDFs to other formats
Process PDFs based on their contents, e.g. splitting based on headings (requires PDFlib+PDI in addition to TET)
PDFlib TET 4.4 is available under commercial licensing.
PDFlib TET can be downloaded in a package which is fully functional, but has certain volume restrictions unless a valid license key is applied.
The PDFlib GmbH License Guide discusses details on licensing and support including update and upgrade conditions.
For mobile and embedded systems please contact email@example.com.
Ask us for extended licenses (unlimited distribution) at firstname.lastname@example.org.
Like all EU businesses, PDFlib GmbH’s invoicing and VAT handling is governed by EU law:
Customers from non-EU countries will not be charged VAT.
Customers from Germany will be charged 19% VAT (MwSt.).
Customers from all other EU countries must provide a valid VAT identification number. Orders without a valid VAT ID cannot be processed.
For more information please see here.