PDFlib TET 4.4 – Text Extraction Toolkit

Extract text from PDF: PDFlib TET PDF IFilter

Important: New major release TET 5 available

Please note that the new major release TET 5 is available here.

The current page covers the previous version TET 4.4 which will be available in parallel for some time. TET 5 is recommended for all new customers; TET 4.4 is available only for existing integration customers.

What is PDFlib TET?

PDFlib TET (Text Extraction Toolkit) reliably extracts text, images and metadata from PDF documents. TET makes available the text contents of a PDF as Unicode strings, plus detailed glyph and font information as well as the position on the page. Raster images are extracted in common raster formats. TET optionally converts PDF documents to an XML-based format called TETML which contains text and metadata as well as resource information.

TET contains advanced content analysis algorithms for determining word boundaries, grouping text into columns and removing redundant text. Using the integrated pCOS interface you can retrieve arbitrary objects from the PDF, such as metadata, interactive elements, etc.

With PDFlib TET you can:

Implement the PDF indexer for a search engine

Repurpose the text and images in PDFs

Convert the contents of PDFs to other formats

Process PDFs based on their contents, e.g. splitting based on headings (requires PDFlib+PDI in addition to TET)

PDFlib TET 4.4 is available under commercial licensing.

PDFlib TET can be downloaded in a package which is fully functional, but has certain volume restrictions unless a valid license key is applied.

Details of the commercial TET license can be found on our licensing web page or in our General License and Support Conditions which is also contained in the distribution. 

The PDFlib GmbH License Guide discusses details on licensing and support including update and upgrade conditions.

For mobile and embedded systems please contact sales@pdflib.com.

Ask us for extended licenses (unlimited distribution) at premium@pdflib.com.

Value added Tax (VAT)

Like all EU businesses, PDFlib GmbH’s invoicing and VAT handling is governed by EU law:

Customers from non-EU countries will not be charged VAT.

Customers from Germany will be charged 19% VAT (MwSt.).

Customers from all other EU countries must provide a valid VAT identification number. Orders without a valid VAT ID cannot be processed.

For more information please see here.