Process the text contents of PDF documents
Simple text extractor
Create a list of all unique words in the document.
Create a sorted list of all words in the document along with the page numbers where the words occur.
Analyze font information in PDF documents
Identify the locations in a PDF where a particular font is used; print the page number, location, and start of text for each hit
Extract raster images from PDF documents
Find out image resolutions
Count images in a PDF according to various interpretations
Resource-based image extractor based on PDFlib TET
Simple image reader
PDF image extractor based on PDFlib TET.
TET and PDFlib
Modify or enhance PDF document with PDFlib+PDI based on their text contents
Enhance PDFs with TET and PDFlib+PDI.
Generate bookmarks based on specific page content.
Split a document into smaller parts based on some page contents.
Highlight text on imported pages based on some criteria.
Find text with TET, hide it with a white rectangle, and add the replacement text on top of it.
Automatically create table of contents based on tyographic rules.
Highlight unmapped glyphs (i.e. glyphs for which TET could not determine a Unicode mapping).
Highlight text in certain fonts.
TETML and XSLT
Convert PDF documents to TETML and process TETML with XSLT
Simple TETML converter
Convert TETML to HTML.
Generate input for the Solr enterprise search server.
Extract raw text from TETML input.
Extract XMP metadata from TETML.
Extract a table to CSV file.
Create a concordance.
words in a document which use a particular font in a size larger than a specified value
font occurrences with page number and position
font and glyph statistics
Extract text and images from attachments.
Classify the pages in a document according to text or image content.
Restrict text extraction to a particular area on the page.
Process multiple documents in a loop.