TET Cookbook


Text Extraction

back_of_the_book_indexCreate a sorted list of all words in the document along with the page numbers where the words occur.
concordanceCreate a sorted list of unique words in a document along with counts.
glyphinfoSimple PDF glyph dumper based on PDFlib TET.
text_extractorPDF text extractor based on PDFlib TET.
text_from_annotationsExtract text from annotations with PDFlib TET and the pCOS interface.