TET Cookbook



emptycheck Check whether a specified area on the page is empty, i.e. does not contain any text, vector graphics or image.
extract_highlighted_text Extract text under Highlight annotations.
get_attachments Extract the text from the document and recursively from all embedded PDF attachments.
identify_ocr Classify the pages in a document based on the page content.
multiple_documents Generalized form of the simple text extractor for multiple documents.
region_of_interest Restrict text extraction to a particular "region of interest", i.e. some area on the page based on knowledge about the document layout.