TET Cookbook



emptycheckCheck whether a specified area on the page is empty, i.e. does not contain any text, vector graphics or image.
get_attachmentsExtract the text from the document and recursively from all embedded PDF attachments.
identify_ocrClassify the pages in a document based on the page content.
multiple_documentsGeneralized form of the simple text extractor for multiple documents.
region_of_interestRestrict text extraction to a particular "region of interest", i.e. some area on the page based on knowledge about the document layout.