Special | |
| emptycheck | Check whether a specified area on the page is empty, i.e. does not contain any text, vector graphics or image. |
| extract_highlighted_text | Extract text under Highlight annotations. |
| get_attachments | Extract the text from the document and recursively from all embedded PDF attachments. |
| identify_ocr | Classify the pages in a document based on the page content. |
| multiple_documents | Generalized form of the simple text extractor for multiple documents. |
| region_of_interest | Restrict text extraction to a particular "region of interest", i.e. some area on the page based on knowledge about the document layout. |