Special |
emptycheck
|
Check whether a specified area on the page is empty, i.e. does not contain any text, vector graphics or image. |
extract_highlighted_text
|
Extract text under Highlight annotations. |
get_attachments
|
Extract the text from the document and recursively from all embedded PDF attachments. |
identify_ocr
|
Classify the pages in a document based on the page content. |
multiple_documents
|
Generalized form of the simple text extractor for multiple documents. |
region_of_interest
|
Restrict text extraction to a particular "region of interest", i.e. some area on the page based on knowledge about the document layout. |