TET 3 offers a variety of new features, including raster image and table extraction, XML output, a new Cookbook, and a number of TET connectors to interface TET with other software. Various new workarounds make it possible to extract the text from PDF documents where this was previously not possible, e.g. some PDFs generated with the TeX typesetting system.
TETML represents PDF contents in XML. TET optionally represents the PDF contents in an XML flavor called TETML . It contains a variety of PDF information in a form which can easily be processed with common XML tools. TETML contains the actual text plus optionally font and position information, resource details (fonts, images, colorspaces), and metadata.
Read more!
A PDF/A validator is a tool which checks whether or not a document conforms to the PDF/A standard. PDFlib GmbH conducted a comprehensive test of PDF/A validation tools.
After conducting the Bavaria tests we can certainly say that the status of PDF/A validation is considerably better than in the early days, and the Isartor test suite and PDF/A TechNotes have significantly influenced the quality of validation.
While the Bavaria report reveals shortcomings in the current product generation, we hope to contribute to further improvements and to help vendors of validation tools to enhance the accuracy of their products.
Read more!The minor update PDFlib 7.0.4 has been released today.
The new version of TET extracts text from PDF documents, retrieves raster image data and tables, and converts PDF documents to XML.
The PDFlib Cookbook has been extended by many new topics.
A list of new topics can be found here.
The PDFlib Cookbook is a collection of PDFlib coding fragments for solving specific problems. The Cookbook topics help you develop solutions based on the PDFlib product family. You can browse Java code and PDF output for all Cookbook topics, or download a package with all code samples along with auxiliary input data.