Advantages for Image Extraction

PDFlib TET 5 - Unique Advantages for Image Extraction


Color Spaces and Compression

Names of all color spaces in PDF

Raster image data in PDF may be encoded in a combination of eleven color spaces and nine compression filters, but common image file formats such as JPEG and TIFF support only a subset of these combinations. TET’s image engine balances the characteristics of the PDF image with the capabilities of the image output formats. Regardless of the internal structure of the PDF image, the pixel image is extracted in one of the common image file formats.

TET processes all color spaces which may be used in PDF.


Spot Colors

Screenshot: spot color channels in Adobe PhotoshopScreenshot: spot channel options in Adobe Photoshop

In addition to CMYK process colors images in PDF may use custom spot colors. Technically, these color spaces are known as Separation (single channel) and DeviceN (multiple channels).

TET creates TIFF output with additional spot color channels. This is intended for applications which need superior color fidelity and cannot accept any color conversion. If an image with DeviceN color includes only a subset of the common CMYK process colors (e.g. only Cyan and Magenta) the missing process channels are added so that plain CMYK output can be created.

However, not all applications are able to handle spot color channels; some are restricted to plain TIFF output. In this case TET can be instructed to emit a spot color channel as grayscale TIFF to facilitate processing.

Photoshop displays spot color channels of extracted TIFF images in the Channels window (top). Double-clicking one of the icons reveals the alternate color (bottom).


Merging fragmented Images

Screenshot: image fragmented in small irregular parts

Screenshot: fragment image parts recombined to a single large image

The images in many PDF documents are broken into small fragments by the software producing the PDF. What appears to be a single image on the page may actually consist of many small pieces. For example, Microsoft Office applications often produce heavily fragmented images which consist of hundreds or thousands of small fragments. Adobe InDesign often segments images into fragments of varying size in a process called »transparency flattening«. TET detects fragmented images and merges the pieces to form a usable larger image if the combined result forms a rectangular pixel grid. Only with image merging fragmented images can reasonably be repurposed.

Although the image is segmented into smaller parts (top),
TET extracts it as a single reusable image (bottom).