PDFUnbound

PDF Unbound

We believe that digital publishing should take full advantage of the opportunity to make e-books more valuable to users. We want to work with you to unlock the full potential of digital publications during the digital publishing workflow using our cross-platform PDF Unbound toolkit.

At the core of PDF Unbound is our semantic analysis engine for reconstructing the logical structure of input documents. PDF Unbound uses AI that learns to adapt to specific document types, allowing for drastic changes in output visualization without upsetting the structure and flow of the document.

Zinio magazines, PressReader, and ActiveTextbook-based projects use the PDF Unbound toolkit for directing digital publishing workflow.

What a Digital Publication Should Be

As digital materials are prepared for publication, there is opportunity for many enhancements over a plain document. It starts with small things like references to figures and tables that are clickable for navigation, live URL links, and text selection and highlighting.

More than that, digital publications should meet certain expectations: text that shows up nicely no matter the platform and screen size, images that can be seen in full resolution, full search capability, indexing, and useful tables of contents.

In some cases, the digital publication can be enriched by adding multimedia or interactive elements. All of these tasks have to be done during the publishing process. We present the PDF Unbound toolkit to streamline digital publishing workflow, and quickly move from input document to final presentation.

The Digital Publication Workflow

We envision the transformation from an input document to a fully digital publication to be a four part process. Evident Point’s digital publishing workflow tools are with you at every step.

PDF Parsing and Extraction

1. Parsing and Extraction

The first step in preparing an input document for presentation is to parse the document and extract text and other objects.

Evident Point has developed its own SDK called Odyssey for parsing PDF, extracting different PDF objects (such as text, pictures, fonts, vector graphics), and rendering all, or just marked, objects, for all documents or for a given page.

During this step, we also extract text (including word and paragraph reconstruction for PDFs) and detect any jumps in the text such as URL links or references to figures.

Finally, we extract high-resolution images from the original document and store them to be displayed on-demand (when clicked through from the low resolution versions in the e-book body).

PDF Structure Reconstruction

2. Structure Reconstruction

Evident Point has developed a unique AI-based approach that enables flexible presentation of final published documents by reconstructing document structure from print-ready formats such as PDF.

Our semantic analysis engine recognizes the role of different document fragments and their organization into sections, chapters, articles, etc.

PDF Unbound then builds a logical model of the document, along with a user interface for customizing the reconstruction using specific rules. The created DOM (document object model) allows faithful conversion to ePUB3 and other structured formats.

The main difference between our methods and our competitors is that we use adaptive procedures that can be "taught" to work with a specific type of document (e.g. scientific paper, newspaper, or textbook).

An example page with structure elements as recognized by PDF Unbound is shown below.

PDF Enrichment

3. Enrichment

Once the structure of the document is reconstructed, the next step is to add any desired supplementary materials. Documents can be enriched with multimedia overlays, supporting attached hi-res images, video and audio. They can also be made interactive with the integration of such elements as quizzes, comment threads, and dictionaries.

The PDF Unbound toolkit supports all attachments that could be used in ActiveTextbook, and thus is a perfect way to prepare material for use with Active Textbook.

PDF Presentation

4. Presentation

PDF Unbound accepts the following input formats: PDF, ePUB2/3, MS Word, HTML, PowerPoint or InDesign XML. Using these inputs, the PDF Unbound tool can create publications in the following formats: enhanced PDF, ePUB3 reflowable or fixed-format, plain HTML, specialized XML formats such as XMLNews, or the ActiveTextbook project format (MDP).

Custom input and output formats can be made available if requested. Because our semantic analysis allows us to know the roles of different elements, we can generate output substantially different from the appearance of the input document, without upsetting the document’s structure and flow.

We can include adaptive formatting tools that prepare the best layout and visualization depending on content and the presentation window.

PDF Unbound can target a specific presentation tool for visualization of the finished product; it is optimized for use with ActiveTextbook as the presentation tool.