Command-line Convertor

The command-line PDF to HTML convertor is contained in the PDFToHTML.jar package that may be downloaded and directly executed on all the java-enabled platforms.

For converting a PDF file to a HTML web page just type:

java -jar PDFToHTML.jar <input_file> [<output_file>] [<options>]

where

Library

Pdf2Dom may be used as a DOM interface to the Apache PDFBox™ library. The following example shows how to obtain a DOM model from a PDF file:

// load the PDF file using PDFBox PDDocument pdf = PDDocument.load(new java.io.File("file.pdf")); // create the DOM parser PDFDomTree parser = new PDFDomTree(); // parse the file and get the DOM Document Document dom = parser.createDOM(pdf);

See the PDFDomTree API documentation for more information.

API Documentation

Pdf2Dom API documentation is generated from the last snapshot.