Documentation - Pdf2Dom parser

Command-line Convertor

The command-line PDF to HTML convertor is contained in the PDFToHTML.jar package that may be downloaded and directly executed on all the java-enabled platforms.

For converting a PDF file to a HTML web page just type:

java -jar PDFToHTML.jar <input_file> [<output_file>] [<options>]

where

<input_file> is the path to the source PDF file to be converted.
<output_file> is an optional name of the output HTML file. If not specified, the output name will be the same as the input name with the html suffix.
Options:
- -fm=[mode] Font handler mode. [mode] = EMBED_BASE64, SAVE_TO_DIR, IGNORE
- -fdir=[path] Directory to extract fonts to. [path] = font extract directory ie dir/my-font-dir
- -im=[mode] Image handler mode. [mode] = EMBED_BASE64, SAVE_TO_DIR, IGNORE
- -idir=[path] Directory to extract images to. [path] = image extract directory ie dir/my-image-dir

Library

Pdf2Dom may be used as a DOM interface to the Apache PDFBox™ library. The following example shows how to obtain a DOM model from a PDF file:

// load the PDF file using PDFBox PDDocument pdf = PDDocument.load(new java.io.File("file.pdf")); // create the DOM parser PDFDomTree parser = new PDFDomTree(); // parse the file and get the DOM Document Document dom = parser.createDOM(pdf);

See the PDFDomTree API documentation for more information.

API Documentation

Pdf2Dom API documentation is generated from the last snapshot.

CSSBox Pdf2Dom

Related projects

Command-line Convertor

Library

API Documentation

CSSBoxPdf2Dom

Related projects

Command-line Convertor

Library

API Documentation

CSSBox Pdf2Dom