The command-line PDF to HTML convertor is contained in the PDFToHTML.jar
package that may be downloaded and directly executed on all the java-enabled platforms.
For converting a PDF file to a HTML web page just type:
java -jar PDFToHTML.jar <input_file> [<output_file>] [<options>]
where
<input_file>
is the path to the source PDF file to be converted.<output_file>
is an optional name of the output HTML file. If not specified, the output name will be the same as the input name with the html
suffix.Options:
-fm=[mode]
Font handler mode. [mode] = EMBED_BASE64, SAVE_TO_DIR, IGNORE-fdir=[path]
Directory to extract fonts to. [path] = font extract directory ie dir/my-font-dir-im=[mode]
Image handler mode. [mode] = EMBED_BASE64, SAVE_TO_DIR, IGNORE-idir=[path]
Directory to extract images to. [path] = image extract directory ie dir/my-image-dirPdf2Dom may be used as a DOM interface to the Apache PDFBox™ library. The following example shows how to obtain a DOM model from a PDF file:
See the PDFDomTree API documentation for more information.
Pdf2Dom API documentation is generated from the last snapshot.