Stable Releases

The stable releases are available via the file release system. The distribution packages contain the library sources and the necessary third-party libraries. A binary library package is available too.

The release also contains a separate PDFToHTML.jar command-line convertor that can be used for converting PDF files to HTML from command line:

java -jar PDFToHTML.jar <infile> [<outfile>]

Maven

Stable releases are available as Maven artifacts. Just use the following dependency:

<dependency>
    <groupId>net.sf.cssbox</groupId>
    <artifactId>pdf2dom</artifactId>
    <version>1.8</version>
</dependency>

Git Repository

The code is hosted in a public repository on GitHub. You may obtain the most recent code using

git clone https://github.com/radkovo/Pdf2Dom.git

The repository includes the latest improvements and bugfixes. Therefore, the repository version of SwingBox usually gives better results than the latest release. However, some of the improvements may not be fully tested.

Requirements

Pdf2Dom has been developed and tested with Java 7+.

The PDF parser is based on the Apache PDFBox library. This library and all its dependencies are necessary for compiling and running Pdf2Dom. The unchanged versions of these libraries are included in the release packages and the repository.

Optionally, Pdf2Dom may be used as a front-end for the CSSBox rendering engine in order to make it render the PDF files.