CSSBox is an (X)HTML/CSS rendering engine written in pure Java. Its primary purpose is to provide a complete and further processable information about the rendered page contents and layout. However, it may be also used for browsing the rendered documents in Java Swing applications.
The input of the rendering engine is the document DOM tree and a set of style sheets referenced from the document. The output is an object-oriented model of the page layout. This model can be directly displayed but mainly, it is suitable for further processing by the layout analysis algorithms as for example the page segmentation or information extraction algorithms.
The core CSSBox library may be also used for obtaining a bitmap or vector (SVG) image of the rendered document. Using the SwingBox package, CSSBox may be used as an interactive web browser component in a Java Swing application.
CSSBox is licensed under the terms of the GNU Lesser General Public License version 3.
This research was supported by the Research Plan No. MSM 0021630528 – Security-Oriented Research in Information Technology.