CSSBox is an (X)HTML/CSS rendering engine written in pure Java. Its primary purpose is to provide a complete and further processable information about the rendered page contents and layout.
This manual gives a short overview of the CSSBox usage. It shows how to render a document and it explains how the resulting page is represented and how the basic information about the individual parts can be obtained.
More detailed information about the individual classes can be obtained from the API documentation.
Any feedback to CSSBox and/or this manual is welcome via the CSSBox website.
The input of the rendering engine is a document DOM tree. The engine is able to automatically load the style sheets referenced in the document and it computes the efficient style of each element. Afterwrads, the document layout is computed.
CSSBox generally expects an implementation of the
DOM on its input represented
by its root Document
node. The way how the DOM is obtained is not important for CSSBox.
However, in most situations, the DOM is obtained by parsing a HTML or XML file. Therefore, CSSBox provides
a framework for binding a parser to the layout engine. Moreover it contains a default parser implementation
that may be simply used or it can be easily replaced by a custom implementation when required. The default
implementation is based on the NekoHTML parser
and Xerces 2. The details about using a different
parser are described in the Custom Document Sources and Parsers section.
With the default implementation, the following code reads and parses the document based on its URL:
//Open the network connection DocumentSource docSource = new DefaultDocumentSource(urlstring); //Parse the input document DOMSource parser = new DefaultDOMSource(docSource); Document doc = parser.parse(); //doc represents the obtained DOM
For the initial DOM and style sheet processing, a DOMAnalyzer object is used. It is initialized with the DOM tree and the base URL:
DOMAnalyzer da = new DOMAnalyzer(doc, docSource.getURL()); da.attributesToStyles(); //convert the HTML presentation attributes to inline styles da.addStyleSheet(null, CSSNorm.stdStyleSheet(), DOMAnalyzer.Origin.AGENT); //use the standard style sheet da.addStyleSheet(null, CSSNorm.userStyleSheet(), DOMAnalyzer.Origin.AGENT); //use the additional style sheet da.addStyleSheet(null, CSSNorm.formsStyleSheet(), DOMAnalyzer.Origin.AGENT); //(optional) use the forms style sheet da.getStyleSheets(); //load the author style sheets
The attributesToStyles()
method converts some HTML presentation attributes to CSS
styles (e.g. the <font>
tag attributes, table attributes and some more). If (X)HTML
interpretation is not required, this method need not be called. When used, this method should be called before
getStyleSheets()
is used.
The addStyleSheet() method is used to add a style sheet to the document. The style sheet is passed as a text string containing the CSS code. In our case, we add two built-in style sheets that represent the standard document style. These style sheets are imported as the user agent style sheets according to the CSS specification. The CSSNorm.stdStyleSheet() method returns the default style sheet recommended by the CSS specification and the CSSNorm.userStyleSheet() contains some additional CSSBox definitions not covered by the standard.
Optionally, the CSSNorm.formsStyleSheet() includes a basic style of form input fields. This style sheet may be used for basic rendering of the form fields when their rendering and functionality is not implemented in any other way in the application.
Finally, the getStyleSheets() method loads and processes all the internal and external style sheets referenced from the document including the inline style definitions. In case of external style sheets, CSSBox tries to obtain the file from the corresponding URL, if accessible.
The resulting DOMAnalyzer object represents the document code together with the associated style.
By default, the DOMAnalyzer
assumes that the page is being rendered on a standard desktop computer
screen. During the style sheet processing, it uses the "screen"
media type and some
default display feature values for
evaluating the possible media queries.
A different media type or feature values may be specified by creating a new media specification represented as a MediaSpec object from the jStyleParser API. The typical usage would be the following:
//we will use the "screen" media type for rendering MediaSpec media = new MediaSpec("screen"); //specify some media feature values media.setDimensions(1000, 800); //set the visible area size in pixels media.setDeviceDimensions(1600, 1200); //set the display size in pixels //use the media specification in the analyzer DOMAnalyzer da = new DOMAnalyzer(doc, docSource.getURL()); da.setMediaSpec(media); //... continue with the DOMAnalyzer initialization as above
The BoxBrowser demo shows a basic usage of the media specifications in a Swing application.
The whole layout engine is represented by a graphical BrowserCanvas object. The simplest way of creating the layout is passing the initial viewport dimensions to the BrowserCanvas constructor. Then, the layout is computed automatically by creating an instance of this object. The remaining constructor arguments are the root DOM element, the DOMAnalyzer used for obtaining the element styles and the document base URL used for loading images and other referenced content.
BrowserCanvas browser = new BrowserCanvas(da.getRoot(), da, new java.awt.Dimension(1000, 600), url);
When further browser configuration is required, the BrowserCanvas may be created without specifying the viewport dimensions. Then, the layout is not computed automatically and it must be created by a subsequent call of the createLayout method. Before creating the layout, the browser configuration may be changed.
BrowserCanvas browser = new BrowserCanvas(da.getRoot(), da, url); //... modify the browser configuration here ... browser.createLayout(new java.awt.Dimension(1000, 600));
Optionally, the createLayout method allows to specify different values for the preferred total canvas size and the visible area size and position] (the CSS viewport size):
BrowserCanvas browser = new BrowserCanvas(da.getRoot(), da, url); //... modify the browser configuration here ... browser.createLayout(new java.awt.Dimension(1200, 600), new java.awt.Rectangle(0, 0, 1000, 600));
In this case, the preferred size of the resulting page is 1200x600 pixels (it may be adjusted during the layout computation based on the page contents) and the size of the visible area is 1000x600 pixels; the visible area is in the top left corner of the rendered page. The visible area size and position is used during the layout computation and it may influence the positions of positioned elements according to the CSS specification.
Setting the visible area size automatically updates the used media specification (see the previous section) so that the same size of the visible area is used for evaluating the CSS media queries. However, this behavior may be disabled by calling setAutoMediaUpdate(false) before creating the layout. In that case, the visible area size used for the layout computation may be different from the size used for media queries.
In all cases, the created browser
object can be directly used for both displaying
the rendered document and for obtaining the created layout model. The details of the browser
configuration are described in the Configuration Options section.
The BrowserCanvas class is
directly derived from the Swing javax.swing.JPanel
class. Therefore, it can
be directly used as a Swing user interface component. The size of the component is
automatically adjusted according to the resulting document layout. The basic document
displaying is shown in the SimpleBrowser
example.
BrowserCanvas provides a simple display of the rendered page with no interactive elements. For obtaining an interactive browser component with text selection and clickable links, the SwingBox extension should be used.
Current browser configuration is represented using a BrowserConfig object that may be accessed using the browser's getConfig() method. The following configuration options are available:
true
.true
.<img>
elements<body>
element background for the whole canvas according to the HTML specification<object>
elements.true
.
browser.getConfig().setDefaultFont(java.awt.Font.SERIF, "Times New Roman"); browser.getConfig().setDefaultFont(java.awt.Font.SANS_SERIF, "Arial"); browser.getConfig().setDefaultFont(java.awt.Font.MONOSPACED, "Courier New");
CSSBox contains two generic abstract classes that represent the document source and the parser and provides their default implementations:
URLConnection
mechanism extended by the support of data:
URL scheme.The default implementations may be used for obtaining the DOM from an URL easily as shown in the Document Loading section. Moreover, CSSBox uses these implementations for obtaining the documents referenced from the HTML code such as images and embedded objects.
When a different implementation of the document source or the parser is required (e.g. for obtaining the documents from a non-standard source, using a different parser implementation, etc.), it is possible to create a custom implementation of the appropriate abstract class. Then, the new implementation may be registered using the browser configuration browser.getConfig().registerDocumentSource() and browser.getConfig().registerDOMSource() methods as mentioned above in the Configuration Options section.
The resulting document layout is represented as a tree of boxes. Each box
creates a rectangular area in the resulting page and it corresponds to a particular
rendered HTML element. There may be multiple boxes corresponding to a single element;
e.g. a multi-line paragraph <p>
is split to several line boxes. A box may be
either composed from child boxes or it may correspond to a particular part of the
document content, which may be a text string or replaced content (e.g. images).
Each box is represented by an object which extends the
Box abstract class. There exist
several box types that roughly correspond to the computed value of the CSS display
property for the corresponding element. Figure 1 shows the type hierarchy of boxes.
Figure 1: Box type hierarchy
The root node of the box tree is always represented by a
Viewport object and it represents
the browser viewport. It has always a single child which is called a root box.
The root box corresponds to the root element of the HTML code passed to the
BrowserCanvas for rendering.
Usually, it is the <body>
element.
The viewport and the root box are obtained using the getViewport() and getRootBox() methods of the BrowserCanvas.
In the following chapters, we will mention the most important methods that can be used for obtaining information about the resulting document layout. For other methods, see the CSSBox API reference.
The basic box properties are defined in the Box abstract class. They are mostly related to the box position and size.
During the layout, the box position is first computed relatively to the containing box and in the next step, the absolute position in the page is computed. The box occupies a rectangular area in the page which includes the box contents, borders and margins. The contents of the box is always inside of this area and it is again a rectangle which is equal or smaller than the whole box. Following methods can be used for obtaining the positions and sizes:
java.awt.Rectangle
object.null
if this is the root box.A text box always box corresponds to a DOM node of the type Text
. It is represented
as TextBox object. When the text is split to
several lines, multiple boxes correspond to a single DOM node. In this case each of them contains
a corresponding substring of the text.
An element box corresponds to a DOM node of the type Element
. It is represented
as an object of the abstract ElementBox
class. It can be implemented as an inline box or a block box as described below. The common
methods of the element boxes are the following:
Each element box may contain any number of nested child boxes that are indexed
from 0 to n. When there exist multiple element
boxes that correspond to a single DOM node, all these boxes share all the child boxes.
The child boxes that belong to a particular element box can be determined using
their index - the getStartChild()
and the getEndChild()
methods give the first and the last index of the child boxes that belong to this particular element box.
Therefore, the normal way of processing all the child boxes of an element box b
is following:
for (int i = b.getStartChild(); i < b.getEndChild(); i++) { Box sub = b.getSubBox(i); //process the child box here... }
The overview of the related ElementBox methods follows:
The dimensions of an element box are modelled according to the
CSS Box Model. In addition
to the content size discussed in the Box
interface description
, it consists of padding,
border width and margin. All these dimensions are represented
by objects of a special LengthSet
class that contain the top
, left
, bottom
and
right
values of the dimension.
There are two values of margin available: A computed value of the margin that corresponds to the specified style and an efficient margin value (emargin) that considers the margin collapsing and it is used during the layout.
Following methods can be used for obtaining the individual values:
display:
CSS property. The returned
value is one of the following constants defined in the
ElementBox class:
DISPLAY_ANY
,
DISPLAY_BLOCK
,
DISPLAY_INLINE
,
DISPLAY_INLINE_BLOCK
,
DISPLAY_INLINE_TABLE
,
DISPLAY_LIST_ITEM
,
DISPLAY_NONE
,
DISPLAY_RUN_IN
,
DISPLAY_TABLE
,
DISPLAY_TABLE_CAPTION
,
DISPLAY_TABLE_CELL
,
DISPLAY_TABLE_COLUMN
,
DISPLAY_TABLE_COLUMN_GROUP
,
DISPLAY_TABLE_FOOTER_GROUP
,
DISPLAY_TABLE_HEADER_GROUP
,
DISPLAY_TABLE_ROW
,
DISPLAY_TABLE_ROW_GROUP
.
Inline boxes are the element boxes with value
of the CSS display:
property equal to inline
. These boxes are
represented as the InlineBox objects.
They have no special properties in addition to the properties defined in the
ElementBox class.
Block boxes, with the display:
value set to block
are represented by the
BlockBox objects.
In addition, the boxes with other values of the display:
property
are represented by several special classes derived from this class:
These objects differ mainly in the way how the boxes and their contents are laid out on the page. From the resulting layout model point of view, they have no special properties in addition to the properties defined in the ElementBox class.
For each box, a VisualContext object is defined that gathers the information about the current text font and color. This object is obtained using the getVisualContext() method of the box. Following methods can be used for obtaining the appropriate information:
java.awt.Font
object.
From this object most of the font properties can be obtained.normal
or small-caps
.none
, underline
,
overline
, line-through
or blink
.The background color is only applicable to element boxes as mentioned in Visual Properties of the Element Box.
The CSSBox library and this manual are under development. Any feedback is welcome mainly via the forums and the bug tracker available via the project page. We will be very happy if you want to contribute to the code. In this case, please contact the author.