CSSBox Manual

Introduction
Basic usage
Rendered Document Model
Feedback

Introduction

CSSBox is an (X)HTML/CSS rendering engine written in pure Java. Its primary purpose is to provide a complete and further processable information about the rendered page contents and layout.

This manual gives a short overview of the CSSBox usage. It shows how to render a document and it explains how the resulting page is represented and how the basic information about the individual parts can be obtained.

More detailed information about the individual classes can be obtained from the API documentation.

Any feedback to CSSBox and/or this manual is welcome via the CSSBox website.

Basic usage

The input of the rendering engine is a document DOM tree. The engine is able to automatically load the style sheets referenced in the document and it computes the efficient style of each element. Afterwrads, the document layout is computed.

Document Loading

CSSBox generally expects an implementation of the DOM on its input represented by its root Document node. The way how the DOM is obtained is not important for CSSBox. However, in most situations, the DOM is obtained by parsing a HTML or XML file. Therefore, CSSBox provides a framework for binding a parser to the layout engine. Moreover it contains a default parser implementation that may be simply used or it can be easily replaced by a custom implementation when required. The default implementation is based on the NekoHTML parser and Xerces 2. The details about using a different parser are described in the Custom Document Sources and Parsers section.

With the default implementation, the following code reads and parses the document based on its URL:

//Open the network connection 
DocumentSource docSource = new DefaultDocumentSource(urlstring);

//Parse the input document
DOMSource parser = new DefaultDOMSource(docSource);
Document doc = parser.parse(); //doc represents the obtained DOM

For the initial DOM and style sheet processing, a DOMAnalyzer object is used. It is initialized with the DOM tree and the base URL:

DOMAnalyzer da = new DOMAnalyzer(doc, docSource.getURL());
da.attributesToStyles(); //convert the HTML presentation attributes to inline styles
da.addStyleSheet(null, CSSNorm.stdStyleSheet(), DOMAnalyzer.Origin.AGENT); //use the standard style sheet
da.addStyleSheet(null, CSSNorm.userStyleSheet(), DOMAnalyzer.Origin.AGENT); //use the additional style sheet
da.addStyleSheet(null, CSSNorm.formsStyleSheet(), DOMAnalyzer.Origin.AGENT); //(optional) use the forms style sheet
da.getStyleSheets(); //load the author style sheets

The attributesToStyles() method converts some HTML presentation attributes to CSS styles (e.g. the <font> tag attributes, table attributes and some more). If (X)HTML interpretation is not required, this method need not be called. When used, this method should be called before getStyleSheets() is used.

The addStyleSheet() method is used to add a style sheet to the document. The style sheet is passed as a text string containing the CSS code. In our case, we add two built-in style sheets that represent the standard document style. These style sheets are imported as the user agent style sheets according to the CSS specification. The CSSNorm.stdStyleSheet() method returns the default style sheet recommended by the CSS specification and the CSSNorm.userStyleSheet() contains some additional CSSBox definitions not covered by the standard.

Optionally, the CSSNorm.formsStyleSheet() includes a basic style of form input fields. This style sheet may be used for basic rendering of the form fields when their rendering and functionality is not implemented in any other way in the application.

Finally, the getStyleSheets() method loads and processes all the internal and external style sheets referenced from the document including the inline style definitions. In case of external style sheets, CSSBox tries to obtain the file from the corresponding URL, if accessible.

The resulting DOMAnalyzer object represents the document code together with the associated style.

Media Support

By default, the DOMAnalyzer assumes that the page is being rendered on a standard desktop computer screen. During the style sheet processing, it uses the "screen" media type and some default display feature values for evaluating the possible media queries.

A different media type or feature values may be specified by creating a new media specification represented as a MediaSpec object from the jStyleParser API. The typical usage would be the following:

//we will use the "screen" media type for rendering
MediaSpec media = new MediaSpec("screen");

//specify some media feature values
media.setDimensions(1000, 800); //set the visible area size in pixels
media.setDeviceDimensions(1600, 1200); //set the display size in pixels

//use the media specification in the analyzer
DOMAnalyzer da = new DOMAnalyzer(doc, docSource.getURL());
da.setMediaSpec(media);
//... continue with the DOMAnalyzer initialization as above

The BoxBrowser demo shows a basic usage of the media specifications in a Swing application.

Obtaining the Layout

The whole layout engine is represented by a graphical BrowserCanvas object. The simplest way of creating the layout is passing the initial viewport dimensions to the BrowserCanvas constructor. Then, the layout is computed automatically by creating an instance of this object. The remaining constructor arguments are the root DOM element, the DOMAnalyzer used for obtaining the element styles and the document base URL used for loading images and other referenced content.

BrowserCanvas browser = 
        new BrowserCanvas(da.getRoot(),
                          da,
                          new java.awt.Dimension(1000, 600),
                          url);

When further browser configuration is required, the BrowserCanvas may be created without specifying the viewport dimensions. Then, the layout is not computed automatically and it must be created by a subsequent call of the createLayout method. Before creating the layout, the browser configuration may be changed.

BrowserCanvas browser = 
        new BrowserCanvas(da.getRoot(), da, url);
//... modify the browser configuration here ...
browser.createLayout(new java.awt.Dimension(1000, 600));

Optionally, the createLayout method allows to specify different values for the preferred total canvas size and the visible area size and position] (the CSS viewport size):

BrowserCanvas browser = 
        new BrowserCanvas(da.getRoot(), da, url);
//... modify the browser configuration here ...
browser.createLayout(new java.awt.Dimension(1200, 600), new java.awt.Rectangle(0, 0, 1000, 600));

In this case, the preferred size of the resulting page is 1200x600 pixels (it may be adjusted during the layout computation based on the page contents) and the size of the visible area is 1000x600 pixels; the visible area is in the top left corner of the rendered page. The visible area size and position is used during the layout computation and it may influence the positions of positioned elements according to the CSS specification.

Setting the visible area size automatically updates the used media specification (see the previous section) so that the same size of the visible area is used for evaluating the CSS media queries. However, this behavior may be disabled by calling setAutoMediaUpdate(false) before creating the layout. In that case, the visible area size used for the layout computation may be different from the size used for media queries.

In all cases, the created browser object can be directly used for both displaying the rendered document and for obtaining the created layout model. The details of the browser configuration are described in the Configuration Options section.

Displaying the document

The BrowserCanvas class is directly derived from the Swing javax.swing.JPanel class. Therefore, it can be directly used as a Swing user interface component. The size of the component is automatically adjusted according to the resulting document layout. The basic document displaying is shown in the SimpleBrowser example.

BrowserCanvas provides a simple display of the rendered page with no interactive elements. For obtaining an interactive browser component with text selection and clickable links, the SwingBox extension should be used.

Configuration Options

Current browser configuration is represented using a BrowserConfig object that may be accessed using the browser's getConfig() method. The following configuration options are available:

browser.getConfig().setLoadImages(boolean)

Configures whether to load the referenced content images automatically. The default value is true.

browser.getConfig().setLoadBackgroundImages(boolean)

Configures whether to load the CSS background images automatically. The default value is true.

browser.getConfig().setImageLoadTimeout(int)

Configures the timeout for loading images. The default value is 500ms.

browser.getConfig().useHTML(boolean)

Configures whether the engine should use the HTML extensions or not. Currently, the HTML extensions include the following:

Creating replaced boxes for <img> elements
Using the <body> element background for the whole canvas according to the HTML specification
Support for the embedded <object> elements.
Special handling of certain elements such as named anchors.

The default value is true.

browser.getConfig().setDefaultFont(String logical, String physical)

Configures the default physical fonts that should be used instead of the logical Java families. The typical usage is the following:

        browser.getConfig().setDefaultFont(java.awt.Font.SERIF, "Times New Roman");
        browser.getConfig().setDefaultFont(java.awt.Font.SANS_SERIF, "Arial");
        browser.getConfig().setDefaultFont(java.awt.Font.MONOSPACED, "Courier New");

browser.getConfig().registerDocumentSource(Class<? extends DocumentSource>)

browser.getConfig().registerDOMSource(Class<? extends DOMSource>)

Register the DocumentSource and DOMSource implementation used for automatic loading of the referenced documents. See the Custom Document Sources and Parsers section for details.

Custom Document Sources and Parsers

CSSBox contains two generic abstract classes that represent the document source and the parser and provides their default implementations:

DocumentSource represents a generic source of documents that is able to obtain a document based on its URL. The default DefaultDocumentSource implementation uses the standard Java URLConnection mechanism extended by the support of data: URL scheme.
DOMSource represents a parser that is able to create a DOM from a document source. The default DefaultDOMSource implementation is based on the NekoHTML parser.

The default implementations may be used for obtaining the DOM from an URL easily as shown in the Document Loading section. Moreover, CSSBox uses these implementations for obtaining the documents referenced from the HTML code such as images and embedded objects.

When a different implementation of the document source or the parser is required (e.g. for obtaining the documents from a non-standard source, using a different parser implementation, etc.), it is possible to create a custom implementation of the appropriate abstract class. Then, the new implementation may be registered using the browser configuration browser.getConfig().registerDocumentSource() and browser.getConfig().registerDOMSource() methods as mentioned above in the Configuration Options section.

Rendered Document Model

The resulting document layout is represented as a tree of boxes. Each box creates a rectangular area in the resulting page and it corresponds to a particular rendered HTML element. There may be multiple boxes corresponding to a single element; e.g. a multi-line paragraph <p> is split to several line boxes. A box may be either composed from child boxes or it may correspond to a particular part of the document content, which may be a text string or replaced content (e.g. images).

Each box is represented by an object which extends the Box abstract class. There exist several box types that roughly correspond to the computed value of the CSS display property for the corresponding element. Figure 1 shows the type hierarchy of boxes.

Figure 1: Box type hierarchy

The root node of the box tree is always represented by a Viewport object and it represents the browser viewport. It has always a single child which is called a root box. The root box corresponds to the root element of the HTML code passed to the BrowserCanvas for rendering. Usually, it is the <body> element.

The viewport and the root box are obtained using the getViewport() and getRootBox() methods of the BrowserCanvas.

In the following chapters, we will mention the most important methods that can be used for obtaining information about the resulting document layout. For other methods, see the CSSBox API reference.

Basic Box Properties

The basic box properties are defined in the Box abstract class. They are mostly related to the box position and size.

Box Position and Size

During the layout, the box position is first computed relatively to the containing box and in the next step, the absolute position in the page is computed. The box occupies a rectangular area in the page which includes the box contents, borders and margins. The contents of the box is always inside of this area and it is again a rectangle which is equal or smaller than the whole box. Following methods can be used for obtaining the positions and sizes:

getAbsoluteBounds(): Returns the absolute box position and size on the page including all the borders and margins. The result is the java.awt.Rectangle object.
getBounds(): Returns the box position and size relatively to the top-left corner of the containing block.
getAbsoluteContentX(): The absolute X coordinate of the top left corner of the box contents.
getAbsoluteContentY(): The absolute Y coordinate of the top left corner of the box contents.
getContentX(): The absolute X coordinate of the top left corner of the box contents relatively to the containing block.
getContentY(): The absolute Y coordinate of the top left corner of the box contents relatively to the containing block.
getContentWidth(): The width of the box content without any margins and borders.
getContentHeight(): The height of the box content without any margins and borders.

Box Tree Structure

getParent(): Returns the parent box of this box or null if this is the root box.
getContainingBlock(): Returns the containing block for this box.
getViewport(): The corresponding Viewport object (the root of the box tree).
getNode(): The DOM node this box corresponds to. There may be multiple boxes corresponding to a single DOM node.

Text Boxes

A text box always box corresponds to a DOM node of the type Text. It is represented as TextBox object. When the text is split to several lines, multiple boxes correspond to a single DOM node. In this case each of them contains a corresponding substring of the text.

Obtaining the Text Content

getText(): Returns the text string corresponding to this node.

Element Boxes

An element box corresponds to a DOM node of the type Element. It is represented as an object of the abstract ElementBox class. It can be implemented as an inline box or a block box as described below. The common methods of the element boxes are the following:

Box Tree Structure

Each element box may contain any number of nested child boxes that are indexed from 0 to n. When there exist multiple element boxes that correspond to a single DOM node, all these boxes share all the child boxes. The child boxes that belong to a particular element box can be determined using their index - the getStartChild() and the getEndChild() methods give the first and the last index of the child boxes that belong to this particular element box. Therefore, the normal way of processing all the child boxes of an element box b is following:

for (int i = b.getStartChild(); i < b.getEndChild(); i++)
{
    Box sub = b.getSubBox(i);
    //process the child box here...
}

The overview of the related ElementBox methods follows:

getStartChild(): Returns the index of the first child box contained in this element box.
getEndChild(): Returns the index of the first child box not contained in this element box.
getSubBox(int): Returns the child box with the given index.
getSubBoxNumber(): The number of child boxes in this box.
getElement(): The corresponding DOM Element.

Box Sizes

The dimensions of an element box are modelled according to the CSS Box Model. In addition to the content size discussed in the Box interface description, it consists of padding, border width and margin. All these dimensions are represented by objects of a special LengthSet class that contain the top, left, bottom and right values of the dimension.

There are two values of margin available: A computed value of the margin that corresponds to the specified style and an efficient margin value (emargin) that considers the margin collapsing and it is used during the layout.

Following methods can be used for obtaining the individual values:

getPadding(): Returns the padding sizes.
getBorder(): The border sizes.
getMargin(): The computed margin sizes.
getEMargin(): The efficient margin sizes used during the layout with margin collapsing.

Visual Properties of the Element Box

getDisplay(): Represents the value of the display: CSS property. The returned value is one of the following constants defined in the ElementBox class: DISPLAY_ANY, DISPLAY_BLOCK, DISPLAY_INLINE, DISPLAY_INLINE_BLOCK, DISPLAY_INLINE_TABLE, DISPLAY_LIST_ITEM, DISPLAY_NONE, DISPLAY_RUN_IN, DISPLAY_TABLE, DISPLAY_TABLE_CAPTION, DISPLAY_TABLE_CELL, DISPLAY_TABLE_COLUMN, DISPLAY_TABLE_COLUMN_GROUP, DISPLAY_TABLE_FOOTER_GROUP, DISPLAY_TABLE_HEADER_GROUP, DISPLAY_TABLE_ROW, DISPLAY_TABLE_ROW_GROUP.
getBgcolor(): The background color of the element or null when transparent.

Inline Boxes

Inline boxes are the element boxes with value of the CSS display: property equal to inline. These boxes are represented as the InlineBox objects. They have no special properties in addition to the properties defined in the ElementBox class.

Block Boxes

Block boxes, with the display: value set to block are represented by the BlockBox objects. In addition, the boxes with other values of the display: property are represented by several special classes derived from this class:

These objects differ mainly in the way how the boxes and their contents are laid out on the page. From the resulting layout model point of view, they have no special properties in addition to the properties defined in the ElementBox class.

Fonts and Colors

For each box, a VisualContext object is defined that gathers the information about the current text font and color. This object is obtained using the getVisualContext() method of the box. Following methods can be used for obtaining the appropriate information:

getColor(): Current text color.
getFont(): Returns the current font represented as the java.awt.Font object. From this object most of the font properties can be obtained.
getFontVariant(): Current font variant in the CSS syntax - i.e. normal or small-caps.
getTextDecoration(): Current text decoration in the CSS syntax - i.e. none, underline, overline, line-through or blink.

The background color is only applicable to element boxes as mentioned in Visual Properties of the Element Box.

Feedback

The CSSBox library and this manual are under development. Any feedback is welcome mainly via the forums and the bug tracker available via the project page. We will be very happy if you want to contribute to the code. In this case, please contact the author.

CSSBox

Subprojects

CSSBox Manual

Table of Contents

Introduction

Basic usage

Document Loading

Media Support

Obtaining the Layout

Displaying the document

Configuration Options

Custom Document Sources and Parsers

Rendered Document Model

Basic Box Properties

Box Position and Size

Box Tree Structure

Text Boxes

Obtaining the Text Content

Element Boxes

Box Tree Structure

Box Sizes

Visual Properties of the Element Box

Inline Boxes

Block Boxes

Fonts and Colors

Feedback