home

Javazoid

source code

Report writers, iText and XML...

One of the topics I have been toying with recently is the idea of creating report writer functionality without using a fully functional, big-ticket-item report writer.

My idea is to create a package to write PDF or Excel or DOC files -- and not necessarily all 3 -- from a bunch of JDBC data. I did some research to find out that there are quite a few open source packages that support either one or the other.

One package that did catch my eye was iText, a PDF writer by Bruno Lowagie and Paulo Soares. What attracted me about this product was the web site, that has an extensive set of documentation... and there's nothing like a well-documented product to make you feel confident. Lowagie and Soares have done a major amount of work in creating a smooth component that will write out PDF files. iText is amazing, a real credit to open source initiatives...

My interest in iText was its capability to write out PDF in table-like structure based on around their PdfPTable class. You can set up headers, columns, ...

One problem with using iText -- at least from the standpoint of writing a report writer, is how to get the data from a structure like a JDBC resultset and format it with iText. To format the data problem with iText, it would be advantageous to scroll backwords in the resultset. However, scrollable resultsets aren't always supported.

Another problem is that iText, while it does a great job of writing to the PDF files structure, provides no interface for the user to specify the text to be written. This is a big problem for report writers, because you need to specify a large number of variables -- anything from the datasource items (particularly the sql...) to a variety of formatting issues (like the 3rd column is numeric and should be left aligned.)

I have very little time or desire to write the kind of interface you would find in a commercial report writer product. Quite often, I find these graphical interfaces difficult to learn and often overkill.

After thinking about it for a while, I concluded that an xml structure would be about the best I could put together as an interface and would be the quickest to code and wouldn't be much of a stretch to serve as a user-interface for the report specification.

If you look at most simple xml files, you see that the structure is pretty much self-explanatory. For my report specification, I concluded that the report required a number of top-level variables with a series of collections like columns, bands, summaries. To help the user, I provide a report.xml file that outlines the structure with sample columns, bands, summaries.

Much of the specification demands the user select from a range of possibilities. For example, I decided to support only two paper types (although iText has a lot more...). To tell the user that they could select from either LETTER or LEGAL, I added a comment field with the range of possibilities. I could have set up an XML DTD structure to limit the range, but, for simplicity I decided the comment would suffice.


<?xml version='1.0' encoding='utf-8'?>
<!-- A Report Definition -->
<!-- this is a template file you can use to build more reports -->
<report>
<name></name> <!-- identifier not used in the actual report -->
<fileName>customerss.pdf</fileName><!-- the PDF file output -->
<title>Customers</title> <!-- title goes in the center of the header on every page -->
<fontSize>8</fontSize> <!-- font for data -->
<fontFamily>Times-Roman</fontFamily>
<!-- one of: Courier-Bold Courier-Oblique Courier-BoldOblique Helvetica Helvetica-Bold Helvetica-Oblique Helvetica-BoldOblique Symbol Times-Roman Times-Bold Times-BoldItalic ZapfDingbats -->
<headerFontFamily>Times-Bold</headerFontFamily><!-- font for header/columns/bands/summaries -->
<headerFontSize>8</headerFontSize>
<margin> <!-- pixels for you margins -->
<margintop>30</margintop>
<marginbottom>10</marginbottom>
<marginleft>10</marginleft>
<marginright>10</marginright>
</margin>
<pageSize>LETTER</pageSize> <!-- LETTER or LEGAL for now -->
<pageLayout>LANDSCAPE</pageLayout> <!-- PORTRAIT or LANDSCAPE -->
<author>gervase</author> <!-- creator of the report -->
<subject>testing</subject> <!-- subject, which appears in PDF properties dlg -->


<bands> <!-- support breaks on multiple data elements -->
<bandColumn>
<bandTitle>State</bandTitle> <!-- caption for the band -->
<bandSqlName>State</bandSqlName><!-- sql column -->
</bandColumn>

</bands>
<columns> <!-- collection of columns -->
<column>
<coltitle>Lastname</coltitle> <!-- the title appears at the top of each page -->
<widthPercent>5</widthPercent> <!-- width, should add up to 100 for all columns -->
<alignment>LEFT</alignment> <!-- one of: LEFT, RIGHT, CENTER -->
<sqlName>lastname</sqlName> <!-- reference the column name in your SQL -->
</column>

</columns>
<summaries> <!-- collection of summary fields -->
<summary>
<summaryCaption>Customer Total:</summaryCaption>
<summarySqlColumn>order_total</summarySqlColumn> <!-- match this column to the column's sqlName you wish to sum on -->
<summaryFunction>Total</summaryFunction>
<summaryFormatPattern>#,###</summaryFormatPattern>
</summary>

</summaries>

<!-- DATA -->
<sql>
SELECT state,lastname,order_total from mytable
</sql>

<sqlDriver>com.driver.mmsql...</sqlDriver> <!-- params for SqlData class: make sure the driver is on the classpath -->
<sqlConnect>jdbc</sqlConnect>
<sqlUsername>user</sqlUsername>
<sqlPassword>pwd</sqlPassword>


</report>

As you can see, the interface is a little busy. There are quite a few comments, but I decided that the user could, when they created their own xml spec, delete the more obvious comments.

That was the easy part.... My biggest issue is that I know just about nothing about XML. You would think that you could do a google search and find a plethora of code examples that would let you read the kind of structure you see above.

But no... the truth is that XML, despite being a widely-known standard and being a relatively non-rocket-science data structure, is already a confusing mire of "specifications" with implementations that render their own versions of how the spec should be read in. In fact, if you know nothing about XML, it would be better to write your own parser. You could probably knock the job off in a good night, whereas if you venture into XML parsing, you'll probably take quick a few nights of research and following false trails.

I spent several nights tracking down info in DOM and JDOM and SAX and every other flavour until I found a book by Elliote Rusty Harold called Processing XML with Java at http://www.ibiblio.org/xml/books/xmljava/. This site has the full text version of Harold's book. I feel I need to digress here... I really think the future of books on Java is this kind of site instead of my basement collection which seems outdated after two weeks. In terms of trees killed and the amount of real estate my collection requires, I wonder why the concept of on-line books hasn't taken off better than it has. Anyway, you can read the entire contents of the Processing XML with Java on-line and Harold hopes you will like it enough to buy a copy.

From my point of view, it is an excellent introduction and serves to weed through the entire history of specification and implementation in a readable manner. It also contained a chapter called "Reading XML" which doesn't seem to be covered anywhere on the net. You would think that the topic of how to read an XML structure after parsing it would be a popularone. However, most sites show you how to create one, how to traverse a DOM tree and put it in a Swing tree, how to deal with whitespace, how to write from Java objects to XML, but almost nothing on reading in XML.

Processing XML has a wonderful chapter on that topic. Harold takes you through the varieties of XML parser implementation -- SAX, DOM, JAXP, JDOM, dom4j, electricXML and XMLPULL. For me, it was a little lengthy, but clarified a great of confusion. And helped me decide that JAXP, created from the Apache Crimson project and now supported in Java JDK 1.4 was a no-brainer, largely because it seems to be the simplest means of reading in the structure I need to created the report specification.

Not that it is all that simple or error-prone. My first discovery was that you cannot simply create a collection structure like columns and expect the parser to read in your Column structure. What JAXP will do is read each of the variables in a column in sequential fashion. The trick is to make sure that a variable of Column (say.. sqlName) isn't read in from a another variable in another structure (say Band, which is a column-like structure that also has a sqlName variable because it needs to access an SQL column.)

You must make sure that the parser can differentiate between Band variables and Column variables. The only way I could figure to do it was to make sure there were no name collisions between Band and Column. Therefore, you'll note that Band has a field called bandSqlName and Summary has a summarySqlColumn field. When I thought about it it was obvious that I had better preceed all the Band fields with a "band" identifier. In a way, this kind of sucks, but that's XML.

When it all came down to it, reading the XML required very little code. In fact, there are two steps in reading the XML files

Step 1. Parse the document.


DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document doc= builder.parse(this.xmlFile);

Step 2. Pull items out of the Document object.

Report report = new Report();
report.author = getValueForElement("author", 0, doc);
report.fileName = getValueForElement("fileName", 0, doc);
report.pageLayout = getValueForElement("pageLayout", 0, doc);
report.pageSize = getValueForElement("pageSize", 0, doc);
report.subject = getValueForElement("subject", 0, doc);
report.name = getValueForElement("name", 0, doc);
report.sql=getValueForElement("sql", 0, doc);
report.title=getValueForElement("title", 0, doc);


private String getValueForElement( String element, int index, Document document) { Node node; Text result; String value; node = document.getElementsByTagName(element).item(index); result = (Text) node.getFirstChild(); if (result == null) return ""; return result.getNodeValue(); } }

Reading the collections proved to be pretty simple, too. Using the document's getElementsByTagName() method, you can refer to node by its index.

private Column[] buildColumns(Document document){
   Column thisColumn ;
   int size = document.getElementsByTagName("column").getLength();
   Column[] columns  = new Column[size];
		
   for (int i = 0; i < size; i++){
 	thisColumn = new Column(); 
        thisColumn.setTitle(getValueForElement("coltitle", i, document));
	thisColumn.setWidthPercent( this.getInteger(
		getValueForElement("widthPercent", i, document)));
	thisColumn.setAlignment(
		getValueForElement("alignment", i, document));
	thisColumn.setSqlName(getValueForElement("sqlName", i, document));
	columns[i] = thisColumn; 
	}
		
		
	return columns;
    }
}

All told, the problem I was seeking to solve, was pretty simple once I found Harold's book. I even found myself dipping into some of the other chapters...

Resources

You can see the obtain the latest version of iText and see one of Open Source's best set of documentation at Bruno Lowagie's site.

Co-author Paolo Soares maintains a more unofficial site that has the latest source code along with a number of useful tools.

If you are interested in reading XML, take a look at Processing XML with Java by Elliotte Rusty Harold. You can read the full text at the author's web site.

Copyright (c) Gervase Gallant 2002.