Nothing makes you want Groovy more than XML

I’m in Delaware this week teaching a course in Java Web Services using RAD7. The materials include a chapter on basic XML parsing using Java. An exercise at the end of the chapter presented the students with a trivial XML file, similar to:

  <book isbn="1932394842">
    <title>Groovy in Action</title>
    <author>Dierk Koenig</author>
  <book isbn="1590597583">
    <title>Definitive Guide to Grails</title>
    <author>Graeme Rocher</author>
  <book isbn="0978739299">
    <title>Groovy Recipes</title>
    <author>Scott Davis</author>

(with different books, of course) and asked the students to find a book with a particular isbn number and print it’s title and author values.

I sighed and went to work, producing a solution roughly like this:

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class ParseLibrary {
    public static void main(String[] args) {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        Document doc = null;
        try {
            DocumentBuilder builder = factory.newDocumentBuilder();
            doc = builder.parse("books.xml");
        } catch (Exception e) {
        NodeList books = doc.getElementsByTagName("book");
        for (int i = 0; i < books.getLength(); i++) {
            Element book = (Element) books.item(i);
            if (book.getAttribute("isbn").equals("1932394842")) {
                NodeList children = book.getChildNodes();
                for (int j = 0; j < children.getLength(); j++) {
                    Node child = children.item(j);
                    if (child.getNodeType() == Node.ELEMENT_NODE) {
                        if (child.getNodeName().equals("title")) {
                            System.out.println("Title: "
                                + child.getFirstChild().getNodeValue());
                        } else if (child.getNodeName().equals("author")) {
                            System.out.println("Author: "
                                + child.getFirstChild().getNodeValue());

The materials didn’t supply a DTD, so I didn’t have any ID attributes to make it easier to get to the book I wanted. That meant I was reduced to continually using getElementsByTagName(String). I certainly didn’t want to traverse the tree, what with all those whitespace nodes containing the carriage-return/line-feed characters. So I found the book nodes, cast them to Element (because only Elements have attributes), found the book I wanted, got all of its children, found the title and author child elements, then grabbed their text values, remembering to go to the element’s first child before doing so.

What an unsightly mess. The only way to simplify it significantly would be to use a 3rd partly library, which the students didn’t have, and it would still be pretty ugly.

One of the students said, “I kept waiting for you to say, ‘this is the hard way, now for the easy way,’ but you never did.”

I couldn’t resist replying, “well, if I had Groovy available, the whole program reduces to:

def library = new XmlSlurper().parse('books.xml')
def book = library.books.find { it.@isbn == '1932394842' }
println "Title: ${book.title}\nAuthor: ${}"

“and I could probably shorted that if I thought about it. How’s that for easy?”

On the bright side, as a result I may have sold another Groovy course. 🙂 For all of Groovy’s advantages over raw Java (and I keep finding more all the time), nothing sells it to Java developers like dealing with XML.


Groovyness with Excel and XML

Today in class one of the students mentioned that they need to read data from an Excel spreadsheet supplied by one of their clients and transform the data into XML adhering to their own schema.

I’ve thought about similar problems for some time and looked at the various Java APIs for accessing Excel. I spent a fair amount of time working with the POI project at Apache, which is a poor substitute but at least worked.

On the XML side, the Java libraries have gotten better, but working with XML in Java is rarely fun. I know the Apache group has built a few helper projects to make it easier, but I haven’t used them that much. In class, students don’t really want to talk about other projects; they want to know what’s in the standard libraries.

In short, I know I could write the necessary code to take data out of Excel and write it out to XML, but it would be long and awkward. It certainly wouldn’t be much fun.

Now, though, I’m spending a lot of time with Groovy. I’m working my way through the book Groovy in Action (Manning), which has jumped to the top of my favorite technical books list. I’m still learning, but I knew there was a Groovy library for accessing Excel, and I knew Groovy had a “builder” for outputting XML. I just needed to see how to write the actual code. I set up a sample Excel spreadsheet with a few rows of data and went to work.

Here’s the result. It’s about 25 lines of code all told. In other words, it’s almost trivial. I’m amazed.

package com.kousenit;

import org.codehaus.groovy.scriptom.ActiveXProxy

def addresses = new File('addresses.xls').canonicalPath
def xls = new ActiveXProxy('Excel.Application')

// get the workbooks object
def workbooks = xls.Workbooks
def workbook = workbooks.Open(addresses)

// select the active sheet
def sheet = workbook.ActiveSheet

// get the XML builder ready
def builder = new groovy.xml.MarkupBuilder()
builder.people {

for (row in 2..1000) {
def ID = sheet.Range("A${row}").Value.value
if (!ID) break

// use the builder to write out each person
person (id: ID) {
name {
firstName sheet.Range("B${row}").Value.value
lastName sheet.Range("C${row}").Value.value

address {
street sheet.Range("D${row}").Value.value
city sheet.Range("E${row}").Value.value
state sheet.Range("F${row}").Value.value
zip sheet.Range("G${row}").Value.value

// close the workbook without asking for saving the file
workbook.Close(false, null, false)
// quits excel

I’d call that a successful experiment. It certainly was a happy one. I know I’ll do more in the future. I’d bet that somebody with more experience could show me how to condense that even further.

Groovy is just plain fun, and I haven’t felt that way about Java for a long, long time.

Brush with (semi-)greatness

This week I’m teaching an XML class in NYC.  It’s actually a basic XML class along with some XML schema training, in order to help the client work with data coming from external sources.  I’ll know more when the class starts tomorrow, but I expect to work with fairly sophisticated schemas.

Since the class is in New York and I hate driving through New York (despite my recent activity there), I decided to take the train from New Haven.  NH is about a 40 minute drive for me, but I can pick up the Acela there, which is clean, fast, and about a thousand times more comfortable than any plane I’ve been on in last year.

My hotel is just a block away from Madison Square Garden.  I decided that it might be fun to go to a Knicks game.  I’ve never been to one.  Heck, I’ve only been to one NBA game at all, and that was years ago.  All I remember is sitting in the nose-bleed section at a Philadelphia 76ers game way back in the 80s, though it might have been the 70s.

I only needed one ticket, so I asked for the best seat available.  The result was that I blew my entertainment budget for the next six months (which is sad, because I already spent that last month, but so be it) and wound up with a floor seat about six rows behind the Knicks’ bench.  Isiah Thomas himself blocked my view of the free-throw line.

As it turned out, the guy next to me was Isiah’s nephew.  (I’d mention his name here but I can’t remember it. :()  He continually interpreted Isiah’s signals for me, i.e., “now they’re gonna go full court press followed by a 3-2 zone,” or “this next play will be a screen for Marbury.”  It was really fun that way.  I was just glad I hadn’t said anything about Isiah before I realized who I was sitting with.

As a fan of Bill Simmons, I’m abundantly aware of Isiah’s staggering weaknesses as a coach and especially as a GM.  When the guy next to me claimed the Knicks would be in the finals in three years, I managed not to say anything, but just barely.

The Knicks had a 3 point lead near the end, then gave up a 3, then took a shot that was clearly goal-tended but wasn’t called, then gave up another 3.  Finally, with less than ten seconds to go and a 3 point deficit, Marbury decided to drive to the basket (??) and his shot was blocked, effectively ending the game.  The crowd went home unhappy, but I had fun.

The SOA bandwagon

Just a quick post this morning from sunny Dover, Delaware. I’m doing an XML class this week with a brief introduction to web services.

Web services is hot, but mostly because the buzzwords “service oriented architecture” is hot. I can understand the motivation: high level IT executives see all these systems they’ve spent so many millions of dollars on and wonder why they can’t all work together.

(Insert your own, “can’t we all just get along?” joke here.)

A web service is generally interpreted these days as an XML API for a system. Wrap them all in XML APIs and suddenly you’ve achieved integration through the sophisticated use of text files. Of course, the devil is always in the details. Integration isn’t always so easy, and a true service oriented architecture is more than just an XML wrapper — to get the real benefit you should create common baseline services that everyone can access, and ultimately the individual systems themselves dissolve into assemblies of services. That’s not nearly as likely to happen.

In analogy with the heady days of the late 90s when we were “web enabling” everything, I like to call this phase “web service enabling.” Just as back then some systems were a lot easier to web enable than others, so today the costs and benefits of web service enabling systems varies widely based on their original designs.

It’s an interesting topic, though. We’ll see how it plays out, given the growing developer antipathy towards XML (favoring JSON and “convention over configuration” instead).

Processing XML using Java

Someday, and that day may never come, I’ll remember that in a DOM tree the text value of a node is stored in its first child, not in the node itself. It’s one of the things I always emphasize to my students, but manage to forget when I have to do the actual processing.

Last week in my Rensselaer class I thought I’d show the students how to access’s web service and then display a simple result. I already have a developer token from there. I even have an Associate’s ID that I’ll use someday when I get around to building my own book recommendations site on top of Amazon. I’ve only been toying with that idea for a bout a year and a half now, though, so maybe I should wait a bit longer. Sigh.

Anyway, I used the REST approach to access the Amazon web service. That means I built up a giant URL with all the Amazon ECS (eCommerce Service) parameters appended, used it as an argument to a URL constructor, opened the connection, and even got back the response. It went something like:

String asin = request.getParameter(“asin”);
StringBuffer buffer = new StringBuffer();

String urlString = buffer.toString();

try {
URL url = new URL(urlString);
URLConnection conn = url.openConnection();

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(conn.getInputStream());
Book b = parseBook(doc);

request.setAttribute(“book”, b);
} catch (Exception e) {  // wimp out and just catch them all, like Pokemon
I’d already defined constants for the base url, etc. Then opening the URLConnection automatically sends the request and returns an XML response; in this case the MEDIUM group from Amazon. Then I thought parsing it would be no problem.

After messing it up for a while, I went back to basics. It’s easy enough to print an XML response to the console, thanks to the coolness of the Java default TransformerFactory:

TransformerFactory factory = TransformerFactory.newInstance();
Transformer xform = factory.newTransformer();

// that’s the default xform; use a stylesheet to get a real one
xform.transform(new DOMSource(doc), new StreamResult(System.out));

and voila, an XML tree is printed to the console in nicely formatted form. With that I was able to prove that I was indeed getting back the proper response tree.

All that’s left is to parse the resulting document. At first I went with the ever-popular doc.getElementsByTagName(String), but I noticed that the resulting document had a default namespace defined.

I needed that when I tried this experiment before, because last time I wrote an XSLT stylesheet to transform the tree into HTML directly.

I’ve decided, however, that the XSLT approach isn’t really what I want. I don’t like building XHTML documents using XSLT templates. Plus, I can’t handle individual elements that way. I think the right OO approach is to treat the XML source as just a database of a different kind. In other words, build a BookDAO class to extract the book elements from the XML file, instantiate a Book, and return it.

There was no need to keep hitting Amazon’s web service each time I’m developing, though, so I copied the output to a text file and started trying to extract the required info.

That turned out not to be a simple as I anticipated. First came the namespace issue referred to above. Despite the presence of the default namespace, I had more success with doc.getElementsByTagname(String) than with doc.getElementsByTagnameNS(String). Maybe that’s because I didn’t explicitly set “namespace awareness” on my DocumentBuilderFactory. At any rate, I finally was able to access an element through the NodeList.

Quick aside: life would be a lot easier if Amazon would add “id” attributes to its elements. With the explosion of interest in Ajax and the subsequent necessity to process XML using JavaScript, maybe they’ll change it.

Anyway, using getElementsByTagName(String) returned a NodeList, from which I extracted a Node, and then kept getting null for the result of node.getNodeValue().

Well, duh. For the zillionth time, the text of a node in a DOM tree is stored not in the node, but in the value of its first (text) child.

In other words:

private String extractValue(Document doc, String s) {
NodeList list = doc.getElementsByTagName(s);
Node n = list.item(0);
return n.getFirstChild().getNodeValue();

Ain’t nuthin’ to it. If I ever get around to switching to JDOM, then life will no doubt be much easier. There’s always the Apache Commons projects like Betwixt or Digester, or even BeanUtils. They’re in the queue, right after Hibernate (still making progress), Spring (on its way), JSF (working on it), and Tapestry (some day), not to mention Ruby on Rails, ….