Nothing makes you want Groovy more than XML

I’m in Delaware this week teaching a course in Java Web Services using RAD7. The materials include a chapter on basic XML parsing using Java. An exercise at the end of the chapter presented the students with a trivial XML file, similar to:


<library>
  <book isbn="1932394842">
    <title>Groovy in Action</title>
    <author>Dierk Koenig</author>
  </book>
  <book isbn="1590597583">
    <title>Definitive Guide to Grails</title>
    <author>Graeme Rocher</author>
  </book>
  <book isbn="0978739299">
    <title>Groovy Recipes</title>
    <author>Scott Davis</author>
  </book>
</library>

(with different books, of course) and asked the students to find a book with a particular isbn number and print it’s title and author values.

I sighed and went to work, producing a solution roughly like this:


import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class ParseLibrary {
    public static void main(String[] args) {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        Document doc = null;
        try {
            DocumentBuilder builder = factory.newDocumentBuilder();
            doc = builder.parse("books.xml");
        } catch (Exception e) {
            e.printStackTrace();
            return;
        }
        NodeList books = doc.getElementsByTagName("book");
        for (int i = 0; i < books.getLength(); i++) {
            Element book = (Element) books.item(i);
            if (book.getAttribute("isbn").equals("1932394842")) {
                NodeList children = book.getChildNodes();
                for (int j = 0; j < children.getLength(); j++) {
                    Node child = children.item(j);
                    if (child.getNodeType() == Node.ELEMENT_NODE) {
                        if (child.getNodeName().equals("title")) {
                            System.out.println("Title: "
                                + child.getFirstChild().getNodeValue());
                        } else if (child.getNodeName().equals("author")) {
                            System.out.println("Author: "
                                + child.getFirstChild().getNodeValue());
                        }
                    }
                }
            }
        }
    }
}

The materials didn’t supply a DTD, so I didn’t have any ID attributes to make it easier to get to the book I wanted. That meant I was reduced to continually using getElementsByTagName(String). I certainly didn’t want to traverse the tree, what with all those whitespace nodes containing the carriage-return/line-feed characters. So I found the book nodes, cast them to Element (because only Elements have attributes), found the book I wanted, got all of its children, found the title and author child elements, then grabbed their text values, remembering to go to the element’s first child before doing so.

What an unsightly mess. The only way to simplify it significantly would be to use a 3rd partly library, which the students didn’t have, and it would still be pretty ugly.

One of the students said, “I kept waiting for you to say, ‘this is the hard way, now for the easy way,’ but you never did.”

I couldn’t resist replying, “well, if I had Groovy available, the whole program reduces to:


def library = new XmlSlurper().parse('books.xml')
def book = library.books.find { it.@isbn == '1932394842' }
println "Title: ${book.title}\nAuthor: ${book.author}"

“and I could probably shorted that if I thought about it. How’s that for easy?”

On the bright side, as a result I may have sold another Groovy course. 🙂 For all of Groovy’s advantages over raw Java (and I keep finding more all the time), nothing sells it to Java developers like dealing with XML.

24 responses to “Nothing makes you want Groovy more than XML”

  1. Well if they can’t find and install Jaxen it’s unlikely they’re going to find and install Groovy.

    Also for the task at hand your code is way wordy. What’s below is shorter and could still benefit from a couple of methods to make the main body more readable. It’s not quite as efficient as yours but if you’re going to go to Groovy efficiency isn’t your primary driver anyway.

    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;

    import org.w3c.dom.Document;
    import org.w3c.dom.Element;
    import org.w3c.dom.NodeList;

    public class ParseLibrary throws Exception {
    public static void main(String[] args) {
    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    Document doc = builder.parse(“books.xml”);

    NodeList books = doc.getElementsByTagName(“book”);
    for (int i = 0; i < books.getLength(); i++) {
    Element book = (Element) books.item(i);
    if (book.getAttribute(“isbn”).equals(“1932394842”)) {
    NodeList titles = book.getElementsByTagName(“title”);
    if(titles ! = null) for(int t = 0; t< titles.getLength(); t++) System.out.println(“Title: ” + titles.item(t).getFirstChild().getNodeValue());

    NodeList authors = book.getElementsByTagName)”author”);
    if(authors ! = null) for(int a = 0; a< authors.getLength(); a++) System.out.println(“Author: ” + authors.item(a).getFirstChild().getNodeValue());

    break; // just one book per isbn
    }
    }
    }
    }

  2. Or would be more readable if you’re comments let me format it properly.

  3. Hi Brett,

    Yes, your code is somewhat shorter, but I’d still take the Groovy solution any time. And as for Jaxen, yes, that helps a lot, but Groovy not only makes XML easier, it makes everything easier.

    One thing is indisputable, though. As much as I like the overall product, WordPress is a truly lousy way to display source code.

    Thanks for commenting, though. 🙂

  4. I hear ya – I just had the same Groovy XML experience:

    The original Java code (and a so-so Groovy impl): http://www.juliesoft.com/blog/jon/index.php/2008/03/09/groovy-is-coming/

    The final Groovy code: http://www.juliesoft.com/blog/jon/index.php/2008/03/12/groovy-micro-benchmark-revisited-groovy-is-fast/

  5. And yes, WordPress’s formatting could be better (that’s why I just use screenshots for my code!!).

    🙂

  6. Jon, those are very interesting results. I’m glad you found a way to get the efficiency back. Personally, I worry a lot less about efficiency in a technology as new as Groovy, figuring that’ll come automatically with time. I’ve heard many reports of progress in that area already.

    And Brett, you’re right, I should think about doing screen shots for my code. What a pain, though. My current system is to paste in the code, then go to code view and add tabs and sprinkle in %lt;pre%gt; and %lt;code%gt; tags as necessary. It’s a really lousy system.

  7. Good article. BTW,

    library.books.find { it.@isbn == ‘1932394842’ }

    should be

    library.book.find { it.@isbn == ‘1932394842’ } // ‘book’ should be singular

  8. Of course, you’re right. I really am going to have to start pasting in images of my source code rather than trying to just type it into WordPress.

    Thanks for catching that.

  9. I too like Groovy,but I do also think that XPATH expressions can be easily used to extract a particular node like groovy expressions.

  10. Hello!,

  11. Michael Mellinger Avatar
    Michael Mellinger

    This line:
    def book = library.books.find { it.@isbn == ‘1932394842’ }

    Should be:
    def book = library.book.find { it.@isbn == ‘1932394842’ }

    def library = new XmlSlurper().parse(‘books.xml’)
    def book = library.book.find { it.@isbn == ‘1932394842’ }
    println “Title: ${book.title}\nAuthor: ${book.author}”

  12. Thanks for the typo catch. Entering code in WordPress is really annoying. 🙂

  13. For displaying source code in wordpress, use the syntaxhighlighter plugin: http://wordpress.org/extend/plugins/syntaxhighlighter/

    Just wrap your code in [sourcecode language=’css’]code here[/sourcecode]. Languages are defined on the plugins homepage. (even though not all languages are implemented, you can still use ‘java’ and it does a good job of groovy)

    Have a look here to see it in action:
    http://www.javathinking.com/?p=95

  14. Hey, it looks like you do have the plugin installed, because my comment is rendering code using it! It should say:

    |sourcecode language=’css’|code here|/sourcecode|

    where | should really be [ and ]

  15. Paul, that is so sweet! I had no idea the plugin was installed here, at WordPress. I guess maybe it should have been obvious, but I didn’t see it documented anywhere.

    Chalk that up as yet another thing that I wish I realized years ago. 🙂

    Thanks!

  16. Well, I’m afraid you’ve succeeded at selling another Groovy course — not so at teaching them good Java programming. Maybe next time you could consider present your students with something cleaner? 😉 e.g.

    [sourcecode language=”java”]
    import java.io.FileInputStream;

    import javax.xml.xpath.XPath;
    import javax.xml.xpath.XPathExpression;
    import javax.xml.xpath.XPathFactory;

    import org.xml.sax.InputSource;

    public class ParseLibrary {
    public static void main( String[] args ) {

    XPathFactory xpathFactory = XPathFactory.newInstance();
    XPath xpath = xpathFactory.newXPath();
    XPathExpression xpathExpression = null;
    try {
    xpathExpression = xpath.compile( "/library/book[@isbn = ‘1932394842’]/title" );
    InputSource is = new InputSource( new FileInputStream( "/home/ja/books.xml" ) );
    String title = xpathExpression.evaluate( is );
    System.out.println( "The title is: " + title );
    } catch( Exception e ) {
    e.printStackTrace();
    }
    }
    }
    [/sourcecode]

  17. But still the Groovy-way beats the hell out of Java…

  18. […] enough, you will realize Java feels stifling. “Dicing all this XML sure would be easier in Groovy,” you will think. You will notice and understand dynamic language zealots. The cool kids […]

  19. love the way Groovy allows me to work with XML! Thanks for this post!

  20. Thanks for the helpful post!
    I was thinking if you could help me with my issue?
    What would be the best way to programmatically remove all the nodes from the whole XML document, so the xml looks like this:

    Dierk Koenig

    Graeme Rocher

    Scott Davis

    I would like to do that using Groovy. Any help?

  21. I am so sold by the “Groovy-way” that I wrote a smallish Java library to largely (somewhat?) mimic it. Please check it out at https://github.com/MorganConrad/xen. It’s still *very* preliminary, but if you check out test/GeocoderDemo.java it more or less matches the example from “Making Java Groovy”.

  22. Very impressive. 🙂 I’ll have to give it a try next time I have to deal with XML.

  23. @Jose, your example is a clean way using 2 languages – java & XPath… the groovy example demonstrates how easy it is for groovy programmers to parse XML w/o learning another language such as XPath…

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.