I’m in Delaware this week teaching a course in Java Web Services using RAD7. The materials include a chapter on basic XML parsing using Java. An exercise at the end of the chapter presented the students with a trivial XML file, similar to:
<library>
<book isbn="1932394842">
<title>Groovy in Action</title>
<author>Dierk Koenig</author>
</book>
<book isbn="1590597583">
<title>Definitive Guide to Grails</title>
<author>Graeme Rocher</author>
</book>
<book isbn="0978739299">
<title>Groovy Recipes</title>
<author>Scott Davis</author>
</book>
</library>
(with different books, of course) and asked the students to find a book with a particular isbn
number and print it’s title
and author
values.
I sighed and went to work, producing a solution roughly like this:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ParseLibrary {
public static void main(String[] args) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document doc = null;
try {
DocumentBuilder builder = factory.newDocumentBuilder();
doc = builder.parse("books.xml");
} catch (Exception e) {
e.printStackTrace();
return;
}
NodeList books = doc.getElementsByTagName("book");
for (int i = 0; i < books.getLength(); i++) {
Element book = (Element) books.item(i);
if (book.getAttribute("isbn").equals("1932394842")) {
NodeList children = book.getChildNodes();
for (int j = 0; j < children.getLength(); j++) {
Node child = children.item(j);
if (child.getNodeType() == Node.ELEMENT_NODE) {
if (child.getNodeName().equals("title")) {
System.out.println("Title: "
+ child.getFirstChild().getNodeValue());
} else if (child.getNodeName().equals("author")) {
System.out.println("Author: "
+ child.getFirstChild().getNodeValue());
}
}
}
}
}
}
}
The materials didn’t supply a DTD, so I didn’t have any ID attributes to make it easier to get to the book I wanted. That meant I was reduced to continually using getElementsByTagName(String)
. I certainly didn’t want to traverse the tree, what with all those whitespace nodes containing the carriage-return/line-feed characters. So I found the book
nodes, cast them to Element
(because only Elements have attributes), found the book I wanted, got all of its children, found the title
and author
child elements, then grabbed their text values, remembering to go to the element’s first child before doing so.
What an unsightly mess. The only way to simplify it significantly would be to use a 3rd partly library, which the students didn’t have, and it would still be pretty ugly.
One of the students said, “I kept waiting for you to say, ‘this is the hard way, now for the easy way,’ but you never did.”
I couldn’t resist replying, “well, if I had Groovy available, the whole program reduces to:
def library = new XmlSlurper().parse('books.xml')
def book = library.books.find { it.@isbn == '1932394842' }
println "Title: ${book.title}\nAuthor: ${book.author}"
“and I could probably shorted that if I thought about it. How’s that for easy?”
On the bright side, as a result I may have sold another Groovy course. 🙂 For all of Groovy’s advantages over raw Java (and I keep finding more all the time), nothing sells it to Java developers like dealing with XML.
Leave a Reply