Processing XML using Java

Someday, and that day may never come, I’ll remember that in a DOM tree the text value of a node is stored in its first child, not in the node itself. It’s one of the things I always emphasize to my students, but manage to forget when I have to do the actual processing.

Last week in my Rensselaer class I thought I’d show the students how to access’s web service and then display a simple result. I already have a developer token from there. I even have an Associate’s ID that I’ll use someday when I get around to building my own book recommendations site on top of Amazon. I’ve only been toying with that idea for a bout a year and a half now, though, so maybe I should wait a bit longer. Sigh.

Anyway, I used the REST approach to access the Amazon web service. That means I built up a giant URL with all the Amazon ECS (eCommerce Service) parameters appended, used it as an argument to a URL constructor, opened the connection, and even got back the response. It went something like:

String asin = request.getParameter(“asin”);
StringBuffer buffer = new StringBuffer();

String urlString = buffer.toString();

try {
URL url = new URL(urlString);
URLConnection conn = url.openConnection();

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(conn.getInputStream());
Book b = parseBook(doc);

request.setAttribute(“book”, b);
} catch (Exception e) {  // wimp out and just catch them all, like Pokemon
I’d already defined constants for the base url, etc. Then opening the URLConnection automatically sends the request and returns an XML response; in this case the MEDIUM group from Amazon. Then I thought parsing it would be no problem.

After messing it up for a while, I went back to basics. It’s easy enough to print an XML response to the console, thanks to the coolness of the Java default TransformerFactory:

TransformerFactory factory = TransformerFactory.newInstance();
Transformer xform = factory.newTransformer();

// that’s the default xform; use a stylesheet to get a real one
xform.transform(new DOMSource(doc), new StreamResult(System.out));

and voila, an XML tree is printed to the console in nicely formatted form. With that I was able to prove that I was indeed getting back the proper response tree.

All that’s left is to parse the resulting document. At first I went with the ever-popular doc.getElementsByTagName(String), but I noticed that the resulting document had a default namespace defined.

I needed that when I tried this experiment before, because last time I wrote an XSLT stylesheet to transform the tree into HTML directly.

I’ve decided, however, that the XSLT approach isn’t really what I want. I don’t like building XHTML documents using XSLT templates. Plus, I can’t handle individual elements that way. I think the right OO approach is to treat the XML source as just a database of a different kind. In other words, build a BookDAO class to extract the book elements from the XML file, instantiate a Book, and return it.

There was no need to keep hitting Amazon’s web service each time I’m developing, though, so I copied the output to a text file and started trying to extract the required info.

That turned out not to be a simple as I anticipated. First came the namespace issue referred to above. Despite the presence of the default namespace, I had more success with doc.getElementsByTagname(String) than with doc.getElementsByTagnameNS(String). Maybe that’s because I didn’t explicitly set “namespace awareness” on my DocumentBuilderFactory. At any rate, I finally was able to access an element through the NodeList.

Quick aside: life would be a lot easier if Amazon would add “id” attributes to its elements. With the explosion of interest in Ajax and the subsequent necessity to process XML using JavaScript, maybe they’ll change it.

Anyway, using getElementsByTagName(String) returned a NodeList, from which I extracted a Node, and then kept getting null for the result of node.getNodeValue().

Well, duh. For the zillionth time, the text of a node in a DOM tree is stored not in the node, but in the value of its first (text) child.

In other words:

private String extractValue(Document doc, String s) {
NodeList list = doc.getElementsByTagName(s);
Node n = list.item(0);
return n.getFirstChild().getNodeValue();

Ain’t nuthin’ to it. If I ever get around to switching to JDOM, then life will no doubt be much easier. There’s always the Apache Commons projects like Betwixt or Digester, or even BeanUtils. They’re in the queue, right after Hibernate (still making progress), Spring (on its way), JSF (working on it), and Tapestry (some day), not to mention Ruby on Rails, ….

One response to “Processing XML using Java”

  1. The part of using transformer factory to print xml directly to console helped.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.