Processing XML using Java

Someday, and that day may never come, I’ll remember that in a DOM tree the text value of a node is stored in its first child, not in the node itself. It’s one of the things I always emphasize to my students, but manage to forget when I have to do the actual processing.

Last week in my Rensselaer class I thought I’d show the students how to access’s web service and then display a simple result. I already have a developer token from there. I even have an Associate’s ID that I’ll use someday when I get around to building my own book recommendations site on top of Amazon. I’ve only been toying with that idea for a bout a year and a half now, though, so maybe I should wait a bit longer. Sigh.

Anyway, I used the REST approach to access the Amazon web service. That means I built up a giant URL with all the Amazon ECS (eCommerce Service) parameters appended, used it as an argument to a URL constructor, opened the connection, and even got back the response. It went something like:

String asin = request.getParameter(“asin”);
StringBuffer buffer = new StringBuffer();

String urlString = buffer.toString();

try {
URL url = new URL(urlString);
URLConnection conn = url.openConnection();

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(conn.getInputStream());
Book b = parseBook(doc);

request.setAttribute(“book”, b);
} catch (Exception e) {  // wimp out and just catch them all, like Pokemon
I’d already defined constants for the base url, etc. Then opening the URLConnection automatically sends the request and returns an XML response; in this case the MEDIUM group from Amazon. Then I thought parsing it would be no problem.

After messing it up for a while, I went back to basics. It’s easy enough to print an XML response to the console, thanks to the coolness of the Java default TransformerFactory:

TransformerFactory factory = TransformerFactory.newInstance();
Transformer xform = factory.newTransformer();

// that’s the default xform; use a stylesheet to get a real one
xform.transform(new DOMSource(doc), new StreamResult(System.out));

and voila, an XML tree is printed to the console in nicely formatted form. With that I was able to prove that I was indeed getting back the proper response tree.

All that’s left is to parse the resulting document. At first I went with the ever-popular doc.getElementsByTagName(String), but I noticed that the resulting document had a default namespace defined.

I needed that when I tried this experiment before, because last time I wrote an XSLT stylesheet to transform the tree into HTML directly.

I’ve decided, however, that the XSLT approach isn’t really what I want. I don’t like building XHTML documents using XSLT templates. Plus, I can’t handle individual elements that way. I think the right OO approach is to treat the XML source as just a database of a different kind. In other words, build a BookDAO class to extract the book elements from the XML file, instantiate a Book, and return it.

There was no need to keep hitting Amazon’s web service each time I’m developing, though, so I copied the output to a text file and started trying to extract the required info.

That turned out not to be a simple as I anticipated. First came the namespace issue referred to above. Despite the presence of the default namespace, I had more success with doc.getElementsByTagname(String) than with doc.getElementsByTagnameNS(String). Maybe that’s because I didn’t explicitly set “namespace awareness” on my DocumentBuilderFactory. At any rate, I finally was able to access an element through the NodeList.

Quick aside: life would be a lot easier if Amazon would add “id” attributes to its elements. With the explosion of interest in Ajax and the subsequent necessity to process XML using JavaScript, maybe they’ll change it.

Anyway, using getElementsByTagName(String) returned a NodeList, from which I extracted a Node, and then kept getting null for the result of node.getNodeValue().

Well, duh. For the zillionth time, the text of a node in a DOM tree is stored not in the node, but in the value of its first (text) child.

In other words:

private String extractValue(Document doc, String s) {
NodeList list = doc.getElementsByTagName(s);
Node n = list.item(0);
return n.getFirstChild().getNodeValue();

Ain’t nuthin’ to it. If I ever get around to switching to JDOM, then life will no doubt be much easier. There’s always the Apache Commons projects like Betwixt or Digester, or even BeanUtils. They’re in the queue, right after Hibernate (still making progress), Spring (on its way), JSF (working on it), and Tapestry (some day), not to mention Ruby on Rails, ….

Sometimes it all works out

I’ve had a couple of airline adventures this week.  My class in Philadelphia ended early last Friday, so I went to the airport expecting to spend hours in the US Airways club.  Not a bad thing, incidentally, since I had a lot of work to do.  As it turned out, though, I was able to get a standby seat on a 1:45 pm flight that was currently boarding.  Since my original flight was set to leave at 8 pm, that was quite a savings.

I did manage to board.  It was a middle seat in the back, but I figured it was okay because the flight is only about 45 minutes.  Then, just before the doors closed, the weather report detected lightning within a three mile radius of the airport.  Everything shut down.

Apparently that’s a new OSHA regulation.  Even though the skies were only overcast, we had to sit there waiting for a grounds crew.  That all seems reasonable, of course, but the problem was that the showers must have been hovering right on the edge of the three-mile radius because they kept saying, “now we can leave,” and then ten minutes later, “wait, no we can’t.”  This went on for three (!) hours.

We did eventually take off.  I got into Hartford about three hours later than I expected, but there were a couple of very silver linings.  First, I still arrived much earlier than I expected.  Second, when I glanced at the monitors in Bradley airport, I noticed that my later flight had been cancelled!

Whew, that was close.

The other eventful time happened yesterday when I was ready to fly to Chicago (the class is in Naperville, actually).  I arrived at Bradley at about 1 pm for a 2:07 pm flight on United.  The plane was actually from a United partner and was one of those small commuter planes.  When I arrived at the airport I discovered the flight was delayed until 4:30 pm.  I did the usual thing of staying in the small US Airways club at Bradley (one of the best investments I’ve ever made — I renewed it while I was waiting).

The flight kept getting later.  First it was 4:52, then 5:07, then 5:35.  I asked and was told weather was not an issue, but they didn’t know any details.  As it turned out, there was another flight to Chicago on a much bigger plane leaving at 5:30.  They offered the passengers a chance to get on standby for that flight, but I decided against it.

My flight eventually boarded at 5:52.  When I got on, however, I realized that most of the passengers had taken the 5:30 flight and therefore my flight was nearly empty.  The plane was small, too, but not so small that it didn’t have two rows of first class seats.  I told the flight attendant that I had planned to upgrade, and she let me move into first. 🙂  I then had a very pleasant trip to O’Hare, only about 4 1/2 hours later than I expected.

Airline travel is worse than almost any other form of transportation, but sometimes it works out.

Hibernate clues

I get it now.  In order to use Hibernate with Derby/Cloudscape, I need to use the “identity” generator:

<class name=”Location” table=”LOCATIONS” schema=”EARTHLINGS”>
    <id name=”id” type=”integer”>
        <generator class=”identity”>
    <property name=”address” column=”STREET_ADDRESS” />
    … other properties …

This makes sense, of course, since in the db build script I have

create table locations (
    id integer generated always as identity,
    street_address varchar(30),
    …  other column defs …

Now I can use this “earthlings” schema dreamed up by Capstone Courseware in my Hibernate class.

I also stumbled across a NoClassDefFoundError and fixed it by adding a jar file.

I’d say on my list of 10 Canonical Errors (the sequence of challenges I need to overcome to learn any new technology), I’m probably up through 7 now.  Getting there.

RAD, WAS, and profiles

This week and last I’ve been teaching an EJBs with RAD6 class.  Now, the installation of RAD in the training centers was done the usual way.  That means they installed it on one machine and then distributed the image to all the others.

Normally that’s fine.  The problem with doing that with RAD is that installing RAD also installs the WebSphere Test Environment, which now is a fully functional version of WAS6.  One of the major changes between WSAD 5 and RAD 6 is that the test environment is no longer customizable by workspace any more.  Instead, the WAS environment remembers every app deployed to it from any workspace at any time.

That got me in trouble in Westborough for two reasons.  One is that I had deployed a series of my own sample apps during the servlets and JSPs class, but when I deleted them at the end of the week, WAS though they were still there and complained during the EJB class about missing EAR files.  Lovely.  Eventually, when the server started up I was able to access the admin console and undeploy them.

The more fundamental problem, however, is that the installation of the test environment builds a node whose name is based on the server name.  That meant that both last week and this week, all the machines had nodes with the exact same name.  Holy conflicts, Batman.  As usual, the problem showed up when we started trying to do JNDI lookups and threw ServiceUnavailable exceptions.
Last week we did some uninstalls and reinstalls of the product, which helped, but couldn’t be the right solution.  This week we didn’t have the installation files at all, so I had to find another way.

At long last, I found it.  That’s what profiles are for.  By creating a brand new server profile, a new name is generated based on the current name of the server.  Everyone was able to create a new profile, deploy apps to it, and run everything cleanly.

Boy I wish I’d known about that a long time ago.  At least now I get it.  I’m not sure if I want to recommend a different installation process or just assume that in each course we’ll create a new profile and move on.

The definitive answer is no doubt in the IBM set-up docs for their server-side classes.  I’m certified to teach them, but haven’t looked at the set-up docs yet.  I think I may have a set at home, though, that I got during the enablement class back in May.  I’ll have to check.  I seriously doubt that any training center is going to want to install RAD on each machine individually, though.

I blame the turbulence model

When I was a research scientist at United Technologies Research Center, I worked on unsteady aerodynamics in axial turbomachinery.  That’s a complicated way of saying I worked on math and computer models of noise in jet engines.

A friend of mine there was our resident mathematician.  He knew everything about everything, or at least had a book about it and could find out whatever you wanted.  I remember finding it amazing that he didn’t have a Ph.D. when I met him.  He eventually went back to UConn at night and earned his doctorate.  I think he did something about three-dimensional tetrahedral meshes, or some such.

He also was a classic SJ in the Myers-Briggs sense.  His files were always well organized and his desk was always clean, as bizarre as that sounds.  I asked him about it once, and he said, “I can only work on one thing at a time, so when I finish with something I put it away.”  Obviously there was no point discussing the issues with anyone that irrational. 😉

At any rate, he had a quote posted in his cubicle (yep, once upon a time I lived in a cube farm).  The quote was about turbulence models, and runs as follows:

“I am an old man now, and when I die and go to heaven there are two matters on which I hope for enlightenment. One is quantum electrodynamics, and the other is the turbulent motion of fluids. And about the former I am rather optimistic.

The quote is from Horace Lamb, who wrote one of the definitive books on hydrodynamics, among other things.

The upshot is that whenever anyone in our CFD (computational fluid dynamics) group ever had trouble connecting our analyses to reality, we always blamed the turbulence model.  After all, we knew it was an approximation at best, and not necessarily a good one, so it was always a good target.

Why do I bring this up now?  Because now that I’ve moved from engineering to software, the role of turbulence models is now played by networking.  While apparently some people somewhere claim to understand it, I think that’s a myth.

I can say one thing for certain — I’m not one of them.  I am sitting here in a hotel room in center city Philadelphia, and for some unknown reason my Outlook client can’t manage to download my business email, except that sometimes it can.  Why?  Beats the heck out of me.  Maybe I forgot to sacrifice a live chicken by the light of the full moon while waving a mouse cable over my head counterclockwise.

Or maybe it’s space charge effects.  My friends in physics used to blame space charge effects, because they claimed you could never really account for them and nobody really understood them anyway.  When they got tired of space charge effects, ground loops were another favorite.

In J2EE, if I have a problem, I know the first place to look is my JNDI lookups.  I hate JNDI with a real passion.  I know how it’s supposed to work and sooner or later I can get it to work, but no matter what I do I know it’s going to be the source of my troubles.

Networks, though, are often the bane of my existence.  Some day, and that day may never come, I’ll really “get” them.  Every time I do, though, I’m wrong.


On the plus side, I’ve got a very good group of students in my EJBs with RAD6 class.  The class is going very well so far, but we haven’t had to fight the WAS6 test server yet.  That’s always fun, too.

What a twist!

Okay, I’ll admit it. When I first saw the commercial for the new M. Night Shyamalan movie Lady in the Water, I, too, felt compelled to say, “What a twist!”  Robot Chicken rules, and not just because it was created by Oz from Buffy, aka Scott Evil, aka Chris Griffin.

On a more serious note, geez, the Hibernate In Action book really is good. I mentioned it in a couple of posts ago, but I had no idea. I remember when I first started looking at Hibernate months ago, I didn’t really like the book. Maybe now I’ve just learned enough to “get it”.

I still get the same feeling that I had when I read Bertram Meyer’s Object Oriented Software Construction — the book comes across as being written by someone almost too arrogant for words, but if you could get past all that, the content was excellent. I still remember Meyer going on for ten pages on how class names should be written with Initial_Caps_Separated_By_Underscores and how that was the only intelligent way to do it. I initially had trouble with the fact that Hibernate book couldn’t stop trying to sell how wonderful the framework is and how theirs is the One True Way(TM) to do ORM. I guess by now I can filter that out.

Oh, and by the way, in the All Star Game this evening, the American League was losing 2-1 with two outs in the top of the ninth. A single, a double, and a triple later it’s 3-2 with Mariano Rivera coming in to close. That’s the first time I can remember looking forward to seeing Rivera come in. He got the save, of course.
Where would the Yankee dynasty have been without Rivera? Where would the Red Sox have been with Rivera?

Maybe now we’ll find out. Our Rivera is named Jonathan Papelbon, despite his blowing the save on Sunday. His ERA skyrocketed all the way up to 0.59. 🙂

What a way to enter the All Star break…

19 innings and an L.  Papelbon blows the save by giving up a homer with two outs in the 9th.  They score two runs in the 11th, only to give them back in the bottom half of the inning, missing a major baserunning error in the process.  They almost made that double play to get out of it, too.  Ginger and I went out to dinner right after that, dropped off some stuff for Xander (sleeping over at a friend’s house) and even drove around a bit before returning and the game was still going.

I knew when Rudy Seanez came in it was over, but he held together for over two innings before falling apart.  I really can’t expect more than that.

The Yankees lost, too.  We could have picked up a whole game on them.

I guess I have to be happy with how the Sox are playing, but if Foulke, Clement, and Wells are all going to be unavailable indefinitely, we need pitching.  Hopefully Theo has something in the works.

I have to admit that it sure is fun watching the Sox play stellar defense.  I can’t remember that happening in my lifetime.

It would have been nice to win that game, though.  Sigh.

%d bloggers like this: