Processing XML using Java

Someday, and that day may never come, I’ll remember that in a DOM tree the text value of a node is stored in its first child, not in the node itself. It’s one of the things I always emphasize to my students, but manage to forget when I have to do the actual processing.

Last week in my Rensselaer class I thought I’d show the students how to access Amazon.com’s web service and then display a simple result. I already have a developer token from there. I even have an Associate’s ID that I’ll use someday when I get around to building my own book recommendations site on top of Amazon. I’ve only been toying with that idea for a bout a year and a half now, though, so maybe I should wait a bit longer. Sigh.

Anyway, I used the REST approach to access the Amazon web service. That means I built up a giant URL with all the Amazon ECS (eCommerce Service) parameters appended, used it as an argument to a URL constructor, opened the connection, and even got back the response. It went something like:

String asin = request.getParameter(“asin”);
StringBuffer buffer = new StringBuffer();
buffer.append(BASE_URL).append(“&”);
buffer.append(SUBSCRIPTION_ID).append(“&”);
buffer.append(ASSOCIATES_ID).append(“&”);
buffer.append(LOOKUP).append(“&”);
buffer.append(“ItemId=”).append(asin).append(“&”);
buffer.append(RESPONSE);

String urlString = buffer.toString();

try {
URL url = new URL(urlString);
URLConnection conn = url.openConnection();

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(conn.getInputStream());
Book b = parseBook(doc);

request.setAttribute(“book”, b);
request.getRequestDispatcher(“/display.jsp”).forward(request,response);
} catch (Exception e) {  // wimp out and just catch them all, like Pokemon
e.printStackTrace();
}
I’d already defined constants for the base url, etc. Then opening the URLConnection automatically sends the request and returns an XML response; in this case the MEDIUM group from Amazon. Then I thought parsing it would be no problem.

After messing it up for a while, I went back to basics. It’s easy enough to print an XML response to the console, thanks to the coolness of the Java default TransformerFactory:

TransformerFactory factory = TransformerFactory.newInstance();
Transformer xform = factory.newTransformer();

// that’s the default xform; use a stylesheet to get a real one
xform.transform(new DOMSource(doc), new StreamResult(System.out));

and voila, an XML tree is printed to the console in nicely formatted form. With that I was able to prove that I was indeed getting back the proper response tree.

All that’s left is to parse the resulting document. At first I went with the ever-popular doc.getElementsByTagName(String), but I noticed that the resulting document had a default namespace defined.

I needed that when I tried this experiment before, because last time I wrote an XSLT stylesheet to transform the tree into HTML directly.

I’ve decided, however, that the XSLT approach isn’t really what I want. I don’t like building XHTML documents using XSLT templates. Plus, I can’t handle individual elements that way. I think the right OO approach is to treat the XML source as just a database of a different kind. In other words, build a BookDAO class to extract the book elements from the XML file, instantiate a Book, and return it.

There was no need to keep hitting Amazon’s web service each time I’m developing, though, so I copied the output to a text file and started trying to extract the required info.

That turned out not to be a simple as I anticipated. First came the namespace issue referred to above. Despite the presence of the default namespace, I had more success with doc.getElementsByTagname(String) than with doc.getElementsByTagnameNS(String). Maybe that’s because I didn’t explicitly set “namespace awareness” on my DocumentBuilderFactory. At any rate, I finally was able to access an element through the NodeList.

Quick aside: life would be a lot easier if Amazon would add “id” attributes to its elements. With the explosion of interest in Ajax and the subsequent necessity to process XML using JavaScript, maybe they’ll change it.

Anyway, using getElementsByTagName(String) returned a NodeList, from which I extracted a Node, and then kept getting null for the result of node.getNodeValue().

Well, duh. For the zillionth time, the text of a node in a DOM tree is stored not in the node, but in the value of its first (text) child.

In other words:

private String extractValue(Document doc, String s) {
NodeList list = doc.getElementsByTagName(s);
Node n = list.item(0);
return n.getFirstChild().getNodeValue();
}

Ain’t nuthin’ to it. If I ever get around to switching to JDOM, then life will no doubt be much easier. There’s always the Apache Commons projects like Betwixt or Digester, or even BeanUtils. They’re in the queue, right after Hibernate (still making progress), Spring (on its way), JSF (working on it), and Tapestry (some day), not to mention Ruby on Rails, ….

Sometimes it all works out

I’ve had a couple of airline adventures this week.  My class in Philadelphia ended early last Friday, so I went to the airport expecting to spend hours in the US Airways club.  Not a bad thing, incidentally, since I had a lot of work to do.  As it turned out, though, I was able to get a standby seat on a 1:45 pm flight that was currently boarding.  Since my original flight was set to leave at 8 pm, that was quite a savings.

I did manage to board.  It was a middle seat in the back, but I figured it was okay because the flight is only about 45 minutes.  Then, just before the doors closed, the weather report detected lightning within a three mile radius of the airport.  Everything shut down.

Apparently that’s a new OSHA regulation.  Even though the skies were only overcast, we had to sit there waiting for a grounds crew.  That all seems reasonable, of course, but the problem was that the showers must have been hovering right on the edge of the three-mile radius because they kept saying, “now we can leave,” and then ten minutes later, “wait, no we can’t.”  This went on for three (!) hours.

We did eventually take off.  I got into Hartford about three hours later than I expected, but there were a couple of very silver linings.  First, I still arrived much earlier than I expected.  Second, when I glanced at the monitors in Bradley airport, I noticed that my later flight had been cancelled!

Whew, that was close.

The other eventful time happened yesterday when I was ready to fly to Chicago (the class is in Naperville, actually).  I arrived at Bradley at about 1 pm for a 2:07 pm flight on United.  The plane was actually from a United partner and was one of those small commuter planes.  When I arrived at the airport I discovered the flight was delayed until 4:30 pm.  I did the usual thing of staying in the small US Airways club at Bradley (one of the best investments I’ve ever made — I renewed it while I was waiting).

The flight kept getting later.  First it was 4:52, then 5:07, then 5:35.  I asked and was told weather was not an issue, but they didn’t know any details.  As it turned out, there was another flight to Chicago on a much bigger plane leaving at 5:30.  They offered the passengers a chance to get on standby for that flight, but I decided against it.

My flight eventually boarded at 5:52.  When I got on, however, I realized that most of the passengers had taken the 5:30 flight and therefore my flight was nearly empty.  The plane was small, too, but not so small that it didn’t have two rows of first class seats.  I told the flight attendant that I had planned to upgrade, and she let me move into first. 🙂  I then had a very pleasant trip to O’Hare, only about 4 1/2 hours later than I expected.

Airline travel is worse than almost any other form of transportation, but sometimes it works out.

Hibernate clues

I get it now.  In order to use Hibernate with Derby/Cloudscape, I need to use the “identity” generator:

<class name=”Location” table=”LOCATIONS” schema=”EARTHLINGS”>
    <id name=”id” type=”integer”>
        <generator class=”identity”>
    </id>
    <property name=”address” column=”STREET_ADDRESS” />
    … other properties …
</class>

This makes sense, of course, since in the db build script I have

create table locations (
    id integer generated always as identity,
    street_address varchar(30),
    …  other column defs …
);

Now I can use this “earthlings” schema dreamed up by Capstone Courseware in my Hibernate class.

I also stumbled across a NoClassDefFoundError and fixed it by adding a jar file.

I’d say on my list of 10 Canonical Errors (the sequence of challenges I need to overcome to learn any new technology), I’m probably up through 7 now.  Getting there.

RAD, WAS, and profiles

This week and last I’ve been teaching an EJBs with RAD6 class.  Now, the installation of RAD in the training centers was done the usual way.  That means they installed it on one machine and then distributed the image to all the others.

Normally that’s fine.  The problem with doing that with RAD is that installing RAD also installs the WebSphere Test Environment, which now is a fully functional version of WAS6.  One of the major changes between WSAD 5 and RAD 6 is that the test environment is no longer customizable by workspace any more.  Instead, the WAS environment remembers every app deployed to it from any workspace at any time.

That got me in trouble in Westborough for two reasons.  One is that I had deployed a series of my own sample apps during the servlets and JSPs class, but when I deleted them at the end of the week, WAS though they were still there and complained during the EJB class about missing EAR files.  Lovely.  Eventually, when the server started up I was able to access the admin console and undeploy them.

The more fundamental problem, however, is that the installation of the test environment builds a node whose name is based on the server name.  That meant that both last week and this week, all the machines had nodes with the exact same name.  Holy conflicts, Batman.  As usual, the problem showed up when we started trying to do JNDI lookups and threw ServiceUnavailable exceptions.
Last week we did some uninstalls and reinstalls of the product, which helped, but couldn’t be the right solution.  This week we didn’t have the installation files at all, so I had to find another way.

At long last, I found it.  That’s what profiles are for.  By creating a brand new server profile, a new name is generated based on the current name of the server.  Everyone was able to create a new profile, deploy apps to it, and run everything cleanly.

Boy I wish I’d known about that a long time ago.  At least now I get it.  I’m not sure if I want to recommend a different installation process or just assume that in each course we’ll create a new profile and move on.

The definitive answer is no doubt in the IBM set-up docs for their server-side classes.  I’m certified to teach them, but haven’t looked at the set-up docs yet.  I think I may have a set at home, though, that I got during the enablement class back in May.  I’ll have to check.  I seriously doubt that any training center is going to want to install RAD on each machine individually, though.

I blame the turbulence model

When I was a research scientist at United Technologies Research Center, I worked on unsteady aerodynamics in axial turbomachinery.  That’s a complicated way of saying I worked on math and computer models of noise in jet engines.

A friend of mine there was our resident mathematician.  He knew everything about everything, or at least had a book about it and could find out whatever you wanted.  I remember finding it amazing that he didn’t have a Ph.D. when I met him.  He eventually went back to UConn at night and earned his doctorate.  I think he did something about three-dimensional tetrahedral meshes, or some such.

He also was a classic SJ in the Myers-Briggs sense.  His files were always well organized and his desk was always clean, as bizarre as that sounds.  I asked him about it once, and he said, “I can only work on one thing at a time, so when I finish with something I put it away.”  Obviously there was no point discussing the issues with anyone that irrational. 😉

At any rate, he had a quote posted in his cubicle (yep, once upon a time I lived in a cube farm).  The quote was about turbulence models, and runs as follows:

“I am an old man now, and when I die and go to heaven there are two matters on which I hope for enlightenment. One is quantum electrodynamics, and the other is the turbulent motion of fluids. And about the former I am rather optimistic.

The quote is from Horace Lamb, who wrote one of the definitive books on hydrodynamics, among other things.

The upshot is that whenever anyone in our CFD (computational fluid dynamics) group ever had trouble connecting our analyses to reality, we always blamed the turbulence model.  After all, we knew it was an approximation at best, and not necessarily a good one, so it was always a good target.

Why do I bring this up now?  Because now that I’ve moved from engineering to software, the role of turbulence models is now played by networking.  While apparently some people somewhere claim to understand it, I think that’s a myth.

I can say one thing for certain — I’m not one of them.  I am sitting here in a hotel room in center city Philadelphia, and for some unknown reason my Outlook client can’t manage to download my business email, except that sometimes it can.  Why?  Beats the heck out of me.  Maybe I forgot to sacrifice a live chicken by the light of the full moon while waving a mouse cable over my head counterclockwise.

Or maybe it’s space charge effects.  My friends in physics used to blame space charge effects, because they claimed you could never really account for them and nobody really understood them anyway.  When they got tired of space charge effects, ground loops were another favorite.

In J2EE, if I have a problem, I know the first place to look is my JNDI lookups.  I hate JNDI with a real passion.  I know how it’s supposed to work and sooner or later I can get it to work, but no matter what I do I know it’s going to be the source of my troubles.

Networks, though, are often the bane of my existence.  Some day, and that day may never come, I’ll really “get” them.  Every time I do, though, I’m wrong.

Sigh.

On the plus side, I’ve got a very good group of students in my EJBs with RAD6 class.  The class is going very well so far, but we haven’t had to fight the WAS6 test server yet.  That’s always fun, too.

What a twist!

Okay, I’ll admit it. When I first saw the commercial for the new M. Night Shyamalan movie Lady in the Water, I, too, felt compelled to say, “What a twist!”  Robot Chicken rules, and not just because it was created by Oz from Buffy, aka Scott Evil, aka Chris Griffin.

On a more serious note, geez, the Hibernate In Action book really is good. I mentioned it in a couple of posts ago, but I had no idea. I remember when I first started looking at Hibernate months ago, I didn’t really like the book. Maybe now I’ve just learned enough to “get it”.

I still get the same feeling that I had when I read Bertram Meyer’s Object Oriented Software Construction — the book comes across as being written by someone almost too arrogant for words, but if you could get past all that, the content was excellent. I still remember Meyer going on for ten pages on how class names should be written with Initial_Caps_Separated_By_Underscores and how that was the only intelligent way to do it. I initially had trouble with the fact that Hibernate book couldn’t stop trying to sell how wonderful the framework is and how theirs is the One True Way(TM) to do ORM. I guess by now I can filter that out.

Oh, and by the way, in the All Star Game this evening, the American League was losing 2-1 with two outs in the top of the ninth. A single, a double, and a triple later it’s 3-2 with Mariano Rivera coming in to close. That’s the first time I can remember looking forward to seeing Rivera come in. He got the save, of course.
Where would the Yankee dynasty have been without Rivera? Where would the Red Sox have been with Rivera?

Maybe now we’ll find out. Our Rivera is named Jonathan Papelbon, despite his blowing the save on Sunday. His ERA skyrocketed all the way up to 0.59. 🙂

What a way to enter the All Star break…

19 innings and an L.  Papelbon blows the save by giving up a homer with two outs in the 9th.  They score two runs in the 11th, only to give them back in the bottom half of the inning, missing a major baserunning error in the process.  They almost made that double play to get out of it, too.  Ginger and I went out to dinner right after that, dropped off some stuff for Xander (sleeping over at a friend’s house) and even drove around a bit before returning and the game was still going.

I knew when Rudy Seanez came in it was over, but he held together for over two innings before falling apart.  I really can’t expect more than that.

The Yankees lost, too.  We could have picked up a whole game on them.

I guess I have to be happy with how the Sox are playing, but if Foulke, Clement, and Wells are all going to be unavailable indefinitely, we need pitching.  Hopefully Theo has something in the works.

I have to admit that it sure is fun watching the Sox play stellar defense.  I can’t remember that happening in my lifetime.

It would have been nice to win that game, though.  Sigh.

EJB’s in RAD6

This week I’m back in Westborough, MA, teaching EJB’s in RAD6.  EJB’s have a rather nasty learning curve.  Here that’s made worse by the fact that the students only have the previous two training classes as experience.

(This is the same group I taught Intro Java in May and Servlets and JSP’s in June.  The companion group in Philadelphia I’ll see again next week.)

Now that I’ve been spending all this time with Hibernate, entity beans look downright primitive.  I understand that EJB 3.0 completely revises them and makes them very Hibernate-like, but I haven’t had a chance to really dig into that.  Not to mention that most companies I deal with use WebSphere as their app server, which doesn’t even understand J2SE 5, much less Java EE 5.  Some day that will change, I guess.  In the meantime I’ll use JBoss for experimentation.

Last week I downloaded Eclipse 3.2 and ran the update wizard to install all the Callisto plug-ins.  The product took forever to start up.  Then, when I tried to make a simple Hello World servlet, the whole thing crashed for reasons unknown.  I’d hate to think they’ve turned Eclipse into RAD, with all the attendant issues, but we’ll see about that, too.

Speaking of Hibernate, I wish I’d realized earlier how valuable the Hibernate in Action book is.  Since it dealt with Hibernate 2 and the field has moved to at least Hibernate 3, I was afraid the material would be too dated to be useful.  That, as they say, turned out not to be the case.  Many of the answers I need are in there.

Once again, I wish I had the Matrix capability of absorbing an entire book in seconds.  It would be worth the metal plug in the back of my head.  It would even be worth it that it would pretty much put me out of business.  I have about a dozen books I need to grok sooner rather than later, and that would make it all so easy. 🙂

Even though the second edition (now called Java Persistence with Hibernate) is on the way, it won’t be available until November.  I wish it was part of Manning’s early access program, because I could really use it now.  Instead, I just bought the HiA ebook, since I already have the hard copy.  Now I can carry it along wherever I go.

I feel like I’m getting much closer to really understanding Hibernate.  I still have a long way to go, though.

At least one Hibernate resolution

Okay, one of the issues I identified is my fault. I can use JavaBeans that don’t have id fields as long as I leave the “name” attribute out of the <id> element in the mapping document. The reason I was throwing an exception is that I was still using the same test cases from before, and one of them tried to retrieve an element by id. Duh.

Okay, at least that makes sense.

Hibernate Challenges

Hibernate is proving to be a bit more challenging than I originally anticipated. The toy problems seem to work just fine, but I have a real (if very small) database schema and a real (if very small) set of Java classes mapped to them already. Trying to insert Hibernate in between is providing a lot of issues.

  • According to the Hibernate documentation , the <id> declaration in the mapping document has an optional name attribute. The text says “if the name attribute is missing, it is assumed that the class has no identifier property.” The Location class I’m using doesn’t have an identifier property, so I left name out of <id>.

No such luck. My test threw a HibernateException anyway, with the informative message: “The class has no identifier property.” Well, duh. Somehow I thought it was going to work anyway, but no such luck.

  • Apache Derby is proving to be quite awkward. Will really likes Derby, but I don’t think it likes me. I got tired of wrestling with the embedded driver. That driver is great as far as it goes, but doesn’t allow me to view the database inside MyEclipse in between tests. Even if I close the connection in between, the driver doesn’t recognize that the database is available.
  • I did eventually get the networked driver to work, but that wasn’t simple, either. I downloaded the Eclipse plug-in, but the client jar file that came with the plug-in apparently didn’t have the org.apache.derby.jdbc.ClientDriver class in it. I had to download the full version of Derby (admittedly not very large, but I didn’t know I was going to have to do that) in order to get the proper jar files.
  • Then I made an error. I put in what I thought was the proper URL for the networked driver and started the network server. The server seemed to come up okay (though in the plug-in I had to click an OK button to get the modal dialog to go away), but the test program kept throwing a “No suitable driver” exception. It took quite some time to realize I’d dropped a backslash from my URL in the hibernate.cfg.xml file. Once I fixed that, I could use the driver again.
  • The really annoying problem, however, is that while I can read from the database now, I can’t write to it using Derby. I keep running into the problem of trying to set the primary key and it doesn’t like that. In the create SQL script, the id attribute is set to “generated always as identity”, which I thought was fine, but it’s causing me problems.
  • In the Java classes, at least one of them has a setter that does validation before setting the attribute. The Job class has attributes minimumSalary and maximumSalary. The setMinimumSalary() method tests to see that the supplied argument is less than or equal to the maximum. That sounds fine, but apparently Hibernate is calling setMinimumSalary() before it calls setMaximumSalary() for each row. The first is therefore throwing an exception. I managed to fix that by using the attribute access=”field” inside the <property> for minimumSalary.

So, I’m working on it. Later I’ll look back on this as a valuable exercise, since it’s much closer to the real way this would be used in industry rather than the standard problems. I’ll be happier when it’s all working, though.