Groovyness with Excel and XML

Today in class one of the students mentioned that they need to read data from an Excel spreadsheet supplied by one of their clients and transform the data into XML adhering to their own schema.

I’ve thought about similar problems for some time and looked at the various Java APIs for accessing Excel. I spent a fair amount of time working with the POI project at Apache, which is a poor substitute but at least worked.

On the XML side, the Java libraries have gotten better, but working with XML in Java is rarely fun. I know the Apache group has built a few helper projects to make it easier, but I haven’t used them that much. In class, students don’t really want to talk about other projects; they want to know what’s in the standard libraries.

In short, I know I could write the necessary code to take data out of Excel and write it out to XML, but it would be long and awkward. It certainly wouldn’t be much fun.

Now, though, I’m spending a lot of time with Groovy. I’m working my way through the book Groovy in Action (Manning), which has jumped to the top of my favorite technical books list. I’m still learning, but I knew there was a Groovy library for accessing Excel, and I knew Groovy had a “builder” for outputting XML. I just needed to see how to write the actual code. I set up a sample Excel spreadsheet with a few rows of data and went to work.

Here’s the result. It’s about 25 lines of code all told. In other words, it’s almost trivial. I’m amazed.

package com.kousenit;

import org.codehaus.groovy.scriptom.ActiveXProxy

def addresses = new File('addresses.xls').canonicalPath
def xls = new ActiveXProxy('Excel.Application')

// get the workbooks object
def workbooks = xls.Workbooks
def workbook = workbooks.Open(addresses)

// select the active sheet
def sheet = workbook.ActiveSheet

// get the XML builder ready
def builder = new groovy.xml.MarkupBuilder()
builder.people {

for (row in 2..1000) {
def ID = sheet.Range("A${row}").Value.value
if (!ID) break

// use the builder to write out each person
person (id: ID) {
name {
firstName sheet.Range("B${row}").Value.value
lastName sheet.Range("C${row}").Value.value

address {
street sheet.Range("D${row}").Value.value
city sheet.Range("E${row}").Value.value
state sheet.Range("F${row}").Value.value
zip sheet.Range("G${row}").Value.value

// close the workbook without asking for saving the file
workbook.Close(false, null, false)
// quits excel

I’d call that a successful experiment. It certainly was a happy one. I know I’ll do more in the future. I’d bet that somebody with more experience could show me how to condense that even further.

Groovy is just plain fun, and I haven’t felt that way about Java for a long, long time.

Brush with (semi-)greatness

This week I’m teaching an XML class in NYC.  It’s actually a basic XML class along with some XML schema training, in order to help the client work with data coming from external sources.  I’ll know more when the class starts tomorrow, but I expect to work with fairly sophisticated schemas.

Since the class is in New York and I hate driving through New York (despite my recent activity there), I decided to take the train from New Haven.  NH is about a 40 minute drive for me, but I can pick up the Acela there, which is clean, fast, and about a thousand times more comfortable than any plane I’ve been on in last year.

My hotel is just a block away from Madison Square Garden.  I decided that it might be fun to go to a Knicks game.  I’ve never been to one.  Heck, I’ve only been to one NBA game at all, and that was years ago.  All I remember is sitting in the nose-bleed section at a Philadelphia 76ers game way back in the 80s, though it might have been the 70s.

I only needed one ticket, so I asked for the best seat available.  The result was that I blew my entertainment budget for the next six months (which is sad, because I already spent that last month, but so be it) and wound up with a floor seat about six rows behind the Knicks’ bench.  Isiah Thomas himself blocked my view of the free-throw line.

As it turned out, the guy next to me was Isiah’s nephew.  (I’d mention his name here but I can’t remember it. :()  He continually interpreted Isiah’s signals for me, i.e., “now they’re gonna go full court press followed by a 3-2 zone,” or “this next play will be a screen for Marbury.”  It was really fun that way.  I was just glad I hadn’t said anything about Isiah before I realized who I was sitting with.

As a fan of Bill Simmons, I’m abundantly aware of Isiah’s staggering weaknesses as a coach and especially as a GM.  When the guy next to me claimed the Knicks would be in the finals in three years, I managed not to say anything, but just barely.

The Knicks had a 3 point lead near the end, then gave up a 3, then took a shot that was clearly goal-tended but wasn’t called, then gave up another 3.  Finally, with less than ten seconds to go and a 3 point deficit, Marbury decided to drive to the basket (??) and his shot was blocked, effectively ending the game.  The crowd went home unhappy, but I had fun.

The SOA bandwagon

Just a quick post this morning from sunny Dover, Delaware. I’m doing an XML class this week with a brief introduction to web services.

Web services is hot, but mostly because the buzzwords “service oriented architecture” is hot. I can understand the motivation: high level IT executives see all these systems they’ve spent so many millions of dollars on and wonder why they can’t all work together.

(Insert your own, “can’t we all just get along?” joke here.)

A web service is generally interpreted these days as an XML API for a system. Wrap them all in XML APIs and suddenly you’ve achieved integration through the sophisticated use of text files. Of course, the devil is always in the details. Integration isn’t always so easy, and a true service oriented architecture is more than just an XML wrapper — to get the real benefit you should create common baseline services that everyone can access, and ultimately the individual systems themselves dissolve into assemblies of services. That’s not nearly as likely to happen.

In analogy with the heady days of the late 90s when we were “web enabling” everything, I like to call this phase “web service enabling.” Just as back then some systems were a lot easier to web enable than others, so today the costs and benefits of web service enabling systems varies widely based on their original designs.

It’s an interesting topic, though. We’ll see how it plays out, given the growing developer antipathy towards XML (favoring JSON and “convention over configuration” instead).

Focusing on what’s important

At the No Fluff, Just Stuff conference I attended last week, I managed to talk to a couple of the presenters and quietly ask about the rates they charge.

That’s always a dicey subject, of course, but it’s very hard to get good information about that.  Software trainers don’t have a union, or anything like that.  We also tend to be a pretty independent lot.  Also, software developers are almost always highly opinionated, and trainers can be even worse since they have a soapbox to stand on.  It’s easy to see how egos can bump into each other when we get together.   The result is that my “market research” consists of talking to a few, trusted individuals and then negotiating with my favorite training companies as a subcontractor.

Those can be unreliable sources of information, though.  Individuals can exaggerate and claim that they gate a particular rate when they rarely see it.  Training companies want to minimize their costs so they may claim they can find someone else for a lower price.

My original approach was to talk to a good friend who was a trainer and adopt his price as mine.  It also fit my budget, which is based on invoicing a certain minimum amount per month.  Based on that amount, I know how many days I need to teach.  I’ve learned over the past couple of years what my upper limit is before I start to get really tired and my quality starts to suffer.

For me, though, I still get many more requests than I can honor.  Without any marketing at all, my schedule can get booked solid for months.  It’s probably inevitable that I start wondering whether I’m undercharging for my services.

At the NFJS conference the presenters are among the leaders in the development industry.  They consequently are in serious demand for their services, but mostly in the form of contracts.  Most of them also do training, though, and they charge a premium for it.

Comparing my rate to theirs, though, is not so easy.  I’m a trainer first and a developer second, though writing code is very important to me.  I can’t imagine I’d ever be happy without being in a classroom occasionally, for reasons I’ve detailed here many times.  Since I spend much more time and effort teaching than writing code, it’s also very unlikely that I’ll ever come up with some fundamental framework that everyone adopts, so I’m unlikely to be the sole or original source of some highly sought-after technology.

Still, it’s hard not to get greedy.  I spoke to a couple of presenters at the conference, as I said above, and what they said gave me the feeling I was significantly underpaid.

Now, I’m not a presenter at NFJS.  I don’t have major book publications to my name.

(Aside: I once read that Bruce Eckel, author of Thinking In Java — now in 5th edition, but whose first edition was my first Java book — said that the book didn’t make him a lot of money.  Instead, being the author of that book helped him increase his rates considerably, and that made him a lot of money.)

I don’t run a consulting firm with lots of employees.  I also don’t run a training firm with lots of employees.  I occasionally work directly with a client instead of as a subcontractor to training companies, and that’s both much more lucrative and much more work.

Greed is a great motivator, though.  So is jealousy.  I don’t like either one, but it’s hard to ignore them.  I suppose they’re acceptable if I use them as motivation to work harder and improve myself.  I’m now thinking I should get more involved in book projects, for example, and become a better developer by doing more project work.

In the meantime, though, I prodded my clients by asking for a slightly higher rate.  One client accepted without a second thought.  Another pushed back, and I compromised.  A third said fine, charge anything I want, but that it will affect what work is offered to me later.

I also had a change to talk to one of my friends at a training company and discovered that one of those presenters at the conference was lying to me, or at least exaggerating.

The upshot of all of this is that I have to periodically remind myself what’s important to me in this business.  I want to work with clients I like.  I want to work on state-of-the-art technologies using state-of-the-art tools (I’m never going to be a Microsoft Word trainer, for example).  I want to make money for people I like.  I want to help students do things they couldn’t do before.   I’d also like to make a million dollars, but only if I can still do all the above.

Money isn’t the most important thing, not by a long shot.  I need to have enough that it’s not an overriding issue, and there’s a certain amount of pride involved, too.  I believe I am a very good trainer and I’m always working to get better.  I think that’s worth a certain amount of reward and a certain amount of respect, a respect in this culture is often given in dollar amounts.  Still, the key is to do what I want to do with the people I want to work with, and that’s awfully valuable.  I spent years and years in other jobs where I didn’t get any of that.

My route to making money will be to learn what’s both popular and enjoyable and cutting edge and do a lot of that.  That’s why these days I teach a lot of Hibernate, Spring, JSF, and Ajax classes.  I also happen to like EJB3 and think it’s going to be very big.

In the meantime, I’m going to spend this afternoon digging into  Dierk Konig’s Groovy in Action book some more and get ready for next week’s class.

Persistence providers are not all alike

At the NFJS conference I attended over the weekend, I wound up going to two session by Mark Richards.  His topic was the JPA specification as part of the overall EJB3 spec.

As I’ve mentioned here, I’m quite interested in that.  Will and I just finished an introductory EJB3 course for Capstone where I wrote the bulk of the JPA stuff.

(Actually, I took advantage of Will shamefully.  I can’t believe how much he ultimately wrote.  Still, he seems okay with it.  I still feel like I got lucky.)

I sat in part of the “Introduction to Java Persistence API” and the bulk of the “Advance Java Persistence API” talks.  I was already familiar with the material, but there’s always more to learn.  Plus, I had awkward questions to ask. 🙂

After the introductory talk, I approached Mark and asked him two questions.  It helps to know that Mark is a “Certified Senior IT Architect” at IBM.

My first question: what the heck is IBM’s problem?  In other words, why don’t they have an application server that supports Java EE 5, or at least JPA and EJB3?  What are they waiting for?  I mean, Sun is already there (of course) and their Sun Java Application Server (Glassfish) is actually usable.  JBoss is pretty much ready to go.  WebLogic is in their last betas.  Where is WebSphere?  Don’t they care at all?

You may say that it was an unfair question, and you’d be right.  It’s just that I have no one else to ask it to.  Or, stated a bit more honestly, I have no one else in any position of influence to vent my frustrations to.

He started off by saying that IBM was moving “deliberately” because they don’t own a persistence provider, the way Oracle has TopLink or JBoss uses Hibernate.  Now, I’m basically a friendly person, but I knew that was nonsense and said so.  He backed down and admitted there were actually “lots of reasons.”  I let it go at that.

A clue can probably be found in the excellent podcasts made by the Java Posse.   In podcast #106 they had an interview with an IBMer, and he basically said that IBM is not a technology-driven company.  Instead, they’ve made the business decision to wait to ask their clients to upgrade to a new version, even implying that the demand for Java EE 5 is not there in there the marketplace.

I find that highly questionable.  IBM developers are at the heart of many of the major open source projects we use today.  IBM even donated Eclipse, for crying out loud, and Eclipse is basically the Emacs of our generation.  IBM tries to be on the forefront of technology.

That said, RAD6 is a mess and from what I’ve gathered, RAD7 is worse.  The Eclipse part is great, but there are huge numbers of bugs and problems in the products, not to mention that they suck up all the memory on your machine and go looking for more much like the aliens in Independence Day.  I think they’re going slowly because they think they’ve still got the market under control and that they don’t want to jump to the new version quickly.  From what I gather, the version of WebSphere that will support Java EE 5 (WAS 7?) will not be out until at least the middle of 2008.  Only time will tell if that decision costs them market share.

Anyway, that’s not Mark’s fault.  It was amusing having him sit next to Brian Goetz from Sun at the last Birds of a Feather session and watch them disagree on fundamental issues.

(Aside: Brian Goetz is a huge name in multithreaded programming.  He wrote the Addison-Wesley book on  threading (Java Concurrency in Practice), and, in a much-discussed move, actually joined Sun about a year ago as most of the best developers were leaving.  One of the best lines of the conference was when he admitted that people said to him, “That’s the first time I’ve seen a rat jump ONTO a sinking ship.” :))

I don’t blame Mark for not having a decent answer — I suspect it’s more a marketing issue than anything else.

The other question I asked him about was the strange unidirectional one-to-many behavior I discussed here in an earlier post.  That’s where the “one” class has a collection of the “many” type, but the “many” class doesn’t have an attribute of “one” type.

Trivial example: an Order may have a collection of Product instances, but the Product doesn’t have a reference to the Order.

Now, the database implementation doesn’t care whether the relationship is bidirectional or not.  The PRODUCT table is going to have a foreign-key to the ORDER table, because there’s no way to know how many columns you’d need in the ORDER table to do it the other way around.  If the association is bidirectional, then there’s no problem.  The Product class has an attribute of type Order and adds a @ManyToOne annotation on it, while the Order class has a collection of Product attribute called “products”, on which it adds a @OneToMany(mappedBy=”product”) annotation.  Everybody’s happy.

Except that it’s wrong.  Why make the Product know about the Order?  And what happens if you forget to set the Order attribute of the Product?  Do you get referential integrity issues or worse?  Add to that the fact that it’s just ugly and you see there’s an issue.

The problem gets much worse, however, if the relationship is unidirectional on the collection side.  The JPA specification states that in a unidirectional association like that, the database implementation should use a link table between the two entity tables.  But nobody does it that way, for good and sound reasons.

Therefore, I asked Mark Richards about it.  Why did the spec recommend that?  Once he realized I wasn’t asking about how to map it, but rather disagreeing with the recommendation in the spec, he lowered his voice and became a conspirator.  “It’s that way,” he said, “because Sun wanted it that way.”

He claimed that this issue was roundly debated during the JSR specification meetings and the debates weren’t terribly friendly, either, but that Sun ultimately made a decision and therefore this is what we have.

Hmm.  It’s hard to know how much truth is in that.  I wasn’t there, and don’t know anybody who was.  It’s so in character for someone from IBM to blame Sun for any problems that it could be true or not.

(Obvious example of an awkward IBM/Sun relationship: why name their flagship editor  platform Eclipse, anyway?  Is that supposed to say something about its relationship to Sun?  Inquiring minds want to know, but of course IBM claims it was all a coincidence.)

So the answer to that is also essentially, that’s what the spec says and we have to live with it.

In the talks themselves, Mark took great pains to try to show how the same code could use either TopLink or Hibernate as its persistence provider.  He switched back and forth many times and showed how it all worked.

Except when it didn’t.  He demonstrated several cases where Hibernate violated the spec in significant ways.  He told a story about asking a member of the Hibernate core team about a particular issue, only to receive an earful about how they knew better than the spec and weren’t going to change their product for something they didn’t agree with.  It came across as arrogance worthy of the Rails team, which apparently flows from Gavin King on down.

I have no idea if that’s actually true or not, either, but it’s not the first time I’ve heard it.

Finally, I get to the reason for today’s post.  As part of our EJB3 materials, we implemented a system where a Proposal has both public Comments and professional Comments.  In other words, we had two one-to-many relationships between Proposal and Comment, and both were unidirectional.  To illustrate the issues with the spec, we decided to show how to implement the public comments using a link table (as the spec suggests) and the professional comments by making the relationship bidirectional.  So far, so good.

The problem came when we tried to do the cascade delete.  Deleting a Proposal ought to delete both kinds of Comments.  We set CascadeType.ALL on both relationships and hoped for the best.

What happened?  TopLink deleted the professional comments without a problem (the bidirectional version), but failed with a foreign key violation when trying to delete the public comments (through the link table).

I decided to re-write the test to use Hibernate, and, lo and behold, that worked like a charm.  Go figure.

So what’s the conclusion?  I’m not trying to change the spec.  I’m an instructor and developer who has to use what’s available and show others how to deal with it.  Frankly, it’s an interesting demonstration to see one provider work and the other fail.  I guess I was just surprised, given the build-up, which one succeeded.

I have other, more philosophical comments to make about those presentations and the conference in general, but that’ll wait until another post.

Just to leave on an up note, however, one of the best lines of the conference was when Neal Ford asked everyone, “do you think it would have been easier to introduce Groovy into your organization if the language had been called Enterprise Business Execution Language?”


20 days and counting…

In honor of the first Yankees/Red Sox game of the season (okay, pre-season), let me just remind everyone that opening day is only 20 days away. Sweet.

Just to prove to myself that my son (Xander, age 14) doesn’t read my blog, I’ll let you in on a little secret. I’ve told him that we’re going out together on April 15. He doesn’t know where. Well, as an official member of Red Sox Nation (a Monster member, no less), I was able to acquire a pair of front row tickets on the Green Monster in Fenway Park.

I’ve always said to my wife that someday before I die, I was going to get monster seats. That someday is April 15. Hey, why wait, even if I did have to take out a second mortgage to afford them?

(That was an exaggeration, if only a small one.)

Don’t tell anybody about this. If the boy comes to me tomorrow and knows about the game, then somebody here said something. 😉

Playing Games with Martin Fowler

It’s been a very busy week.  Will Provost (owner of Capstone Courseware) and I just finished putting together a two-day Introduction to EJB 3.0 course, which took a lot of work, especially because I was teaching a Spring class at the time.  That meant late nights and challenges for both of us.  I think we both hit the limit of our endurance, but at least it’s finished (except for the instructor guide, but that’s coming).

This weekend, however, I’m attending my second No Fluff, Just Stuff conference here in Danvers, MA.  This edition of NFJS is formally called the New England Software Symposium, which sounds much more formal than it is.

Part of the reason that I actually pay to go to this conference is the No Fluff Just Stuff philosophy.  There aren’t any vendors or marketing people here.  There’s a cap on the number of attendees at around 250.  It’s held over the weekend to provide the minimum disruption to work.  The conference basically consists of the best developers in the world talking about their favorite technologies.

This year I planned to spend some time with Scott Davis.  He was the co-author of the JBoss at Work book that I enjoyed so much last year.  I even used it as the required text in my Developing Enterprise Applications class at Rensselaer last summer.  When I attended the conference last year I got to meet him and enjoyed his company.  Then I found out a few months ago that he’s big in the Groovy and Grails worlds (he runs the aboutGroovy web site), which just shows what a small world this is.  I knew I wanted to talk to him about all this.

It turned out he’s running the conference this time around.  He’s also giving several presentations.  The one I attended today was on mocking web services in order to test client apps.  He’s also big on RESTful web services and had a talk that included discussing the  Yahoo!, Amazon, and Google API’s.  I didn’t get to go to that one, however.  I was busy attending Neal Ford’s talk on implementing SOA, which was very interesting.

Neal works for ThoughtWorks, which is quite a famous company, at least from my point of view.  The chief scientist at ThougthWorks is the very famous Martin Fowler, who has a fantastic blog/wiki (he calls it a bliki) at and is huge in the agile community.  His UML Distilled book is everybody’s first UML book, and his Patterns of Enterprise Architecture should be required reading for every serious developer.

Neal Ford also gave the keynote this evening, where he talked about Polyglot Programming and made a strong case that Groovy will be the Next Big Thing.  Okay, maybe I enjoyed his talk so much because I agreed with so much of it, but he really did do a good job.

Then the fun happened.  He and Scott got a few people together over in the bar, where we were joined by none other than Martin Fowler himself.

Frankly, for me that registers as a brush with greatness.  It took a lot of self-control for me not to go all fan-boy on him.  I did tell him I’ve been a great admirer of his for years and have recommended his books to hundreds of students.  He tolerated that but I could see he wasn’t wild about the adulation, so I backed off.  He and his wife (he lives in the area so she came along) brought a few British board games to the bar and everybody sat around playing them while consuming various liquid refreshments.

I knew this was going to be an interesting conference, but I never imagined I’d be sitting in a hotel bar around midnight playing board games with Martin Fowler.

I’m really impressed by many of the presenters I’ve met.  They all seem so accomplished and professional.  I’d like to do the same if possible.  I often feel I have to work hard to avoid what I call the “instructor trap,” which is to know a lot but not to actually have done much.  These guys have all done so much it’s dazzling.  I feel a certain reflected glory just by hanging around them.

(Yeah, I’ll get over this hero worship soon enough, but I might as well have fun with it while it’s going on.)

I am rather looking forward to telling a group of future students about the night I spent playing games with Martin Fowler and his wife, though. 🙂

%d bloggers like this: