Getting a list of Grails plugins programmatically

In September, I’m very happy to be giving a couple of presentations at the No Fluff, Just Stuff conference in the Boston area.  One of my presentations is a review of the various available Grails plugins.  To prepare for that, I thought I’d create a Grails application that acted as a survey, so people could rate the plugins they like.

One task is to get a list of available Grails plugins.  I wanted to do that programmatically, too, because I’d like to update the list automatically using the Quartz plugin (of course).

How do you get a list of available plugins?  My first thought was to do the HTML equivalent of screen scraping at the main plugin site, http://grails.org/Plugins .  At that site everything is nicely divided into categories, along with links to descriptions and more.

Screen scraping HTML is not fun, though.  I’ve done it before, when necessary, but it’s not very robust and tends to run into problems.  Many of those problems have to do with the fact that HTML is a mess.  Most web sites are filled with HTML that isn’t even well-formed, making processing it programmatically a real pain.

GinA, however, mentioned HTTPUnit as an easy way to access a web page.  Since it’s a regular old Java library, that meant I could use it with Groovy.  Therefore, my first attempt was:


import com.meterware.httpunit.WebConversation

def baseUrl = 'http://grails.org/Plugins'

def wc = new WebConversation()
def resp = wc.getResponse(baseUrl)

Unfortunately, I’m already in trouble even at that point.  If I run that, I get a massive exception stack trace with shows that the included Neko DOM parser choked on the embedded prototype JavaScript library.

While I was debating what to do about that (I really didn’t want to just open the URL, get the text, and start having Fun With Regular Expressions), I noticed a blog posting here, from someone named Isak Rickyanto, from Jakarta, Indonesia.

(A Java developer from Java.  How cool is that?  Or should I say, “how Groovy?” :))

Isak points out that there is a list of Grails plugins at http://svn.codehaus.org/grails-plugins/ .  As a Subversion repository listing, it’s not full of JavaScript.  Even better, every plugin is listed as a simple link in an unordered list.

I therefore modified my script to look like this:


def baseUrl = 'http://svn.codehaus.org/grails-plugins/'

def wc = new WebConversation()
def resp = wc.getResponse(baseUrl)
def pluginNames = []
resp.links.each { link ->
    if (link.text =~ /^grails/) {
        def name = link.text - 'grails-' - '/'
        pluginNames << name
    }
}
println pluginNames

Here I’m taking advantage of the fact that the WebResponse class (returned from getResponse(url)) has a method called getLinks().  Since there was one link that had the name “.plugin-meta“, I decided to use a trivial regular expression to filter down to the links definitely associated with plugins.  The WebLink.getText() method then returned the text of the link, with gave values of the form

grails-XXX/

for each plugin.  One of the things I love about Groovy is that I can then just subtract out the characters I don’t want, which is how I added the actual plugin names to an array.

Unfortunately, while that’s part of what I want, that isn’t everything I want.  I’d like the version numbers and the descriptions, too, if possible.  I could go digging into the various directories and look for patterns, but a different idea occurred to me.

I finally remembered that the way I normally find out what plugins are available is to run the

grails list-plugins

command and look at the output.  You’ve probably seen it.  It gives an output like

Welcome to Grails 1.0.3 - http://grails.org/
Licensed under Apache Standard License 2.0
Grails home is set to: c:\grails-1.0.3

Base Directory: c:\
Note: No plugin scripts found
Running script c:\grails-1.0.3\scripts\ListPlugins.groovy
Environment set to development

Plug-ins available in the Grails repository are listed below:
-------------------------------------------------------------

acegi               <0.3>            --  Grails Spring Security 2.0 Plugin
aop                 <no releases>    --  No description available
audit-logging       <0.4>            --  adds hibernate audit logging and onChange event handlers ...
authentication      <1.0>            --  Simple, extensible authentication services with signup ....
autorest            <no releases>    --  No description available

etc.  So if I could get this output, I could break each line into the pieces I want with simple String processing.

How can I do that?  In the spirit of reducing it to a problem already solved, I realized I just wanted to execute that command programmatically and capture the output.  One way to do that is to take advantage of Groovy’s ability to run command line scripts (GinA covers this, of course, but so does Scott Davis’s most excellent Groovy Recipes book).  Here’s the result:


def names = []
def out = "cmd /c grails list-plugins".execute().text
out.split("\n").each { line ->
    if (line =~ /<.*>/) {
        def spaceSplit = line.split()
        def tokenSplit = line.split('--')
        def name = spaceSplit[0]
        def version = spaceSplit[1] - '<' - '>'
        def description = tokenSplit[-1].trim()
        names << name
    }
}

Basically I’m executing the list-plugins command at a command prompt under Windows (sorry, but that’s still my life), splitting the output at the carriage returns (for some odd reason, using eachLine directly kept giving me errors), and processing each line individually.  The lines listing plugins are the ones with version numbers in angle brackets (like <0.3>), and the descriptions came after two dashes.  It seemed easiest to just split the lines both ways in order to get the data I wanted.

I ran this script and the other script together to see if I got the same output.  Here’s the result:


println "From 'grails list-plugins': " + names
println "From svn repo: " + pluginNames
println "Difference: " + (pluginNames - names)

From 'grails list-plugins': ["acegi", "aop", "audit-logging", ..., "yui"]
From svn repo: ["acegi", "aop", "audit-logging", ..., "yui"]
Difference: ["extended-data-binding"]

Why the difference? From the list-plugins output, here’s the line for “extended-data-binding“:


ext-ui              <no releases>    --  No description available
extended-data-binding<0.2>            --  This plugin extends Grails' data binding ...

Yup, the name ran into the version number format.  Sigh. Of course, the other problem with this is that at the moment it’s dependent on my own system configuration (Windows, with the grails command in the path), which can’t be a good thing.

Finally, after all this work, I suddenly realized that I already have the script used to list the plugins.  As with all the other Grails commands, it’s a Gant script in the <GRAILS_HOME>\scripts directory called, obviously enough, ListPlugins.groovy.  According to the documentation at the top, it was written by Sergey Nebolsin for version 0.5.5.

What Sergey does is to go to a slightly different URL and then parse the results as XML.  His script accesses

DEFAULT_PLUGIN_DIST = "http://plugins.grails.org"

instead of the SVN repo location listed above, but if you go there, they look remarkably alike.  I wouldn’t be surprised if http://plugins.grails.org is simply an alias for the SVN repository.

Note that the script also creates a cached version of the plugin list, called plugins-list.xml, which is kept in the

"${userHome}/.grails/${grailsVersion}/plugins/"

directory.  That’s completely understandable, but a lousy location on a Windows box.  I never go to my so-called “user home” directory, so I would never occur to me to look there for information.

His script checks to see if that file is missing or out of date.  If it’s necessary to update it, he opens a URL and starts processing:


def remoteRevision = 0
new URL(DEFAULT_PLUGIN_DIST).withReader { Reader reader ->
    def line = reader.readLine()

...

    // for each plugin directory under Grails Plugins SVN in form of 'grails-*'
    while(line=reader.readLine()) {
        line.eachMatch(/<li><a href="grails-(.+?)">/) {
            // extract plugin name
           def pluginName = it[1][0..-2]

           // collect information about plugin
           buildPluginInfo(pluginsList, pluginName)
        }

etc.

So, in effect, he’s screen scraping the SVN page; he’s just doing a better job of it than I was.

Incidentally, the line in his script that lead to my parsing problems is on line 86:

plugins << "${pluginLine.padRight(20, " ")}${versionLine.padRight(16, " ")} --  ${title}"

I could bump up the padding by one, or learn to parse the output better. 🙂 I expect the “right” answer, though, is to do what Sergey did, pretty much. Still, if all I have to do is add a little padding, it’s awfully tempting to just “reuse” Sergey’s existing script.

In an upcoming post, I’ll talk about how I used the RichUI plugin to apply a “star rating” to each entry so that people could vote. I don’t have the site ready yet, though. I’ll be sure to mention it when I do.

2 responses to “Getting a list of Grails plugins programmatically”

  1. Have a more careful look at GinA again, it surely doesn’t refer to HttpUnit but to HtmlUnit.

  2. Hi Marc,

    Yes, I did find that after the fact, and I’ve used it since. I wished I’d noticed it sooner. Still, processing by hand probably wasn’t a bad exercise for me.

    Thanks for your comment,

    Ken

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.