Monday, May 11, 2009

Clapham: A railroad diagram generator

I don't work with the Oracle database very much anymore, and one thing I miss is their server documentation. I still have my old copy of the Oracle 7.3 SQL Language Reference, and sometimes I reach for it when the SQL:2008 standard has fuddled my brain and I want to be reassured that SQL can be simple, powerful and trustworthy. The calming effect is partly due to the authoritative tone, but the railroad diagrams describing the syntax of each command say 'Don't worry'.

For example, here is Oracle 10.2's CREATE TABLE:



Yes, railroad diagrams. You can easily get lost in something as large as the SQL language, with its hundreds of commands, keywords and unexpected clauses, and railroad diagrams are the map.

When it came to writing our documentation for SQLstream, we of course wanted to include railroad diagrams to illustrate our dialect of SQL. It's possible to construct the diagrams by hand, but it's tedious, error prone, and it's difficult to get the diagrams to look consistent. Unbelievably, we couldn't find a tool to generate them, so we ended up writing them by hand.

Now I've gotten a little breathing room after the release of SQLstream 2.0, I took a couple of days to write an open-source railroad diagram generator. I've released it on Sourceforge, and named it Clapham, after the South London town which is home to the most complicated railway junction you ever saw.

This has been a nice return to old-school open source, with its mantras "release early, release often"; and "don't whine: contribute". The diagrams aren't yet as pretty as Oracle's, but we're getting there. Even though this is the very first release, and the project is barely alpha, it has already generated charts for LucidDB's not inconsiderable SQL grammar.

More details at the home page, and you can download release clapham-0.1.003 from SourceForge. Contributions welcome, of course.

15 comments:

rpbouman said...

Hi Julian!

great! I'll check it out.

Now I'm just curious - did you ever check out ANTLR and its IDE ANTLR Works (http://www.antlr.org/works/index.html)? ANTLR Works creates these railroad diagrams too for the grammar you happen to be editing at the time. This is on the fly and almost instantaneous: you click the grammar rule of interested, and bwam, there's the diagram.

Now for documentation purposes, I can imagine you might sometimes want to simplify the diagrams a bit as the grammar may contain some constructs that make parsing easier or faster, but still I found that usually the ANTLR works diagrams are pretty much what I wanted to see.

kind regards,

Roland

Arjen Lentz said...

Cool!
I tend to like (E)BNF but I realise that a more visual approach can be handy for users, and I know that many people actually find BNF hard to read.

Unknown said...

pretty sweet julian! Great work.

Ross Patterson said...

IBM's BookMaster product (think "HTML authoring for mainframes circa 1985") had (and still has, I believe) a markup scheme for generating railroad diagrams, rather than trying to interpret one or another form of grammar. Gary Richtmeyer's B2H ("BookMaster 2 HTML") program (http://www.vm.ibm.com/download/packages/descript.cgi?b2h) includes parser for the BookMaster syntax markup that generates HTML results rather than images. I'm of two minds about that - I'm a text guy at heart, but images are nice too.

Ross

Julian Hyde said...

Roland,

I'm not too familiar with ANTLR. I switched from JavaCUP to JavaCC a few years ago, and I have been happy with it. I'll take a look and see if there is any code I can leverage.

I can't tell from the site whether they do the things you would need to generate documentation: HTML-friendly formats (PNG images and HTML Map elements), batch mode, upper limit on the width of an image.

If there are ANTLR users out there we could add a parser to read .g files. If there are any other features that are missing, please let me know and I'll add them to the roadmap.

Julian

rpbouman said...

Hi Julian!

"I can't tell from the site whether they do the things you would need to generate documentation: HTML-friendly formats (PNG images and HTML Map elements), batch mode, upper limit on the width of an image."

a little bit of that is there, some isn't. You can right click an individual rule diagram and export to either EPS or .png

You can also export all grammar rules using

File > Export All Rules

From there you can choose again between eps and png.

So, unfortunately no contol over the width, and no HTML IMAGE maps. That'd be a great feature though.

Arjen Lentz said...

@julian SVG is the way for the web. Vector/scalable. That can be converted to bitmap (PNG, etc) or PDF.

thecarpy said...

SVG is fine, but not for html help, which has not finished dying, yet ...

Julian Hyde said...

I know that SVG support on the client is sketchy, so we definitely need to support other formats. Clapham uses Batik to create an internal SVG image, then generates PNG images from SVG. We could generate other formats from SVG too.

John Russell said...

Just a followup to Ross's comment. Yes, BookMaster had a nice set of tags for railroad diagrams. Similar tags made it into Docbook; however I'm not aware of Docbook-to-RR-diagram open source paths. IBM did structured-doc-to-RR-diagram (using the in-house IBMIDDOC DTD). Don't know how that all evolved after 1999.

IMHO, the best thing that BookMaster had that never made it into Docbook, DITA, et al, is the tagging to produce Gantt charts. Very useful to be able to manipulate schedule dates by twiddling attributes in text files. Haven't seen that capability in any form since IBM switched from BookMaster to SGML and then XML.

Jeff said...

Regarding the history of syntax charts in manuals, I located some of the old manuals in which I first used them.

The earliest of my books-using-charts (that I could find) was the Intel PL/M-86 User's Guide for 8086-Based Development Systems (© 1980), with over two dozen charts.

Also copyrighted/published that year was my 2920 Signal Processing Applications Software/Compiler User's Guide, with 32 charts.

The INTELLEC® Series III Microcomputer Development System Programmer's Reference Manual (© 1980), has 13 of my syntax charts and a brief introduction to their use, in the Reader's Guide section up front.

The MCS-86 Macro Assembly Language Reference Manual (about 360 pages) that I wrote also used syntax charts extensively; probably 2-3 dozen. Published in the early '80s ...

Then in 1985 I published a 4-volume set on UX-BASIC: Tutorial, User's Guide, Language Reference (over 400 pages) and a full set of syntax charts that ran 32 pages plus three pages of definitions for elements in the charts.

Tom Copeland said...

Awesome, Julian! I blogged about it along with some suggestions for how JavaCC could make this easier - perhaps we can write a straight BNF output format for JJDoc.

Tom Copeland said...

Also, FWIW, Eric Dahlstrom saw this and tweeted "that svg output is not very pretty, suggest using either svg paths or rects directly, and to make sure text is placed correctly". Probably the "use paths or rects" means more to you than it does to me :-)

Tom Copeland said...

Just a final note - I've added a BNFGenerator class to JJDoc so it's much easier to go from a JavaCC grammar to a Clapham-generated SVG diagram, details are here. Fun stuff!

Julian Hyde said...

Two exciting news items.

1. Edgar Espina (developer of ANTLR IDE) has been contributing to Clapham recently, and has included Clapham in ANTLR IDE. He has fixed the layout bugs, added new input languages and output formats.

2. I have changed Clapham's license from GPL to Simplified BSD. This should make it easier to embed Clapham in other projects.