Friday, April 10, 2009

Formatting MDX as plain text

When Mondrian tools output MDX results as text, such as in the cmdRunner utility, we've been using the same old crappy format for years. For example, the query
select
  crossjoin(
    {[Time].[1997].[Q1], [Time].[1997].[Q2].[4]},
    {[Measures].[Unit Sales], [Measures].[Store Sales]}) on 0,
  {[USA].[CA].[Los Angeles],
   [USA].[WA].[Seattle],
  [USA].[CA].[San Francisco]} on 1
FROM [Sales]
is formatted as
Axis #0:
{}
Axis #1:
{[Time].[1997].[Q1], [Measures].[Unit Sales]}
{[Time].[1997].[Q1], [Measures].[Store Sales]}
{[Time].[1997].[Q2].[4], [Measures].[Unit Sales]}
{[Time].[1997].[Q2].[4], [Measures].[Store Sales]}
Axis #2:
{[Store].[All Stores].[USA].[CA].[Los Angeles]}
{[Store].[All Stores].[USA].[WA].[Seattle]}
{[Store].[All Stores].[USA].[CA].[San Francisco]}
Row #0: 6,373
Row #0: 13,736.97
Row #0: 1,865
Row #0: 3,917.49
Row #1: 6,098
Row #1: 12,760.64
Row #1: 2,121
Row #1: 4,444.06
Row #2: 439
Row #2: 936.51
Row #2: 149
Row #2: 327.33
I've just checked in an alternative formatter that makes the result look more like a pivot table. The same query would come out like this:
                     1997       1997        1997        1997
                     Q1         Q1          Q2         Q2
                                            4          4
                     Unit Sales Store Sales Unit Sales Store Sales
=== == ============= ========== =========== ========== ===========
USA CA Los Angeles   6,373      13,736.97   1,865      3,917.49
USA WA Seattle      6,098      12,760.64   2,121      4,444.06
USA CA San Francisco 439       936.51      149        327.33
Two questions:

1. Should we move this code into the olap4j code base? (It would seem to make sense because it doesn't require any mondrian internals to do the job, and the processing requires a 'grid model' similar to query models already part of olap4j. But I don't want to 'dump' code that is not generally useful.)

2. What do people feel is the ideal format for formatting MDX results as text? As a starting point, another couple of possible formats are below.

"Oracle" format
                     1997
                     Q1                     Q2
                                            4
                     Unit Sales Store Sales Unit Sales Store Sales
=== == ============= ========== =========== ========== ===========
USA CA Los Angeles        6,373   13,736.97      1,865    3,917.49
    WA Seattle            6,098   12,760.64      2,121    4,444.06
    CA San Francisco        439      936.51        149      327.33
"MySQL" format
|                          | 1997                                                |
|                          | Q1                       | Q2                      |
|                         |                          | 4                        |
|                         | Unit Sales | Store Sales | Unit Sales | Store Sales |
+-----+----+---------------+------------+-------------+------------+-------------+
| USA | CA | Los Angeles   |      6,373 |   13,736.97 |      1,865 |   3,917.49 |
|     | WA | Seattle       |      6,098 |   12,760.64 |     2,121 |    4,444.06 |
|     | CA | San Francisco |        439 |     936.51 |        149 |      327.33 |

7 comments:

Julian Hyde said...

By the way, if line-wrapping prevents you from seeing the tables properly, try reducing your browser's font size a bit.

rpbouman said...

Hi!

re #1, I just glanced through the spec (http://olap4j.svn.sourceforge.net/viewvc/olap4j/trunk/doc/olap4j_fs.html) and I didn't seen anything relating to text output formats of the CellSet, CellSetMetadata and friends (is this the level I should be thinking of?). So for that reason I would not mix it with the olap4jcode because it would not be clear to what part of the spec it relates.

That said, I think this is incredibly useful stuff!!! So it'd be really nice if you could configure mondrian to pick a particular formatter.

re #2, well, It seems that it would be reaaly nice if it'd be easy for other tools to do something with the text result. It seems that what you call "MySQL format" is most suitable for that, as you can easily parse the result by scanning for the pipe and dash characters.

(Many people will perhaps disagree, but I can imagine that for some purposes, a scriptable command-line style client that provides this kind of output is exactly the right tool for the job - not everybody is capable if writing their own formatter directly in java whereas there are plenty tools to deal with csv-like text data)

just my 2ct

Calin said...

Hi Julian,

I think the code belongs in olap4j, it's not Mondrian specific, it's useful also when connecting from a non-Mondrian olap4j client.

And I like the Mysql output better but both are very nice!

Thanks!

Calin Medianu

Julian Hyde said...

Roland,

CellSet would be the obvious place to put this, because you're formatting a CellSet after all, but I agree, it would also absolutely the wrong place to put it. (People often overdo the object-oriented programming: just because functionality relates to a particular class, it doesn't mean that the code should be glued onto that class.) CellSet is one of the core workhorse interfaces in olap4j (analogous to JDBC's ResultSet) so it should not be cluttered with extra functionality.

But even if it is not absolutely essential, it is useful to a lot of people, so I would like to package it with olap4j somehow. The org.olap4j.query package relates to forming and processing queries from an end-user's perspective, so I might add a Formatter class there.

Interesting point about the text result being parseable. I would not make this a goal of the rectangular output. I would create another format that is easily parseable and somewhat readable. In my opinion, the traditional format meets that goal; suggestions for other formats are welcome.

Julian

Luc Boudreau said...

It could be useful to include this code in Olap4j in a separate package, but instead of limiting this new package to a simple 'text display' function, why not create one to support MDX tables?

Let me explain...

I thought about that a few weeks ago while working on Pentaho Analysis Tool. As far as I know, there are no 'java wide' open source table model to work with multidimensional tables. Thomas Morgner might have coded one actually for his Report Designer, but last time I checked (versions < 2.0) there was no such thing.

It could be a great feature to have a common API for multidimensional tables. There is definitely a need for a multidimensional equivalent of Java's basic TableModel.

Insights?

Julian Hyde said...
This comment has been removed by the author.
Julian Hyde said...

Luc,

I totally agree. Laying out MDX results as grids is not trivial, so it should be common code.

I've moved the code from inner classes of mondrian's TestContext class to org.olap4j.query.CellSetFormatter and a couple of sub-classes. The RectangularCellSetFormatter is capable of quite complex formatting, including cellsets of which have any number of axes (not just the usual 2) and where some of the axes may be empty. Both of the styles I called 'MySQL' and 'Oracle' are supported.

The classes are marked 'experimental', which will give you or anyone else interested free reign to refactor into a multidimensional equivalent of Java's TableModel.

I put them in the org.olap4j.query package because a programmer-friendly model for displaying MDX results seems very closely related to the programmer-friendly model for forming queries.

Julian