Tuesday, May 31, 2011

Scripted plug-ins in LucidDB and Mondrian

I saw a demo last week of scripted user-defined functions in LucidDB, and was inspired this weekend to add them to Mondrian.

Kevin Secretan of DynamoBI has just contributed some extensions to LucidDB to allow you to call script code (such as JavaScript or Python) in any place where you can have a user-defined function, procedure, or transform. This feature builds on a JVM feature introduced in Java 1.6, scripting engines.

Scripted functions may be a little slower than Java user-defined functions, but what they lose in performance they more than make up in flexibility. Writing user-defined functions in Java has always been laborious: you need to write a Java class, compile it, put it in a jar, put the jar on the server's class path, and restart the server. Each time you find a bug, you need to repeat that process, and that can easily take a number of minutes each cycle. Because scripted functions are compiled on the fly, you can cycle faster, and spend more of your valuable time working on the actual application.

I am speaking about LucidDB (and SQLstream) here, but the same problems exist for Mondrian plug-ins. Scripting is an opportunity to radically speed up development of application extensions, because everything can be done in the schema file. (Or via the workbench... but that part isn't implemented yet.)

Mondrian has several plug-in types, all today implemented using a Java SPI. I chose to make scriptable those plug-ins that are defined in a mondrian schema file: user-defined function, member formatter, property formatter, and cell formatter. A small syntax change to the schema file allowed you to chose whether to implement these plug-ins by specifying the name of a Java class (as before) or an inline script.

As an example, here is the factorial function defined in JavaScript:

<UserDefinedFunction name="Factorial">
  <Script language="JavaScript">
    function getParameterTypes() {
      return new Array(new mondrian.olap.type.NumericType());
    }
    function getReturnType(parameterTypes) {
      return new mondrian.olap.type.NumericType();
    }
    function execute(evaluator, arguments) {
      var n = arguments[0].evaluateScalar(evaluator);
      return factorial(n);
    }
    function factorial(n) {
      return n <= 1 ? 1 : n * factorial(n - 1);
    }
  </Script>
</UserDefinedFunction>

A user-defined function ironically requires several functions in order to provide the metadata needed by the MDX type system. The member, property and cell formatters are simpler. They require just one function, so mondrian dispenses with the function header, and requires just the 'return' expression inside the Script element. For example, here is a member formatter:

<Level name="name" column="column">
  <MemberFormatter>
    <Script language="JavaScript">
      return member.getName().toUpperCase();
    </Script>
  </MemberFormatter>
</Level>

You can of course write multiple statements, if you wish. Since JavaScript is embedded in the JVM, your code can call back into Java methods, and use the full runtime Java library.

There are examples of cell formatters and property formatters in the latest schema guide.

If you are concerned about performance, you could always translate this code back to a Java UDF when it is fully debugged. However, you might be pleasantly surprised by the performance of JavaScript: I was able to invoke a script function about 20,000 times per second. And I hear that there is a Janino "scripting engine" that compiles Java code into bytecode on the fly. In principle, it should be as fast as a real Java UDF.

I'd love to hear about Janino, or in fact any other scripting engine, with the Mondrian or LucidDB scripted functions.

By the way, you can expect to see scripted functions in a release of SQLstream not too far in the future. The Eigenbase project makes it easy to propagate features between projects, and this feature is too good not to share.

5 comments:

Daniel Lemire said...

Brainstorming: what about an equivalent to olap4j in JavaScript?

Julian Hyde said...

I would love to do an equivalent ot olap4j in JavaScript. Let's call it 'olap4js'. I've already spoken to Roland Bouman (who wrote xmla4js) and James Dixon (who is Pentaho's chief geek, but more importantly, in this context, wrote olap4j's query model) and they both think it is a good idea.

I think it could be a community effort. My contribution -- if the other olap4js contributors think it makes sense -- would be to translate (or otherwise hack) the olap4j test suite so that it runs on olap4js. Then it's a small matter of coding-until-the-test-suite-passes.

I should write a blog post about it. Roland, James, Would you like to do a blog post explaining what you see as the advantages of olap4js?

Daniel Lemire said...

I say you should write a blog post about it.

Julian Hyde said...

Thought you'd say that! ;)

Raimonds said...

I created JavaScript user defined functions and formatter support in mondrian-olap JRuby gem, see example

But then I thought that user defined functions in CoffeeScript would be even nicer. As I do precompilation to JavaScript in this case I can even do parameters and return type definition much simpler. See example

As a result Factorial sample is looking much nicer :)

parameters: ["Numeric"]
returns: "Numeric"
execute: (n) -> if n <= 1 then 1 else n * @execute(n - 1)