Monday, May 05, 2014

Table macros

Table macros are a new Optiq feature (since release 0.6) that combine the efficiency of tables with the flexibility of functions.

Optiq offers a convenient model for presenting data from multiple external sources via a single, efficient SQL interface. Using adapters, you create a schema for each external source, and a table for each data set within a source.

But sometimes the external data source does not consist of a fixed number of data sets, known ahead of time. Consider, for example, Optiq’s web adapter, optiq-web, which makes any HTML table in any web page appear as a SQL table. Today you can create an Optiq model and define within it several tables.

Optiq-web’s home page shows an example where you can create a schema with tables “Cities” and “States” (based on the Wikipedia pages List of states and territories of the United States and List of United States cities by population) and execute a query to find out the proportion of the California’s population that live in cities:

SELECT COUNT(*) "City Count",

  SUM(100 * c."Population" / s."Population") "Pct State Population"

FROM "Cities" c, "States" s

WHERE c."State" = s."State" AND s."State" = 'California';
But what if you want to query a URL that isn’t in the schema? A table macro will allow you to do this:
SELECT * FROM TABLE(
  web(‘http://en.wikipedia.org/wiki/List_of_countries_by_population’));
web is a function that returns a table. That is, a Table object, which is the definition of a table. In Optiq, a table definition doesn’t need to be assigned a name and put inside a schema, although most do; this is a free-floating table. A table just needs to be able to describe its columns, and to be able to convert itself to relational algebra. Optiq invokes it while the query is being planned.

Here is the WebTableMacro class:

public class WebTableMacro {
  public Table eval(String url) {
    Map operands = new HashMap();
    operands.put(“url”, url);
    return new WebTable(operands, null);

  }

}
And here is how you define a WEB function based upon it in your JSON model:
{
  version: '1.0',
  defaultSchema: ‘ADHOC’,
  schemas: [
    {

      name: 'ADHOC',
      functions: [
        {

          name: ‘WEB’,
          className: 'com.example.WebTableMacro'
        }

      ]
    }
  ]


}
Table macros are a special kind of table function. They are defined in the same in the model, and invoked in the same way from a SQL statement. A table function can be used at prepare time if (a) its arguments are constants, and (b) the table it returns implements TranslatableTable. If it fails either of those tests, it will be invoked at runtime; it will still produce results, but will have missed out on the advantages of being part of the query optimization process.

What kind of advantages can the optimization process being? Suppose a web page that produces a table supports URL parameters to filter on a particular column and sort on another. We could write planner rules that push take a FilterRel or SortRel on top of a WebTableScan and convert them into a scan with extra URL parameters. A table that came from the web function would be able to participate in that process.

The name ‘table macros’ is inspired by Lisp macros — functions that are invoked at compile time rather than run time. Macros are an extremely powerful feature in Lisp and I hope they will prove to be a powerful addition to SQL. But to SQL users, a more familiar name might be ‘parameterized views’.

Views and table macros are both expanded to relational algebra before the query is optimized. Views are specified in SQL, whereas table macros invoke user code (it takes some logic to handle those parameters). Under the covers, Optiq’s views are implemented using table macros. (They always have been — we’ve only just got around to making table macros a public feature.)

To sum up. Table macros are powerful new Optiq feature that extend the reach of Optiq to data sources that have not been pre-configured into an Optiq model. They are a generalization of SQL views, and share with views the efficiency of expanding relational expressions at query compilation time, where they can be optimized. Table macros will help bring a SQL interface to yet more forms of data.

6 comments:

Vladimir Dolzhenko said...

The example seems a bit outdated:
eval method has to return a TranslatableTable instance to be a proper macro function.

There is a lack of documentation and examples - the most valuable is test cases. Is there any plans to cover the common cases in a wiki or document it somehow ?

Julian Hyde said...

It seems reasonable for it to be other sub-types of Table. I'm not sure why the return type was changed (although Vladimir probably had a good reason). I've logged https://issues.apache.org/jira/browse/CALCITE-584 to track this.

Yes, we need more documentation. A good place would be https://github.com/apache/incubator-calcite/blob/master/doc/TUTORIAL.md.

Vladimir Dolzhenko said...

Another question:

Is it possible to pass through a Map value like

TABLE(MYMACRO(MAP["key", "value"]))

finally it goes to SqlUserDefinedTableMacro.convertArguments but it expects to get just a literal (not a collection or map) and fails.

I got an advice from Vladimir Sitnokov to use a TableFunction - but that's just another way.

Julian Hyde said...

It is reasonable to want to pass in any constant expression, but Calcite does not currently support it. Please log a jira case.

In your example you'd have to replace double-quoted string with single-quoted strings. Double-quoted strings are identifiers (e.g. column names) and are not constant.

Vladimir Dolzhenko said...

sorry, it was a typo - in my test case it is single-quote.

Juan said...

Hi Julian.
Thank you very much for rendering Mondrian free!
Much appreciated.