Optiq offers a convenient model for presenting data from multiple external sources via a single, efficient SQL interface. Using adapters, you create a schema for each external source, and a table for each data set within a source.
But sometimes the external data source does not consist of a fixed number of data sets, known ahead of time. Consider, for example, Optiq’s web adapter, optiq-web, which makes any HTML table in any web page appear as a SQL table. Today you can create an Optiq model and define within it several tables.
Optiq-web’s home page shows an example where you can create a schema with tables “Cities” and “States” (based on the Wikipedia pages List of states and territories of the United States and List of United States cities by population) and execute a query to find out the proportion of the California’s population that live in cities:
SELECT COUNT(*) "City Count",But what if you want to query a URL that isn’t in the schema? A table macro will allow you to do this:
SUM(100 * c."Population" / s."Population") "Pct State Population"
FROM "Cities" c, "States" s
WHERE c."State" = s."State" AND s."State" = 'California';
SELECT * FROM TABLE(
web(‘http://en.wikipedia.org/wiki/List_of_countries_by_population’));
web
is a function that returns a table. That is, a Table
object, which is the definition of a table. In Optiq, a table definition doesn’t need to be assigned a name and put inside a schema, although most do; this is a free-floating table. A table just needs to be able to describe its columns, and to be able to convert itself to relational algebra. Optiq invokes it while the query is being planned.
Here is the WebTableMacro
class:
public class WebTableMacro {And here is how you define a WEB function based upon it in your JSON model:
public Table eval(String url) {
Mapoperands = new HashMap ();
operands.put(“url”, url);
return new WebTable(operands, null);
}
}
{Table macros are a special kind of table function. They are defined in the same in the model, and invoked in the same way from a SQL statement. A table function can be used at prepare time if (a) its arguments are constants, and (b) the table it returns implements
version: '1.0',
defaultSchema: ‘ADHOC’,
schemas: [
{
name: 'ADHOC',
functions: [
{
name: ‘WEB’,
className: 'com.example.WebTableMacro'
}
]
}
]
}
TranslatableTable
. If it fails either of those tests, it will be invoked at runtime; it will still produce results, but will have missed out on the advantages of being part of the query optimization process.
What kind of advantages can the optimization process being? Suppose a web page that produces a table supports URL parameters to filter on a particular column and sort on another. We could write planner rules that push take a FilterRel
or SortRel
on top of a WebTableScan
and convert them into a scan with extra URL parameters. A table that came from the web
function would be able to participate in that process.
The name ‘table macros’ is inspired by Lisp macros — functions that are invoked at compile time rather than run time. Macros are an extremely powerful feature in Lisp and I hope they will prove to be a powerful addition to SQL. But to SQL users, a more familiar name might be ‘parameterized views’.
Views and table macros are both expanded to relational algebra before the query is optimized. Views are specified in SQL, whereas table macros invoke user code (it takes some logic to handle those parameters). Under the covers, Optiq’s views are implemented using table macros. (They always have been — we’ve only just got around to making table macros a public feature.)
To sum up. Table macros are powerful new Optiq feature that extend the reach of Optiq to data sources that have not been pre-configured into an Optiq model. They are a generalization of SQL views, and share with views the efficiency of expanding relational expressions at query compilation time, where they can be optimized. Table macros will help bring a SQL interface to yet more forms of data.
6 comments:
The example seems a bit outdated:
eval method has to return a TranslatableTable instance to be a proper macro function.
There is a lack of documentation and examples - the most valuable is test cases. Is there any plans to cover the common cases in a wiki or document it somehow ?
It seems reasonable for it to be other sub-types of Table. I'm not sure why the return type was changed (although Vladimir probably had a good reason). I've logged https://issues.apache.org/jira/browse/CALCITE-584 to track this.
Yes, we need more documentation. A good place would be https://github.com/apache/incubator-calcite/blob/master/doc/TUTORIAL.md.
Another question:
Is it possible to pass through a Map value like
TABLE(MYMACRO(MAP["key", "value"]))
finally it goes to SqlUserDefinedTableMacro.convertArguments but it expects to get just a literal (not a collection or map) and fails.
I got an advice from Vladimir Sitnokov to use a TableFunction - but that's just another way.
It is reasonable to want to pass in any constant expression, but Calcite does not currently support it. Please log a jira case.
In your example you'd have to replace double-quoted string with single-quoted strings. Double-quoted strings are identifiers (e.g. column names) and are not constant.
sorry, it was a typo - in my test case it is single-quote.
Hi Julian.
Thank you very much for rendering Mondrian free!
Much appreciated.
Post a Comment