Julian Hyde on Streaming Data, Open Source OLAP. And stuff.: Pesky quoted identifiers in SQL

Sunday, June 10, 2012

Pesky quoted identifiers in SQL

The SQL that Mondrian generates is, until now, different than the SQL that most people would write by hand. Most people don't use spaces or punctuation in table and column names, and don't enclose identifiers in quotation marks when writing SQL DDL, DML or queries. Mondrian, on the other hand, religiously quotes every identifier, whether it needs it or not.

The two styles are not compatible because on many databases (Oracle is one example) unquoted identifiers are implicitly converted to upper-case. If you use lower-case table and column names in Mondrian's schema, they will not match the upper-case identifiers created during DDL.

For instance, if you create a table in Oracle using

CREATE TABLE emp ( empno INTEGER, ename VARCHAR2(30), deptno INTEGER);

then Oracle creates a table called EMP with columns EMPNO, ENAME and DEPTNO. When you query it using

SELECT ename FROM emp WHERE deptno = 20;

the effect is as if you had written

SELECT ENAME FROM EMP WHERE DEPTNO = 20;

Now, if you'd told Mondrian that the table was called "emp", Mondrian tries to be helpful. It generates the query

SELECT "ename" FROM "emp" WHERE "deptno" = 20;

Of course, there is no table called "emp", only one called "EMP", so on case-sensitive databases such as Oracle this causes an error. You then need to go back to your schema and change

<Table name="emp"/>

<Table name="EMP"/>

and all other table and column names in your schema. Yuck!

There is now a simpler way. The Schema XML element has a quoteSql attribute:

<Schema name='FoodMart' metamodelVersion='4.0' quoteSql='false'>

If you set quoteSql='false', Mondrian will not quote identifiers when generating SQL. (Actually, it will still quote them if they contain spaces and such. But we recommend that if you use quoteSql='false', you use sensible table names containing only alphanumeric characters and '_'.)

More details can be found in MONDRIAN-887. It is only fixed in the lagunitas branch (i.e., mondrian-4.0 alpha), and only in new-style schemas (not mondrian-3 style schemas automatically upgraded). Give it a try and let me know how it works for you.

3 comments:

Marc said...: I've run into this issue as well. I'm a little OCD about having my machine-generated SQL be at close to what a human would write as possible, so I try to always eliminate quotes whenever I can.

One trick I use is to check the case of the identifiers I see and if it matches the driver's storesUpperCaseIdentifiers/storesLowerCaseIdentifiers/storesMixedCaseIdentifiers metadata value (and assuming it isn't a SQL keyword), then I add it as an unquoted string to the generated SQL.; 6/10/2012 4:28 PM
Dan said...: The DB world sure has got itself into a mess with this situation. Our oltp application quotes everything on create, so all the tables in Oracle are lower case. So if you dont quote then it doesnt find the objects, as oracle assumes uppercase! arrgh!; 6/11/2012 12:30 AM
Julian Hyde said...: Dan,

Mondrian by default works in exactly the same way as your OLTP application. Now at least you can make it work in a different way. If you want.

Marc,

Yeah, I'm going to have to get my hands dirty with those storesUpperCaseIdentifiers etc. JDBC methods, for the JDBC stats provider. If the column is called "empNo", I will need to know whether the column would be called "EMPNO", "empno" or "empNo" in the catalog. I wonder how many JDBC drivers are actually correct in those methods. We're just about to find out.

Julian; 6/11/2012 11:19 AM