Building a SQL interface isn't easy, because MongoDB's data model is such a long way from SQL's model. Here are some of the challenges:
- MongoDB doesn't have a schema. Each database has a number of named 'collections', which are the nearest thing to a SQL table, but each row in a collection can have a completely different set of columns.
- In MongoDB, data can be nested. Each row consists of a number of fields, and each field can be a scalar value, null, a record, or an array of records.
- MongoDB supports a number of relational operations, but doesn't use the same terminology as SQL: the
find
method supports the equivalent ofSELECT
andWHERE
, while theaggregate
method supports the equivalent ofSELECT
,WHERE
,GROUP BY
,HAVING
andORDER BY
. - For efficiency, it's really important to push as much of the processing down to MongoDB's query engine, without the user having to re-write their SQL.
- But MongoDB doesn't support anything equivalent to JOIN.
- MongoDB can't access external data.
I decided to tackle this using Optiq. Optiq already has a SQL parser and a powerful query optimizer that is powered by rewrite rules. Building on Optiq's core rules, I can add rules that map tables onto MongoDB collections, and relational operations onto MongoDB's
find
and aggregate
operators.What I produced is a effectively a JDBC driver for MongoDB. Behind it is a hybrid query-processing engine that pushes as much of the query processing down to MongoDB, and does whatever is left (such as joins) in the client.
Let's give it a try. First, install MongoDB, and import MongoDB's zipcode data set:
Log into MongoDB to check it's there:
Now let's see the same data via SQL. Download and install Optiq:
Optiq comes with a sample model in JSON format, and the sqlline SQL shell. Connect using the mongo-zips-model.json Optiq model, and use sqlline's
!tables
command to list the available tables.
$ ./sqlline sqlline> !connect jdbc:optiq:model=mongodb/target/test-classes/mongo-zips-model.json admin admin Connecting to jdbc:optiq:model=mongodb/target/test-classes/mongo-zips-model.json Connected to: Optiq (version 0.4.13) Driver: Optiq JDBC Driver (version 0.4.13) Autocommit status: true Transaction isolation: TRANSACTION_REPEATABLE_READ sqlline> !tables +------------+--------------+-----------------+---------------+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | +------------+--------------+-----------------+---------------+ | null | mongo_raw | zips | TABLE | | null | mongo_raw | system.indexes | TABLE | | null | mongo | ZIPS | VIEW | | null | metadata | COLUMNS | SYSTEM_TABLE | | null | metadata | TABLES | SYSTEM_TABLE | +------------+--------------+-----------------+---------------+Each collection in MongoDB appears here as a table. There are also the
COLUMNS
and TABLES
system tables provided by Optiq, and a view called ZIPS
defined in mongo-zips-model.json
.Let's try a simple query. How many zip codes in America?
Now a more complex one. How many states have a city called Springfield?
Let's use the SQL
EXPLAIN
command to see how the query is implemented.
sqlline> !set outputformat csv
sqlline> EXPLAIN PLAN FOR
. . . .> SELECT count(DISTINCT state) AS c FROM zips WHERE city = 'SPRINGFIELD';
'PLAN'
'EnumerableAggregateRel(group=[{}], C=[COUNT($0)])
EnumerableAggregateRel(group=[{0}])
EnumerableCalcRel(expr#0..4=[{inputs}], expr#5=['SPRINGFIELD'], expr#6=[=($t0, $t5)], STATE=[$t3], $condition=[$t6])
MongoToEnumerableConverter
MongoTableScan(table=[[mongo_raw, zips]], ops=[[<{city: 1, state: 1, _id: 1}, {$project ...}>]])
'
1 row selected (0.115 seconds)
The last line of the plan shows that Optiq calls MongoDB's find operator asking for the "city", "state" and "_id" fields. The first three lines of the plan show that the filter and aggregation are implemented using in Optiq's built-in operators, but we're working on pushing them down to MongoDB.
Finally, quit sqlline.
Optiq and its MongoDB adapter shown here are available on github. If you are interested in writing your own adapter, check out optiq-csv, a sample adapter for Optiq that makes CSV files appear as tables. It has own tutorial on writing adapters.
Check back at this blog over the next few months, and I'll show how to write views and advanced queries using Optiq, and how to use Optiq's other adapters.
23 comments:
Impressive work!
We're on a similar wavelength. Check out http://www.mongosql.com
Hi Julian
Great stuff, but I coudn't make it work.
See what I got - any help is welcome. Thanks! Enzo
sqlline> !connect jdbc:optiq:model=mongodb/target/test-classes/mongo-zips-model.json admin admin
java.lang.RuntimeException: Error instantiating JsonCustomSchema(name=mongo_raw)
at net.hydromatic.optiq.model.ModelHandler.visit(ModelHandler.java:110)
at net.hydromatic.optiq.model.JsonCustomSchema.accept(JsonCustomSchema.java:37)
at net.hydromatic.optiq.model.ModelHandler.visit(ModelHandler.java:71)
at net.hydromatic.optiq.model.ModelHandler.(ModelHandler.java:63)
at net.hydromatic.optiq.jdbc.Driver$1.onConnectionInit(Driver.java:61)
at net.hydromatic.optiq.jdbc.UnregisteredDriver.connect(UnregisteredDriver.java:127)
at sqlline.SqlLine$DatabaseConnection.connect(SqlLine.java:4650)
at sqlline.SqlLine$DatabaseConnection.getConnection(SqlLine.java:4701)
at sqlline.SqlLine$Commands.connect(SqlLine.java:3942)
at sqlline.SqlLine$Commands.connect(SqlLine.java:3851)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at sqlline.SqlLine$ReflectiveCommandHandler.execute(SqlLine.java:2810)
at sqlline.SqlLine.dispatch(SqlLine.java:817)
at sqlline.SqlLine.begin(SqlLine.java:699)
at sqlline.SqlLine.mainWithInputRedirection(SqlLine.java:441)
at sqlline.SqlLine.main(SqlLine.java:424)
Caused by: java.lang.ClassNotFoundException: net.hydromatic.optiq.impl.mongodb.MongoSchemaFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at net.hydromatic.optiq.model.ModelHandler.visit(ModelHandler.java:100)
... 18 more
0: jdbc:optiq:model=mongodb/target/test-class> !quit
Connection is already closed.
Oops. The recent changes in directory structure broke things. I've fixed them now. Pull the latest, remove .classpath.txt, and run sqlline again.
Note that the model path has changed. The line is now
!connect jdbc:optiq:model=mongodb/target/test-classes/mongo-zips-model.json admin admin
Hi..
Can this be used to connect to mongodb://localhost/databasename ??
Can it directly connect to mongoDB instead of a model file ?
What is the sqlline command ?
Can this be used to connect to mongodb://localhost/databasename ??
Yes. You can specify the host and database name in the model file.
The link to the model file was broken in the post. It is now https://github.com/julianhyde/optiq/blob/master/mongodb/src/test/resources/mongo-zips-model.json, and I have corrected the post.
Can it directly connect to mongoDB instead of a model file ?
Not at present, but it makes sense. I have logged https://github.com/julianhyde/optiq/issues/109.
What is the sqlline command ?
It is described in the post. "./sqlline" from the shell, followed by "!connect" from within sqlline.
I was trying this example on windows with Cygwin, however sqlline doesn't seem too work, I enter in the shell normally but then none of the commands do anything at all. It just gets stuck.
Any idea why?
Sounds like an issue with sqlline on Windows/cygwin. Can you log it, please.
Using the latest optiq (commit 5a4126 or later) you should be able to connect from both cygwin (using sqlline) and native windows (using sqlline.bat). If you still have errors, please log an issue.
Hi Julian,
Can we perform OLAP analysis using Mondrain OLAP4j after converting the collections on MongoDb to Tables
Yes. optiq-on-MongoDB is just a regular JDBC data source as far as Mondrian is concerned.
If I had a different JSON file I wanted to query, how would I go about generating a model file for it? Im interested in using optiq to query mongo databases and need a way to adapt optiq to a variety of JSON files.
If the data is outside MongoDB, you could use optiq-csv, to which we recently added support for JSON files.
But within MongoDB, I suggest that you load each file into a different MongoDB collection in the same MongoDB database. From Optiq each MongoDB collection automatically appears as a SQL table with a column called _MAP. (In the example, that table is called "mongo_raw"."zips".) You can create views on those tables, similar to the "mongo"."zips" view in the example, or you can query the raw table.
So does that mean there is a command I can execute via optiq-csv to generate a model around a .json file? I checked through the tutorial and readme and was unable to find a way to access data without having a model file built by hand first.
If all those .json files are in the same directory, optiq-csv's schema adapter will make them automatically appear in a schema as tables.
Building views onto those tables with columns of particular names and types is a manual task. You will need to edit the model file by hand.
If you have in mind a tool that would help further, log a jira case describing it.
I think this example may have lost functionality, when trying to select loc[0] or loc[1] the two floats return ass null in all rows
Please, Can you help me with the setup of optiq on my machine, i have a running mongoDB on it...............?
Krish,
Ask for help on the optiq dev list, http://mail-archives.apache.org/mod_mbox/incubator-optiq-dev/.
Hello,
it seems to problem with Mongo 2.6
0: jdbc:optiq:model=mongodb/target/test-class> !tables
+-----------+-------------+------------+------------+---------+----------+------------+-----------+---------------------------+----------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_CAT | TYPE_SCHEM | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION |
+-----------+-------------+------------+------------+---------+----------+------------+-----------+---------------------------+----------------+
com.mongodb.MongoException: not authorized for query on admin.system.namespaces
at com.mongodb.MongoException.parse(MongoException.java:82)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:292)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:273)
at com.mongodb.DB.getCollectionNames(DB.java:399)
at net.hydromatic.optiq.impl.mongodb.MongoSchema.getTableMap(MongoSchema.java:56)
at net.hydromatic.optiq.impl.AbstractSchema.getTableNames(AbstractSchema.java:79)
at net.hydromatic.optiq.jdbc.OptiqSchema$3.build(OptiqSchema.java:94)
at net.hydromatic.optiq.jdbc.OptiqSchema$3.build(OptiqSchema.java:91)
at net.hydromatic.optiq.jdbc.OptiqSchema$AbstractCached.get(OptiqSchema.java:664)
at net.hydromatic.optiq.jdbc.OptiqSchema.getTableNames(OptiqSchema.java:309)
at net.hydromatic.optiq.jdbc.MetaImpl.tables(MetaImpl.java:366)
at net.hydromatic.optiq.jdbc.MetaImpl.tables(MetaImpl.java:397)
at net.hydromatic.optiq.jdbc.MetaImpl$5.apply(MetaImpl.java:253)
at net.hydromatic.optiq.jdbc.MetaImpl$5.apply(MetaImpl.java:251)
at net.hydromatic.linq4j.EnumerableDefaults$8$1.moveNext(EnumerableDefaults.java:1468)
at net.hydromatic.linq4j.EnumerableDefaults$15$1.moveNext(EnumerableDefaults.java:2033)
at net.hydromatic.optiq.runtime.EnumeratorCursor.next(EnumeratorCursor.java:47)
at net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187)
at sqlline.IncrementalRows.hasNext(IncrementalRows.java:62)
at sqlline.TableOutputFormat.print(TableOutputFormat.java:33)
at sqlline.SqlLine.print(SqlLine.java:1653)
at sqlline.Commands.metadata(Commands.java:199)
at sqlline.Commands.tables(Commands.java:332)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
at sqlline.SqlLine.dispatch(SqlLine.java:804)
at sqlline.SqlLine.begin(SqlLine.java:681)
at sqlline.SqlLine.start(SqlLine.java:398)
at sqlline.SqlLine.main(SqlLine.java:292)
Julian just picked this up, looking for an efficient SQL interface in to my mongo db to fit in with an existing SQL reporting mechanism - question. Do I need to build the table models for all my collections, or does the driver do this. It was not clear from the readme whether the models where simply for views and custom tables only
BillOrm,
The driver automatically builds table models for all collections. In the example, mongo_raw.zips one such table. Because MongoDB does not record columns, these tables just have one column called _MAP.
Hi Julian
Can u please suggest me a way to connect mongodb and saiku using optiq...I'm unable to understand how optiq is working and help to analyse mongo data
Post a Comment