Building a SQL interface isn't easy, because MongoDB's data model is such a long way from SQL's model. Here are some of the challenges:
- MongoDB doesn't have a schema. Each database has a number of named 'collections', which are the nearest thing to a SQL table, but each row in a collection can have a completely different set of columns.
- In MongoDB, data can be nested. Each row consists of a number of fields, and each field can be a scalar value, null, a record, or an array of records.
- MongoDB supports a number of relational operations, but doesn't use the same terminology as SQL: the
findmethod supports the equivalent of
WHERE, while the
aggregatemethod supports the equivalent of
- For efficiency, it's really important to push as much of the processing down to MongoDB's query engine, without the user having to re-write their SQL.
- But MongoDB doesn't support anything equivalent to JOIN.
- MongoDB can't access external data.
I decided to tackle this using Optiq. Optiq already has a SQL parser and a powerful query optimizer that is powered by rewrite rules. Building on Optiq's core rules, I can add rules that map tables onto MongoDB collections, and relational operations onto MongoDB's
What I produced is a effectively a JDBC driver for MongoDB. Behind it is a hybrid query-processing engine that pushes as much of the query processing down to MongoDB, and does whatever is left (such as joins) in the client.
Let's give it a try. First, install MongoDB, and import MongoDB's zipcode data set:
Log into MongoDB to check it's there:
Now let's see the same data via SQL. Download and install Optiq:
Optiq comes with a sample model in JSON format, and the sqlline SQL shell. Connect using the mongo-zips-model.json Optiq model, and use sqlline's
!tablescommand to list the available tables.
Each collection in MongoDB appears here as a table. There are also the ./sqlline sqlline> !connect jdbc:optiq:model mongodb/target/test-classes/mongo-zips-model.json admin admin Connecting to jdbc:optiq:model mongodb/target/test-classes/mongo-zips-model.json Connected to: Optiq version 0.4.13 Driver: Optiq JDBC Driver version 0.4.13 Autocommit status: Transaction isolation: TRANSACTION_REPEATABLE_READ sqlline> !tables +------------+--------------+-----------------+---------------+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | +------------+--------------+-----------------+---------------+ | null | mongo_raw | zips | TABLE | | null | mongo_raw | system.indexes | TABLE | | null | mongo | ZIPS | VIEW | | null | metadata | COLUMNS | SYSTEM_TABLE | | null | metadata | TABLES | SYSTEM_TABLE | +------------+--------------+-----------------+---------------+
TABLESsystem tables provided by Optiq, and a view called
Let's try a simple query. How many zip codes in America?
Now a more complex one. How many states have a city called Springfield?
Let's use the SQL
EXPLAINcommand to see how the query is implemented.
The last line of the plan shows that Optiq calls MongoDB's find operator asking for the "city", "state" and "_id" fields. The first three lines of the plan show that the filter and aggregation are implemented using in Optiq's built-in operators, but we're working on pushing them down to MongoDB.
Finally, quit sqlline.
Optiq and its MongoDB adapter shown here are available on github. If you are interested in writing your own adapter, check out optiq-csv, a sample adapter for Optiq that makes CSV files appear as tables. It has own tutorial on writing adapters.
Check back at this blog over the next few months, and I'll show how to write views and advanced queries using Optiq, and how to use Optiq's other adapters.