Monday, April 23, 2012

A first look at linq4j

This is a sneak peek of an exciting new data management technology. linq4j (short for "Language-Integrated Query for Java") is inspired by Microsoft's LINQ technology, previously only available on the .NET platform, and adapted for Java. (It also builds upon ideas I had in my earlier Saffron project.)



I launched the linq4j project less than a week ago, but already you can do select, filter, join and groupBy operations on in-memory and SQL data.

In this demo, I write and execute sample code against the working system, and explain the differences between the key interfaces Iterable, Enumerable, and Queryable.

For those of you who want to get a closer look at the real code, here's one of the queries shown in the demo:
DatabaseProvider provider =
    new DatabaseProvider(Helper.MYSQL_DATA_SOURCE);
provider.emps
    .where(
        new Predicate1<Employee>() {
            public boolean apply(Employee v1) {
                return v1.manager;
            }
        })
    .join(
        provider.depts,
        new Function1<Employee, Integer>() {
            public Integer apply(Employee a0) {
                return a0.deptno;
            }
        },
        new Function1<Department, Integer>() {
            public Integer apply(Department a0) {
                return a0.deptno;
            }
        },
        new Function2<Employee, Department, String>() {
            public String apply(Employee v1,
                                Department v2) {
                return v1.name + " works in " + v2.name;
            }
        }
    )
    .foreach(
        new Function1<String, Void>() {
            public Void apply(String a0) {
                System.out.println(a0);
                return null;
            }
        }
    );
and here is its (not yet implemented) sugared syntax:
List<String> strings =
    from emp in provider.emps,
        join dept in provider.depts on emp.deptno == dept.deptno
    where emp.manager
    orderBy emp.name
    select emp.name + " works in " + dept.name;
For more information, visit the linq4j project's home page.

10 comments:

Marc said...

Looks pretty nice. Verbosity is an issue, though. I've always found all the various query builder APIs in Java to be so verbose that it makes it difficult to understand the query at a glance.

Have you considered how linq4j might work with Java 8's lambdas? Designing with lambdas in mind might make the query composition terser without having to resort to the extra pre/post-processing step that I assume will be needed for your "sugared syntax" example to work.

The most recent summary of how Lambda will be implement seems to be http://cr.openjdk.java.net/~briangoetz/lambda/lambda-state-4.html

Julian Hyde said...

I think that lambdas would make a big improvement to the conciseness. (It was not a coincidence that LINQ was introduced to C# at about the same time as lambdas.)

The API I am designing should interoperate perfectly with lambdas, when they arrive. (And I notice that my IDE, Intellij, has a form of code-folding that makes it look as if lambdas are already here.

Compiler support for the sugared syntax would improve conciseness and usability much more. I think a lot of people in the Java world would have loved to have had LINQ years ago, but thought it was too much work. The API, the core providers (e.g. the SQL provider) and compiler support are all of about the same magnitude in terms of effort. My hope is that if I solve the first two, then the compiler work is a reasonable effort/risk/benefit equation.

The only problem being that Oracle now runs Java, and they have zero interest in any data standard that they didn't invent.

rossjudson said...

I think you may find xtend to be quite worthwhile, in the context of linq4j. At first glance much of what you want to do can be expressed with it -- by this I mean that you could create the linq facility with it, and make it available easily to Java code. Within Eclipse xtend generates Java code, on the fly, to match your xtend source code.

Anonymous said...

programmatic queries like LINQ are just like JPA, difficult, hard to read as conditional logic becomes more complex. I think it is a no-go, waste of time

Julian Hyde said...

Anonymous,

Of course you're entitled to your opinion. (But if you'd like to have a civilized discussion, please use your real name next time.)

LINQ goes beyond programmatic queries, with the sugared syntax I describe in the last paragraph of the blog post. I believe that queries in that syntax have much in common with SQL syntax (in particular they grow well as logic becomes more complex). And unlike SQL queries embedded in strings, they are validated at compile time.

Then there are the possibilities of doing real, serious query optimization against SQL and non-relational data sources.

I agree with your comments about JPA. Believe me, if linq4j was just another Java persistence layer I wouldn't be building it. I really think that linq4j can advance the state of the art in both persistence layers and database query languages.

Julian

Paco SoberĂ³n said...

Just an question about orthography... shouldn't it be 'Queriable' instead of 'Queryable'? I mean, the same as 'reliable' for 'rely', 'appliable' for 'apply', and so on.

Julian Hyde said...

Paco,

You may be right. I can't find either "queriable" or "queryable" in a real dictionary. However, Microsoft used the term "Queryable" in LINQ, and I wanted linq4j to be as close as possible to the Microsoft API, so the question is moot.

This wouldn't be the first time that a mis-spelling was enshrined in computer science: see HTTP referer.

Julian

shailendra said...

Hello Julian,

I am Shailendra.
I am reading your linq4j API.
and I want to use your linq4j API.

Can you please tell me about.

In MYSQL :

MySQL FIND_IN_SET() returns the position of a string if it is present (as a substring) within a list of strings. The string list itself is a string contains substrings separated by ‘,’ (comma) character.


Do you implemet any method in your API that working like find_in_set() function.

Please reply me ASAP.

ThanKs! in advance

Julian Hyde said...

Shailendra,

When you access a collection using select or where you can pass in your own expression. So you can write your own implementation of find_in_set and use it on your data.

Unknown said...

Hi Julian,

The sugared syntax of the generated sql will be implemented dialect specific right?
I mean syntax for Declaration and While expression for BigQuery dialect will be something like this:


DECLARE heads BOOL;
DECLARE heads_count INT64 DEFAULT 0;

and

WHILE boolean_expression DO
sql_statement_list
END WHILE;

respectively.

Can you elaborate on this more?