Tuesday, February 16, 2010

Improved collections classes for Mondrian's query execution process

Mondrian calculations work predominantly over lists of members and tuples. Internally, Mondrian represents lists of members as List, and represents lists of tuples as List. (And similarly for iterators over members and tuples.)

There are two problems with this. First, the representation of tuple lists requires an array to be allocated for each element of the list. Allocations cost time and memory. (Granted, we could allocate temporary arrays only when tuples are accessed, which would cost only time. But according to my latest round of profiling, the effort of allocating lots small arrays is significant.)

Second, the code to deal with members and tuples has to be different. The most extreme example of this found in the implementation of CrossJoin. There are over 30 inner classes in CrossJoinFunDef.java, to deal with the permutations of iterator vs. list, mutable vs. immutable, and tuple vs. member.

In short, the java standard List and Iterator classes are not serving us well. I think it's appropriate to introduce classes/interfaces that handle members and tuples more uniformly, and can store, access, and iterate over collections without lots of small arrays being created.

Here are some collection classes that I think would serve the purpose:
interface TupleList {
    int size();
    int arity();
    Member getMember(int index, int ordinal);
    TupleIterator tupleIterator();
}

interface TupleIterator {
    int arity();
    boolean hasNext();
    // writes the members of the next tuple into given array
    void next(Member[] members);
    // appends members of the next tuple to given list
    void next(List<Member> members);
}
If arity = 1 (i.e. if the list is just a collection of members) then TupleList could easily be implemented using java.util.ArrayList.

For other arities, a list of tuples could be represented as a set of members end-to-end. For instance, the list with two 3-tuple elements {(A1, B1, C1), (A2, B2, C2)} would be held in a list {A1, B1, C1, A2, B2, C2} and getMember(index, ordinal) would read element index * arity + ordinal of the list.

Introducing these would require quite a few code changes, mostly in the mondrian.olap.fun package, which is where the builtin functions are implemented. There should be no changes to the user API or olap4j.

I am still debating whether this change makes sense. Usually this kind of penny-pinching architectural change doesn't pay off. But some of them pay off big. I've learned in Oracle, Broadbase, and SQLstream that for high-performance data processing you shouldn't be doing any memory allocations in an inner loop that is executed once per row. That isn't quite practical in Java, but it's a goal to strive for. In today's CPU architectures, where memory is slow and last-level-cache is fast, it pays to keep data contiguous.

If you are a Mondrian developer, I'd be interested to hear what you think about this proposed change.

9 comments:

Anonymous said...

Hi Julian,

In my experience of optimizing software, memory allocations, especially in inner loops are a big consumer of time.

My experience is C++ optimization, but I assume Java should benefit from it even more.

While it is better avoiding allocations/deallocations all together in tight loops, sometimes it is helpful to allocate big portions of memory before the inner loop. The less allocations needed, the better.

I always optimize memory allocations first before doing any other work of optimization.

Kind regards,
Bart Pappyn

Unknown said...

+1 on using specialized lists, maps, iterators etc.

In Kettle 2.x we used (Array) List to store rows of data and moved to stupid simple Object[] in 3.0. That gave in certain cases a ten-fold performance increase. Like Bart stated, it was the innermost loop. A very small performance increase on a single field bases resulted in a massive performance increase on the transformation level.

Another example would be the modified class we created to do caching for specific cases. Using a standard Map (HashMap, Hashtable) would consume over 3 times more memory and be a lot slower for let's say Long/Long couples (foreign key use-case). That is simply because you don't need to wrap everything in extra classes etc. We moved this to our own LongHashIndex class with very good results.

http://source.pentaho.org/viewvc/svnkettleroot/Kettle/trunk/src-core/org/pentaho/di/core/hash/LongHashIndex.java?view=markup

Memory consumption is not the only thing that killed the performance of Kettle 2.x either. Because a lot of the collection classes wrap everything in other classes (Key/Value, links, etc) you get massive object allocation and garbage collection afterwards. In these cases where we had huge amounts of short-lived objects, the garbage collector was going crazy and spent a lot of time.

IMHO doing it the "Java" way by sticking to the standard collections bites you because you are in effect exposing the weaknesses of the JRE. It makes total sense to try and wiggle your way out of these situations ;-)

Cheers,
Matt

Julian Hyde said...

Matt & Bart,

Thanks for the support. I'm glad I'm not the only one fighting against the Java runtime.

Actually I'm still a big fan of Java collections... for the 98% of my code that is not performance-critical. I'm sure I'll find a way to add adapters to make my data structures look like Java collections when they need to.

Matt, I spotted a slight performance bug. LongHashIndexEntry has a constructor parameter 'Long key' and a member 'long key' and is only ever called with value 'long key'. So, whenever you insert an entry into the hash table, the long key value is boxed and unboxed for no reason. Change the parameter to 'long key' and I think you'll save a memory allocation.

I'll consider using that class. I have also been looking at apache-commons-primitives; they have some useful collections implemented using primitive values, not objects; and they are wise to issues like map iterators creating an Entry each step. Unfortunately the project doesn't seem to have been that active recently.

Julian

Eric McDermid said...

I can't really comment on how much the specific optimizations you're looking at will change things, but I do think that in some cases it is definitely worth it to find or build collections better optimized for Mondrian's needs. The standard JDK implementations seem to be chosen for what the developers thought would be typical use, which doesn't always match up with what Mondrian actually does.

An example of this is the performance of ArrayList.removeAll(), as documented by Amin Ahmad.

One place this is of use is in the makeIterable() methods of some of the IterCalcs defined in FilterFunDef. They use the pattern of creating an ArrayList, then iterating over it removing items one at a time. An alternative implementation is to create a new HashMap, add the items to be removed to it, then remove them all at once using removeAll().

With the standard version of ArrayList, the choice is pretty much a tossup, and if anything the removeAll() approach is actually slightly slower. With a version optimized along the lines of Amin's suggestions, however, the removeAll() is significantly faster, in some cases cutting overall query execution time in half.

(FYI, the reason this change hasn't been added to the Mondrian mainline yet is that due to the need to access private members one can't just subclass ArrayList to override removeAll(). Thus, both Amin's version and the slightly different one I inherited are based on copies of Sun's source, presenting license compatibility issues -- and almost certainly "tainting" my knowledge so that I can't do a true clean room implementation of my own.)

Julian Hyde said...

Eric,

Are you seeing performance problems with removeAll? Mondrian only uses it in one place: MinusStarPredicate.values(Collection).

I'm more concerned with clear(). ArrayList.clear() is O(n) because it zeroes out entries in the backing array to make the referenced objects available for GC. That's wasted effort if we are using the same ArrayList repeatedly as scratch space. We could maybe have a TemporaryArrayList that doesn't bother to clear and is therefore O(0).

Since you bring up the performance of the Filter function applied to iterables, I'd like to ask a broader question: what do you all think are the strengths and weaknesses of the current evaluation system based on mondrian.calc.Calc and its subclasses. Some issues that spring to mind:

1. Is the dual implementation strategy, epitomized by Calc subclasses MemberListCalc, TupleListCalc, MemberIterCalc, TupleIterCalc, useful? Could we do all processing on iterators, and build lists only when we need them?

2. Are the so-called "high-cardinality" collections useful? They make iterators look like lists, but give an error if you try to backtrack more than a few entries or to start a scan from the start. In my opinion they are neither fish nor fowl, and we'd be better moving to straight iterators for collectios too large to fit into memory.

3. Mutable lists. In some circumstances a mutable list is useful (think Filter) but if the natural implementation of the operator returns an immutable list (e.g. getting the list of a member's children from the cache) there is an extra expense making a mutable copy. The current system the caller of an operator to choose, at expression compilation time, whether they want a mutable or immutable list. It works nicely, but it is extra complexity; is this complexity justified?

4. Evaluator.evaluate() returns an Object. This causes boxing/unboxing of int and double values. It also means that an operator, say '+', doesn't know at compile time whether it is going to do use int or double operations, so has to do 'instanceof' to find out which. One advantage of using Object values is easy representation of null, empty and error values.

Let me know what you think.

Julian

Eric McDermid said...

Julian,

I'm not currently seeing an issue with ArrayList.removeAll(). As you point out, Mondrian rarely uses it.

There are, however, places where Mondrian iterates through a list, removing individual members from that list one at a time:

members = new ArrayList(members);
Iterator it = members.iterator();
while (it.hasNext()) {
Member member = it.next();
evaluator2.setContext(member);
if (! bcalc.evaluateBoolean(evaluator2)) {
it.remove();
}
}

An alternate but equally valid approach would be the following:

members = new ArrayList(members);
Iterator it = members.iterator();
Collection toRemove = new HashSet();
while (it.hasNext()) {
Member member = it.next();
evaluator2.setContext(member);
if (! bcalc.evaluateBoolean(evaluator2)) {
toRemove.add(member);
}
}
members.removeAll(toRemove);

(Apologies for the loss of formatting. I can't seem to find a tag to preserve it that is also permitted in blogger comments.)

With the standard JDK implementation of ArrayList, the first approach is preferable, since the extra overhead in the second results in slightly lower performance.

With a custom version of ArrayList that has a more optimal implementation of removeAll(), however, the two techniques no longer have equivalent performance, making the second approach a much better choice.

Julian Hyde said...

I agree, the first approach is sub-optimal. On an ArrayList, it has O(n^2) running time (which would happen if every other element were eliminated by the filter condition).

Your second approach doesn't work, because MDX 'sets' can contain multiple copies of the same member.

Probably better to rebuild the list with the members that you DO want. Or use an iterator. :)

Blogger doesn't make it easy to write code in blog posts. The only approach I've found to formatting code is to insert &nbsp; elements like this:

a line
  indented
    further indented

Eric McDermid said...

Hrm... I must be missing a key concept, then. I'd have assumed that in this case (the code snippet is from FilterFunDef) multiple copies of the same member would each still evaluate to the same boolean result.

It does pass the Mondrian regression suite, not that that's a guarantee of anything.

Julian Hyde said...

I stand corrected. Your 2nd approach would work for Filter. It seems strange removing objects by identity; it's more natural -- and, usually, efficient -- to reference them by position. Your approach wouldn't work for the Tail function, for example.