Friday, February 04, 2011

Scalable caching in Mondrian

Wouldn't it be great if Mondrian's cache could be shared between several Mondrian instances, use memory outside the JVM or even across several machines, and scale as the data size or computation effort increases? That is the vision of Pentaho's "enterprise cache" initiative.

Mondrian cell-caching architecture, including pluggable external cache.

Luc Boudreau has been leading this effort, has just checked in the first revision of the new mondrian.rolap.agg.SegmentCache interface, and has written a blog post describing how it will work. (Note: This SPI is likely to change before we release it.)

Pluggable caching will be in Mondrian release 3.3, probably Q2 or Q3 this year.In the community edition will be the SPI and a default implementation that uses JVM memory. Of course the community will be able to contribute alternative implementations. In the enterprise edition of Mondrian 3.3, there will be scalable, highly manageable implementation based on something like Terracotta BigMemory, ehCache or JBoss Infinispan.

In future releases, you can expect to see further work in the area. Maybe alternative implementations of the caching SPI, and certainly tuning of Mondrian's caching and evaluation strategies, as we apply Mondrian to some of the biggest data sets out there.

7 comments:

Dipak said...

Hi Julian,

I am interested to know mondrian cache configuration settings. Is there any documentation on configuring EHCache in mondrian?

Chang Lim said...

Hi Julian,

I am on 3.4.1 community edition. Does the Community edition provide "Scalable caching in Mondrian" feature. In that article, you mentioned that the Community edition provides a default implementation that uses JVM memory. What are the differences between JVM Memory and, say, Memcached? Are there any instructions on how to go about getting things running in the community edition?

Thanks in advanced!
Chang

Julian Hyde said...

Check out CDC (Community Distributed Cache). It is based on Hazelcast and works quite nicely.

Don't expect detailed instructions on writing your own caching provider. See the javadoc on SegmentCache, and read the code.

Chang Lim said...

Hi Julian,

Using Pentaho Server and the CTools installer, I was able to configure Pentaho to use the Community Distributed Cache (CDC). This is really cool.

Now, I am trying to do a similar setup for just Mondrian without the use of the Pentaho Server. I have the Mondrain.war in my JBoss app server. I edited mondrain.properties and set "mondrian.rolap.SegmentCache=pt.webdetails.cdc.mondrian.SegmentCacheHazelcast" and have the CDC and related jar files in the class path. When JBoss is started and the Mondrian server is initializing, I get the following errors. I have the log4j debug turned on for "pt.webdetails" and there are no logs from the CDC classes. Any idea what configuration/setup I am missing? Does the standalone Mondrain.war support Custom SegmentCache SPI?

=================================
2012-10-19 22:31:31,757 DEBUG [mondrian.rolap.agg.SegmentCacheWorker] Segment cache initialized: mondrian.rolap.cache.MemorySegmentCache
2012-10-19 22:31:31,757 DEBUG [mondrian.rolap.agg.SegmentCacheWorker] Starting cache instance: pt.webdetails.cdc.mondrian.SegmentCacheHazelcast
. . .
2012-10-19 22:31:31,773 DEBUG [mondrian.rolap.agg.SegmentCacheWorker] Segment cache initialized: pt.webdetails.cdc.mondrian.SegmentCacheHazelcast
. . .
2012-10-19 22:31:31,773 ERROR [org.jboss.ejb.plugins.LogInterceptor] Unexpected Error in method: public abstract java.lang.Object com.vitria.component.server.beans.Administration.invokeService(java.lang.String,java.lang.String,java.lang.Object[]) throws com.vitria.component.api.ComponentException,java.rmi.RemoteException
java.lang.ExceptionInInitializerError
at mondrian.olap.MondrianServer.forId(MondrianServer.java:77)
at mondrian.olap.DriverManager.getConnection(DriverManager.java:98)
at mondrian.olap.DriverManager.getConnection(DriverManager.java:68)
. . .
. . .
Caused by: java.lang.NullPointerException
at pt.webdetails.cdc.mondrian.SegmentCacheHazelcast.getCache(SegmentCacheHazelcast.java:30)
at pt.webdetails.cdc.mondrian.SegmentCacheHazelcast.addListener(SegmentCacheHazelcast.java:71)
at mondrian.rolap.agg.SegmentCacheManager.(SegmentCacheManager.java:273)
at mondrian.rolap.agg.AggregationManager.(AggregationManager.java:58)
at mondrian.server.MondrianServerImpl.(MondrianServerImpl.java:172)
at mondrian.server.MondrianServerRegistry.createWithRepository(MondrianServerRegistry.java:184)
at mondrian.server.MondrianServerRegistry.(MondrianServerRegistry.java:48)
at mondrian.server.MondrianServerRegistry.(MondrianServerRegistry.java:33)
=================================

Thanks,
Chang

Julian Hyde said...

> Does the standalone Mondrain.war support Custom SegmentCache SPI?

Yes. (Though it helps if you spell mondrian.war correctly.)

That said, this blog is not the place to debug CDC issues.Try the CDC forum.

ctscubedeveloper said...

hi julian ,
we have implemented 6 cubes in a server and if we have 6 concurrent users for all these 6 cubes will the mondrain cache will be shared for all these 6 cubes at a time or the requests are served on the sequential order

Julian Hyde said...

ctscubedeveloper,

Yes, Mondrian should populate and access the cache in parallel.