tag:blogger.com,1999:blog-5672165237896126100.post2216338124335761403..comments2022-03-27T08:59:33.430-07:00Comments on Julian Hyde on Streaming Data, Open Source OLAP. And stuff.: OLAP change notification, and the CellSetListener APIJulian Hydehttp://www.blogger.com/profile/17816795169191026372noreply@blogger.comBlogger9125tag:blogger.com,1999:blog-5672165237896126100.post-81004898731567217812010-07-21T18:22:34.210-07:002010-07-21T18:22:34.210-07:00Julian we gave a first look at the new Olap4J real...Julian we gave a first look at the new Olap4J real-time interfaces. We obviously need to spend more time and possibly implement those in an extension of the XMLA driver implementation. But it looks fine, more ambitious actually than what we currently do. We currently support updates of cells within the current cell set, but when the cell set boundaries change (like a new member appearing on an axis) we reexecute the query.<br /><br />I want to get back to you next week when I am back from my current Hong-Kong business trip.<br /><br /><br /><br />Daniel, ActivePivot is a commercial software and I cannot freely drop code like you mentionned. But we are very interested in working on practical projects/evaluations with academic partners. I am myself based in Paris and already have partnerships going on with Supélec and Télécom Paris french engineering schools. And QuartetFS also has offices in London, New-York and Singapour.<br /><br />Please send me an email at my QuartetFS address if you want to talk about that ( ach@quartetfs.com )<br /><br /><br /><br />By the way my name is Antoine ;) I don't remember why "Papa" was my registered nickname in blogger.Antoine CHAMBILLEhttps://www.blogger.com/profile/12077093427393105217noreply@blogger.comtag:blogger.com,1999:blog-5672165237896126100.post-30988069675308248712010-07-19T11:22:12.539-07:002010-07-19T11:22:12.539-07:00The olap4j API is checked in already.
Yes. Which ...<em>The olap4j API is checked in already.</em><br /><br />Yes. Which is why I think there is a enough meat for an academic project.Daniel Lemirehttps://www.blogger.com/profile/01566622051558391310noreply@blogger.comtag:blogger.com,1999:blog-5672165237896126100.post-697397524887478052010-07-19T11:00:34.594-07:002010-07-19T11:00:34.594-07:00Daniel,
The olap4j API is checked in already.
I ...Daniel,<br /><br />The olap4j API is checked in already.<br /><br />I was thinking of a simple (brute force) implementation in mondrian (at first). For each statement being watched, store the results on the server side. Each time there is a notification from the CacheControl API, re-execute these statements, compare the results, and notify the client if there is a difference. For extra credit, compute that difference.<br /><br />Plenty of ways to improve this implementation... each of which could be a paper...<br /><br />JulianJulian Hydehttps://www.blogger.com/profile/17816795169191026372noreply@blogger.comtag:blogger.com,1999:blog-5672165237896126100.post-15936092480328436812010-07-19T10:52:29.040-07:002010-07-19T10:52:29.040-07:00@Papa (and @Julian)
I've convinced a student ...@Papa (and @Julian)<br /><br />I've convinced a student to start working on real-time OLAP in the context of proposed API upgrade. <br /><br />Any chance you might share prototypical code? Or anything at all?<br /><br />I don't think she'll have time to produce a full prototype, but she might be able to run some experiments and produce some feedback.<br /><br />Without any software, her work might be a bit more conceptual and maybe less useful.Daniel Lemirehttps://www.blogger.com/profile/01566622051558391310noreply@blogger.comtag:blogger.com,1999:blog-5672165237896126100.post-64395697388716828042010-07-19T10:46:31.303-07:002010-07-19T10:46:31.303-07:00@Papa:
Thanks for the kind words. I'd be grat...@Papa:<br /><br />Thanks for the kind words. I'd be grateful if you could carefully review the proposed olap4j API. Since you have already implemented this support, you probably have a good idea what works.<br /><br />Also, can you drop me an email -- I'd like to see if there are any other ways we can work together.<br /><br />JulianJulian Hydehttps://www.blogger.com/profile/17816795169191026372noreply@blogger.comtag:blogger.com,1999:blog-5672165237896126100.post-20710387140892616602010-07-19T03:29:08.081-07:002010-07-19T03:29:08.081-07:00I find this post rather exciting, as real-time ola...I find this post rather exciting, as real-time olap was the initial decision for our ActivePivot software ( http://www.quartetfs.com/activepivot ).<br /><br />We materialize low level aggregates in the main memory, index them with compressed bitmaps also in the main memory so that higher level aggregates can be computed in milliseconds. The result is a set of fully incremental data structures that support fast throughputs of transactionnal updates and can be queried multidimensionnally. And a subscription engine is built on top of that to register MDX queries and push updated cells to subscribers.<br /><br />We implement MDX and XMLA but had to develop some proprietary push protocol, along with a custom web frontend to show those green/red blinking cells. Now it looks like we may incorporate this layer behind Olap4J interfaces, a library used by several of our customers.<br /><br />It is good to see a significant step towards real-time OLAP.<br /><br /><br /><br />I will also use this comment as an opportunity to greet and thank you Julian for the continuous innovation and excellence of the Mondrian project, and Daniel for your inspiring work on bitmap indexes.Antoine CHAMBILLEhttps://www.blogger.com/profile/12077093427393105217noreply@blogger.comtag:blogger.com,1999:blog-5672165237896126100.post-85159081239837087432010-06-18T12:04:30.780-07:002010-06-18T12:04:30.780-07:00I do not agree that static data and materializatio...<em>I do not agree that static data and materialization are part of the definition of OLAP.</em><br /><br />Agreed but Jim Gray's data cube is all about materialization (go back and check his original data cube paper).<br /><br />While this is not Mondrian's business, you can probably get a lot of work done just by keeping the fact table in RAM, without pre-materializing permanently a lot of views. Qlikview seems to be doing it well.Daniel Lemirehttps://www.blogger.com/profile/01566622051558391310noreply@blogger.comtag:blogger.com,1999:blog-5672165237896126100.post-90612575653271129002010-06-18T10:37:23.405-07:002010-06-18T10:37:23.405-07:00Speaking as the designer of the olap4j API, I don&...Speaking as the designer of the olap4j API, I don't need to avoid the performance pitfalls; I just provide an API so that ingenious OLAP engine designers can provide the latest data.<br /><br />But speaking as the designer of the Mondrian engine, here's how I'd do it. By the way, I do not agree that static data and materialization are part of the definition of OLAP. OLAP is a multidimensional view of data, and short query response times. Static data, materialization, and for that matter star schemas, have been a pragmatic best practice in achieving those goals but I don't see them as essential.<br /><br />Materialization has a high cost if it occurs on disk. If there are N materialized views on top of each row in a fact table, and they are stored on disk, then updating that row will occur N+1 disk block writes.<br /><br />If you batch up updates over a period of time, then updates for multiple rows will tend to hit the same disk blocks, and you can save some effort.<br /><br />But the story is radically different if the materialization occurs in memory. Memory writes do not have a significant cost (unless you are processing hundreds of thousands of updates per second) so you do not need to worry about keeping N, the number of aggregates small. You can have lots of materializations of the fact table, at different granularities, and the main cost is memory.<br /><br />Results from an OLAP engine are always materialized in memory at least once; namely the cell from which the OLAP engine generates its result. If the transactional system notifies the OLAP engine of each transaction, the OLAP engine can modify its cache and all aggregations in it.<br /><br />This picture seems to work nicely for MOLAP, where all data is stored in multidimensional format, and less well for ROLAP, where the primary source of data is relational and data is stored in multidimensional format, in memory, only fleetingly.<br /><br />But it is not relational format that makes materialized aggregations start to perform poorly; it is storage on disk. If a MOLAP engine has more data in multidimensional arrays than it has memory, those updating those arrays will require a write to disk, and the only efficient way to do that is to batch up many updates into the same disk write. That implies that what is on disk has to lag behind the latest transactional state.<br /><br />Putting all this together, we arrive at an architecture where the OLAP engine has a cache that is up to date with the transactional system, and contains multiple in-memory aggregates at different granularities. If it is a ROLAP system, like Mondrian, then the data ultimately comes to rest in the relational database, but the relational database will lag behind the latest transactional state. The OLAP engine serves as a view for the application, serving up the most up to date data, because it knows which parts of the system are up to date.Julian Hydehttps://www.blogger.com/profile/17816795169191026372noreply@blogger.comtag:blogger.com,1999:blog-5672165237896126100.post-481934021506628262010-06-18T05:27:08.663-07:002010-06-18T05:27:08.663-07:00Typically, OLAP assumes static data and uses mater...Typically, OLAP assumes static data and uses materialization to speed up results. One might say that's almost part of the definition of OLAP.<br /><br />So, how do you avoid performance pitfalls?Daniel Lemirehttps://www.blogger.com/profile/01566622051558391310noreply@blogger.com