<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5672165237896126100</id><updated>2012-01-16T09:51:40.082-08:00</updated><category term='pentaho mondrian gartner'/><category term='vista fail'/><category term='itsa traffic gps sensors'/><category term='sql standard extensions streaming'/><category term='linux thread scheduler'/><category term='streaming web rss twitter'/><category term='jpivot trend arrows mondrian'/><category term='mondrian architecture olap4j xmla'/><category term='ubuntu intrepid ibex fennel eigenbase'/><category term='olap4j yellowfin'/><category term='sql oracle db2 nosql &quot;one size fits all&quot;'/><category term='mondrian mdx etl pentaho analyzer'/><category term='mondrian olap4j eigenbase luciddb pentaho beer'/><category term='olap query optimization mondrian native SQL'/><category term='olap4j palo &quot;sap bw&quot; ssas mondrian jedox pentaho'/><category term='pentaho analyzer jpivot pat open source olap viewer'/><category term='baby sebastian beer camra'/><category term='mondrian explain plan mdx profiling'/><category term='sqlstream mondrian stream sql olap etl'/><category term='pivot olap4j gwt'/><category term='mondrian mdx formatting olap4j'/><category term='mondrian luciddb sqlstream javascript udf'/><category term='cwm mondrian metadata xmi mof'/><category term='pentaho puc aggregate designer mondrian summary table'/><category term='openmrs mondrian kettle pentaho'/><category term='data privacy realtime'/><category term='ggro hawkwatch'/><category term='sqlstream realtime bi'/><category term='scalable olap ehcache terracotta jboss infinispan'/><category term='mondrian 4.0 xml schema unit test'/><category term='lemon pie chart'/><category term='mondrian kettle pentaho oem training'/><category term='sqlstream twitter'/><category term='ggro raptor migration'/><category term='mondrian jdbc dialect compatibility contributions'/><category term='mondrian mdx parsing'/><category term='mondrian olap writeback splash olap4j palo jpalo'/><category term='olap4j java olap analysis api'/><category term='openmrs pri bbc'/><category term='olap4j beta production'/><category term='gigaom structureconf nosql bigdata sqlstream'/><category term='virtualization etl cdc olap streaming sql esp'/><category term='mondrian physical schema bnf xsd clapham'/><category term='webinar mysql mondrian continuous data integration'/><category term='oracle openworld sqlstream aeturnum mpp smp streaming sql etl'/><category term='xmla microsoft olap4j &quot;native xml web services&quot; &quot;sql server 2008&quot;'/><category term='pat pentaho analysis tool olap4j slice dice jpivot'/><category term='gis olap'/><category term='sqlstream event-driven marketing customer experience'/><category term='olap4j streaming notification'/><category term='recycling'/><category term='maven ivy mondrian'/><category term='ggro raptor hawk hill rss'/><category term='streaming sql'/><category term='sqlstream monitoring tivoli'/><category term='mondrian cache calculated members'/><category term='mondrian crosstab'/><category term='sqlstream signal processing rabbitmq amqp seismic'/><category term='real-time analytics event stream processing esp'/><category term='arithmetic journalism'/><category term='sqlstream rss atom twitter feed realtime'/><category term='mondrian jpivot pentaho analysis howto'/><category term='oracle simba mdx olap olap4j standardization'/><category term='olap4j gwt slice dice olap mdx'/><category term='mondrian high cardinality dimension'/><category term='organic community baked box'/><category term='sdforum mondrian'/><category term='streaming sql social media rss twitter friendfeed facebook'/><category term='streaming sql database cep'/><category term='open source hudson jenkins oracle'/><category term='bnf javacc parser grammar generator'/><category term='system design'/><category term='goto t.rex attack'/><category term='mondrian mdx writeback'/><category term='caucus colorado democracy'/><category term='mozilla firefox 3.5 sqlstream'/><category term='efficient primitive java collections janino'/><category term='xmla javascript open source bi ajax'/><category term='kettle mondrian workbench'/><category term='twitter realtime web streaming sql'/><category term='limerick'/><category term='OSBOOTCAMP'/><category term='open source BI survey COSS'/><category term='welsh road sign auto-reply'/><title type='text'>Julian Hyde on Streaming Data, Open Source OLAP. And stuff.</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default?start-index=101&amp;max-results=100'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>120</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-70633547855380786</id><published>2012-01-14T16:05:00.000-08:00</published><updated>2012-01-16T09:50:28.309-08:00</updated><title type='text'>Changes to Mondrian's caching architecture</title><content type='html'>&lt;br /&gt;&lt;div class="p1"&gt;I checked in some architectural changes to Mondrian's cache this week.&lt;/div&gt;&lt;div class="p1"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;First the executive summary:&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;1. Mondrian should do the same thing as it did before, but scale up better to more concurrent queries and more cores.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;2. Since this is a fairly significant change in the architecture, I'd appreciate if you kicked the tires, to make sure I didn't break anything.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;Now the longer version.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;Since we &lt;a href="http://julianhyde.blogspot.com/2011/02/scalable-caching-in-mondrian.html" target="_blank"&gt;introduced external caches in Mondrian 3.3&lt;/a&gt;, we were aware that we were putting a strain on the caching architecture. The caching architecture has needed modernization for a while, but external caches made it worse. First, a call to an external cache can take a significant amount of time: depending on the cache, it might do a network I/O, and so take several orders of magnitude longer than a memory access. Second, we introduced external caching and introduced in-cache rollup, and for both of these we had to beef up the in-memory indexes needed to organize the cache segments.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;Previously we'd used a critical section approach: any thread that wanted to access an object in the cache locked out the entire cache. As the cache data structures became more complex, those operations were taking longer. To improve scalability, we adopted a radically different architectural pattern, called the &lt;a href="http://en.wikipedia.org/wiki/Actor_model" target="_blank"&gt;Actor Model&lt;/a&gt;. Basically, one thread, called the Cache Manager is dedicated to looking after the cache index. Any query thread that wants to find a segment in the cache, or to add a segment to the cache, or create a segment by rolling up existing segments, or flush the cache sends a message to the Cache Manager.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;Ironically, the cache manager does not get segments from external caches. As I said earlier, external cache accesses can take a while, and the cache manager is super-busy. The cache manager tells the client the segment key to ask the external cache for, and the client does the asking. When a client gets a segment, it stores it in its private storage (good for the duration of a query) so it doesn't need to ask the cache manager again. Since a segment can contain thousands of cells, even large queries typically only make a few requests to the cache manager.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;The external cache isn't just slow; it is also porous. It can have a segment one minute, and forget it the next. The Mondrian query thread that gets the cache miss will tell the cache manager to remove the segment from its index (so Mondrian doesn't ask for it again), and formulate an alternative strategy to find it. Maybe the required cell exists in another cached segment; maybe it can be obtained by rolling up other segments in cache (but they, too, could have gone missing without notice). If all else fails, we can generate SQL to populate the required segment from the database (a fact table, or if possible, an aggregate table).&amp;nbsp;&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;Since the cache manager is too busy to talk to the external cache, it is certainly too busy to execute SQL statements. From the cache manager's perspective, SQL queries take an eternity (several million CPU cycles each), so it farms out SQL queries to a pool of worker threads. The cache manager marks that segment as 'loading'. If another query thread asks the cache manager for a cell that would be in that segment, it receives a&amp;nbsp;&lt;a href="http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html" target="_blank"&gt;Future&lt;/a&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.com/api/mondrian/rolap/agg/SegmentBody.html" target="_blank"&gt;SegmentBody&lt;/a&gt;&amp;gt; that will be&amp;nbsp;populated as soon as the segment arrives.&amp;nbsp;When that segment returns, the query thread pushes the segment into the cache, and tells the cache manager to update the state of that segment from 'loading' to 'ready'.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;The Actor Model is a radically different architecture. First, let's look at the benefits. Since one thread is managing an entire subsystem, you can just remove all locking. This is liberating. Within the subsystem, you can code things very simply, rather than perverting your data structures for thread-safety. You don't even need to use concurrency-safe data structures like &lt;a href="http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CopyOnWriteArrayList.html" target="_blank"&gt;CopyOnWriteArrayList&lt;/a&gt;, you can just use the fastest data structure that does the job. Once you remove concurrency controls such as 'synchronized' blocks, and access from only one thread, the data structure becomes miraculously faster. How can that be? The data structure now resides in the thread's cache, and when you removed the concurrency controls, you were also removing memory barriers that forced changes to be written through L1 and L2 cache to RAM, which is &lt;a href="http://julianhyde.blogspot.com/2010/11/numbers-everyone-should-know.html" target="_blank"&gt;up to 200 times slower&lt;/a&gt;.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;Migrating to the Actor Model wasn't without its challenges. First of all, you need to decide which data structures and actions should be owned by the actor. I believe we got that one right. I found that most of the same things needed to be done, but by different threads than previously; so the task we mainly about moving code around. We needed to refine the data structures that were passed between "query", "cache manager" and "worker" threads, to make sure that they were immutable. If, for instance, you want the query thread to find other useful work to do while it is waiting for a segment, it shouldn't be modifying a data structure that it put into the cache manager's request queue.&amp;nbsp;In a future blog post,&amp;nbsp;I'll describe in more detail the&amp;nbsp;challenges &amp;amp; benefits of migrating one component of a complex software system to the Actor Model.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;Not all caches are equal. Some, like &lt;a href="http://www.jboss.org/infinispan" target="_blank"&gt;JBoss Infinispan&lt;/a&gt;, are able to share cache items (in our case, segments containing cell values) between nodes in a cluster, and to use redundancy to ensure that cache items are never lost. Infinispan calls itself a "data grid", which first I dismissed as mere marketing, but I became convinced that it is genuinely a different kind of beast than a regular cache. To support data grids, we added hooks so that a cache can tell Mondrian about segments that have been added to other nodes in a cluster. This way, Mondrian becomes a genuine cluster. If I execute query X on node 1, it will put segments into the data grid that will make the query you are about to submit, query Y on node 2, execute faster.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;As you can tell by the enthusiastic length of this post, I am very excited about this change to Mondrian's architecture. Outwardly, Mondrian executes the same MDX queries the same as it ever did. But the internal engine can scale better when running on a modern CPU with many cores; due to the external caches, the cache behave much more predictably; and you can create clusters of Mondrian nodes that share their work and memory.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;The changes will be released soon as Mondrian version &lt;strike&gt;3.3.1&lt;/strike&gt;&amp;nbsp;3.4, but you can help by downloading from the main line (or from CI), kicking the tires, and letting us know if you find any problems.&lt;br /&gt;&lt;br /&gt;[Edited 2011/1/16, to fix version number.]&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-70633547855380786?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/70633547855380786/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=70633547855380786' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/70633547855380786'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/70633547855380786'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2012/01/changes-to-mondrians-caching.html' title='Changes to Mondrian&apos;s caching architecture'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-5740272191525842954</id><published>2012-01-04T11:22:00.000-08:00</published><updated>2012-01-05T01:18:03.404-08:00</updated><title type='text'>olap4j moves to Apache License</title><content type='html'>&lt;br /&gt;&lt;div class="p1"&gt;We've decided to change &lt;a href="http://www.olap4j.org/"&gt;olap4j&lt;/a&gt;'s license to the &lt;a href="http://www.apache.org/licenses/LICENSE-2.0.html"&gt;Apache Software License, Version 2.0&lt;/a&gt; (ASL).&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;I hope to find time to write a post describing in more detail the rationale for the license change and the choice of license. For now, suffice it to say that the license change will be good for olap4j's adoption and long-term success, and good for its users. ASL is a very permissive license, and is arguably &lt;a href="http://www.openlogic.com/news/press/05.16.11.php"&gt;becoming the standard license for open source enterprise software&lt;/a&gt;.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;The committers have agreed to the license change, and we will be making the changes to the code base over the next few days and making a point release under the new license shortly.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;Happy new year everyone.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;Julian&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-5740272191525842954?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/5740272191525842954/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=5740272191525842954' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5740272191525842954'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5740272191525842954'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2012/01/olap4j-moves-to-apache-license.html' title='olap4j moves to Apache License'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-5520527643985077935</id><published>2011-08-24T10:40:00.000-07:00</published><updated>2011-08-24T10:40:29.458-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian mdx parsing'/><title type='text'>How Mondrian names hierarchies</title><content type='html'>        &lt;br /&gt;&lt;div class="p1"&gt;You may or may not be aware of the property &lt;a href="http://mondrian.pentaho.com/api/mondrian/olap/MondrianProperties.html#SsasCompatibleNaming"&gt;mondrian.olap.SsasCompatibleNaming&lt;/a&gt;. It controls the naming of elements, in particular how Mondrian names hierarchies when there are multiple hierarchies in the same dimension.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p2"&gt;Let's suppose that there is a dimension called 'Time', and it contains hierarchies called 'Time' and 'Weekly'.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;If&amp;nbsp;SsasCompatibleNaming&amp;nbsp;is false, the dimension and the first hierarchy will both be called '[Time]', and the other hierarchy will be called '[Time.Weekly]'.&lt;/div&gt;&lt;div class="p1"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;If&amp;nbsp;SsasCompatibleNaming&amp;nbsp;is true, the dimension will be called '[Time]', the first hierarchy be called '[Time].[Time]', and the other hierarchy will be called '[Time].[Weekly]'.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;As you can see, SsasCompatibleNaming makes life simpler, if slightly more verbose, because it gives each element a distinct name. There are knock-on effects, beyond the naming of hierarchies. The most subtle and confusing effect is in the naming of levels when the dimension, hierarchy and level all have the same name. If SsasCompatibleNaming is false, then [Gender].[Gender].Members is asking for the members of the gender &lt;i&gt;level&lt;/i&gt;, and yields two members. If SsasCompatibleNaming is true, then [Gender].[Gender].Members is asking for the members of the gender &lt;i&gt;hierarchy&lt;/i&gt;, and yields three members (all, F and M).&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;Usually, however, Mondrian is forgiving in how it resolves names, and if elements have different names, it will usually find the element you intend.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;The default value is false. However, that leads to naming behavior which is not compatible with other MDX implementations, in particular Microsoft SQL Server Analysis Services (versions 2005 and later).&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;From mondrian-4 onwards, the property will be set to true. (You won't be able to set it to false.) This makes sense, because in mondrian-4, with attribute-hierarchies, there will typically be several hierarchies in each dimension. We will really need to get our naming straight.&lt;/div&gt;&lt;div class="p2"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;What do we recommend?&amp;nbsp;If you are using Pentaho Analyzer, Saiku or JPivot today, we recommend that you use the default value, false. But if you are writing your own MDX (or have built your own client), try setting the value to true. The new naming convention actually makes more sense, and moving to it now will minimize the disruption when you move to mondrian-4.&lt;/div&gt;&lt;div class="p1"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;I am just about to check in a change that uses a new, and better name resolution algorithm. It will be more forgiving, and standards-compliant, in how it resolves the names of calculated members. However, it might break compatibility, so it will only be enabled if SsasCompatibleNaming is true.&lt;/div&gt;&lt;div class="p1"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;Are you using this property today? Let us know how it's working for you.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-5520527643985077935?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/5520527643985077935/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=5520527643985077935' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5520527643985077935'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5520527643985077935'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2011/08/how-mondrian-names-hierarchies.html' title='How Mondrian names hierarchies'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-8770738448624096027</id><published>2011-07-22T10:51:00.000-07:00</published><updated>2011-07-22T10:51:08.674-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sqlstream signal processing rabbitmq amqp seismic'/><title type='text'>Real-Time Seismic Monitoring</title><content type='html'>Marc Berkowitz wrote a &lt;a href="http://www.sqlstream.com/blog/2011/07/real-time-seismic-monitoring-in-the-cloud-with-sqlstream/"&gt;blog post describing an application of SQLstream to power a seismic monitoring project&lt;/a&gt; that is a collaboration between several leading research institutions.&lt;br /&gt;&lt;br /&gt;The project is interesting in several respects:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The project involves signal processing. Unlike the "event-processing" application that we see most often at SQLstream, events arrive at a regular rate (generally 40 readings every second, per sensor). In signal processing, events are more likely to be processed using complex mathematical formulas (such as &lt;a href="http://en.wikipedia.org/wiki/Fourier_transform"&gt;Fourier transforms&lt;/a&gt;) than by boolean logic (event A happened, then event B happened). Using SQLstream's user-defined function framework, we were easily able to accommodate this form of processing.&lt;/li&gt;&lt;li&gt;It illustrates how a stream-computing "fabric" can be created, connecting multiple SQLstream processing nodes using &lt;a href="http://www.rabbitmq.com/"&gt;RabbitMQ&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;One of the reasons for building a distributed system was to allow an agile approach. Researchers can easily deploy new algorithms without affecting the performance or correctness of other algorithms running in the cloud.&lt;/li&gt;&lt;li&gt;Another goal of the distributed system was performance and scalability. Nodes can easily be added to accommodate greater numbers of sensors. The system is not &lt;a href="http://en.wikipedia.org/wiki/Embarrassingly_parallel"&gt;embarassingly parallel&lt;/a&gt;, but we were still able to parallelize the solution effectively.&lt;/li&gt;&lt;li&gt;Lastly, the system needs to be both continuous and real-time. "Continuous" meaning that data is processed as it arrives; a smoother, more predictable and more efficient mode of operation than ETL. "Real-time" because some of the potential outputs of the system, such as tsunami alerts, need to be delivered as soon as possible in order to be useful.&lt;/li&gt;&lt;/ul&gt;In all, a very interesting case study of what SQLstream is capable of. Marc plans to make follow-up posts describing the solution in more detail, so &lt;a href="http://www.sqlstream.com/blog/"&gt;stay tuned&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-8770738448624096027?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/8770738448624096027/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=8770738448624096027' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8770738448624096027'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8770738448624096027'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2011/07/real-time-seismic-monitoring.html' title='Real-Time Seismic Monitoring'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-3104531790622340755</id><published>2011-06-09T10:25:00.000-07:00</published><updated>2011-06-09T10:25:05.433-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='olap4j yellowfin'/><title type='text'>Yellowfin BI release 5.2 moves to olap4j</title><content type='html'>According to &lt;a href="http://www.newsmaker.com.au/news/9379"&gt;their press release&lt;/a&gt;, Yellowfin BI version 5.2 "includes a significant OLAP overhaul, with the introduction of OLAP4j and support for PALO, BW as well as enhanced connectivity for SQL Server 2005+".&lt;br /&gt;&lt;br /&gt;Nice to see &lt;a href="http://www.olap4j.org"&gt;olap4j&lt;/a&gt; gaining wider adoption. Though not too surprising, given connectivity options that it opens up. And bear in mind that because olap4j is open source, for every product that mentions olap4j in a press release, there may be dozens or hundreds of others that are using it and not talking about it publicly.&lt;br /&gt;&lt;br /&gt;Increased adoption is good, whether or not vendors choose to announce it. We know if vendors run into issues, they will log them and someone would fix them. It makes olap4j better for everyone.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-3104531790622340755?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/3104531790622340755/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=3104531790622340755' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3104531790622340755'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3104531790622340755'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2011/06/yellowfin-bi-release-52-moves-to-olap4j.html' title='Yellowfin BI release 5.2 moves to olap4j'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-8339921548425500424</id><published>2011-06-03T12:58:00.000-07:00</published><updated>2011-06-04T00:07:02.438-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='efficient primitive java collections janino'/><title type='text'>Roll your own high-performance Java collections classes</title><content type='html'>The Java collections framework is great. You can create maps, sets, lists with various element types, various performance characteristics (e.g. if you want O(1) insert, use a linked list), iterate over them, and you can decorate them to give them other behaviors.&lt;br /&gt;&lt;br /&gt;But suppose that you want to create a high-performance, memory efficient immutable list of integers? You'd write&lt;br /&gt;&lt;br /&gt;&lt;code&gt;List&amp;lt;Integer&amp;gt; list =&lt;br /&gt;&amp;nbsp;&amp;nbsp;Collections.unmodifiableList(&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;new ArrayList&lt;integer&gt;(&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Arrays.asList(1000, 1001, 1002)));&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;There will be 6 objects allocated in the JVM: three Integer objects, an array Object[3] to hold the Integers, an ArrayList, and an UnmodifiableRandomAccessList. Not to mention the Arrays.ArrayList and Integer[3] used to construct the list and quickly thrown away.&lt;br /&gt;&lt;br /&gt;The resulting list is no longer high-performance. A call to say 'int n = list.get(2)' requires 3 method calls (UnmodifiableRandomAccessList.get, ArrayList.get, Integer.intValue) and 3 indirections. And the sheer number of objects created reduces the chance that a given stretch of code will be able to operate solely from the contents of L1 cache.&lt;br /&gt;&lt;br /&gt;So, what next? Should I write my own class, like this?&lt;br /&gt;&lt;br /&gt;&lt;code&gt;public class UnmodifiableNativeIntArrayList&lt;br /&gt;&amp;nbsp;&amp;nbsp;implements List&amp;lt;Integer&amp;gt;&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;...&lt;br /&gt;}&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Well, maybe. But there are rather a lot of variations to cover, and each one needs to be hand-coded and tested.&lt;br /&gt;&lt;br /&gt;Do I use library code? I searched and turned up &lt;a href="http://commons.apache.org/primitives/"&gt;Apache Commons Primitives&lt;/a&gt;, &lt;a href="http://pcj.sourceforge.net/"&gt;Primitive Collections for Java (PCJ)&lt;/a&gt;, and &lt;a href="http://trove.starlight-systems.com/"&gt;GNU Trove (trove4j)&lt;/a&gt;. Of these, only GNU Trove is still active.&lt;br /&gt;&lt;br /&gt;None of the libraries supports features such as maps with two or more keys, unmodifiable collections, synchronized collections, flat collections (similar to &lt;a href="http://commons.apache.org/collections/api-3.2/org/apache/commons/collections/map/Flat3Map.html"&gt;Apache Flat3Map&lt;/a&gt;). It's not surprising that they don't: each combination of features would require its own class, so the size of the jar file would grow exponentially.&lt;br /&gt;&lt;br /&gt;So, I'd like to propose an alternate approach. You configure a factory, specifying the precise kind of collection you would like, and the factory generates the collection class in bytecode. You can use the factory to quickly create as many instances of the collection as you wish. The collection implements the Java collections interfaces, plus additional interfaces that allow you to efficiently access the collection without boxing/unboxing.&lt;br /&gt;&lt;br /&gt;The above example would be written as follows:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;// Initialize the factory when the program is loaded.&lt;br /&gt;// Then the bytecode gets generated just once.&lt;br /&gt;static final Factory factory =&lt;br /&gt;&amp;nbsp;&amp;nbsp;new FactoryBuilder()&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.list()&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.elementType(Integer.TYPE)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.modifiable(false)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.factory();&lt;br /&gt;&lt;br /&gt;int[] ints = {1000, 1001, 1002};&lt;br /&gt;IntList list = factory.createIntList(ints);&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Variants are expressed as FactoryBuilder methods:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;FactoryBuilder FactoryBuilder.list()&lt;/li&gt;&lt;li&gt;FactoryBuilder FactoryBuilder.map()&lt;/li&gt;&lt;li&gt;FactoryBuilder FactoryBuilder.set()&lt;/li&gt;&lt;li&gt;FactoryBuilder FactoryBuilder.keyType(Class...) (for maps only)&lt;/li&gt;&lt;li&gt;FactoryBuilder FactoryBuilder.valueType(Class...) (for maps only)&lt;/li&gt;&lt;li&gt;FactoryBuilder FactoryBuilder.elementType(Class...) (for list and set only)&lt;/li&gt;&lt;li&gt;FactoryBuilder FactoryBuilder.sorted(boolean) (cf. the difference between Set and SortedSet)&lt;/li&gt;&lt;li&gt;FactoryBuilder FactoryBuilder.deterministic(boolean) (cf. the difference between HashMap and LinkedHashMap)&lt;/li&gt;&lt;li&gt;FactoryBuilder FactoryBuilder.modifiable(boolean)&lt;/li&gt;&lt;li&gt;FactoryBuilder FactoryBuilder.fixedSize(boolean) (cf. the difference between Flat3Map and Map)&lt;/li&gt;&lt;li&gt;FactoryBuilder FactoryBuilder.synchronized(boolean)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;And so forth. Additional variants could be added as the project evolved. Templates could be fine-tuned for particular combinations of variants.&lt;br /&gt;&lt;br /&gt;The projects I mentioned above clearly use a template system, and we could use and extend those templates. The janino facility can easily convert the generated java code into bytecode. And the JVM would be able to apply JIT (just-in-time compilation) to these classes; in fact, these classes would be more amenable to compilation, because they would be compact and final.&lt;br /&gt;&lt;br /&gt;The existing projects have invested a lot of effort designing high-performance collections. I'd like to build on that work; this project could even be an extension to those projects.&lt;br /&gt;&lt;br /&gt;I'd like to hear if you're interested in working with me on this.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-8339921548425500424?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/8339921548425500424/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=8339921548425500424' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8339921548425500424'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8339921548425500424'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2011/06/roll-your-own-high-performance-java.html' title='Roll your own high-performance Java collections classes'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-3831657708509285388</id><published>2011-06-01T16:26:00.000-07:00</published><updated>2011-06-01T16:26:08.160-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian high cardinality dimension'/><title type='text'>Removing Mondrian's 'high cardinality dimension' feature</title><content type='html'>I would like to remove the 'high cardinality dimension' feature in mondrian 4.0.&lt;br /&gt;&lt;br /&gt;To specify that a dimension is high-cardinality, you set the &lt;a href="http://mondrian.pentaho.com/documentation/xml_schema.php#CubeDimension"&gt;highCardinality attribute of the Dimension element&lt;/a&gt; to true. This will cause mondrian to scan over the dimension, rather than trying to load all of the children of a given parent member into memory.&lt;br /&gt;&lt;br /&gt;The goal is a worthy one, but the implementation &amp;mdash; making iterators look like lists &amp;mdash; has a number of architectural problems: it duplicates code; because it allows backtracking for a fixed amount, it works with small dimensions but unpredictably fails with larger ones; and because lists are based on iterators, re-starting an iteration multiple times (e.g. from within a crossjoin) can re-execute complex SQL statements.&lt;br /&gt;&lt;br /&gt;There are other architectural features designed to help with large dimensions. Many functions can operate in an 'iterable' mode (except that here the iterators are explicit). And for many of the most data-intensive operators, such as crossjoin, filter, semijoin (non-empty), and topcount, we can push down the operator to SQL, and thereby reduce the number of records coming out of the RDBMS.&lt;br /&gt;&lt;br /&gt;It's always hard to remove a feature. But over the years we have seen numerous inconsistencies, and if we removed this feature in mondrian 4.0, we could better focus our resources. &lt;br /&gt;&lt;br /&gt;If you are using this feature and getting significant performance benefit, I would like to hear from you. I would like to understand about your use case, and either direct you to another feature that solves the problem, or try to develop an alternative solution in mondrian 4.0. The best place to make comments about these use cases is on the Jira case &lt;a href="http://jira.pentaho.com/browse/MONDRIAN-949"&gt;MONDRIAN-949&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-3831657708509285388?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/3831657708509285388/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=3831657708509285388' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3831657708509285388'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3831657708509285388'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2011/06/removing-mondrians-high-cardinality.html' title='Removing Mondrian&apos;s &apos;high cardinality dimension&apos; feature'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2669826520389763611</id><published>2011-05-31T00:03:00.000-07:00</published><updated>2011-05-31T16:11:03.111-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian luciddb sqlstream javascript udf'/><title type='text'>Scripted plug-ins in LucidDB and Mondrian</title><content type='html'>I saw a demo last week of scripted user-defined functions in &lt;a href="http://www.luciddb.org/"&gt;LucidDB&lt;/a&gt;, and was inspired this weekend to add them to &lt;a href="http://mondrian.pentaho.com/"&gt;Mondrian&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Kevin Secretan of DynamoBI &lt;a href="http://www.thejach.com/view/2011/5/can_your_sql_database_do_this"&gt;has just contributed&lt;/a&gt; some extensions to LucidDB to allow you to call script code (such as JavaScript or Python) in any place where you can have a user-defined function, procedure, or transform. This feature builds on a &lt;a href="http://www.jcp.org/en/jsr/detail?id=223"&gt;JVM feature introduced in Java 1.6&lt;/a&gt;, &lt;a href="http://download.oracle.com/javase/6/docs/technotes/guides/scripting/programmer_guide/index.html"&gt;scripting engines&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Scripted functions may be a little slower than Java user-defined functions, but what they lose in performance they more than make up in flexibility. Writing user-defined functions in Java has always been laborious: you need to write a Java class, compile it, put it in a jar, put the jar on the server's class path, and restart the server. Each time you find a bug, you need to repeat that process, and that can easily take a number of minutes each cycle. Because scripted functions are compiled on the fly, you can cycle faster, and spend more of your valuable time working on the actual application.&lt;br /&gt;&lt;br /&gt;I am speaking about LucidDB (and SQLstream) here, but the same problems exist for Mondrian plug-ins. Scripting is an opportunity to radically speed up development of application extensions, because everything can be done in the schema file. (Or via the workbench... but that part isn't implemented yet.)&lt;br /&gt;&lt;br /&gt;Mondrian has several plug-in types, all today implemented using a Java SPI. I chose to make scriptable those plug-ins that are defined in a mondrian schema file: user-defined function, member formatter, property formatter, and cell formatter. A small syntax change to the schema file allowed you to chose whether to implement these plug-ins by specifying the name of a Java class (as before) or an inline script.&lt;br /&gt;&lt;br /&gt;As an example, here is the factorial function defined in JavaScript:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;b&gt;&amp;lt;UserDefinedFunction name="Factorial"&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;Script language="JavaScript"&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;function getParameterTypes() {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return new Array(new mondrian.olap.type.NumericType());&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;function getReturnType(parameterTypes) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return new mondrian.olap.type.NumericType();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;function execute(evaluator, arguments) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;var n = arguments[0].evaluateScalar(evaluator);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return factorial(n);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;function factorial(n) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return n &lt;= 1 ? 1 : n * factorial(n - 1);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;/Script&amp;gt;&lt;br /&gt;&amp;lt;/UserDefinedFunction&amp;gt;&lt;/b&gt;&lt;/pre&gt;&lt;br /&gt;A user-defined function ironically requires several functions in order to provide the metadata needed by the MDX type system. The member, property and cell formatters are simpler. They require just one function, so mondrian dispenses with the function header, and requires just the 'return' expression inside the Script element. For example, here is a member formatter:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;b&gt;&amp;lt;Level name="name" column="column"&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;MemberFormatter&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;Script language="JavaScript"&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return member.getName().toUpperCase();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/Script&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;/MemberFormatter&amp;gt;&lt;br /&gt;&amp;lt;/Level&amp;gt;&lt;/b&gt;&lt;/pre&gt;&lt;br /&gt;You can of course write multiple statements, if you wish. Since JavaScript is embedded in the JVM, your code can call back into Java methods, and use the full runtime Java library.&lt;br /&gt;&lt;br /&gt;There are examples of cell formatters and property formatters in the latest &lt;a href="http://p4webhost.eigenbase.org:8080/open/mondrian/doc/schema.html"&gt;schema guide&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;If you are concerned about performance, you could always translate this code back to a Java UDF when it is fully debugged. However, you might be pleasantly surprised by the performance of JavaScript: I was able to invoke a script function about 20,000 times per second. And I hear that there is a &lt;a href="http://docs.codehaus.org/display/JANINO/Home"&gt;Janino&lt;/a&gt; "scripting engine" that compiles Java code into bytecode on the fly. In principle, it should be as fast as a real Java UDF.&lt;br /&gt;&lt;br /&gt;I'd love to hear about Janino, or in fact any other scripting engine, with the Mondrian or LucidDB scripted functions.&lt;br /&gt;&lt;br /&gt;By the way, you can expect to see scripted functions in a release of SQLstream not too far in the future. The Eigenbase project makes it easy to propagate features between projects, and this feature is too good not to share.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2669826520389763611?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2669826520389763611/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2669826520389763611' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2669826520389763611'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2669826520389763611'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2011/05/scripted-plug-ins-in-luciddb-and.html' title='Scripted plug-ins in LucidDB and Mondrian'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-9217801446574600968</id><published>2011-04-12T02:21:00.000-07:00</published><updated>2011-04-12T03:05:59.583-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='olap4j java olap analysis api'/><title type='text'>olap4j version 1.0 released</title><content type='html'>Today &lt;a href="http://www.pentaho.com/news/releases/pentaho-announces-a-new-era-in-open-standards-for-analytics/"&gt;we launched version 1.0 of olap4j&lt;/a&gt;, the open standard API for accessing analytic databases.&lt;br /&gt;&lt;br /&gt;It's worth mentioning that version 1.0 is a big deal for an open source project. The tag implies maturity and stability, both of which are true for olap4j. The project is over 4 years old, has two robust driver implementations, and many applications in production.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://mondrian.pentaho.com/api/mondrian/olap4j/MondrianOlap4jDriver.html"&gt;olap4j driver for Mondrian&lt;/a&gt; has been the official way to access Mondrian since version 3.0, and the &lt;a href="http://www.olap4j.org/api/org/olap4j/driver/xmla/XmlaOlap4jDriver.html"&gt;olap4j driver for XML/A&lt;/a&gt; allows access to many XML/A-compliant analytic engines, including &lt;a href="http://www.microsoft.com/sqlserver/2008/en/us/analysis-services.aspx"&gt;Microsoft SQL Server Analysis Services&lt;/a&gt;, &lt;a href="http://mondrian.pentaho.com"&gt;Mondrian&lt;/a&gt;, &lt;a href="http://www.jedox.com/en/products/Palo-Suite/palo-olap-server.html"&gt;Palo&lt;/a&gt;, and &lt;a href="http://en.wikipedia.org/wiki/SAP_NetWeaver_Business_Intelligence"&gt;SAP BW&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;olap4j was created to address the lack of an open standard API for Java access to OLAP servers. Microsoft had created APIs for the Windows platform (&lt;a href="http://en.wikipedia.org/wiki/OLE_DB_for_OLAP"&gt;OLE DB for OLAP&lt;/a&gt;, and later &lt;a href="http://msdn.microsoft.com/en-us/library/ms123483.aspx"&gt;ADOMD.NET&lt;/a&gt;) and for web services (&lt;a href="http://news.xmlforanalysis.com/what-is-xmla.html"&gt;XML for Analysis&lt;/a&gt;) and in due course other vendors adopted those APIs as standards, but on Java, the main platform for enterprise applications, you were always tied to the API provided by your OLAP server vendor.&lt;br /&gt;&lt;br /&gt;There had been previous attempts to create Java APIs for OLAP, but they foundered because the main vendors could not &amp;mdash; or would not &amp;mdash; overcome the technical differences between their products. Since OLAP is concerned with constructing dynamic queries to assist an end-user in interactively exploring a data set, most vendors constructed queries using a complex proprietary API to "build" a query using a sequence of transforms.&lt;br /&gt;&lt;br /&gt;Relational database APIs such as &lt;a href="http://en.wikipedia.org/wiki/Open_Database_Connectivity"&gt;ODBC&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Java_Database_Connectivity"&gt;JDBC&lt;/a&gt; take a different approach: the query is a string in the SQL language. This allowed the APIs to be simpler, because the semantics of the query language need to be understood by the SQL parser and validator on the server, not by the API itself. And it has allowed the query language to be standardized without affecting the API too much. But the OLAP vendors maintained that such a simplifying approach could not be applied to OLAP. &lt;br /&gt;&lt;br /&gt;Microsoft started to prove them wrong when in 1998 they launched &lt;a href="http://en.wikipedia.org/wiki/Microsoft_Analysis_Services"&gt;SQL Server OLAP Services&lt;/a&gt;, the OLE DB for OLAP API, and the MDX query language. This was the first time (to my knowledge) that an OLAP vendor had built its API around a query language as opposed to a set of transforms. MDX played a major role in the success of XML/A: a web services API would have been much harder to use if the queries had built using an object model. Other vendors started to adopt OLE DB for OLAP and XML/A, leaving a void on the one platform Microsoft had no interest in: Java.&lt;br /&gt;&lt;br /&gt;Those of us in the open source world felt that void most acutely. Open source projects are organized into discrete components, each talking a standard API, and able to replace a proprietary component by being better, cheaper, faster. If there are no standard APIs, the product stacks sprawl across many components, from client-side to server-side, all made by the same vendor; there is nowhere for open source to get a foothold, and the customer has no choice but to accept the whole hog sold by the vendor.&lt;br /&gt;&lt;br /&gt;To redress this, I decided to create a new API. The software would be developed as an open source project, but perhaps more importantly, the specification would be created using an open standards process. As a result, the participants in olap4j read as a who's who of open source BI. Barry Klawans, then chief architect of &lt;a href="http://www.jaspersoft.com"&gt;JasperSoft&lt;/a&gt;, co-authored the original draft; Pentaho's chief geek, &lt;a href="http://jamesdixon.wordpress.com/"&gt;James Dixon&lt;/a&gt;, authored the query model; &lt;a href="http://devdonkey.blogspot.com/"&gt;Luc Boudreau&lt;/a&gt;, first with the University of Montreal, then with SQL Power, and now at Pentaho, is the XMLA driver's most active committer and co-leads the project; Paul Stoellberger and Tom Barber have proven and showcased olap4j by developing the first graphical client, Saiku. Paul has also got the XMLA driver working against SAP BW. And we've worked closely with Palo developers: Michael Raue worked with us on the spec, and &lt;a href="http://twitter.com/vmalic"&gt;Vladislav Malicevic&lt;/a&gt; has gotten the XMLA driver working against Palo.&lt;br /&gt;&lt;br /&gt;I knew that to be successful, olap4j needed to be simple and familiar, so I mandated that it would be an extension to JDBC and would use MDX as its query language. The other participants in the specification process took it from there.&lt;br /&gt;&lt;br /&gt;Because olap4j is an extension to JDBC, any developer who has accessed databases from Java can easily pick it up. And it can leverage standard JDBC services such as connection pools and driver managers.&lt;br /&gt;&lt;br /&gt;Microsoft had proven that an API could be built around the MDX language; there were differences between servers, but these would be mostly in the dialect of MDX supported; just about any server could support the basic metamodel of catalogs, cubes, dimensions, and measures. Some clients would want to build their own queries, and parse existing MDX queries; for these, we added a &lt;a href="http://www.olap4j.org/api/org/olap4j/query/Query.html"&gt;query model&lt;/a&gt; and an &lt;a href="http://www.olap4j.org/api/org/olap4j/mdx/parser/MdxParser.html"&gt;MDX parser&lt;/a&gt; to olap4j. Use of the query model and MDX parser is optional: if you have an MDX query string, you can just execute it.&lt;br /&gt;&lt;br /&gt;We have recently added more advanced features such as &lt;a href="http://julianhyde.blogspot.com/2009/06/cell-writeback-in-mondrian.html"&gt;scenarios (write-back)&lt;/a&gt; and &lt;a href="http://julianhyde.blogspot.com/2010/06/olap-change-notification-and.html"&gt;notifications&lt;/a&gt;. These features are still experimental (unlike the rest of the API, they may change post-1.0) and are optional for any olap4j provider. But we hope to see more providers implementing them, and clients making use of them. And we hope to see more features added to olap4j in future versions.&lt;br /&gt;&lt;br /&gt;The goal of olap4j was to foster development of analytic clients, servers, and integrated analytic apps by providing an open standard for connectivity. That goal has been realized. There is a native driver for mondrian and an XMLA driver that works against Microsoft SQL Server Analysis Services, SAP BW, Jedox Palo. There are several clients, both open and closed source: several components in Pentaho's own suite, the &lt;a href="http://code.google.com/p/pentaho-cdf/"&gt;Community Dashboard Framework (CDF)&lt;/a&gt;, &lt;a href="http://www.analytical-labs.com/"&gt;Saiku&lt;/a&gt;, &lt;a href="http://code.google.com/p/adans/"&gt;ADANS&lt;/a&gt;, &lt;a href="http://www.sqlpower.ca/page/wabit"&gt;SQL Power Wabit&lt;/a&gt;, and more.&lt;br /&gt;&lt;br /&gt;People are using olap4j in ways that I couldn't imagine when I started the project four years ago. That's the exciting thing about an open source project becomes successful and starts to gain momentum: you can expect the unexpected.&lt;br /&gt;&lt;br /&gt;Thank you to everyone who helped us get to this milestone.&lt;br /&gt;&lt;br /&gt;Visit &lt;a href="http://www.olap4j.org"&gt;www.olap4j.org&lt;/a&gt;, and download the release 1.0 of the specification and the software.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-9217801446574600968?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/9217801446574600968/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=9217801446574600968' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/9217801446574600968'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/9217801446574600968'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2011/04/olap4j-version-10-released.html' title='olap4j version 1.0 released'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-4134355498819032849</id><published>2011-02-04T13:13:00.000-08:00</published><updated>2011-02-04T13:20:45.738-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='scalable olap ehcache terracotta jboss infinispan'/><title type='text'>Scalable caching in Mondrian</title><content type='html'>Wouldn't it be great if &lt;a href="http://mondrian.pentaho.org/"&gt;Mondrian&lt;/a&gt;'s cache could be shared between several Mondrian instances, use memory outside the JVM or even across several machines, and scale as the data size or computation effort increases? That is the vision of Pentaho's "enterprise cache" initiative.&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_qNJXxQcuKDM/TUtk3p9so8I/AAAAAAAAAEI/lHcS8yzIq0k/s400/MondrianSegmentCacheSPI.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="191" src="http://1.bp.blogspot.com/_qNJXxQcuKDM/TUtk3p9so8I/AAAAAAAAAEI/lHcS8yzIq0k/s400/MondrianSegmentCacheSPI.png" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Mondrian cell-caching architecture, including pluggable external cache.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;Luc Boudreau has been leading this effort, has just checked in the first revision of the new &lt;a href="http://mondrian.pentaho.com/headapi/mondrian/rolap/agg/SegmentCache.html"&gt;mondrian.rolap.agg.SegmentCache&lt;/a&gt;&amp;nbsp;interface, and has &lt;a href="http://devdonkey.blogspot.com/2011/02/mondrian-spi-segmentcache.html"&gt;written a blog post describing how it will work&lt;/a&gt;. (Note: This &lt;a href="http://en.wikipedia.org/wiki/Service_provider_interface"&gt;SPI&lt;/a&gt; is likely to change before we release it.)&lt;br /&gt;&lt;br /&gt;Pluggable caching will be in Mondrian release 3.3, probably Q2 or Q3 this year.In the community edition will be the SPI and a default implementation that uses JVM memory. Of course the community will be able to contribute alternative implementations. In the enterprise edition of Mondrian 3.3, there will be scalable, highly manageable implementation based on something like &lt;a href="http://www.terracotta.org/bigmemory"&gt;Terracotta BigMemory&lt;/a&gt;, &lt;a href="http://ehcache.org/"&gt;ehCache&lt;/a&gt; or &lt;a href="http://www.jboss.org/infinispan"&gt;JBoss Infinispan&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;In future releases, you can expect to see further work in the area. Maybe alternative implementations of the caching SPI, and certainly tuning of Mondrian's caching and evaluation strategies, as we apply Mondrian to some of the biggest data sets out there.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-4134355498819032849?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/4134355498819032849/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=4134355498819032849' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4134355498819032849'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4134355498819032849'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2011/02/scalable-caching-in-mondrian.html' title='Scalable caching in Mondrian'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_qNJXxQcuKDM/TUtk3p9so8I/AAAAAAAAAEI/lHcS8yzIq0k/s72-c/MondrianSegmentCacheSPI.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-784789599159827941</id><published>2011-01-25T23:42:00.000-08:00</published><updated>2011-01-25T23:42:57.317-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='open source hudson jenkins oracle'/><title type='text'>Oracle, Hudson and Jenkins</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://hudson-ci.org/images/logo_oracle_small.gif" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://hudson-ci.org/images/logo_oracle_small.gif" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;a href="http://hudson-ci.org/images/butler.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="96" src="http://hudson-ci.org/images/butler.png" style="cursor: move;" width="96" /&gt;&lt;/a&gt;I've been following the furore about the &lt;a href="http://hudson-ci.org/"&gt;Hudson open source project&lt;/a&gt; with some interest and amusement. &lt;a href="http://www.oracle.com/"&gt;Oracle&lt;/a&gt; owns the trademark on the name Hudson (because the original developer worked for Sun at the time the project was created) and the community is spooked by the possibility that Oracle will enforce its trademark rights in future.&lt;br /&gt;&lt;br /&gt;Trademark rights are indeed a big deal for an open source project, just as they are for a commercial product. An open source project builds its brand by several years of high-quality releases and effective support in its community. Whoever owns the trademark of a project controls that brand.&lt;br /&gt;&lt;br /&gt;Here is &lt;a href="http://hudson-ci.org/docs/process_summary.html"&gt;Oracle's proposal for the future of the project&lt;/a&gt;, and the &lt;a href="http://kohsuke.org/2011/01/24/on-oracle-proposal-about-hudson/"&gt;response of one of the project's lead developers&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It's an interesting study in the fragile dynamics of an open source project's community. Oracle clearly don't understand how fragile the power balance is. The community is spooked; not so much by Oracle's ability to enforce its trademark (they claim they would never do that) but by their presumption that they have more of a say in the project than anyone else.&lt;br /&gt;&lt;br /&gt;My two cents? Oracle are not evil, but they are being naive and are coming across as complete dicks. If I were a member of the active Hudson community (I'm a happy user of Hudson, since it powers &lt;a href="http://ci.pentaho.com/"&gt;Pentaho's continuous integration site&lt;/a&gt;, but I wouldn't say that makes me a community member) I'd certainly give my +1 to fork and change the name of the project to Jenkins. There's little reason not to.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://hudson-ci.org/images/butler.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-784789599159827941?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/784789599159827941/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=784789599159827941' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/784789599159827941'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/784789599159827941'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2011/01/oracle-hudson-and-jenkins.html' title='Oracle, Hudson and Jenkins'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-8204475663657274020</id><published>2011-01-06T15:08:00.000-08:00</published><updated>2011-01-06T15:24:22.688-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data privacy realtime'/><title type='text'>"Just another big pile of data"</title><content type='html'>Jeff Jonas writes about the &lt;a href="http://jeffjonas.typepad.com/jeff_jonas/2010/12/big-data-flows-vs-wicked-leaks-.html"&gt;challenges of managing data privacy&lt;/a&gt; when the data concerned is Big Data. He advocates taking a real-time approach to auditing user behavior:&lt;br /&gt;&lt;blockquote&gt;&lt;strong&gt;Real-time active audits.&lt;/strong&gt;&amp;nbsp; It is now going to be essential that user activity be more rigorously analyzed, in real-time, for inappropriate behavior.&amp;nbsp; Audit logs have actually been part of the problem – just another big pile of data – evidence of misuse hiding in plain sight against the backdrop of millions and millions of benign audit records.&lt;/blockquote&gt;I must say, it hadn't occurred to me that privacy management could be seen as a real-time data problem. But he's right that large data sets, when not acted upon immediately, can become part of the problem. In his words, "just another big pile of data".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-8204475663657274020?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/8204475663657274020/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=8204475663657274020' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8204475663657274020'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8204475663657274020'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2011/01/just-another-big-pile-of-data.html' title='&quot;Just another big pile of data&quot;'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-1037821703076693964</id><published>2010-12-19T21:26:00.000-08:00</published><updated>2010-12-19T21:26:17.786-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='linux thread scheduler'/><title type='text'>An experiment with the Linux scheduler</title><content type='html'>I was curious to see how the Linux scheduler would manifest from a program's perspective, so today I did an experiment.&lt;br /&gt;&lt;br /&gt;I wrote a single-threaded program running a simple loop. All the loop does is to compute the number of milliseconds since the last iteration, and store the result in a histogram. We are not so much interested in the performance of the loop (it does about a million iterations per second) but in the variations in the intervals between loop iterations. These variations are presumably caused by the Linux scheduler.&lt;br /&gt;&lt;br /&gt;Here are the numbers I achieved, and the same numbers in a chart (with a logarithmic y-axis).&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_BVv0WTpeWTs/TQ7ngj1xKVI/AAAAAAAAAEw/dphV-tXC2hM/s1600/chart" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="172" src="http://4.bp.blogspot.com/_BVv0WTpeWTs/TQ7ngj1xKVI/AAAAAAAAAEw/dphV-tXC2hM/s320/chart" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Interval (milliseconds)&lt;/th&gt;&lt;th&gt;Frequency&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;0&lt;/td&gt;    &lt;td align="right"&gt;450,080,302&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;1&lt;/td&gt;    &lt;td align="right"&gt;909,044&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;2&lt;/td&gt;    &lt;td align="right"&gt;4,642&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;3&lt;/td&gt;    &lt;td align="right"&gt;1,696&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;4&lt;/td&gt;    &lt;td align="right"&gt;853&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;5&lt;/td&gt;    &lt;td align="right"&gt;561&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;6&lt;/td&gt;    &lt;td align="right"&gt;557&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;7&lt;/td&gt;    &lt;td align="right"&gt;335&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;8&lt;/td&gt;    &lt;td align="right"&gt;1,098&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;9&lt;/td&gt;    &lt;td align="right"&gt;152&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;10&lt;/td&gt;    &lt;td align="right"&gt;86&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;11&lt;/td&gt;    &lt;td align="right"&gt;52&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;12&lt;/td&gt;    &lt;td align="right"&gt;98&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;13&lt;/td&gt;    &lt;td align="right"&gt;17&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;14&lt;/td&gt;    &lt;td align="right"&gt;13&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;15&lt;/td&gt;    &lt;td align="right"&gt;6&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;16&lt;/td&gt;    &lt;td align="right"&gt;21&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;17&lt;/td&gt;    &lt;td align="right"&gt;5&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;19&lt;/td&gt;    &lt;td align="right"&gt;3&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;20&lt;/td&gt;    &lt;td align="right"&gt;1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;21&lt;/td&gt;    &lt;td align="right"&gt;2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;22&lt;/td&gt;    &lt;td align="right"&gt;2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;23&lt;/td&gt;    &lt;td align="right"&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;24&lt;/td&gt;    &lt;td align="right"&gt;2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;25&lt;/td&gt;    &lt;td align="right"&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;26&lt;/td&gt;    &lt;td align="right"&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;27&lt;/td&gt;    &lt;td align="right"&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;28&lt;/td&gt;    &lt;td align="right"&gt;1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;29&lt;/td&gt;    &lt;td align="right"&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt; &lt;/table&gt;&lt;/div&gt;&lt;br /&gt;The vast majority of iterations occur zero milliseconds after the previous iteration. No surprise there; Java's clock granularity, 1 millisecond, is coarse enough to execute over a million instructions.&lt;br /&gt;&lt;br /&gt;If the thread was never interrupted, one would expect the loop to tick forward 1 millisecond 1,000 times per second, or about 500,000 times in all. It actually ticks 909,044 times, so interrupts are at work: about 400,000 of them.&lt;br /&gt;&lt;br /&gt;Longer delays also occur: 2, 3, 4, up to 28 milliseconds, occurring with exponentially decreasing frequency. Only 8 delays of 20 milliseconds or longer occur in the 7.5 minute run. The chart shows the exponential decay clearly. The chart plots log-frequency, and the trend line is indeed flat from 2 milliseconds onwards, so it is accurate to characterize the line as exponential.&lt;br /&gt;&lt;br /&gt;The one surprising thing: significant bumps at 8, 12 and 16 milliseconds. Although the trend of the line is pretty consistently down, each of those interval durations has more distinctly occurrences than the previous interval. Does anyone know anything about the Linux scheduler that might explain this?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-1037821703076693964?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/1037821703076693964/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=1037821703076693964' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1037821703076693964'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1037821703076693964'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/12/experiment-with-linux-scheduler.html' title='An experiment with the Linux scheduler'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_BVv0WTpeWTs/TQ7ngj1xKVI/AAAAAAAAAEw/dphV-tXC2hM/s72-c/chart' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-7768829724068073683</id><published>2010-11-22T12:00:00.000-08:00</published><updated>2010-11-24T11:26:41.746-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian architecture olap4j xmla'/><title type='text'>Architectural shuffling in mondrian's XMLA and olap4j servers</title><content type='html'>As a software architect, some of my most interesting work doesn't deliver any additional functionality to end-users, but reorganizes the architecture to make great things possible in future. Since mondrian is an open source project, those great things will, likely as not, be dreamt up by someone else; my job as leader of the &lt;a href="http://mondrian.pentaho.com"&gt;mondrian&lt;/a&gt; project is to reorganize things to make that possible.&lt;br /&gt;&lt;br /&gt;Case in point, my &lt;a href="http://perforce.eigenbase.org:8080/@md=d&amp;amp;cd=//&amp;amp;c=D2C@/13929?ac=10"&gt;recent check in, change 13929&lt;/a&gt;. It contains three new pieces of functionality.&lt;br /&gt;&lt;h4&gt;Make mondrian's XMLA server run off the olap4j API&lt;/h4&gt;Mondrian's legacy API (&lt;a href="http://mondrian.pentaho.com/headapi/mondrian/olap/Connection.html"&gt;mondrian.olap.Connection&lt;/a&gt;, etc.) has been deprecated for some time; olap4j is the official API by which applications should speak to mondrian. Mondrian's XMLA server, that takes incoming SOAP requests over HTTP to execute queries or retrieve metadata, processes them using mondrian, and returns the results as SOAP or JSON over HTTP, has not used the olap4j API until now.&lt;br /&gt;&lt;br /&gt;As part of this change, I converted the &lt;a href="http://mondrian.pentaho.com/headapi/mondrian/xmla/XmlaHandler.html"&gt;XMLA server&lt;/a&gt; to use olap4j. In the process, I achieved some beneficial side effects. First, I discovered and fixed a few bugs in mondrian's olap4j driver; this will make the olap4j driver more stable for everyone.&lt;br /&gt;&lt;br /&gt;Second, I discovered a few essential pieces of metadata that the olap4j API does not return. I have not yet extended olap4j to include them: that may happen as we move towards olap4j 1.1 or olap4j 2.0, if they make sense for other olap4j stakeholders. I created the&amp;nbsp;&lt;a href="http://mondrian.pentaho.com/headapi/mondrian/xmla/XmlaHandler.XmlaExtra.html"&gt;XmlaExtra&lt;/a&gt; interface as a loophole, to allow the XMLA server get mondrian's legacy API; this interface serves to document what's missing from olap4j.&lt;br /&gt;&lt;br /&gt;Third, and most exciting, the XMLA server should now run against any olap4j driver. It needs to be repackaged a bit &amp;mdash; it still lives within the mondrian codebase, in the &lt;a href="http://mondrian.pentaho.com/headapi/mondrian/xmla/package-summary.html"&gt;mondrian.xmla&lt;/a&gt; package &amp;mdash; but if you are developing an olap4j driver, contact me, and we can consider spinning it out.&lt;br /&gt;&lt;h4&gt;Make mondrian into a real server &amp;mdash; for those who want one&lt;/h4&gt;You'll notice that I tend to refer to mondrian as an OLAP engine.&amp;nbsp;I've always hesitated to call it a 'server', because a server has an independent existence (its own process id, for instance), configuration information, and services such as authentication.&lt;br /&gt;&lt;br /&gt;This is no accident: I deliberately architected mondrian as an engine, so that it could be embedded into another application or server and inherit those services from that application. That's why you need to tell mondrian the URI of the catalog, the JDBC information of the data warehouse, and the role that you would like mondrian to use to execute queries. It has no concept of users and passwords, because it assumes that the enclosing application is performing authentication, then mapping authenticated users to roles.&lt;br /&gt;&lt;br /&gt;This architecture makes as much sense now as it did when I started, and it isn't going to change. Core mondrian will remain an engine. But the XMLA server, as its name suggests, performs some of the functions that one associates with a server. In particular, it reads a datasources.xml file that contains the name, catalog URI, and JDBC information of multiple catalogs. My idea was to create an alternate olap4j driver, &lt;a href="http://mondrian.pentaho.com/headapi/mondrian/olap4j/MondrianOlap4jEngineDriver.html"&gt;MondrianOlap4jEngineDriver&lt;/a&gt;, that&amp;nbsp;extends the default driver &lt;a href="http://mondrian.pentaho.com/headapi/mondrian/olap4j/MondrianOlap4jDriver.html"&gt;MondrianOlap4jDriver&lt;/a&gt;, and move the catalog functionality from the XMLA server to the new olap4j driver.&lt;br /&gt;&lt;br /&gt;The new driver is added as part of this change, but is not complete. In a later change, I will move the catalog functionality out of the XMLA server. I don't have plans to add other server features, such as mechanisms to authenticate users or map user names to roles. But I've provided the hook where this functionality can be added, and I encourage you in the mondrian community to contribute that functionality.&lt;br /&gt;&lt;h4&gt;Lock box&lt;/h4&gt;Last, I came up with an elegant (I think) solution to a problem that has been perplexing us for a while. The problem is that the JDBC API requires all parameters to be passed as strings when you are making a connection. If you are creating an olap4j connection to mondrian, and access to the underlying data warehouse is via a &lt;a href="http://download.oracle.com/javase/6/docs/api/javax/sql/DataSource.html"&gt;javax.sql.DataSource&lt;/a&gt; object, not a connect string, then you cannot pass in that DataSource. If you have created your own Role object to do customized access-control, you cannot pass in the object, you have to pass in the name of a role already defined in the mondrian schema (or a comma-separated list of role names).&lt;br /&gt;&lt;br /&gt;I invented a&amp;nbsp;&lt;a href="http://mondrian.pentaho.com/headapi/mondrian/util/LockBox.html"&gt;LockBox&lt;/a&gt;&amp;nbsp;class, that acts as a special kind of map that has some of the characteristics of a directory service. There is one lock box per server. If you have an object you wish to pass in, then you register it with the lock box, and the lock box gives you a string moniker to reference that object. That moniker is unique for the duration of the server, and near impossible for an unauthorized component guess. You can pass it to other components, and they can access the object.&lt;br /&gt;&lt;br /&gt;The lock box automatically garbage collects unused objects. When an object is registered, the lock box returns an entry object to the caller that contains both the string moniker and the object itself. The entry is the key to a &lt;a href="http://download.oracle.com/javase/6/docs/api/java/util/WeakHashMap.html"&gt;WeakHashMap&lt;/a&gt;, so when the client forgets the entry, the object is eligible to be garbage-collected out of the lock box. This guarantees that the lock box will not fill up over time due to clients forgetting to deregister objects.&lt;br /&gt;&lt;br /&gt;LockBox does not purport to be a full directory service &amp;mdash; in particular, objects are only accessible within the same JVM &amp;mdash; but it carries out a simple purpose, efficiently and elegantly, and may be useful to other applications.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-7768829724068073683?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/7768829724068073683/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=7768829724068073683' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7768829724068073683'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7768829724068073683'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/11/architectural-shuffling-in-mondrians.html' title='Architectural shuffling in mondrian&apos;s XMLA and olap4j servers'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-3969700668946149667</id><published>2010-11-17T15:04:00.000-08:00</published><updated>2010-11-17T15:04:17.508-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='system design'/><title type='text'>Numbers everyone should know</title><content type='html'>Jeffrey Dean recently gave a talk "Building Software Systems at Google and Lessons Learned" at Stanford (&lt;a href="http://goo.gl/0MznW"&gt;video&lt;/a&gt;). One of his slides was the following list of numbers:&lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;L1 cache reference&lt;/td&gt;&lt;td align="right"&gt;0.5&lt;/td&gt;&lt;td&gt;ns&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Branch mispredict                    &lt;/td&gt;&lt;td align="right"&gt;5 ns&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;L2 cache reference                   &lt;/td&gt;&lt;td align="right"&gt;7 ns&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mutex lock/unlock                    &lt;/td&gt;&lt;td align="right"&gt;25 ns&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Main memory reference                &lt;/td&gt;&lt;td align="right"&gt;100 ns&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Compress 1K bytes w/ cheap algorithm &lt;/td&gt;&lt;td align="right"&gt;3,000 ns&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Send 2K bytes over 1 Gbps network    &lt;/td&gt;&lt;td align="right"&gt;20,000 ns&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Read 1 MB sequentially from memory   &lt;/td&gt;&lt;td align="right"&gt;250,000 ns&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Round trip within same datacenter    &lt;/td&gt;&lt;td align="right"&gt;500,000 ns&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Disk seek                           &lt;/td&gt;&lt;td align="right"&gt;10,000,000 ns&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Read 1 MB sequentially from disk    &lt;/td&gt;&lt;td align="right"&gt;20,000,000 ns&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Send packet CA-&amp;gt;Netherlands-&amp;gt;CA&lt;/td&gt;&lt;td align="right"&gt;150,000,000 ns&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;Everyone who wants to design high-performance, scalable systems should memorize these numbers. There are many, many lessons to be learned.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-3969700668946149667?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/3969700668946149667/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=3969700668946149667' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3969700668946149667'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3969700668946149667'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/11/numbers-everyone-should-know.html' title='Numbers everyone should know'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-3688808049615720870</id><published>2010-10-29T20:21:00.000-07:00</published><updated>2010-10-29T20:21:16.246-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='streaming sql'/><title type='text'>Concepts in Streaming SQL</title><content type='html'>SQLstream's marketing head Rick Saletta just wrote a &lt;a href="http://www.sqlstream.com/blog/2010/10/concepts-in-streaming-sql/"&gt;layman's guide to streaming SQL&lt;/a&gt;. It's short, sweet, entirely buzzword-free, and a good introduction to streaming queries. So I thought I'd share the whole post:&lt;br /&gt;&lt;blockquote&gt;A streaming SQL query is a continuous, standing query that executes over streaming data. Data streams are processed using familiar SQL relational operators augmented to handle time sensitive data. Streaming queries are similar to database queries in how they analyze data; they differ by operating continuously on data as they arrive and by updating results in real-time.&lt;br /&gt;&lt;br /&gt;Streaming SQL queries process dynamic, flowing data, in contrast to traditional RDBMSs, which process static, stored data with repeated single-shot queries. Streaming SQL is simple to configure using existing IT skills, dramatically reducing integration cost and complexity. Combining the intuitive power of SQL with this simplicity of configuration enables much faster implementation of business ideas, while retaining the scalability and investment protection important for business-critical systems.&lt;br /&gt;&lt;br /&gt;By processing transactions continuously, streaming SQL directly addresses the real-time business needs for low latency, high volume, and rapid integration. Complex, time-sensitive transformations and analytics, operating continuously across multiple input data sources, are simple to configure and generate streaming-analytics answers as input data arrive. Sources can include any application inputs or outputs, or any of the data feeds processed or generated within an enterprise. Examples include financial trading data, internet clickstream data, sensor data, and exception events. SQL can process multiple input and output streams of data, for multiple publishers and subscribers.&lt;/blockquote&gt;If you want to learn more, download the &lt;a href="http://www.sqlstream.com/Resources/ConceptsInStreamingSQL.pdf"&gt;Concepts in Streaming SQL&lt;/a&gt; white paper.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-3688808049615720870?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/3688808049615720870/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=3688808049615720870' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3688808049615720870'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3688808049615720870'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/10/concepts-in-streaming-sql.html' title='Concepts in Streaming SQL'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2265194776603744953</id><published>2010-10-07T18:39:00.000-07:00</published><updated>2010-10-07T18:39:22.349-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian mdx etl pentaho analyzer'/><title type='text'>Setting the current member of the time dimension automatically</title><content type='html'>The question came up today, "How do I write my report so that the current member of the Time dimension is the most recent member for which transactional data are loaded?"&lt;br /&gt;&lt;br /&gt;It's a good question, and comes up often. Let's look at some ways that you could solve it.&lt;br /&gt;&lt;h4&gt;Attempt #1: CurrentDateMember&lt;/h4&gt;One might think that 'today' would suffice (using Mondrian's &lt;a href="http://julianhyde.blogspot.com/2006/10/mondrian-22-cube-designer-and.html"&gt;CurrentDateMember&lt;/a&gt; MDX function), but since many enterprises only run the ETL process overnight, it isn't always the right answer. Some nights (gasp!) the ETL process fails, so even 'yesterday' may not be right answer.&lt;br /&gt;&lt;h4&gt;Attempt #2: defaultMember&lt;/h4&gt;The default member of a hierarchy is its 'all' member, or if there is no 'all' member, the first member of the first level. Mondrian allows you to change the default member in the schema file using the defaultMember XML attribute. To do this for the Time hierarchy, you'd write the following:&lt;br /&gt;&lt;blockquote&gt;&amp;lt;Dimension name="Time" type="TimeDimension"&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;lt;Hierarchy defaultMember="[Time].[2010].[10].[07]" hasAll="false" primarykey="time_id"&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;...&lt;/blockquote&gt;&lt;br /&gt;You'd have to find some way to re-generate the schema XML file each time a load was successful (or use a &lt;a href="http://mondrian.pentaho.com/api/mondrian/spi/DynamicSchemaProcessor.html"&gt;DynamicSchemaProcessor&lt;/a&gt; to generate a schema on the fly, substituting a template schema that contains a token for the default member). But I wouldn't recommend this approach. Default members of hierarchies don't just affect what appears on the screen; they are the default context for all MDX calculations (where the calculation isn't explicitly set in the formula), and so all calculations will change every time you reload your data warehouse. This probably isn't what your users want.&lt;br /&gt;&lt;h4&gt;Attempt #3: Parameter&lt;/h4&gt;Define each of your reports with a parameter that holds the initial member of the time hierarchy for that report. Use some kind of scripting (say a custom piece of JavaScript inside &lt;a href="http://code.google.com/p/pentaho-cdf/"&gt;Pentaho's CDF&lt;/a&gt;, or a Pentaho action sequence) to populate that parameter as the report is launched.&lt;br /&gt;&lt;br /&gt;This approach is on the right track, but isn't quite perfect. This will give the your users what they want, but you will have to maintain a piece of script for every report you define.&lt;br /&gt;&lt;h4&gt;Attempt #4: Parameter with MDX expression as default value&lt;/h4&gt;This improves on attempt #3 by putting the expression to initialize the parameter inside the definition of the parameter. You don't need to provide a value of the parameter when you launch the report (unless you want to), and that means you don't need to write those pesky scripts.&lt;br /&gt;&lt;br /&gt;Although the question called for "the most recent [Time] member for which transactional data are loaded", I'm going to drive home the point with an example that qualifies on another dimension as well. This query will launch with the most recent month for which anyone in the town of Bellflower, California bought Good beer.&lt;br /&gt;&lt;blockquote&gt;select [Measures].[Unit Sales] on 0,&lt;br/&gt;&amp;nbsp;[Product].Children on 1&lt;br/&gt;from [Sales]&lt;br/&gt;where Parameter(&lt;br/&gt;&amp;nbsp;&amp;nbsp;"Time period of interest",&lt;br/&gt;&amp;nbsp;&amp;nbsp; [Time],&lt;br/&gt;&amp;nbsp;&amp;nbsp; Tail(&lt;br/&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; {&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; [Time],&lt;br/&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; Filter(&lt;br/&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; [Time].[Month].Members,&lt;br/&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0 &amp;lt; ([Customers].[USA].[CA].[Bellflower],&lt;br/&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; [Product].[Drink].[Alcoholic Beverages].[Beer and Wine].[Beer].[Good]))&lt;br/&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; },&lt;br/&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; 1),&lt;br/&gt;&amp;nbsp;&amp;nbsp; "Time period of interest for current analysis. By default the most recent month for which transactions exist.")&lt;/blockquote&gt;Filter evaluates the sames of Good beer in Bellflower every month and throws out months where no Good beer was sold, and Tail chooses the last. The dummy first element {[Time], ... } is to ensure that if the residents of Bellflower have never bought Good beer, the report still launches with a valid member of the time dimension.&lt;br /&gt;&lt;br /&gt;The results are as follows:&lt;blockquote&gt;&lt;pre&gt;| &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;| Unit Sales |&lt;br /&gt;+----------------+------------+&lt;br /&gt;| Drink &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp; &amp;nbsp;2,344 |&lt;br /&gt;| Food &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; | &amp;nbsp; &amp;nbsp; 18,278 |&lt;br /&gt;| Non-Consumable | &amp;nbsp; &amp;nbsp; &amp;nbsp;4,648 |&lt;br /&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;br /&gt;and the member in the slicer is [Time].[1997].[11]. (Yes, it's a long time since the unfortunate residents of Bellflower, CA drank Good beer.) This report doesn't contain a great deal of detail, but it can be used as a starting point for an series of slice, dice and pivot operations to interactively explore the data, and the same Time member will be carried forward until the user decides to switch to another time.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Attempt #5. Schema parameters&lt;/b&gt;I stopped looking for a solution when I had written the above query in attempt #4, but schema parameters are potentially even better. Schema parameters are little-known Mondrian feature that allow you define a parameter once in a schema file, then reference it in any report written against that schema.&lt;br /&gt;&lt;br /&gt;I haven't tried it, but the solution would look something like the following. To define the parameter, include the following in your schema file:&lt;br /&gt;&lt;blockquote&gt;&amp;lt;Parameter defaultvalue="Tail({ [Time], Filter([Time].[Month].Members, 0 &amp;amp;lt; ([Customers].[USA].[CA].[Bellflower], [Product].[Drink].[Alcoholic Beverages].[Beer and Wine].[Beer].[Good])) }, 1)" name="Time period of interest" type="Member"/&amp;gt;&lt;/blockquote&gt;and reference the parameter in an MDX query using ParamRef:&lt;br /&gt;&lt;blockquote&gt;select [Measures].[Unit Sales] on 0,&lt;br/&gt;&amp;nbsp;[Product].Children on 1&lt;br/&gt;from [Sales]&lt;br/&gt;where ParamRef("Time period of interest")&lt;/blockquote&gt;&lt;h4&gt;Other solutions?&lt;/h4&gt;As you can see there are many ways to attack a problem using Mondrian, Pentaho and MDX. Do you know other techniques to solve this problem? Let me know.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2265194776603744953?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2265194776603744953/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2265194776603744953' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2265194776603744953'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2265194776603744953'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/10/setting-current-member-of-time.html' title='Setting the current member of the time dimension automatically'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-3277113486391955441</id><published>2010-09-19T13:15:00.000-07:00</published><updated>2010-09-20T09:52:24.910-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian jdbc dialect compatibility contributions'/><title type='text'>Adding support for a new database to mondrian</title><content type='html'>Mondrian supports a large number of back-end databases. So many in fact, that we rely on contributors to add the support, and we have tried to standardize the steps to support a new database.&lt;br /&gt;&lt;br /&gt;I thought it might be worthwhile to reiterate those steps. The steps are: &lt;br /&gt;&lt;br /&gt;1. Write a dialect class. The dialect must implement the &lt;a href="http://mondrian.pentaho.com/api/mondrian/spi/Dialect.html"&gt;Dialect&lt;/a&gt; interface, and will probably be a subclass of &lt;a href="http://mondrian.pentaho.com/api/mondrian/spi/impl/JdbcDialectImpl.html"&gt;JdbcDialectImpl&lt;/a&gt;. The only prerequisites are that your database has a JDBC driver and supports SQL-92: SELECT .. FROM .. JOIN .. GROUP BY. Mondrian will glean as much information as it can, such as how your database quotes identifiers that contain mixed-case or spaces, from the JDBC driver. But you will need to override methods to provide information that the JDBC does not provide (e.g. how to affect whether NULL values sort first or last) or if the JDBC driver lies.&lt;br /&gt;&lt;br /&gt;2. Add your dialect to the &lt;a href="http://p4web.eigenbase.org/open/mondrian-release/3.2/src/main/META-INF/services/mondrian.spi.Dialect"&gt;META-INF/services/mondrian.spi.Dialect&lt;/a&gt; file so that mondrian can find it.&lt;br /&gt;&lt;br /&gt;3. Modify MondrianFoodMartLoader and (optionally) bin/loadFoodMart.sh so that you can load data from demo/FoodMartData.sql into your database. (We prefer not to add a dump file for each database to the distro. The FoodMart data set is about 5MB compressed, so each dump file would bloat the size of the distro.)&lt;br /&gt;&lt;br /&gt;4. Get the test suite to pass. We strongly recommend that you focus on getting DialectTest to pass first; once the dialect is accurate, most other mondrian tests should just pass. If you need help, post your test output to the &lt;a href="http://lists.pentaho.org/mailman/listinfo/mondrian"&gt;mondrian developers email list&lt;/a&gt;; someone is likely to have seen the problem before, on another database.&lt;br /&gt;&lt;br /&gt;5. Add a section to mondrian.properties describing a typical connect string for your database. &lt;br /&gt;&lt;br /&gt;When these steps are complete, post the files in a JIRA case. I will add your database to the list of supported databases and mention it in the release notes of the next mondrian release. &lt;br /&gt;&lt;br /&gt;Then, please join the email list and stay in touch. We are not able to test all supported databases each release. When we announce a beta of a mondrian release, run the test suite against your database, and let us know if we've broken anything. Supporting a large array of databases is not hard, but it is even easier if we do it as a community.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-3277113486391955441?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/3277113486391955441/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=3277113486391955441' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3277113486391955441'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3277113486391955441'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/09/adding-support-for-new-database-to.html' title='Adding support for a new database to mondrian'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-3317305542365600733</id><published>2010-08-17T14:42:00.000-07:00</published><updated>2010-09-17T21:49:59.885-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian olap4j eigenbase luciddb pentaho beer'/><title type='text'>September world tour</title><content type='html'>I'm going to be at several conferences over the next month or so. I always like to meet up with people who are using &lt;a href="http://mondrian.pentaho.com/"&gt;mondrian&lt;/a&gt;, &lt;a href="http://www.olap4j.org/"&gt;olap4j&lt;/a&gt;, &lt;a href="http://www.pentaho.com/"&gt;Pentaho&lt;/a&gt; and &lt;a href="http://www.luciddb.org/"&gt;LucidDB&lt;/a&gt; to do open source BI, so put these on your schedule.&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_BVv0WTpeWTs/TGsxGm0ZB_I/AAAAAAAAAEk/rFPvY1l7rrM/s1600/P2009918_ji_IMG_0894.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="240" src="http://1.bp.blogspot.com/_BVv0WTpeWTs/TGsxGm0ZB_I/AAAAAAAAAEk/rFPvY1l7rrM/s320/P2009918_ji_IMG_0894.JPG" width="320" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Beer in Barcelona: Thomas Morgner, Matt Casters and others at Pentaho Community Meetup 2009.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;On Wednesday September 1, there is an &lt;a href="http://www.meetup.com/San-Francisco-Eigenbase-Developers/calendar/14311008/"&gt;Eigenbase Developers Meetup&lt;/a&gt; at &lt;a href="http://www.sqlstream.com/"&gt;SQLstream&lt;/a&gt;'s offices in San Francisco. I predict that it will be a reunion, of sorts, for people who worked on LucidDB at &lt;a href="http://en.wikipedia.org/wiki/LucidEra"&gt;LucidEra&lt;/a&gt;, but you will also hear about new features that Nick Goodman is adding as part of &lt;a href="http://www.dynamobi.com/"&gt;DynamoDB&lt;/a&gt; (including closer integration with Pentaho), John Sichi's work on &lt;a href="http://pub.eigenbase.org/wiki/FirewaterDistributedArchitecture"&gt;Firewater&lt;/a&gt; (a project to build a scalable shared-nothing database based on LucidDB) and contributions SQLstream developers are making to Eigenbase. And there will be beer.&lt;br /&gt;&lt;br /&gt;At 9.00pm on Tuesday September 21, &lt;a href="http://twitter.com/luclemagnifique"&gt;Luc Boudreau&lt;/a&gt; is giving a talk about olap4j at &lt;a href="http://java.dzone.com/articles/javaone-2010-accepted-talks"&gt;Java One&lt;/a&gt;. I will be in the audience, on hand to answer the questions that Luc considers beneath his dignity. This is a Birds of a Feather (BoF) talk, which means that it will be less formal than a usual conference talk, you can attend even if you have not paid to attend the conference, and yes, there will be beer.&lt;br /&gt;&lt;br /&gt;Lastly, I will be attending the 3rd annual &lt;a href="http://wiki.pentaho.com/display/COM/Pentaho+Community+Gathering+-+Portugal+2010"&gt;Pentaho Community Gathering&lt;/a&gt; on September 25-26 in Lisbon, Portugal, and giving a talk about what's new in Mondrian and open-source OLAP. Meetups in previous years have been great fun, a good chance to network, and a great chance to find out all of the great stuff that the Pentaho community is working with and extending the Pentaho BI suite. Even though Portugal is better known for producing wine, I wouldn't be surprised if at some point there was beer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-3317305542365600733?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/3317305542365600733/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=3317305542365600733' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3317305542365600733'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3317305542365600733'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/08/september-world-tour.html' title='September world tour'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_BVv0WTpeWTs/TGsxGm0ZB_I/AAAAAAAAAEk/rFPvY1l7rrM/s72-c/P2009918_ji_IMG_0894.JPG' height='72' width='72'/><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2314703459638413813</id><published>2010-08-02T17:58:00.000-07:00</published><updated>2011-07-20T12:09:51.359-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='olap4j palo &quot;sap bw&quot; ssas mondrian jedox pentaho'/><title type='text'>olap4j now talks to Palo and SAP BW</title><content type='html'>As &lt;a href="http://julianhyde.blogspot.com/2010/07/olap4j-heading-for-10.html"&gt;olap4j heads towards release 1.0&lt;/a&gt;, there are further signs that it is coming of age, in the form of drivers for the &lt;a href="http://en.wikipedia.org/wiki/Palo_(OLAP_database)"&gt;Palo MOLAP engine&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/SAP_NetWeaver_Business_Intelligence"&gt;SAP BW&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;A few days ago &lt;a href="http://twitter.com/pstoellberger/status/18983516133"&gt;Paul Stoellberger announced&lt;/a&gt; that olap4j's &lt;a href="http://www.xmla.org/"&gt;XMLA&lt;/a&gt; driver could connect to SAP BW, and posted pictures of PAT, PRD and PDI to prove it.&lt;br /&gt;&lt;br /&gt;And just a week later, Jedox CEO &lt;a href="http://www.paloinsider.com/palo/palo-talks-olap4j-finally/"&gt;Kristian Raue writes about how to connect to Palo&lt;/a&gt;. His post includes a blessedly short&amp;nbsp;Java program to do it.&amp;nbsp;Only one line of Kristian Raue's program — the connect string — would be different if the program were talking to &lt;a href="http://mondrian.pentaho.com/"&gt;Mondrian&lt;/a&gt;, &lt;a href="http://msdn.microsoft.com/en-us/library/bb522607.aspx"&gt;Microsoft SQL Server Analysis Services&lt;/a&gt; or SAP BW via XMLA.&lt;br /&gt;&lt;br /&gt;This is a success for both open standards and for open source software. Now applications built on &lt;a href="http://www.olap4j.org"&gt;olap4j&lt;/a&gt; have two open source OLAP engines — Palo and Mondrian — available to them, and can choose which is best according to the characteristics of their OLAP application.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;Clarification, 2011/7/20&lt;/b&gt;: Palo's engine is open source, but as Christian Warden points out in a comment to this post, their XMLA server is not. I was therefore incorrect to give the impression that olap4j can talk to Palo as part of a 100% open source stack.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Behind those open source projects are companies who need to show a profit. Palo is backed by &lt;a href="http://www.jedox.com/"&gt;Jedox&lt;/a&gt;, and Mondrian is backed by &lt;a href="http://www.pentaho.com/"&gt;Pentaho&lt;/a&gt;. Are the business people at those companies concerned that their engineers are working with each other, or that their customers now have a choice of OLAP engines? Not at all. The move makes the open source BI ecosystem stronger, and both companies benefit.&lt;br /&gt;&lt;br /&gt;Vendors who embrace open source and open standards are effectively saying, "We have built our platform on open standards. We know that if we don't live up to your expectations, you can just walk away. So we know that we have to remain the best platform for your application."&lt;br /&gt;&lt;br /&gt;Customers love to have choices, and Pentaho and Jedox are giving customers the greatest choice of all: the choice to walk away.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2314703459638413813?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2314703459638413813/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2314703459638413813' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2314703459638413813'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2314703459638413813'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/08/olap4j-now-talks-to-palo-and-sap-bw.html' title='olap4j now talks to Palo and SAP BW'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-1308386280460423815</id><published>2010-07-23T17:45:00.000-07:00</published><updated>2010-07-23T20:27:43.132-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian 4.0 xml schema unit test'/><title type='text'>mondrian heading for 4.0</title><content type='html'>I am currently working on mondrian-4.0 and am reworking a lot of mondrian's internals, particularly in how schemas are loaded and validated, and in the mapping of levels and measures onto star schemas. Regression tests for bugs and features are always important, but they are especially important right now.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Regression tests, now more than ever&lt;/h3&gt;Usually when I am changing mondrian, if I see a piece of logic in the code, I will try to preserve that logic, and will rework it if necessary when internal APIs change. That goes for other committers too. But I don't propose to do that for mondrian-4.0: the changes to the code are quite widespread, and besides, it's chance to clean out some of the cruft that has built up over the years.&lt;br /&gt;&lt;br /&gt;Yes, it's true. Even in mondrian's impeccably clean code &amp;mdash; almost 300,000 lines of it &amp;mdash; there are some pieces of code that we're not sure are actually used. My natural inclination as a developer is to remove that code, and see whether anything breaks. If that piece of code is a something you contributed but didn't write a test for, then nothing will break, and your code will be on the cutting room floor of history.&lt;br /&gt;&lt;br /&gt;So, if you have contributed a feature or bug fix to mondrian over the past years or months, make sure that there is a test case checked in as part of mondrian's regression test suite. I will do my best to make sure that the test case stays working, even if the code underneath is all different. &lt;b&gt;If there isn't a test, your beloved feature may just disappear in mondrian-4.0&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Schema changes, and compatibility&lt;/h3&gt;In mondrian-4.0 there will be changes to how schemas are structured. A few examples:&lt;ul&gt;&lt;li&gt;Virtual cubes will be obsolete, or more precisely, any cube can have multiple groups of measures, each group based on a separate fact table.&lt;/li&gt;&lt;li&gt;Linkages between snowflake tables, currently specified using the &lt;join&gt; element, will be specified using new &amp;lt;PhysicalSchema&amp;gt; element that declares table usages, relationships, and derived columns for the whole schema.&lt;/li&gt;&lt;li&gt;The present XML grammar isn't very forgiving if you get things in the wrong order: if you define a &amp;lt;NamedSet&amp;gt; before your &amp;lt;CalculatedMember&amp;gt;s in a cube, mondrian currently ignores all calculated members. The new XML grammar will be more forgiving.&lt;/li&gt;&lt;/ul&gt;I recently decided that the XML grammar is sufficiently different that I would create a new XML grammar. But &lt;a href="http://wiki.pentaho.com/display/analysis/Physical+Schema+Design+Discussion"&gt;I promised&lt;/a&gt; that mondrian would be backwards compatible, and I will stand by that. There will be a converter that will recognize an old-style schema, convert it to a new-style schema in memory, and then proceed to load the new-style schema. So, old-style schemas should continue to work. Mondrian-4.0&lt;br /&gt;&lt;br /&gt;In a few weeks I will be ready to release the specification for the new-style schemas. I would appreciate review of the new schemas. Since it is a major change, mondrian-4.0 will have a long beta phase. During which time I could use your help testing both new features and backwards compatibility.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-1308386280460423815?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/1308386280460423815/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=1308386280460423815' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1308386280460423815'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1308386280460423815'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/07/mondrian-heading-for-40.html' title='mondrian heading for 4.0'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-4728627971244287150</id><published>2010-07-22T15:49:00.000-07:00</published><updated>2010-07-22T15:49:03.342-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='olap4j beta production'/><title type='text'>olap4j heading for 1.0</title><content type='html'>Luc Boudreau this week &lt;a href="http://blog.devdonkey.org/?p=31"&gt;announced plans to take olap4j to version 1.0&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It is four years since I released the draft of the &lt;a href="http://www.olap4j.org/"&gt;olap4j API&lt;/a&gt;. People have joked that olap4j, like google mail, has been in "&lt;a href="http://en.wikipedia.org/wiki/Perpetual_beta"&gt;perpetual beta&lt;/a&gt;" since then. But olap4j's maturity belies its humble version number. It has been in use by production applications, is the foundation of several OLAP clients, and there are at least two drivers. (Mondrian's primary interface is its olap4j driver, and the XMLA driver has variants for Mondrian, Microsoft Analysis Services and SAP BW.)&lt;br /&gt;&lt;br /&gt;In software development (and particularly open source) culture, version 1.0 of an API is a symbolic milestone. It means that the API is stable, well tested, and will not be changed except at a major release, and then only with due consultation.&amp;nbsp;So, version 1.0 of olap4j will be something to celebrate, but before then, we need to undertake a review of what is in the API.&lt;br /&gt;&lt;br /&gt;Some parts of olap4j (such as the &lt;a href="http://www.olap4j.org/api/org/olap4j/query/Query.html"&gt;query model&lt;/a&gt;, &lt;a href="http://forums.pentaho.org/showthread.php?t=69327"&gt;advanced drill through&lt;/a&gt;, &lt;a href="http://julianhyde.blogspot.com/2009/06/cell-writeback-in-mondrian.html"&gt;cell write back&lt;/a&gt; and &lt;a href="http://julianhyde.blogspot.com/2010/06/olap-change-notification-and.html"&gt;notifications&lt;/a&gt;) are still under active development, and it is not in anyone's interests to freeze these parts of the API just yet. So, sections such as these will be marked 'experimental', and likely to change (with consultation of the community, as usual) in future.&lt;br /&gt;&lt;br /&gt;Whether you are an olap4j developer, part of the existing olap4j user community, or are just interested in using OLAP from within the Java without being tied to a particular vendor's API, please get involved in the review process.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-4728627971244287150?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/4728627971244287150/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=4728627971244287150' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4728627971244287150'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4728627971244287150'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/07/olap4j-heading-for-10.html' title='olap4j heading for 1.0'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-4814397995264524607</id><published>2010-06-29T18:08:00.000-07:00</published><updated>2010-06-29T18:08:52.794-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sql oracle db2 nosql &quot;one size fits all&quot;'/><title type='text'>SQL past and future</title><content type='html'>Ken North, writing in Dr. Dobb's Journal, gives a &lt;a href="http://www.drdobbs.com/blog/archives/2010/06/database_indust.html"&gt;nice overview of the long and storied history of SQL&lt;/a&gt;. The piece helps one understand the wave of mergers among the big database vendors, and make sense of current trends in database and database-like software. And I'd like to offer my opinion about where SQL and database management systems are headed.&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;North looks into the claims that 'the database is dead' and finds that — yet again — reports of its death were greatly exaggerated:&lt;/div&gt;&lt;blockquote&gt;&lt;i&gt;Forrester Research recently estimated the total database market (licenses, support, consulting) would grow from $27 billion in 2009 to $32 billion by 2012. SQL technology is entrenched in many organizations and across millions of web sites. Perhaps that explains why, during the past decade, IBM, Oracle, Sun and SAP made billion-dollar investments in a ‘dead’ technology.&lt;/i&gt;&lt;/blockquote&gt;However, I do believe that the relational database is currently in crisis. Relational databases have been the mainstay of data management for over twenty years, but Oracle and its cohorts have no answer for "&lt;a href="http://queue.acm.org/detail.cfm?id=1563874"&gt;Big Data&lt;/a&gt;", the massive onslaught of information from the web and sensors.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://en.wikipedia.org/wiki/NoSQL"&gt;NoSQL movement&lt;/a&gt; is solving these problems by challenging some of the assumptions held by RDBMS vendors. At SQLstream, we regard ourselves as part of the NoSQL movement even though we are huge fans of SQL, because we are challenging the biggest assumption of them all: that you have to put data on disk before you can analyze it.&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;It's a shame that North doesn't mention streaming SQL, because it fits perfectly into the grand arc of the SQL language: adopt new problems, express them declaratively, and solve them first with special-purpose database engines and finally by adapting the architecture of the big, general-purpose database engines. This last step sometimes takes many years to happen, but it happened for transaction processing, object database, and data warehousing, and I have no doubt that it will happen for streaming relational data.&lt;/div&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;One of the reasons that SQL has remained relevant is SQL standards process; products built on one database can be run on another database and, perhaps more important, skill sets acquired on one engine can be applied to another. When the dust settles, and the big databases have learned hard architectural lessons, I think a lot of these new problems will be solved in SQL.&amp;nbsp;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Unlike &lt;a href="http://www.cs.brown.edu/~ugur/fits_all.pdf"&gt;Mike Stonebraker&lt;/a&gt;, I do think that organizations will want to put all these different forms of data into one database management system. That database will of course be a facade spread over many servers, disks, data organizations and query processing engines, but will offer centralized management and allow the different forms of data to be combined. They will get their wish because the SQL language is so powerful at hiding differences in underlying data organization.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;When the dust has settled, the SQL language will have changed and adapted yet again, and maybe there will be some new names at the top of the roster of database vendors, but we will once again be solving most of our data management problems using declarative queries beginning with the word "SELECT".&amp;nbsp;SQL is dead; long live SQL!&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-4814397995264524607?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/4814397995264524607/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=4814397995264524607' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4814397995264524607'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4814397995264524607'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/06/sql-past-and-future.html' title='SQL past and future'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-3793229659862099473</id><published>2010-06-22T21:09:00.000-07:00</published><updated>2010-06-22T21:09:44.208-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='gigaom structureconf nosql bigdata sqlstream'/><title type='text'>The Data Tsunami: SQLstream at Structure 2010</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://wp.gigaom.com/assets/buttons/structure/attendee.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://wp.gigaom.com/assets/buttons/structure/attendee.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;At &lt;a href="http://www.sqlstream.com/"&gt;SQLstream&lt;/a&gt;, we have no doubt that streaming SQL and stream computing will be a key part of the next-generation enterprise infrastructure, but we are less certain how we fit into trends such as "&lt;a href="http://gigaom.com/2010/05/31/commercializing-big-data/"&gt;Big Data&lt;/a&gt;" and "&lt;a href="http://en.wikipedia.org/wiki/NoSQL"&gt;NoSQL&lt;/a&gt;".&lt;br /&gt;&lt;br /&gt;Taking the terms absolutely literally, we aren't either. We can't be "Big Data" because we do out damnedest to process data as fast as we receive it, in memory. I guess "Fast Data" would be a better word for what we do. And we can't be "NoSQL" because we're the biggest cheerleaders for industry-standard SQL you'll find anywhere outside Redwood Shores or Almaden.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;But we founded the company because we foresaw the onset of massive quantities of data, requiring efficiently-delivered low-latency results, and the impending failure (or at least faltering) of the classical database architecture to solve them. So, the forces driving us are the same forces driving Big Data and NoSQL.&lt;br /&gt;&lt;br /&gt;SQLstream CEO &lt;a href="http://events.gigaom.com/structure/10/speakers/#damian_black"&gt;Damian Black&lt;/a&gt; will be a appearing on the the "Big Data: Dealing with the Data Tsunami" panel at the&amp;nbsp;&lt;a href="http://events.gigaom.com/structure/10/"&gt;Structure 2010&lt;/a&gt; conference this week, so he took the occasion to write a &lt;a href="http://www.sqlstream.com/blog/2010/06/big-data-dealing-with-the-data-tsunami/"&gt;blog post to set out his views&lt;/a&gt; on the subject.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://events.gigaom.com/structure/10/files/2009/12/01.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://events.gigaom.com/structure/10/files/2009/12/01.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;Structure 2010 is an excellent conference, distinguished by the quality of its speakers and its attendees, and known for its intimate atmosphere. I will be in the audience both tomorrow and Thursday finding out what's new. Follow me on &lt;a href="http://twitter.com/julianhyde"&gt;twitter @julianhyde&lt;/a&gt; or even better, come and introduce yourself.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-3793229659862099473?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/3793229659862099473/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=3793229659862099473' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3793229659862099473'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3793229659862099473'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/06/data-tsunami-sqlstream-at-structure.html' title='The Data Tsunami: SQLstream at Structure 2010'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-501367300976370506</id><published>2010-06-17T14:46:00.000-07:00</published><updated>2010-06-18T09:56:40.296-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xmla microsoft olap4j &quot;native xml web services&quot; &quot;sql server 2008&quot;'/><title type='text'>Is Microsoft abandoning XML/A?</title><content type='html'>Microsoft are deprecating &lt;a href="http://technet.microsoft.com/en-us/library/cc280436.aspx"&gt;Native XML Web Services in SQL Server 2008&lt;/a&gt;, and if I understand the productmanagerese correctly, that means that they are abandoning &lt;a href="http://www.xmla.org/"&gt;XML for Analysis (XML/A)&lt;/a&gt; as an interface to Microsoft Analysis Services.&lt;br /&gt;&lt;br /&gt;(I may be mistaken. Can someone who is closer to Microsoft's roadmap clarify how OLAP applications on non-Windows systems are supposed to access Microsoft Analysis Services?)&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;UPDATE, 2010/6/18 09:50 PDT&lt;/b&gt;: It turns out that I am mistaken, and I received several comments pointing this out. The real story is that Microsoft is deprecating native XML access to SQL Server (the relational database, not the OLAP server). I have left the rest of the blog post as I originally wrote it, but please read it in the light of the new evidence.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;But if true, that would be a worrying development for those of us who want to build interoperable OLAP applications. (In particular, those with clients on non-Windows platforms such as Linux.)&lt;br /&gt;&lt;br /&gt;It's funny; Microsoft was the leading company pushing web services back in 2000. Everyone thought that it was too verbose a protocol for passing large data sets around, but Microsoft had a big problem to solve &amp;mdash; its aging &lt;a href="http://msdn.microsoft.com/en-us/library/ms809311.aspx"&gt;DCOM&lt;/a&gt; infrastructure &amp;mdash; and pushed it through.&lt;br /&gt;&lt;br /&gt;I was at a meeting in Redmond (in 1999, if memory serves) when Microsoft launched XMLA. The SQL Server OLAP Services team demoed the interoperability by showing a Java program running on Solaris (or possibly Linux... my memory is fading...) connecting to a Microsoft OLAP Services server. They joked that they could have been fired for having a non-Windows machine in the building. But nevertheless they made their point: XMLA was interoperable, and that was unprecedented among OLAP applications at the time.&lt;br /&gt;&lt;br /&gt;Soon afterwards Microsoft started using &lt;a href="http://sqlblog.com/blogs/mosha/archive/2005/12/02/analysis-services-2005-protocol-xmla-over-tcp-ip.aspx"&gt;compressed XML&lt;/a&gt; for its XMLA calls and responses, thereby reducing the problem. Unfortunately their compression technology was proprietary, so the rest of us had to carry on using uncompressed SOAP calls. It gave Microsoft's drivers an unfair advantage over other drivers attempting to talk to Analysis Services.&lt;br /&gt;&lt;br /&gt;It's ironic that Microsoft should abandon a standard that they created, and which has been astoundingly successful. I suspect that they have gotten tired of maintaining it when their own drivers use more efficient, proprietary wire protocols.&lt;br /&gt;&lt;br /&gt;If Microsoft is deprecating XMLA, I doubt that it will disappear for some years to come, but it is bound to be a concern for people building applications now that they intend to be running for several years.&lt;br /&gt;&lt;br /&gt;Of course, one thing people can do to insulate themselves from the future is to build their OLAP applications in Java using &lt;a href="http://www.olap4j.org"&gt;olap4j&lt;/a&gt;. Whatever net protocol Microsoft adopts to replace XMLA, we will keep the &lt;a href="http://www.olap4j.org/api/org/olap4j/driver/xmla/XmlaOlap4jDriver.html"&gt;olap4j driver for Analysis Services&lt;/a&gt; working, so you shouldn't need to change your application.&lt;br /&gt;&lt;br /&gt;Likewise, if you are building your application in JavaScript, consider using Roland Bouman's excellent &lt;a href="http://code.google.com/p/xmla4js"&gt;xmla4js&lt;/a&gt; library.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-501367300976370506?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/501367300976370506/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=501367300976370506' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/501367300976370506'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/501367300976370506'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/06/is-microsoft-abandoning-xmla.html' title='Is Microsoft abandoning XML/A?'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2216338124335761403</id><published>2010-06-15T23:21:00.000-07:00</published><updated>2010-06-15T23:21:11.998-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='olap4j streaming notification'/><title type='text'>OLAP change notification, and the CellSetListener API</title><content type='html'>There has been an interesting &lt;a href="https://sourceforge.net/projects/olap4j/forums/forum/577988/topic/3737252"&gt;design discussion on the olap4j forums&lt;/a&gt; about how an OLAP server could notify its client that the data set has changed. It is exciting because it would allow us to efficiently update OLAP displays in real-time.&lt;br /&gt;&lt;br /&gt;We came up with an API, at the center of which is &lt;a href="http://www.olap4j.org/head/api/org/olap4j/CellSetListener.html"&gt;the new interface CellSetListener&lt;/a&gt;, which I have just checked into &lt;a href="http://olap4j.svn.sourceforge.net/viewvc/olap4j?revision=319&amp;view=revision"&gt;olap4j's subversion repository&lt;/a&gt;. (The API is experimental. That means you shouldn't expect to find a working implementation just yet, or assume that the API won't change radically before it is finalized, but it does mean we are still very much open to suggestions for improvements.)&lt;br /&gt;&lt;br /&gt;Of course, OLAP notifications are a subject close to my heart, because they bring together my interests in &lt;a href="http://www.sqlstream.com"&gt;SQLstream&lt;/a&gt; and &lt;a href="http://mondrian.pentaho.org/"&gt;mondrian&lt;/a&gt;. 'Push-based' computing is challenging, because every link in the chain needs to propagate the events to the next link. In a previous post &lt;a href="http://julianhyde.blogspot.com/2008/02/streaming-sql-meets-olap.html"&gt;I described&lt;/a&gt; how SQLstream could do continuous ETL, populate fact and aggregate tables incrementally, and notify mondrian that data items in its cache were out of date.&lt;br /&gt;&lt;br /&gt;A mondrian implementation of the CellSetListener API would cause mondrian to internally re-evaluate all queries that have listeners and cover an affected area of the cache. If the results of those queries changed, mondrian would transmit those notifications to OLAP client applications such as &lt;a href="http://www.pentaho.com/products/analysis/"&gt;Pentaho Analyzer&lt;/a&gt; or &lt;a href="http://code.google.com/p/pentahoanalysistool/"&gt;PAT&lt;/a&gt;. The client application would then change the value of the cell on the screen, and maybe change the cell's background color momentarily to attract the user's attention.&lt;br /&gt;&lt;br /&gt;Getting data to change on the screen, in front of the end-user's eyes, within seconds of the data changing in the operational system, would be truly spectacular.&lt;br /&gt;&lt;br /&gt;There are several links in the chain to make that happen. Two of the links, SQLstream and &lt;a href="http://julianhyde.blogspot.com/2007/02/mondrian-cache-control.html"&gt;mondrian's cache control API&lt;/a&gt;, are already complete. We've just begun forging the next link.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2216338124335761403?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2216338124335761403/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2216338124335761403' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2216338124335761403'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2216338124335761403'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/06/olap-change-notification-and.html' title='OLAP change notification, and the CellSetListener API'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2334063592217190619</id><published>2010-06-01T11:27:00.000-07:00</published><updated>2010-06-01T11:27:54.944-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian explain plan mdx profiling'/><title type='text'>Adding profiling and query plans to mondrian</title><content type='html'>I've long wanted to add query plans and profiling to mondrian's execution engine.&lt;br /&gt;&lt;br /&gt;I just logged &lt;a href="http://jira.pentaho.com/browse/MONDRIAN-754"&gt;jira case MONDRIAN-754&lt;/a&gt; with my ideas for how it would work, and I thought it would be good to share it as a blog post so I can gather opinions. Most of the rest of this post is taken straight from the jira case. Note that we are not committing to implement this feature in any particular release. It's just an idea we are kicking around.&lt;br /&gt;&lt;br /&gt;Mondrian currently does not help you find out where the time is spent in executing a query, except for time spent in SQL. While most Mondrian queries are SQL-heavy, it would help to see the breakdown.&lt;br /&gt;&lt;br /&gt;Specifically: (a) it would increase understanding of the engine by showing the physical plan (Calc nodes) that mondrian has chosen (including caching of results, choice of iterator versus list representation of sets), (b) it would help identify problems where a particular leaf expression that generates SQL is executed repeatedly and generates very similar SQL statements (see e.g. &lt;a href="http://jira.pentaho.com/browse/MONDRIAN-723"&gt;MONDRIAN-723&lt;/a&gt;); and (c) it would help developers identify MDX functions taking longer than expected, or sub-optimal plans, and thereby tune mondrian.&lt;br /&gt;&lt;br /&gt;I propose to add a profiling mode with two levels.&lt;br /&gt;&lt;br /&gt;At the lower level, mondrian would print the plan of each MDX query as it executed it:&lt;br /&gt;&lt;br /&gt;Select(cube="[Cube]" mdx="with set [Foo xxx] AS [Foo].Children select non empty [Foo xxx] on 0, [Bar] * [Baz] on 1 from [Cube] where [Gender].[M]")&lt;br /&gt;= CalculatedSets&lt;br /&gt;=== CalculatedSet(name="Foo xxx", format="iterable")&lt;br /&gt;===== Children(format="list")&lt;br /&gt;======= HierarchyExpr(uniqueName="[Foo]")&lt;br /&gt;= FilterAxis&lt;br /&gt;=== MemberExpr(uniqueName="[Gender].[M]")&lt;br /&gt;= Axes&lt;br /&gt;=== Axis(ordinal="0", nonEmpty="true")&lt;br /&gt;===== SetExpr(name="[Foo xxx]", format="iterable")&lt;br /&gt;=== Axis(ordinal="1", nonEmpty="false")&lt;br /&gt;===== CrossJoin&lt;br /&gt;======= Call(function="{}")&lt;br /&gt;========= Call(function="CURRENTMEMBER")&lt;br /&gt;=========== HierarchyExpr(uniqueName="[Bar]")&lt;br /&gt;======= Call(function="{}")&lt;br /&gt;========= Call(function="CURRENTMEMBER")&lt;br /&gt;=========== HierarchyExpr(uniqueName="[Baz]")&lt;br /&gt;&lt;br /&gt;Format. I've used leading '=' to preserve indentation in this bug report. I would use spaces in this bug report. I'd use spaces in the actual feature. Or we could use XML.&lt;br /&gt;&lt;br /&gt;There isn't much difference between the physical plan and the MDX query because this is a simple example. Note that '[Foo]' has been expanded as if the user had written '{[Foo].CurrentMember}'. Differences in more complex plans include: constant reduction; introduction of Cache operator; choice of physical format (list, mutable list, iterator); adapters to change physical format (e.g. copy a list to make it mutable); pushdown of non-empty and other constraints to native SQL; strategies for evaluating named sets (first time, each time).&lt;br /&gt;&lt;br /&gt;Optionally each node could contain extra static information: the type (e.g. Integer, String, Numeric, Member(hierarchy='Store'), Set(Tuple(Member(hierarchy=[Store]), Member(level=[Time].[Year])))); format (list, mutable list, iterator); list of hierarchies an expression is dependent on (important for cached expressions).&lt;br /&gt;&lt;br /&gt;With the higher level of profiling, mondrian would gather information while the plan is running. The number of times a node is executed, and amount of time in that node and its children. From that we can also compute the amount of time in the node alone. At the end of execution, mondrian would print the plan tree again, with "count", "self" and "self+children" values attached to each node.&lt;br /&gt;&lt;br /&gt;Of course there is always an overhead to collecting profiling info. We would not recommend that people run production applications with profiling enabled. The question is always whether the numbers gathered from the profiled system are representative of the system running in its normal mode. Call count would be 100% accurate, and elapsed time should be within a few microseconds per call, so the profiling would serve its purpose.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2334063592217190619?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2334063592217190619/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2334063592217190619' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2334063592217190619'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2334063592217190619'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/06/adding-profiling-and-query-plans-to.html' title='Adding profiling and query plans to mondrian'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-1058569689454711553</id><published>2010-05-27T14:30:00.000-07:00</published><updated>2010-05-28T16:58:03.061-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='organic community baked box'/><title type='text'>Bite Club</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_BVv0WTpeWTs/S_7m7hM6lqI/AAAAAAAAAEY/I784eE-lFXc/s1600/photo.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 320px; height: 240px;" src="http://1.bp.blogspot.com/_BVv0WTpeWTs/S_7m7hM6lqI/AAAAAAAAAEY/I784eE-lFXc/s320/photo.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5476068107128313506" /&gt;&lt;/a&gt;A couple of our friends (actually former colleagues of Pamela's from &lt;a href="http://www.organicexchange.org/"&gt;Organic Exchange&lt;/a&gt;) have started a fabulous new business. It's modeled on the "&lt;a href="http://en.wikipedia.org/wiki/Vegetable_box_scheme"&gt;veggie box&lt;/a&gt;" schemes used by &lt;a href="http://en.wikipedia.org/wiki/Community_supported_agriculture"&gt;Community Supported Agriculture&lt;/a&gt; farms, except that rather than vegetables, the each week's box contains home-baked goods.&lt;br /&gt;&lt;br /&gt;We received the first box yesterday. It contained potato and spring pea pasties, a ginger and black pepper cake, millet muffins, and a dozen sesame cookies with Hawaiian sea salt.&lt;br /&gt;&lt;br /&gt;Everything was delicious. We're especially pleased with the savory items, because it's always a challenge to create simple, healthy family meals we can enjoy with our son. We're hoping that in future weeks the box will contain pot pies, quichen, samosas, and the like.&lt;br /&gt;&lt;br /&gt;Cindy and Terry don't have a web site yet. If you're in South Berkeley, drop me a line and I'll tell you how to sign up.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Update&lt;/em&gt;: I was wrong; they do have a site: it's &lt;a href="http://freshbitebaking.com/bite-club"&gt;http://freshbitebaking.com/bite-club&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-1058569689454711553?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/1058569689454711553/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=1058569689454711553' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1058569689454711553'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1058569689454711553'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/05/bite-club.html' title='Bite Club'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_BVv0WTpeWTs/S_7m7hM6lqI/AAAAAAAAAEY/I784eE-lFXc/s72-c/photo.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-4351830809253325735</id><published>2010-04-30T11:26:00.000-07:00</published><updated>2010-04-30T11:43:05.364-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='itsa traffic gps sensors'/><title type='text'>Intelligent Transportation in Houston</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.itsa.org/UserFiles/Image/AM2010LogoPage.jpg"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 306px; height: 99px;" src="http://www.itsa.org/UserFiles/Image/AM2010LogoPage.jpg" border="0" alt="" /&gt;&lt;/a&gt;I'm off to Houston for the &lt;a href="http://www.itsa.org/annualmeeting.html"&gt;annual meeting&lt;/a&gt; of the &lt;a href="http://www.itsa.org/"&gt;Intelligent Transportation Society of America (ITSA)&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;On Monday, I will be speaking on a panel on next-generation standards and architecture for Intelligent Transportation systems. Also present on that panel will be senior members of the U.S. Department of Transportation.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.sqlstream.com"&gt;SQLstream&lt;/a&gt; is helping to simplify how traffic information is collected, analyzed, and acted upon in real time. In one application, SQLstream is gathering information from a fleet of 7,000 vehicles with GPS sensors and using that information to build a real-time picture of a state-wide freeway system. That information allows government agencies to manage traffic flow, respond to incidents, and plan capacity.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_BVv0WTpeWTs/S9sixh_QgGI/AAAAAAAAAEQ/MxdO4US4xQY/s1600/motorway.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 259px;" src="http://4.bp.blogspot.com/_BVv0WTpeWTs/S9sixh_QgGI/AAAAAAAAAEQ/MxdO4US4xQY/s400/motorway.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5466000807076855906" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-4351830809253325735?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/4351830809253325735/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=4351830809253325735' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4351830809253325735'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4351830809253325735'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/04/intelligent-transportation-in-houston.html' title='Intelligent Transportation in Houston'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_BVv0WTpeWTs/S9sixh_QgGI/AAAAAAAAAEQ/MxdO4US4xQY/s72-c/motorway.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-7305307546595499126</id><published>2010-04-07T17:24:00.000-07:00</published><updated>2010-04-07T17:35:15.927-07:00</updated><title type='text'>MySQL user conference 2010</title><content type='html'>Are you at the MySQL conference this year? I'm giving a &lt;a href="http://en.oreilly.com/mysql2010/public/schedule/detail/13444"&gt;talk about SQLstream&lt;/a&gt; at 5.15pm on Wednesday 14th (a week from today), but as ever, I'll be wearing both my hats. I'm always ready &amp;amp; willing to chat about mondrian, Pentaho, streaming SQL, open source BI, and the future of intelligent life in the universe.&lt;br /&gt;&lt;br /&gt;If you're going to be at the show, let's see if we can hook up. Email me directly, or reach me on twitter &lt;a href="http://twitter.com/julianhyde"&gt;@julianhyde&lt;/a&gt;. If you want to try to randomly bump into me, first try the Pentaho booth, then try the bar...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-7305307546595499126?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/7305307546595499126/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=7305307546595499126' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7305307546595499126'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7305307546595499126'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/04/mysql-user-conference-2010.html' title='MySQL user conference 2010'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-5056075833326780784</id><published>2010-02-16T18:24:00.000-08:00</published><updated>2010-02-16T18:34:09.387-08:00</updated><title type='text'>Improved collections classes for Mondrian's query execution process</title><content type='html'>Mondrian calculations work predominantly over lists of members and tuples. Internally, Mondrian represents lists of members as List&lt;member&gt;, and represents lists of tuples as List&lt;member[]&gt;. (And similarly for iterators over members and tuples.)&lt;br /&gt;&lt;br /&gt;There are two problems with this. First, the representation of tuple lists requires an array to be allocated for each element of the list. Allocations cost time and memory. (Granted, we could allocate temporary arrays only when tuples are accessed, which would cost only time. But according to my latest round of profiling, the effort of allocating lots small arrays is significant.)&lt;br /&gt;&lt;br /&gt;Second, the code to deal with members and tuples has to be different. The most extreme example of this found in the implementation of CrossJoin. There are over 30 inner classes in CrossJoinFunDef.java, to deal with the permutations of iterator vs. list, mutable vs. immutable, and tuple vs. member.&lt;br /&gt;&lt;br /&gt;In short, the java standard List and Iterator classes are not serving us well. I think it's appropriate to introduce classes/interfaces that handle members and tuples more uniformly, and can store, access, and iterate over collections without lots of small arrays being created.&lt;br /&gt;&lt;br /&gt;Here are some collection classes that I think would serve the purpose:&lt;br /&gt;&lt;blockquote&gt;interface TupleList {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;int size();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;int arity();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Member getMember(int index, int ordinal);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;TupleIterator tupleIterator();&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;interface TupleIterator {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;int arity();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;boolean hasNext();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;// writes the members of the next tuple into given array&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;void next(Member[] members);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;// appends members of the next tuple to given list&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;void next(List&amp;lt;Member&amp;gt; members);&lt;br /&gt;}&lt;/blockquote&gt;If arity = 1 (i.e. if the list is just a collection of members) then TupleList could easily be implemented using java.util.ArrayList.&lt;br /&gt;&lt;br /&gt;For other arities, a list of tuples could be represented as a set of members end-to-end. For instance, the list with two 3-tuple elements {(A1, B1, C1), (A2, B2, C2)} would be held in a list {A1, B1, C1, A2, B2, C2} and getMember(index, ordinal) would read element index * arity + ordinal of the list.&lt;br /&gt;&lt;br /&gt;Introducing these would require quite a few code changes, mostly in the mondrian.olap.fun package, which is where the builtin functions are implemented. There should be no changes to the user API or olap4j.&lt;br /&gt;&lt;br /&gt;I am still debating whether this change makes sense. Usually this kind of penny-pinching architectural change doesn't pay off. But some of them pay off big. I've learned in Oracle, Broadbase, and SQLstream that for high-performance data processing you shouldn't be doing any memory allocations in an inner loop that is executed once per row. That isn't quite practical in Java, but it's a goal to strive for. In today's CPU architectures, where memory is slow and last-level-cache is fast, it pays to keep data contiguous.&lt;br /&gt;&lt;br /&gt;If you are a Mondrian developer, I'd be interested to hear what you think about this proposed change.&lt;/member[]&gt;&lt;/member&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-5056075833326780784?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/5056075833326780784/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=5056075833326780784' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5056075833326780784'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5056075833326780784'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/02/improved-collections-classes-for.html' title='Improved collections classes for Mondrian&apos;s query execution process'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2398723386094007458</id><published>2010-01-19T13:20:00.000-08:00</published><updated>2010-01-19T13:28:06.439-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='streaming sql database cep'/><title type='text'>Data in Flight</title><content type='html'>An article of mine, "&lt;a href="http://cacm.acm.org/magazines/2010/1/55738-data-in-flight/fulltext"&gt;Data in Flight&lt;/a&gt;," is published in this month's &lt;a href="http://cacm.acm.org/"&gt;Communications of the ACM&lt;/a&gt;. In it, I took the time to explain, in layman's terms, why I think streaming database technology is a game-changer.&lt;br /&gt;&lt;br /&gt;Many pundits have latched on to the term &lt;a href="http://en.wikipedia.org/wiki/Complex_event_processing"&gt;CEP (Complex Event Processing)&lt;/a&gt; to describe this technology. CEP is a legitimate and important application, and I believe that streaming SQL is a good way to solve it, but the article tries to put a bit of space between the two concepts. There are so many problems that benefit from the declarative, relational approach but where the data arrives incrementally and the problem can be solved much more efficiently by a streaming engine working (mainly) in memory than a database, and CEP is just one application area. My article describes a few of those problems.&lt;br /&gt;&lt;br /&gt;I'm all fired up about streaming databases, just as I was when I co-founded &lt;a href="http://www.sqlstream.com"&gt;SQLstream&lt;/a&gt;. I've worked in the database field for over 20 years, and I think it's the most exciting thing to happen in databases in a generation. (Yes, it's more important than data warehousing and, cough, object databases.)&lt;br /&gt;&lt;br /&gt;Streaming SQL technology is rapidly becoming part of the standard toolkit for solving data management problems. If you're not familiar with the technology, reading the article is a good way to come up to speed. Enjoy!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2398723386094007458?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2398723386094007458/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2398723386094007458' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2398723386094007458'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2398723386094007458'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/01/data-in-flight.html' title='Data in Flight'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-9002652898272515547</id><published>2010-01-14T12:54:00.000-08:00</published><updated>2010-01-14T13:28:48.833-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xmla javascript open source bi ajax'/><title type='text'>xmla4js</title><content type='html'>Yesterday I attended &lt;a href="http://wiki.pentaho.com/display/COM/January+13,+2010+-+Roland+Bouman+-+OLAP+and+Analysis+for+web+applications+using+XMLA"&gt;Roland Bouman's webinar introducing xmla4js&lt;/a&gt;. Xmla4js is a library for connecting to OLAP servers in JavaScript. All you need is an OLAP server that speaks XMLA (and most of them do).&lt;br /&gt;&lt;br /&gt;It's classic web 2.0 technology. It does virtually nothing, yet changes everything. There is very little code (about 20K), an old-school enterprise architect would regard it as a trivial piece of protocol glue, and yet it opens the door to all kinds of mashups. Those mashups will get powerful OLAP data into the hands, and onto the screens, of the business users who care about that data.&lt;br /&gt;&lt;br /&gt;Roland is a practical guy and a great communicator, so the presentation (and the &lt;a href="http://code.google.com/p/xmla4js/"&gt;download from google code&lt;/a&gt;) includes several examples of those mashups. I urge you to take a look at the recorded webinar.&lt;br /&gt;&lt;br /&gt;A few issues came up during the webinar that are worth mulling over.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;1. Query model&lt;/h3&gt;&lt;br /&gt;One thing that is missing is a query model. A query model allows you to represent the state of the current query, apply a transformation (say sorting on column #3, or adding a hierarchy to an axis), then generate a new MDX statement to send to the OLAP server. The demo Roland showed had a rudimentary query model, but in order to do more complex analyses, that query model will run out of road very quickly.&lt;br /&gt;&lt;br /&gt;It's a problem that xmla4js shares with &lt;a href="http://www.olap4j.org"&gt;olap4j&lt;/a&gt; (as the &lt;a href="http://code.google.com/p/pentahoanalysistool/"&gt;PAT&lt;/a&gt; developers know only too well). I'd like to find a way to pool resources.&lt;br /&gt;&lt;br /&gt;We could create a query model that works both on olap4j (in Java) and on xmla4js (in JavaScript). There would be two implementations, but at least the transformations can be specified in a language-neutral way, and we could write a single test suite that could exercise both implementations.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;2. Cube metadata&lt;/h3&gt;&lt;br /&gt;Roland bemoaned the fact that getting the metadata for a cube (including dimensions, hierarchies, levels, measures) takes several XMLA round-trips.&lt;br /&gt;&lt;br /&gt;He has a good point. Those round-trips may make the page load several times slower. We could easily extend Mondrian's cube metadata request so that you can ask for those extra elements.&lt;br /&gt;&lt;br /&gt;If it proves useful, other XMLA engines (such as &lt;a href="http://www.jedox.com/en/about-jedox/press-ordner/press-archive/archive/Palo-Open-Source-OLAP-Server-now-supports-MDX-and-Excel-Pivot-Tables.html"&gt;PALO&lt;/a&gt;) could do the same, and heck, if the &lt;a href="http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!1294.entry"&gt;XMLA council is not asleep in their castle&lt;/a&gt;, they could add it to the next version of the XMLA spec. (Well, we can hope.)&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;3. Results in JSON&lt;/h3&gt;&lt;br /&gt;Roland pointed out that XMLA is a verbose and inconvenient data format for JavaScript to consume. The "industry standard" for that environment is &lt;a href="http://www.json.org/"&gt;JSON&lt;/a&gt;. It has a similar attributes/values/nested sets structure to XML but is easier to parse: because it is syntactically valid JavaScript you just execute the JSON code to get the value.&lt;br /&gt;&lt;br /&gt;Mondrian's XMLA servlet is written to generate elements, attributes, and nested collections of elements, and precious little of the code directly generates XML. It wouldn't be too much work to generate JSON instead. The JSON would have the same structure as the XMLA, sans the irritating namespaces that are necessary in XML.&lt;br /&gt;&lt;br /&gt;For example, the JSON response from MDSCHEMA_CUBES could look like this:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;"DiscoverResponse": {&lt;br /&gt;&amp;nbsp;&amp;nbsp;"return": {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"root": {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"row": [&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"CATALOG_NAME": "FoodMart",&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"SCHEMA_NAME": "FoodMart",&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"CUBE_NAME": "HR",&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"CUBE_TYPE": "CUBE",&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"IS_DRILLTHROUGH_ENABLED": true,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"IS_WRITE_ENABLED": false,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"IS_LINKABLE": false,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"IS_SQL_ENABLED": false,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"DESCRIPTION": "FoodMart Schema - HR Cube"&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;},&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"CATALOG_NAME": "FoodMart",&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"SCHEMA_NAME": "FoodMart",&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"CUBE_NAME": "Sales",&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"CUBE_TYPE": "CUBE",&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"IS_DRILLTHROUGH_ENABLED": true,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"IS_WRITE_ENABLED": false,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"IS_LINKABLE": false,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"IS_SQL_ENABLED": false,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"DESCRIPTION": "FoodMart Schema - Sales Cube"&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;},&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;]&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;br /&gt;&amp;nbsp;&amp;nbsp;}&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;I'd like to hear Roland's (and the Mondrian, Pentaho, olap4j and PAT community's) take on these points. Thanks again Roland for an informative webinar and a great new addition to the open source BI technology stack.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-9002652898272515547?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/9002652898272515547/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=9002652898272515547' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/9002652898272515547'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/9002652898272515547'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2010/01/xmla4js.html' title='xmla4js'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-7659511255445702349</id><published>2009-12-08T12:14:00.000-08:00</published><updated>2009-12-08T12:47:28.653-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='open source BI survey COSS'/><title type='text'>Survey of open source business intelligence adoption</title><content type='html'>A &lt;a href="http://www.pentaho.com/news/releases/20091208_pentaho_dominates_osbi_adoption.php"&gt;new research report&lt;/a&gt; published by &lt;a href="http://www.b-eye-network.com/"&gt;BeyeNETWORK&lt;/a&gt; analyzes people's use of open source BI. It is gratifying to see Pentaho, and Pentaho projects Mondrian and Kettle, at the top of their categories. Weka, another Pentaho project, is a a narrow second to R for statistics/data mining.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_BVv0WTpeWTs/Sx651-Gvi3I/AAAAAAAAAEE/_xK7EFoJNiM/s1600-h/osbi-products.png"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px; height: 122px;" src="http://3.bp.blogspot.com/_BVv0WTpeWTs/Sx651-Gvi3I/AAAAAAAAAEE/_xK7EFoJNiM/s200/osbi-products.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5412968139001400178" /&gt;&lt;/a&gt;With 1,000 respondents in small, medium and large companies, this is not a small survey. Pentaho's dominance across several categories makes &lt;a href="http://www.itbusinessedge.com/cm/blogs/vizard/open-source-vendor-stakes-claim-to-bi-market-leadership/?cs=37916"&gt;Jaspersoft's recent claim to be the most widely deployed BI software&lt;/a&gt; pretty difficult to believe.&lt;br /&gt;&lt;br /&gt;Pentaho's strategy has been to build a suite of best-of-breed open source components, foster those components and their communities, and integrate them into a design-time and run-time platform. This report shows that this strategy is paying off.&lt;br /&gt;&lt;br /&gt;The paper covers the bad stuff (performance problem areas, barriers to adoption) as well as the good, and surveys the information sources that people found useful in troubleshooting and getting to successful deployments. If you are evaluating open source BI products, the paper is well worth a read.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-7659511255445702349?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/7659511255445702349/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=7659511255445702349' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7659511255445702349'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7659511255445702349'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/12/survey-of-open-source-business.html' title='Survey of open source business intelligence adoption'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_BVv0WTpeWTs/Sx651-Gvi3I/AAAAAAAAAEE/_xK7EFoJNiM/s72-c/osbi-products.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-1464049853761794320</id><published>2009-10-05T17:37:00.000-07:00</published><updated>2009-10-06T09:24:18.255-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pentaho analyzer jpivot pat open source olap viewer'/><title type='text'>Pentaho Analyzer</title><content type='html'>Pentaho today &lt;a href="http://www.pentaho.com/news/releases/20091005_pentaho_announces_strategic_technology_acquisition.php"&gt;announced a new OLAP viewer&lt;/a&gt;, called Pentaho Analyzer Enterprise Edition, based on LucidEra's ClearView component.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_BVv0WTpeWTs/SsqXldTlnwI/AAAAAAAAAD0/Pga8sxxd-zs/s1600-h/analyzer_table.png"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 320px; height: 256px;" src="http://3.bp.blogspot.com/_BVv0WTpeWTs/SsqXldTlnwI/AAAAAAAAAD0/Pga8sxxd-zs/s320/analyzer_table.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5389286573879959298" /&gt;&lt;/a&gt; This is great news for Pentaho customers, the community, and the BI world at large. While &lt;a href="http://www.pentaho.com/products/analysis/"&gt;Pentaho Analysis (Mondrian)&lt;/a&gt; is one of its strongest components, the current OLAP viewer (based on &lt;a href="http://jpivot.sourceforge.net/"&gt;JPivot&lt;/a&gt;) has been one of its weakest.&lt;br /&gt;&lt;br /&gt;The new viewer puts Pentaho at the top of the heap, in competition with best-of-breed OLAP viewers. It is designed to be intuitive for business users (yes, those people who don't speak MDX!), is built using the latest web technologies, and integrates seamlessly with Mondrian and the rest of the Pentaho suite.&lt;br /&gt;&lt;br /&gt;It is going to revolutionize the experience of using OLAP within the Pentaho suite.&lt;br /&gt;&lt;br /&gt;Naturally, &lt;a href="http://www.tholis.com/news/pentaho-quo-vadis-/"&gt;there are concerns&lt;/a&gt;. First, the new viewer is only part of Pentaho's Enterprise Edition (EE) suite. If Pentaho is committed to open source BI, why not release it open source? Second, what will happen to &lt;a href="http://code.google.com/p/pentahoanalysistool/"&gt;Pentaho Analysis Tool (PAT)&lt;/a&gt;, the successor to JPivot being developed by the Pentaho community? I'd like to take the opportunity to answer these concerns, because I think this is news that everyone should be celebrating.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;b&gt;Why is the new Analyzer not open source?&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There's been a lot of talk about open source business models, 'open core', good and evil, and all that. Releasing ClearView as part of Enterprise Edition is perfectly in sync with Pentaho's business model and with my intuitions about what makes sense for open source. Here's my rationale.&lt;br /&gt;&lt;br /&gt;If you release a piece of software open source out of sheer, 'I love the world!' altruism, you won't necessarily see much benefit. Pentaho is a for-profit business, and they are savvy about leveraging the benefits of open source software. And let's not kid ourselves, there are considerable downsides to releasing something open source. Your competitors can pick up the software and incorporate your hard work into their suite. And your customers may decide that the free version is so good that they aren't going to give you any of their money.&lt;br /&gt;&lt;br /&gt;Open source allows you to bring a component to a wider audience, an audience that will test, document and improve the component, and will support each other on the forums. Only the Community Edition (CE) components get that boost. Therefore, Pentaho's strategy is to release the core functionality in CE. That means the high-performance core of the system, the code paths that get run trillions of times an hour, and that means all the components that are necessary to build a functional and useful BI application.&lt;br /&gt;&lt;br /&gt;In particular, people ask me whether there is a high-performance 'Mondrian on steroids' in EE. No there isn't. None of us want to maintain alternative code-paths, because the extra complexity would slow down future development. If I were to create a performance optimization in EE, the community would probably replicate that optimization in CE within a few weeks. Improving the core Mondrian system for everyone brings more people into the community, and that brings more people to EE.&lt;br /&gt;&lt;br /&gt;And by the way, this doesn't just apply to the Pentaho Analysis part of the suite. Pentaho adds major new functionality to the suite each release, and most of that goes into open source components.&lt;br /&gt;&lt;br /&gt;So, what's left to go into EE? Bells and whistles, things that make the product easier to use, easier to manage, and things that make your boss want to reach for his or her checkbook. And of course support, releases that are certified and indemnified, and more regular. I don't think that's a bad deal, however you look at it.&lt;br /&gt;&lt;br /&gt;It also helps if the components are delivered under a business-friendly license like &lt;a href="http://www.gnu.org/copyleft/lesser.html"&gt;LGPL&lt;/a&gt; or &lt;a href="http://www.eclipse.org/legal/epl-v10.html"&gt;EPL&lt;/a&gt;. Otherwise you will not attract contributions from OEM vendors, who are the companies with the skills to extend components as complex as Mondrian or Pentaho Data Integration (Kettle). Once again, Pentaho is taking a risk by using business-friendly licenses, because there is always a chance that Pentaho's competitors will scoop up the fruits of its labors. (As in fact &lt;a href="http://www.jaspersoft.com/jasperanalysis"&gt;they do&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_BVv0WTpeWTs/SsqX_Vuif1I/AAAAAAAAAD8/X3VrSW3gIRo/s1600-h/analyzer_chart.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 320px; height: 256px;" src="http://4.bp.blogspot.com/_BVv0WTpeWTs/SsqX_Vuif1I/AAAAAAAAAD8/X3VrSW3gIRo/s320/analyzer_chart.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5389287018522115922" /&gt;&lt;/a&gt;But Pentaho's faith in the open source process pays off. ClearView is proof of that. If Mondrian had not been available under a business-friendly open source license, LucidEra would probably have written it on top of another vendor's engine, and Pentaho would not have been able to use it. Incidentally, LucidEra has contributed many important enhancements to Mondrian in areas of both performance and functionality over the past three years. This has improved Mondrian for everyone, and we know that ClearView performs very well against Mondrian.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;b&gt;What will happen to PAT?&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To restate what I said above, there is a network effect when you make a component open source. The more people that use a component, the more people are going to contribute to it. We want as many people to use Mondrian as possible, and in particular we want the right people to use it (the people who are going to make major improvements).&lt;br /&gt;&lt;br /&gt;So, for Mondrian's continuing health as an open source component, we need the Community Edition of Mondrian to be good enough to build business applications on. For that, we need to make PAT successful.&lt;br /&gt;&lt;br /&gt;I personally have been laying the ground work for PAT for a number of years. I spearheaded the &lt;a href="http://www.olap4j.org/"&gt;olap4j&lt;/a&gt; API, knowing that the community would be more likely to write the next generation OLAP viewer if it was guaranteed to be portable across OLAP engines. Then I kicked off the halogen project, a collaboration between Pentaho developers and the community to build a viewer using olap4j and GWT. Pentaho developers contributed code and user interface design to that project, even working in their spare time when the current Pentaho sprint used up all of their 'official' cycles. And the PAT project used the halogen code, and the knowledge of the halogen developers, as a starting point.&lt;br /&gt;&lt;br /&gt;It's not healthy to have too close a relationship between an OLAP server and viewer. There should always be room for competition, an opportunity to use a new viewer or (gasp!) different OLAP server if the 'standard' one isn't ideal. I created olap4j with competition in mind, and the experiment seems to be working: PAT can run against Mondrian's native interface, Mondrian's XMLA server, and against SQL Server Analysis Services via XMLA.&lt;br /&gt;&lt;br /&gt;I want to make it easier to build alternative front-ends on top of olap4j, so I have been encouraging PAT developers to contribute to olap4j's query model and library of transforms. I would like to see Analyzer move to olap4j internally (it currently uses Mondrian's native API), and perhaps migrate some of the logic in Analyzer to olap4j so that we can share the costs of maintaining it with the community.&lt;br /&gt;&lt;br /&gt;Lastly, as I realized at the recent community meetup in Barcelona, we have a great team, and we need to harness their energy. After a beer or two with PAT developers &lt;a href="http://twitpic.com/ia5go"&gt;Tom and Paul&lt;/a&gt;, some inspiring demos from &lt;a href="http://twitpic.com/ia1oi"&gt;Pedro and Daniel&lt;/a&gt;, we hatched ideas of incorporating spark lines and writeback into PAT, and I'm sure the ideas will keep on flowing. With this much inspiration and hard work coming from the community, how can we possibly fail?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-1464049853761794320?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/1464049853761794320/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=1464049853761794320' title='20 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1464049853761794320'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1464049853761794320'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/10/pentaho-analyzer.html' title='Pentaho Analyzer'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_BVv0WTpeWTs/SsqXldTlnwI/AAAAAAAAAD0/Pga8sxxd-zs/s72-c/analyzer_table.png' height='72' width='72'/><thr:total>20</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-7615067644962095988</id><published>2009-08-17T04:11:00.000-07:00</published><updated>2009-08-17T05:57:28.653-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='streaming sql social media rss twitter friendfeed facebook'/><title type='text'>What API should Facebook and FriendFeed use to publish the social stream?</title><content type='html'>&lt;a href="http://arstechnica.com/web/news/2009/08/stream-resistance-is-futile-facebook-assimilates-friendfeed.ars"&gt;Ars Technica reports that&lt;/a&gt; "&lt;i&gt;social networking giant Facebook has acquired FriendFeed. This deal reflects Facebook's growing fixation on the social stream, but it's hard to see how the two services will be merged. [...]&lt;br /&gt;&lt;br /&gt;[Facebook's] powerful but esoteric SQL-like query system all add up to a steep learning curve. By comparison, FriendFeed has a simple and elegant API that exposes a lot of information and is much more accommodating to developers.&lt;/i&gt;"&lt;br /&gt;&lt;br /&gt;It seems to me that streaming SQL is the correct solution to this problem. Not a SQL-like language, not an API (although you of course have to use an API to execute queries and get the results), and not just traditional SQL on finite relations, but SQL where streams are a first-class construct.&lt;br /&gt;&lt;br /&gt;I'm not a big believer in 'SQL-like' languages; they give SQL a bad name. Someone once said that the C programming language combines all the power of assembly language with all the ease-of-use of assembly language. The same could be said for 'SQL-like' languages: they tend offer limited capabilities of a fixed API, but you have to learn a new language to do so.&lt;br /&gt;&lt;br /&gt;Full SQL is difficult to implement because it must be possible to combine the relational operators (join, filter, union, project, and so forth), and other language features such as types and built-in operators, in any combination. Implementors often give up on this (what language designers call &lt;a href="http://en.wikipedia.org/wiki/Orthogonal#Computer_science"&gt;orthogonality&lt;/a&gt;), and what they get to is termed a SQL-like language. The full power of SQL only accrues when the implementor has implemented the whole language, and achieved orthogonality.&lt;br /&gt;&lt;br /&gt;Nor can the problem be solved particularly easily or efficiently using regular SQL, because every query is going to be of the form 'tell me what has changed since I last ran the query'. That kind of activity throws a conventional database into a tailspin.&lt;br /&gt;&lt;br /&gt;So, streaming SQL could solve this problem. Has anyone tried it?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-7615067644962095988?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/7615067644962095988/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=7615067644962095988' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7615067644962095988'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7615067644962095988'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/08/what-api-should-facebook-and-friendfeed.html' title='What API should Facebook and FriendFeed use to publish the social stream?'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-5656456503582779922</id><published>2009-07-29T00:12:00.000-07:00</published><updated>2009-07-29T02:18:10.878-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='twitter realtime web streaming sql'/><title type='text'>Twitter makes the realtime web look more like the old web</title><content type='html'>Twitter has a &lt;a href="http://search.twitter.com"&gt;new home page&lt;/a&gt;, in the time-honored style of a search engine home page. Claire Cain Miller &lt;a href="http://bits.blogs.nytimes.com/2009/07/28/twitter-plays-up-search-with-new-home-page/"&gt;writes in the New York Times&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;&lt;i&gt;It has become a cliché that first-time visitors to Twitter respond with some version of: "I don’t get it." [The new home page] tries to solve that problem.&lt;/i&gt;&lt;/blockquote&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_BVv0WTpeWTs/SnAFD_hMacI/AAAAAAAAADc/00URfIVDIyU/s1600-h/twitter.png"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 320px; height: 206px;" src="http://2.bp.blogspot.com/_BVv0WTpeWTs/SnAFD_hMacI/AAAAAAAAADc/00URfIVDIyU/s320/twitter.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5363792722346666434" /&gt;&lt;/a&gt;That problem is worth solving, but the home page is also an interesting sign of the melding of the old and the new.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Old web, new web&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The old web is that vast repository of content, ranked by how many people reference that content, and navigated by search engines such as Google. The new web is populated by dynamic content, where what happened in the last minute is much more important than what happened yesterday.&lt;br /&gt;&lt;br /&gt;The new, real-time web has been a wild frontier. There's a cachet to being a Twitter user; you're among pioneers, one of the elite who 'get it', not one of the ordinary folks. That's a problem for Twitter, because they need those millions of ordinary folks first to 'get it', then to get something useful out of it, come back, and start spending their click-through dollars.&lt;br /&gt;&lt;br /&gt;But harnessing the power of the real-time web is no easy problem. First of all, streaming content is a new paradigm. Facebook are doing well at introducing a lot of people to that idea of the ever-changing home page; Twitter's minimalist concept needs more getting used to, but the search engine front-end to the stream of chatter will surely help.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Second, you need different tools to convert the noise in Twitter other social media feeds into information. A search engine is not going to cut it. The new tools cannot work on the streaming data alone; they have to combine the new data with old, organize the data, and cluster the data with other data that is similar based on subject matter, geographical proximity, or proximity of users in the social network. The stream of content hurtling past our eyes looks like chatter, just noise, until we rank it, look for trends, and put it into historical context.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Old analytics, new analytics&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;I find it interesting because at &lt;a href="http://www.sqlstream.com"&gt;SQLstream&lt;/a&gt; we are dealing with a very similar problems for enterprise data. Business users would like to see the full spectrum of data, from right now to the distant past, but when making decisions, they want more recent data to carry more weight; they also want to take into account similarity of subject matter, geographical proximity, and the structure of social networks.&lt;br /&gt;&lt;br /&gt;Traditional analytic solutions use data warehouses, analogous to the old, static, web and its search engine guardians. A data warehouse treats all data equally, regardless of its age. There is so much data that it has to be stored on disk, and it takes several hours to organize that data, so while a typical data warehouse will contain data from five years ago until close of business yesterday, the most important data — what happened today — hasn't reached the data warehouse yet.&lt;br /&gt;&lt;br /&gt;SQLstream melds the old (the data warehouse) with the new (streaming events and transactions arriving over the wire), presenting a unified view via the SQL query language. We say that we "query the future", meaning that you can place standing queries that react when events of interest occur. These queries cache their working sets in memory, so the response time is a few milliseconds, and throughput tens or hundreds of thousands of records per second.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://julianhyde.blogspot.com/2008/02/streaming-sql-meets-olap.html"&gt;data sources that SQLstream can handle&lt;/a&gt; are diverse. Some of the data comes from traditional sources, like corporate transaction processing systems. Some sources are often considered too high-volume to process in a data warehouse, such as click-stream and system monitoring data. And there are &lt;a href="http://www.intelligententerprise.com/blog/archives/2008/12/bi_on_content_f.html"&gt;new sources like Twitter, social media, Atom and RSS feeds&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The problems of the real-time web and the real-time enterprise are surprisingly similar. Without tools to filter, aggregate, rank, and provide historical context, all of these data sources just look like noise and have little apparent value. At SQLstream, we are providing the tools to convert streams into valuable information.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-5656456503582779922?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/5656456503582779922/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=5656456503582779922' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5656456503582779922'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5656456503582779922'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/07/twitter-makes-realtime-web-look-more.html' title='Twitter makes the realtime web look more like the old web'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_BVv0WTpeWTs/SnAFD_hMacI/AAAAAAAAADc/00URfIVDIyU/s72-c/twitter.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-4528706684314092922</id><published>2009-07-26T10:11:00.000-07:00</published><updated>2009-07-29T19:40:24.924-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='limerick'/><title type='text'>An unfortunate fellow named Hyde</title><content type='html'>A limerick featuring my family name and ending with a pun. What could be better?&lt;br /&gt;&lt;br /&gt;An unfortunate fellow named Hyde&lt;br /&gt;fell down an outhouse and died.&lt;br /&gt;By mischance, his brother&lt;br /&gt;fell down another.&lt;br /&gt;And now they're interred side-by-side.&lt;br /&gt;&lt;br /&gt;(Reportedly due to Johnny Carson.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-4528706684314092922?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/4528706684314092922/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=4528706684314092922' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4528706684314092922'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4528706684314092922'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/07/unfortunate-fellow-named-hyde.html' title='An unfortunate fellow named Hyde'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-4237213732088786678</id><published>2009-07-23T12:45:00.000-07:00</published><updated>2009-07-23T12:54:54.085-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian jpivot pentaho analysis howto'/><title type='text'>Introduction to Pentaho Analysis</title><content type='html'>Joshua Tolley has written a nice &lt;a href="https://wiki.csinitiative.com/display/tri/Pentaho+Analysis+-+OLAP+How-To"&gt;step-by-step guide to using Pentaho Analysis&lt;/a&gt;. The tutorial is from the end-user's point of view, which of course is the most important perspective.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://wiki.csinitiative.com/download/attachments/2425482/OLAPnavigator.png?version=1&amp;modificationDate=1247768365022"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 247px; height: 405px;" src="https://wiki.csinitiative.com/download/attachments/2425482/OLAPnavigator.png?version=1&amp;modificationDate=1247768365022" border="0" alt="" /&gt;&lt;/a&gt;If you're interested in the back-end stuff, a couple of weeks ago Joshua also wrote a nice &lt;a href="http://blog.endpoint.com/2009/07/mdx.html"&gt;MDX primer&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-4237213732088786678?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/4237213732088786678/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=4237213732088786678' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4237213732088786678'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4237213732088786678'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/07/introduction-to-pentaho-analysis.html' title='Introduction to Pentaho Analysis'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-128806514313762280</id><published>2009-07-10T17:45:00.000-07:00</published><updated>2009-07-10T18:00:01.425-07:00</updated><title type='text'>Functional dependency optimizations in Mondrian</title><content type='html'>Eric McDermid just checked in a nice new feature into Mondrian which optimizes the SQL generated by MySQL. It takes advantage of the fact that in MySQL, if some of your GROUP BY columns are unique, you can leave the other columns out of the GROUP BY clause, and MySQL does less work.&lt;br /&gt;&lt;br /&gt;In some cases, a lot less work. MySQL implements GROUP BY by sorting, and since this reduces the volume of data being sorted, Eric reports significant performance improvements. Unfortunately it only works on MySQL, since MySQL is the database I know which has this feature.&lt;br /&gt;&lt;br /&gt;See the latest &lt;a href="http://p4webhost.eigenbase.org:8080/open/mondrian/doc/schema.html#Functional_dependency_optimizations"&gt;schema documentation&lt;/a&gt; for more details.&lt;br /&gt;&lt;br /&gt;I'll note that we reserve the right to change the syntax a little in future versions. In mondrian-4.0 we're adding physical schemas, which will include much more information about tables and relationships, so it would make sense to declare unique keys along with that. But rest assured that even if we do change the syntax, the feature will still be present.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-128806514313762280?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/128806514313762280/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=128806514313762280' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/128806514313762280'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/128806514313762280'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/07/functional-dependency-optimizations-in.html' title='Functional dependency optimizations in Mondrian'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-6307425355657845641</id><published>2009-06-30T16:04:00.000-07:00</published><updated>2009-06-30T16:46:02.870-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mozilla firefox 3.5 sqlstream'/><title type='text'>SQLstream powers Firefox 3.5 realtime downloads monitor</title><content type='html'>Mozilla launched &lt;a href="http://www.mozilla.com/firefox"&gt;Firefox 3.5&lt;/a&gt; today, and with it, a neat applet, powered by &lt;a href="http://www.sqlstream.com"&gt;SQLstream&lt;/a&gt;, to monitor downloads in real time.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_BVv0WTpeWTs/SkqjjCsFmtI/AAAAAAAAADM/UvqG-XUOhHA/s1600-h/heatmap.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 320px; height: 316px;" src="http://2.bp.blogspot.com/_BVv0WTpeWTs/SkqjjCsFmtI/AAAAAAAAADM/UvqG-XUOhHA/s320/heatmap.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5353270929495792338" /&gt;&lt;/a&gt;You can see the results at &lt;a href="http://downloadstats.mozilla.com"&gt;Mozilla's download stats page&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;A few weeks ago, &lt;a href="http://www.creativereview.co.uk/cr-blog/2009/june/apple-hyperwall"&gt;Apple's Hyperwall&lt;/a&gt; was awe-inspiring as a piece of visual art, but it was less impressive as a piece of real-time data integration, because the data was &lt;a href="http://www.appleinsider.com/articles/09/06/09/apple_stuns_wwdc_crowd_with_pulsating_app_store_hyperwall.html "&gt;delayed five minutes&lt;/a&gt; from the app store.&lt;br /&gt;&lt;br /&gt;SQLstream gathers data from Mozilla's download centers around the world, assigns each record a latitude and longitude, and summarizes the information in a continuously executing SQL query. Data is read with sub-second latencies, and then aggregated (using SQLstream's streaming GROUP BY operator) into summary records each describing a second of activity.&lt;br /&gt;&lt;br /&gt;A server-side Java program reads the data using JDBC, serializes it as JSON, and transmits it to all connected web clients. Clients render the charts using the Canvas tag, newly introduced in &lt;a href="http://en.wikipedia.org/wiki/HTML_5"&gt;HTML 5&lt;/a&gt;. The results are very impressive visually, but to a back-end guy like myself, the plumbing is impressive too.&lt;br /&gt;&lt;br /&gt;The amazing thing is that SQLstream makes this so easy. Our official company blurb talks about "shortening data integration projects from months to weeks", but this project took just a couple of days of work.&lt;br /&gt;&lt;br /&gt;By the way, don't try to view the page in Microsoft's Internet Explorer. Ten years ago, Internet Explorer led the charge to enhance the capabilities of the web browser, introducing dynamic HTML (DHTML), XML handling in the browser, ActiveX controls and other capabilities, but those days are over. With HTML 5 there is a renaissance in web standards; Firefox is leading the pack, with other 'modern' browsers such as &lt;a href="http://apple.com/safari"&gt;Safari&lt;/a&gt;, &lt;a href="http://www.opera.com"&gt;Opera&lt;/a&gt; and &lt;a href="http://www.google.com/chrome"&gt;Chrome&lt;/a&gt; not far behind.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-6307425355657845641?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/6307425355657845641/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=6307425355657845641' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6307425355657845641'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6307425355657845641'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/06/sqlstream-powers-firefox-35-realtime.html' title='SQLstream powers Firefox 3.5 realtime downloads monitor'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_BVv0WTpeWTs/SkqjjCsFmtI/AAAAAAAAADM/UvqG-XUOhHA/s72-c/heatmap.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-3633192392818034961</id><published>2009-06-28T13:24:00.000-07:00</published><updated>2009-06-28T13:40:18.394-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='vista fail'/><title type='text'>Black screen, white pointer on Vista</title><content type='html'>Last night I had a problem where Vista gives me a black screen and white pointer. You can move the pointer around, but you can't do anything useful (except, as noted below, press the Shift key 5 times). I tried restarting in safe mode, and I got a black screen with 'Safe mode' in each corner of the screen, but otherwise the same experience.&lt;br /&gt;&lt;br /&gt;I had a huge sinking feeling. I've had this problem twice before in the last twelve months. On the other two occasions, Dell technical support asked the usual questions for an hour or so, then told me to re-install Vista. An operating system that needs to be re-installed every 6 months is not a productive operating system, even if the operating system is great in between times. Which Vista isn't, anyway.&lt;br /&gt;&lt;br /&gt;Luckily, this time I found &lt;a href="http://social.technet.microsoft.com/forums/en-US/itprovistadesktopui/thread/193b7008-ce4b-4d03-acc3-b8d7ffe610d5/"&gt;this post at Microsoft TechNet&lt;/a&gt;. The problem was exactly as described in the post, and so was the cause (corrupted windows event log files) and the solution (rename the directory, or delete the event log files).&lt;br /&gt;&lt;br /&gt;I'm pleased to say I discovered the same hack that they did: press the shift key five times, which gives you the 'Do you want to turn on Sticky Keys?' dialog. (Yes, this is literally the ONLY meaningful interaction you can have with what is obviously an instance of Vista which is functioning but just not listening.) Then you click on the 'Go to the Ease of Access Center do disable the keyboard shortcut', and because this brings up an explorer window, you can then type in the address bar to launch all kinds of other commands.&lt;br /&gt;&lt;br /&gt;Thanks to &lt;a href="http://social.technet.microsoft.com/Profile/en-US/?user=towz"&gt;towz&lt;/a&gt; and others who posted to that forum; proving that even for Microsoft products, the crowd can sometimes provide better technical support than the professionals.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-3633192392818034961?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/3633192392818034961/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=3633192392818034961' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3633192392818034961'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3633192392818034961'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/06/black-screen-white-pointer-on-vista.html' title='Black screen, white pointer on Vista'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-5469642486750116035</id><published>2009-06-17T17:35:00.000-07:00</published><updated>2009-06-17T17:57:19.390-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='gis olap'/><title type='text'>Spatial OLAP using GeoMondrian</title><content type='html'>I received an email from Thierry Badard of Laval University, Québec:&lt;br /&gt;&lt;blockquote&gt;After the release of a new version of its open source spatial ETL tool, &lt;a href="http://www.geokettle.org/"&gt;GeoKettle&lt;/a&gt; yesterday (please see &lt;a href="http://geosoa.scg.ulaval.ca/en/index.php?module=announce&amp;amp;ANN_user_op=view&amp;amp;ANN_id=12"&gt;the announcement&lt;/a&gt; for more details), the &lt;a href="http://geosoa.scg.ulaval.ca/"&gt;GeoSOA research group at Laval University, Quebec, Canada&lt;/a&gt; is proud to announce the availibility as new open source projects of &lt;a href="http://www.geo-mondrian.org/"&gt;GeoMondrian&lt;/a&gt;, the first implementation of a Spatial OLAP (SOLAP) server and &lt;a href="http://www.spatialytics.org/"&gt;Spatialytics&lt;/a&gt;, a lightweight cartographic component which enables navigation in SOLAP data cubes.&lt;/blockquote&gt;&lt;blockquote&gt;GeoKettle, GeoMondrian and Spatialytics are components of the complete geospatial BI (Business Intelligence) software stack developed by the GeoSOA research group.&lt;br /&gt;&lt;/blockquote&gt;For some screenshots of the project, see &lt;a href="http://www.geobi.org/2009/03/georeport-pentaho-cdf-integration-has.html"&gt;Fabio D'Ovidio's blog&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;div&gt;This is the kind of news that makes me proud to have gotten involved in open source. I'm not an expert on spatial software, so I could never have written a spatial OLAP engine; I didn't even realize the need existed. I'm delighted that people who are experts in the field could build upon my efforts and all of the other people who have contributed to &lt;a href="http://mondrian.pentaho.org"&gt;Mondrian&lt;/a&gt; and &lt;a href="http://jpivot.sourceforge.net"&gt;JPivot&lt;/a&gt; over the years.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;&lt;div&gt;I hear that they are also involved in a &lt;a href="http://www.geobi.org/2009/06/geolap-news-from-gsoc-2009.html"&gt;Google Summer of Code (GSoC) project to integrate with Pentaho Community Dashboard Framework (CDF)&lt;/a&gt;. That makes a lot of sense.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;And of course I would be happy to receive contributions back to Mondrian if it makes it easier for them to maintain the code base.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I wish the GeoMondrian, GeoKettle and Spatialytics projects every success, and look forward to them bringing BI to a new audience.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-5469642486750116035?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/5469642486750116035/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=5469642486750116035' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5469642486750116035'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5469642486750116035'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/06/spatial-olap-using-geomondrian.html' title='Spatial OLAP using GeoMondrian'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2859338247682311979</id><published>2009-06-11T18:18:00.000-07:00</published><updated>2009-06-22T10:18:41.110-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian mdx writeback'/><title type='text'>Cell writeback in Mondrian</title><content type='html'>&lt;div&gt;Writeback is a feature that allows you to modify OLAP cell values and see the effects ripple through the data set, automatically modifying child and parent cells, and also cells derived using calculations. This allows you to perform 'what if' analysis and applications such as budgeting.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I have added experimental support for writeback to Mondrian.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In Mondrian's case, the term 'writeback' is a bit misleading. In a ROLAP system such as Mondrian, writing back to the database would be difficult, since values are stored in a fact table but we allow cells of any granularity to be modified. One modified cell might contain thousands of fact table rows. So, we don't write cells back to the database, but just retain the modified cells in memory, and propagate the modifications to related cells.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here's how to use the experimental writeback support. Some of the details may change later as we make the feature more usable.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;First, enable writeback for your cube. Create a dimension called 'Scenario', and a measure called 'Atomic Cell Count':&lt;/div&gt;&lt;blockquote&gt;&lt;pre&gt;&lt;div&gt;&amp;lt;Cube name='Sales'&amp;gt;&lt;/div&gt;&lt;div&gt;    &amp;lt;Dimension name='Scenario' foreignKey='time_id'&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;Hierarchy primaryKey='time_id' hasall='true'&amp;gt;&lt;/div&gt;&lt;div&gt;            &amp;lt;InlineTable alias='_dummy'&amp;gt;&lt;/div&gt;&lt;div&gt;                &amp;lt;ColumnDefs&amp;gt;&lt;/div&gt;&lt;div&gt;                    &amp;lt;ColumnDef name='foo' type='Numeric'/&amp;gt;&lt;/div&gt;&lt;div&gt;                &amp;lt;/ColumnDefs&amp;gt;&lt;/div&gt;&lt;div&gt;                &amp;lt;Rows/&amp;gt;&lt;/div&gt;&lt;div&gt;            &amp;lt;/InlineTable&amp;gt;&lt;/div&gt;&lt;div&gt;            &amp;lt;Level name='Scenario' column='foo'/&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;/Hierarchy&amp;gt;&lt;/div&gt;&lt;div&gt;    &amp;lt;/Dimension&amp;gt;&lt;/div&gt;&lt;div&gt;    &amp;lt;!-- Other dimensions... --&amp;gt;&lt;/div&gt;&lt;div&gt;   &amp;lt;Measure name='Atomic' aggregator='count'/&amp;gt;&lt;/div&gt;&lt;div&gt;    &amp;lt;!-- Other measures... --&amp;gt;&lt;/div&gt;&lt;div&gt;&amp;lt;/Cube&amp;gt;&lt;/div&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;div&gt;(Yes, this is a lot of crud to add to your cube definition, and it's temporary. In future, we will let you flag a cube as 'writeback enabled', and a [Scenario] dimension and [Atomic Cell Count] measure will be created automatically. Also, we will make it easier for you to create dimensions that have only calculated members, without resorting to inline tables.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Next, create a Scenario:&lt;/div&gt;&lt;blockquote&gt;&lt;pre&gt;&lt;div&gt;Connection connection;&lt;/div&gt;&lt;div&gt;Scenario scenario = connection.createScenario();&lt;/div&gt;&lt;div&gt;int scenarioId = scenario.getId();&lt;/div&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;div&gt;(The Scenario API will soon move to &lt;a href="http://www.olap4j.org/"&gt;olap4j&lt;/a&gt;: before mondrian-4.0, I hope. This includes the class &lt;code&gt;mondrian.olap.Scenario&lt;/code&gt;, the method &lt;code&gt;mondrian.olap.Cell.setValue()&lt;/code&gt;, and the method &lt;code&gt;mondrian.olap.Connection.createScenario()&lt;/code&gt;. It will be optional for an olap4j driver to support writeback, but Mondrian's olap4j driver will, of course.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Write a query that uses the scenario. Assuming that &lt;code&gt;scenarioId&lt;/code&gt; above was 1, the query&lt;/div&gt;&lt;blockquote&gt;&lt;pre&gt;SELECT [Measures].[Unit Sales] ON COLUMNS,&lt;div&gt;    {[Product],&lt;/div&gt;&lt;div&gt;     [Product].Children,&lt;/div&gt;&lt;div&gt;     [Product].[Drink].Children} ON ROWS&lt;/div&gt;&lt;div&gt;FROM [Sales]&lt;/div&gt;&lt;div&gt;WHERE [Scenario].[1]&lt;/div&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;div&gt;returns&lt;/div&gt;&lt;blockquote&gt;&lt;pre&gt;&lt;div&gt;[Product]                 [Unit Sales]&lt;/div&gt;&lt;div&gt;========================= ============&lt;/div&gt;&lt;div&gt;(All)                          266,773&lt;/div&gt;&lt;div&gt; + Drink                        24,597&lt;/div&gt;&lt;div&gt; +--+ Alcoholic Beverages        6,838&lt;/div&gt;&lt;div&gt; +--+ Beverages                 13,573&lt;/div&gt;&lt;div&gt; +--+ Dairy                      4,186&lt;/div&gt;&lt;div&gt; + Food                        191,940&lt;/div&gt;&lt;div&gt; + Non-Consumable               50,236&lt;/div&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;div&gt;Choose one of the cells returned from the query and modify its value. For example, let's reduce the sales of Drink by 1,000 from 24,597 to 23,597:&lt;/div&gt;&lt;blockquote&gt;&lt;pre&gt;&lt;div&gt;Result result = connection.executeQuery(...);&lt;/div&gt;&lt;div&gt;Cell cell = result.getCell(new int[] {0, 1});&lt;/div&gt;&lt;div&gt;cell.setValue(23597, AllocationPolicy.EQUAL_ALLOCATION);&lt;/div&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;div&gt;Execute the query again, and it returns&lt;/div&gt;&lt;blockquote&gt;&lt;pre&gt;&lt;div&gt;[Product]                 [Unit Sales]&lt;/div&gt;&lt;div&gt;========================= ============&lt;/div&gt;&lt;div&gt;(All)                          265,773&lt;/div&gt;&lt;div&gt; + Drink                        23,597&lt;/div&gt;&lt;div&gt; +--+ Alcoholic Beverages        6,563&lt;/div&gt;&lt;div&gt; +--+ Beverages                 12,990&lt;/div&gt;&lt;div&gt; +--+ Dairy                      4,043&lt;/div&gt;&lt;div&gt; + Food                        191,940&lt;/div&gt;&lt;div&gt; + Non-Consumable               50,236&lt;/div&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;div&gt;The value for Drink is 23,597, as expected, and the values of its children have been correspondingly reduced.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;How the value is allocated to the children (and in fact all descendants) is decided by the allocation policy. In this case, we specified EQUAL_ALLOCATION, which means that all atomic cells have the same value.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;An atomic cell is the finest grained value that can be viewed multidimensionally; for this cube, it is an instance of a particular customer buying a particular product, on a particular promotion, on a particular day, in a particular store. That makes for an awful lot of of atomic cells, but there may be fewer atomic cells than fact table rows. If the fact table does not have a primary key on (customer, product, time, promotion, store) some cells may have more than one fact table row. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If instead we had written&lt;/div&gt;&lt;blockquote&gt;&lt;pre&gt;&lt;div&gt;cell.setValue(23597, AllocationPolicy.EQUAL_INCREMENT);&lt;/div&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;div&gt;the query would have returned&lt;/div&gt;&lt;blockquote&gt;&lt;pre&gt;&lt;div&gt;[Product]                 [Unit Sales]&lt;/div&gt;&lt;div&gt;========================= ============&lt;/div&gt;&lt;div&gt;(All)                          265,773&lt;/div&gt;&lt;div&gt; + Drink                        23,597&lt;/div&gt;&lt;div&gt; +--+ Alcoholic Beverages        6,560&lt;/div&gt;&lt;div&gt; +--+ Beverages                 13,022&lt;/div&gt;&lt;div&gt; +--+ Dairy                      4,015&lt;/div&gt;&lt;div&gt; + Food                        191,940&lt;/div&gt;&lt;div&gt; + Non-Consumable               50,236&lt;/div&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;div&gt;We notice that Beverages has not been reduced as much under EQUAL_INCREMENT policy than EQUAL_ALLOCATION policy; the average value for atomic cells of Beverages must be greater than for Drink as a whole.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Allocation policies are defined consistent with Analysis Services' &lt;a href="http://technet.microsoft.com/en-us/library/ms145488.aspx"&gt;UPDATE CUBE statement&lt;/a&gt;. Mondrian does not currently implement WEIGHTED_ALLOCATION or WEIGHTED_INCREMENT policies.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Treating scenarios as a dimension is an elegant and powerful idea. Using the Scenario dimension, you can easily switch from one scenario to another, or you can compare scenarios side-by-side.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Note that you can also set a connection's current scenario. This effectively becomes the default value for the Scenario dimension in that connection, so you do not need to specify Scenario in the slicer. However, there still needs to be an explicit scenario in the context when you call &lt;code&gt;Cell.setValue()&lt;/code&gt;. I'm not sure whether the benefit of having a scenario for a connection outweighs the benefit/confusion, and we may discontinue this feature.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Remember, this is still an experimental feature. There is some cleanup to be done, some performance tuning, and the API needs to be moved into olap4j. But most importantly, it's not useful until a user interface, such as PAT or JPivot, supports scenarios and modifying cell values.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2859338247682311979?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2859338247682311979/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2859338247682311979' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2859338247682311979'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2859338247682311979'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/06/cell-writeback-in-mondrian.html' title='Cell writeback in Mondrian'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-6088042116573288868</id><published>2009-05-26T16:12:00.000-07:00</published><updated>2009-05-26T18:17:21.793-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='webinar mysql mondrian continuous data integration'/><title type='text'>Webinar: Eliminating MySQL Bottlenecks and Replication Issues using Real-Time Queries &amp; Continuous ETL</title><content type='html'>Would you like to find out how to build a continuous ETL process integrating source systems, MySQL data warehouse, and Mondrian OLAP engine?&lt;br /&gt;&lt;br /&gt;&lt;div&gt;I'm going to be hosting a webinar tomorrow describing how to do this using SQLstream. (Basically a repeat of the webinar I gave at the MySQL conference this year, but many of you missed it.)&lt;br /&gt;&lt;br /&gt;Join me and Damian Black, CEO of SQLstream, on the webinar at 11am PDT/2pm EDT tomorrow, Wednesday 27th May. To register for the webinar, visit &lt;a href="https://www2.gotomeeting.com/register/668399275"&gt;https://www2.gotomeeting.com/register/668399275&lt;/a&gt;.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-6088042116573288868?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/6088042116573288868/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=6088042116573288868' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6088042116573288868'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6088042116573288868'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/05/webinar-eliminating-mysql-bottlenecks.html' title='Webinar: Eliminating MySQL Bottlenecks and Replication Issues using Real-Time Queries &amp; Continuous ETL'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-6648165930275572690</id><published>2009-05-19T17:42:00.000-07:00</published><updated>2009-05-19T18:15:10.215-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian physical schema bnf xsd clapham'/><title type='text'>Explaining the structure of Mondrian schemas</title><content type='html'>There are some &lt;a href="http://wiki.pentaho.com/display/analysis/Physical+Schema+Design+Discussion"&gt;major schema changes coming in Mondrian 4.0&lt;/a&gt;, and I'm writing up specifications for these so that everyone knows what's coming and has chance to influence it.&lt;br /&gt;&lt;br /&gt;But before I do that, I thought I'd try to improve how we describe the structure of XML schemas in the present release, just a bit. I have tried a couple of things. First, I created an XML skeleton that shows which elements can occur inside which other elements:&lt;br /&gt;&lt;br /&gt;&lt;blockquote style="text-indent: -20px"&gt;    &lt;code&gt;        &lt;div style="padding-left:20px"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Schema"&gt;Schema&lt;/a&gt;&amp;gt;&lt;/div&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Cube"&gt;Cube&lt;/a&gt;&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Table"&gt;Table&lt;/a&gt;&amp;gt;&lt;/div&gt;                    &lt;div style="padding-left:80px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_AggName"&gt;AggName&lt;/a&gt;&amp;gt;&lt;/div&gt;                        &lt;div style="padding-left:100px;"&gt;aggElements&lt;/div&gt;                    &lt;div style="padding-left:80px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_AggPattern"&gt;AggPattern&lt;/a&gt;&amp;gt;&lt;/div&gt;                        &lt;div style="padding-left:100px;"&gt;aggElements&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Dimension"&gt;Dimension&lt;/a&gt;&amp;gt;&lt;/div&gt;                    &lt;div style="padding-left:80px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Hierarchy"&gt;Hierarchy&lt;/a&gt;&amp;gt;&lt;/div&gt;                        &lt;div style="padding-left:100px;"&gt;relation&lt;/div&gt;                        &lt;div style="padding-left:100px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Closure"&gt;Closure&lt;/a&gt;/&amp;gt;&lt;/div&gt;                        &lt;div style="padding-left:100px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Level"&gt;Level&lt;/a&gt;&amp;gt;&lt;/div&gt;                            &lt;div style="padding-left:120px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_KeyExpression"&gt;KeyExpression&lt;/a&gt;&amp;gt;&lt;/div&gt;                                &lt;div style="padding-left:140px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_SQL"&gt;SQL&lt;/a&gt;/&amp;gt;&lt;/div&gt;                            &lt;div style="padding-left:120px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_NameExpression"&gt;NameExpression&lt;/a&gt;&amp;gt;&lt;/div&gt;                                &lt;div style="padding-left:140px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_SQL"&gt;SQL&lt;/a&gt;/&amp;gt;&lt;/div&gt;                            &lt;div style="padding-left:120px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_CaptionExpression"&gt;CaptionExpression&lt;/a&gt;&amp;gt;&lt;/div&gt;                                &lt;div style="padding-left:140px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_SQL"&gt;SQL&lt;/a&gt;/&amp;gt;&lt;/div&gt;                            &lt;div style="padding-left:120px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_OrdinalExpression"&gt;OrdinalExpression&lt;/a&gt;&amp;gt;&lt;/div&gt;                                &lt;div style="padding-left:140px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_SQL"&gt;SQL&lt;/a&gt;/&amp;gt;&lt;/div&gt;                            &lt;div style="padding-left:120px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_ParentExpression"&gt;ParentExpression&lt;/a&gt;&amp;gt;&lt;/div&gt;                                &lt;div style="padding-left:140px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_SQL"&gt;SQL&lt;/a&gt;/&amp;gt;&lt;/div&gt;                            &lt;div style="padding-left:120px;"&gt;&amp;lt;&lt;a  href="http://mondrian.pentaho.org/documentation/schema.php#XML_Property"&gt;Property&lt;/a&gt;&amp;gt;&lt;/div&gt;                                &lt;div style="padding-left:140px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_PropertyExpression"&gt;PropertyExpression&lt;/a&gt;&amp;gt;&lt;/div&gt;                                    &lt;div style="padding-left:160px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_SQL"&gt;SQL&lt;/a&gt;/&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_DimensionUsage"&gt;DimensionUsage&lt;/a&gt;&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Measure"&gt;Measure&lt;/a&gt;&amp;gt;&lt;/div&gt;                    &lt;div style="padding-left:80px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_MeasureExpression"&gt;MeasureExpression&lt;/a&gt;&amp;gt;&lt;/div&gt;                        &lt;div style="padding-left:100px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_SQL"&gt;SQL&lt;/a&gt;/&amp;gt;&lt;/div&gt;                    &lt;div style="padding-left:80px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_MemberProperty"&gt;CalculatedMemberProperty&lt;/a&gt;/&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_CalculatedMember"&gt;CalculatedMember&lt;/a&gt;&amp;gt;&lt;/div&gt;                    &lt;div style="padding-left:80px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Formula"&gt;Formula&lt;/a&gt;/&amp;gt;&lt;/div&gt;                    &lt;div style="padding-left:80px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_MemberProperty"&gt;CalculatedMemberProperty&lt;/a&gt;/&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_NamedSet"&gt;NamedSet&lt;/a&gt;&amp;gt;&lt;/div&gt;                    &lt;div style="padding-left:80px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Formula"&gt;Formula&lt;/a&gt;/&amp;gt;&lt;/div&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_VirtualCube"&gt;VirtualCube&lt;/a&gt;&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_CubeUsages"&gt;CubeUsages&lt;/a&gt;&amp;gt;&lt;/div&gt;                    &lt;div style="padding-left:80px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_CubeUsage"&gt;CubeUsage&lt;/a&gt;&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_VirtualCubeDimension"&gt;VirtualCubeDimension&lt;/a&gt;&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_VirtualCubeMeasure"&gt;VirtualCubeMeasure&lt;/a&gt;&amp;gt;&lt;/div&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Role"&gt;Role&lt;/a&gt;&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_SchemaGrant"&gt;SchemaGrant&lt;/a&gt;&amp;gt;&lt;/div&gt;                    &lt;div style="padding-left:80px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_CubeGrant"&gt;CubeGrant&lt;/a&gt;&amp;gt;&lt;/div&gt;                        &lt;div style="padding-left:100px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_HierarchyGrant"&gt;HierarchyGrant&lt;/a&gt;&amp;gt;&lt;/div&gt;                            &lt;div style="padding-left:120px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_MemberGrant"&gt;MemberGrant&lt;/a&gt;/&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Union"&gt;Union&lt;/a&gt;&amp;gt;&lt;/div&gt;                    &lt;div style="padding-left:80px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_RoleUsage"&gt;RoleUsage&lt;/a&gt;/&amp;gt;&lt;/div&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_UserDefinedFunction"&gt;UserDefinedFunction&lt;/a&gt;/&amp;gt;&lt;/div&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Parameter"&gt;Parameter&lt;/a&gt;/&amp;gt;&lt;/div&gt;        &lt;br/&gt;        relation ::=&lt;br/&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Table"&gt;Table&lt;/a&gt;&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_SQL"&gt;SQL&lt;/a&gt;/&amp;gt;&lt;/div&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_View"&gt;View&lt;/a&gt;&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_SQL"&gt;SQL&lt;/a&gt;/&amp;gt;&lt;/div&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_InlineTable"&gt;InlineTable&lt;/a&gt;&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_ColumnDefs"&gt;ColumnDefs&lt;/a&gt;&amp;gt;&lt;/div&gt;                    &lt;div style="padding-left:80px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_ColumnDef"&gt;ColumnDef&lt;/a&gt;&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Rows"&gt;Rows&lt;/a&gt;&amp;gt;&lt;/div&gt;                    &lt;div style="padding-left:80px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Row"&gt;Row&lt;/a&gt;&amp;gt;&lt;/div&gt;                        &lt;div style="padding-left:100px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Value"&gt;Value&lt;/a&gt;&amp;gt;&lt;/div&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_Join"&gt;Join&lt;/a&gt;&amp;gt;&lt;/div&gt;                &lt;div style="padding-left:60px;"&gt;relation&lt;/div&gt;        &lt;br/&gt;        aggElement ::=&lt;br/&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_AggExclude"&gt;AggExclude&lt;/a&gt;&amp;gt;&lt;/div&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_AggFactCount"&gt;AggFactCount&lt;/a&gt;&amp;gt;&lt;/div&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_AggIgnoreColumn"&gt;AggIgnoreColumn&lt;/a&gt;&amp;gt;&lt;/div&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_AggForeignKey"&gt;AggForeignKey&lt;/a&gt;&amp;gt;&lt;/div&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_AggMeasure"&gt;AggMeasure&lt;/a&gt;&amp;gt;&lt;/div&gt;            &lt;div style="padding-left:40px;"&gt;&amp;lt;&lt;a href="http://mondrian.pentaho.org/documentation/schema.php#XML_AggLevel"&gt;AggLevel&lt;/a&gt;&amp;gt;&lt;/div&gt;    &lt;/code&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;You can see the full version in the &lt;a href="http://mondrian.pentaho.org/documentation/schema.php#Schema_files"&gt;Mondrian schema guide&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This approach shows where things are located, but it doesn't show how many of each element can belong to a particular parent element, or the order in which they are required. So, I wrote up a small &lt;a href="http://p4webhost.eigenbase.org:8080/open/mondrian/doc/schema.bnf"&gt;BNF grammar&lt;/a&gt; and used &lt;a href="http://clapham.hydromatic.net"&gt;Clapham&lt;/a&gt; to generate a &lt;a href="http://clapham.hydromatic.net/mondrian-3.1-bnf/"&gt;railroad diagram&lt;/a&gt;. For comparison, the &lt;a href="http://clapham.hydromatic.net/mondrian-4.0-bnf/"&gt;railroad diagram for the work-in-progress mondrian-4.0 schema is here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-6648165930275572690?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/6648165930275572690/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=6648165930275572690' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6648165930275572690'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6648165930275572690'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/05/explaining-structure-of-mondrian.html' title='Explaining the structure of Mondrian schemas'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-6355218775713867132</id><published>2009-05-11T01:29:00.000-07:00</published><updated>2009-05-11T02:07:43.057-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bnf javacc parser grammar generator'/><title type='text'>Clapham: A railroad diagram generator</title><content type='html'>I don't work with the Oracle database very much anymore, and one thing I miss is their server documentation. I still have my old copy of the Oracle 7.3 SQL Language Reference, and sometimes I reach for it when the SQL:2008 standard has fuddled my brain and I want to be reassured that SQL can be simple, powerful and trustworthy. The calming effect is partly due to the authoritative tone, but the railroad diagrams&lt;span style="text-decoration: underline;"&gt;&lt;/span&gt;&lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_7002.htm"&gt;&lt;/a&gt; describing the syntax of each command say 'Don't worry'.&lt;br /&gt;&lt;br /&gt;For example, here is &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_7002.htm"&gt;Oracle 10.2's CREATE TABLE&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/img/relational_table.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 596px; height: 218px;" src="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/img/relational_table.gif" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Yes, railroad diagrams. You can easily get lost in something as large as the SQL language, with its hundreds of commands, keywords and unexpected clauses, and railroad diagrams are the map.&lt;br /&gt;&lt;br /&gt;When it came to writing our documentation for &lt;a href="http://www.sqlstream.com/"&gt;SQLstream&lt;/a&gt;, we of course wanted to include railroad diagrams to illustrate our dialect of SQL. It's possible to construct the diagrams by hand, but it's tedious, error prone, and it's difficult to get the diagrams to look consistent. Unbelievably, we couldn't find a tool to generate them, so we ended up writing them by hand.&lt;br /&gt;&lt;br /&gt;Now I've gotten a little breathing room after the release of SQLstream 2.0, I took a couple of days to write an open-source railroad diagram generator. I've released it on &lt;a href="http://sourceforge.net/projects/clapham"&gt;Sourceforge, and named it Clapham&lt;/a&gt;, after the South London town which is home to the most complicated railway junction you ever saw.&lt;br /&gt;&lt;br /&gt;This has been a nice return to old-school open source, with its mantras "release early, release often"; and "don't whine: contribute". The diagrams aren't yet as pretty as Oracle's, but we're getting there. Even though this is the very first release, and the project is barely alpha, it has already &lt;a href="http://clapham.hydromatic.net/farrago/"&gt;generated charts for LucidDB's not inconsiderable SQL grammar&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;More details at the &lt;a href="http://clapham.hydromatic.net/"&gt;home page&lt;/a&gt;, and you can &lt;a href="https://sourceforge.net/project/showfiles.php?group_id=243703&amp;amp;package_id=297002&amp;amp;release_id=681840"&gt;download release clapham-0.1.003 from SourceForge&lt;/a&gt;. Contributions welcome, of course.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-6355218775713867132?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/6355218775713867132/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=6355218775713867132' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6355218775713867132'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6355218775713867132'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/05/clapham-railroad-diagram-generator.html' title='Clapham: A railroad diagram generator'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>15</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2590586374515504298</id><published>2009-04-30T10:43:00.000-07:00</published><updated>2009-04-30T11:22:19.772-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian cache calculated members'/><title type='text'>How Mondrian evaluates expressions</title><content type='html'>When it comes to expression evaluation, Mondrian keeps things simple. It doesn't tend to cache the results of expressions, but calculates them each time they are evaluated. Eventually the calculation tunnels through all calculated members and ends up at an atomic cell. Atomic cells are retrieved from the database, and stored in the cell value cache, so they are only calculated once.&lt;br /&gt;&lt;br /&gt;By the way, an atomic cell is not necessarily at the lowest level of the hierarchy; Mondrian would prefer to load cells at a coarse granularity, and leave the hard work of aggregating values to the database, or even better, an aggregate table. And Mondrian does its best to retrieve atomic cells in batches. It gathers together requests for lots of cells of the same granularity and generates a single SQL statement to retrieve them all at once.&lt;br /&gt;&lt;br /&gt;Mondrian's 'keep it simple' scheme comes unstuck when a particular calculation is repeated many times over. Nick Goodman came up with a classic example of this in &lt;a href="http://jira.pentaho.org:8080/browse/MONDRIAN-552"&gt;bug MONDRIAN-552&lt;/a&gt;. The query is as follows:&lt;br /&gt;&lt;blockquote&gt;&lt;pre&gt;with member [Measures].[Profit Change] as&lt;br /&gt;   ([Measures].[Profit], [Time].CurrentMember)&lt;br /&gt;   - ([Measures].[Profit], [Time].PrevMember)&lt;br /&gt;member [Measures].[Running Total] as&lt;br /&gt;   ([Measures].[Profit], [Time].CurrentMember)&lt;br /&gt;   + ([Measures].[Running Total], [Time].PrevMember)&lt;br /&gt;member [Measures].[Average Daily Running Total] as&lt;br /&gt;   Avg(&lt;br /&gt;       Descendants(&lt;br /&gt;           [Time].CurrentMember, [Time.Weekly].[Day])&lt;br /&gt;      [Measures].[Running Total])&lt;br /&gt;select&lt;br /&gt;   {[Measures].[Profit Change],&lt;br /&gt;     [Measures].[Running Total],&lt;br /&gt;     [Measures].[Average Daily Running Total]} ON COLUMNS,&lt;br /&gt;   {[Time.Weekly].[Week].Members} ON ROWS&lt;br /&gt;from [Sales]&lt;/pre&gt;&lt;/blockquote&gt;Note how [Measures].[Running Total] is recursive. The running total for week 3 is defined as the running total for week 2 plus the profit for week 3. To calculate the average running total for week 99, Mondrian computes profit for the first 99 weeks and to calculate the average running total for week 100, Mondrian computes profit 100 for the first 100 weeks. There's lots of wasted effort: Mondrian has computed profit 50,000 times when it could have done it just 100 times and cached the results.&lt;br /&gt;&lt;br /&gt;The solution is simple: wrap the calculation for [Measures].[Running Total] in the &lt;a href="http://mondrian.pentaho.org/documentation/performance.php#Optimizing_Calculations_with_the_Expression_Cache"&gt;Cache() function&lt;/a&gt;, and Mondrian will compute the value only once.&lt;br /&gt;&lt;br /&gt;You will see that in the bug I come up with a couple of proposals for making Mondrian better. I don't think Mondrian should automatically cache every expression, because caching costs time and memory, and most expressions are only evaluated once or twice. And by the way, you should use the Cache function sparingly, for the same reason.&lt;br /&gt;&lt;br /&gt;But it would be nice if Mondrian could automatically detect some cases where expression caching is desirable. The proposed 'cache' property of a calculated member would have three values: 0 (never cache), 1 (always cache) and null (Mondrian should use its best judgment). Most calculated members would leave the caching up to Mondrian, so we would need to come up with a simple, effective rule that governs caching before we implemented this feature. What do you think the rule should be?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2590586374515504298?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2590586374515504298/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2590586374515504298' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2590586374515504298'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2590586374515504298'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/04/how-mondrian-evaluates-expressions.html' title='How Mondrian evaluates expressions'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-4463970001347490698</id><published>2009-04-27T15:11:00.000-07:00</published><updated>2009-04-27T15:20:06.526-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pat pentaho analysis tool olap4j slice dice jpivot'/><title type='text'>PAT 0.2</title><content type='html'>PAT (Pentaho Analysis Tool) project renews my faith in open source. A team of folks from Pentaho's community have got together and are cooking up a new UI for Mondrian. Due to the magic of olap4j, it will work against other OLAP engines too.&lt;br /&gt;&lt;br /&gt;They just released version 0.2, and &lt;a href="http://pentahomusings.blogspot.com/2009/04/pat-02-arrives.html"&gt;Tom's release notes&lt;/a&gt; are an amusing and informative history of the project. Download from the &lt;a href="http://code.google.com/p/pentahoanalysistool/"&gt;project home page&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-4463970001347490698?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/4463970001347490698/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=4463970001347490698' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4463970001347490698'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4463970001347490698'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/04/pat-02.html' title='PAT 0.2'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-6537313086400326400</id><published>2009-04-10T17:31:00.000-07:00</published><updated>2009-04-11T00:51:56.798-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian mdx formatting olap4j'/><title type='text'>Formatting MDX as plain text</title><content type='html'>&lt;div&gt;When Mondrian tools output MDX results as text, such as in the cmdRunner  utility, we've been using the same old crappy format for years. For example, the  query&lt;/div&gt; &lt;pre&gt;select&lt;br /&gt;  crossjoin(&lt;br /&gt;    {[Time].[1997].[Q1],  [Time].[1997].[Q2].[4]},&lt;br /&gt;    {[Measures].[Unit Sales], [Measures].[Store  Sales]}) on 0,&lt;br /&gt;  {[USA].[CA].[Los Angeles],&lt;br /&gt;   [USA].[WA].[Seattle],&lt;br /&gt;   [USA].[CA].[San Francisco]} on 1&lt;br /&gt;FROM [Sales]&lt;/pre&gt;&lt;div&gt;is formatted as&lt;/div&gt; &lt;pre&gt;Axis #0:&lt;br /&gt;{}&lt;br /&gt;Axis #1:&lt;br /&gt;{[Time].[1997].[Q1], [Measures].[Unit  Sales]}&lt;br /&gt;{[Time].[1997].[Q1], [Measures].[Store  Sales]}&lt;br /&gt;{[Time].[1997].[Q2].[4], [Measures].[Unit  Sales]}&lt;br /&gt;{[Time].[1997].[Q2].[4], [Measures].[Store Sales]}&lt;br /&gt;Axis #2:&lt;br /&gt;{[Store].[All Stores].[USA].[CA].[Los Angeles]}&lt;br /&gt;{[Store].[All  Stores].[USA].[WA].[Seattle]}&lt;br /&gt;{[Store].[All Stores].[USA].[CA].[San  Francisco]}&lt;br /&gt;Row #0: 6,373&lt;br /&gt;Row #0: 13,736.97&lt;br /&gt;Row #0: 1,865&lt;br /&gt;Row #0: 3,917.49&lt;br /&gt;Row #1: 6,098&lt;br /&gt;Row #1: 12,760.64&lt;br /&gt;Row #1: 2,121&lt;br /&gt;Row #1: 4,444.06&lt;br /&gt;Row #2: 439&lt;br /&gt;Row #2: 936.51&lt;br /&gt;Row #2: 149&lt;br /&gt;Row #2: 327.33&lt;/pre&gt; &lt;div&gt;I've just &lt;a href="http://p4web.eigenbase.org/@md=d&amp;amp;c=6PU@12590?ac=10"&gt;checked in&lt;/a&gt; an alternative formatter that makes the result look  more like a pivot table. The same query would come out like  this:&lt;/div&gt;&lt;pre&gt;                     1997       1997        1997        1997&lt;br /&gt;                     Q1         Q1          Q2          Q2&lt;br /&gt;                                            4           4&lt;br /&gt;                     Unit Sales Store Sales Unit Sales Store Sales&lt;br /&gt;=== == ============= ========== =========== ========== ===========&lt;br /&gt;USA CA Los Angeles   6,373      13,736.97   1,865      3,917.49&lt;br /&gt;USA WA Seattle       6,098      12,760.64   2,121      4,444.06&lt;br /&gt;USA CA San Francisco 439        936.51      149        327.33&lt;/pre&gt;&lt;div&gt;Two questions:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;1. Should we move this code into the &lt;a href="http://www.olap4j.org/"&gt;olap4j&lt;/a&gt; code base? (It would seem to make sense  because it doesn't require any mondrian internals to do the job, and the  processing requires a 'grid model' similar to query models already part of  olap4j. But I don't want to 'dump' code that is not generally useful.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;2. What do people feel is the ideal format for formatting MDX results as  text? As a starting point, another couple of possible formats are below.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;"Oracle" format&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;pre&gt;                     1997&lt;br /&gt;                     Q1                      Q2&lt;br /&gt;                                            4&lt;br /&gt;                     Unit Sales Store Sales Unit Sales Store Sales&lt;br /&gt;=== == ============= ========== =========== ========== ===========&lt;br /&gt;USA CA Los Angeles        6,373   13,736.97      1,865    3,917.49&lt;br /&gt;    WA Seattle            6,098   12,760.64      2,121    4,444.06&lt;br /&gt;    CA San Francisco        439      936.51        149      327.33&lt;/pre&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;"MySQL"  format&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;pre&gt;|                          | 1997                                                |&lt;br /&gt;|                          | Q1                       |  Q2                      |&lt;br /&gt;|                          |                          | 4                        |&lt;br /&gt;|                          | Unit Sales | Store Sales | Unit Sales | Store Sales |&lt;br /&gt;+-----+----+---------------+------------+-------------+------------+-------------+&lt;br /&gt;| USA | CA | Los Angeles   |      6,373 |   13,736.97 |      1,865 |   3,917.49  |&lt;br /&gt;|     | WA | Seattle       |      6,098 |   12,760.64 |     2,121  |    4,444.06 |&lt;br /&gt;|     | CA | San Francisco |        439 |     936.51  |        149 |      327.33 |&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-6537313086400326400?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/6537313086400326400/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=6537313086400326400' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6537313086400326400'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6537313086400326400'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/04/formatting-mdx-as-plain-text.html' title='Formatting MDX as plain text'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-7466747269030858612</id><published>2009-04-07T23:58:00.000-07:00</published><updated>2009-04-08T01:36:27.927-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle simba mdx olap olap4j standardization'/><title type='text'>The last MDX holdout folds, but true OLAP interop is still a long way off</title><content type='html'>&lt;div&gt;Oracle, the last major OLAP vendor to embrace &lt;a href="http://en.wikipedia.org/wiki/Multidimensional_Expressions"&gt;MDX&lt;/a&gt;, has finally &lt;a href="http://www.simba.com/MDX-Provider-for-Oracle-OLAP.htm"&gt;added MDX support to its server&lt;/a&gt;. The MDX Provider for Oracle OLAP, developed in partnership with &lt;a href="http://www.simba.com"&gt;Simba&lt;/a&gt;, implements the OLE DB for OLAP API and the MDX query language, and went beta this week.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The most obvious application of this technology, and I'm sure the initial revenue driver, will be to allow end-users to use Excel 2007 as their client for slicing and dicing.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Simba's architecture diagram shows the MDX provider loaded onto the same machine as the Excel client. It wouldn't seem technically difficult to run the MDX provider as a server, and have multiple clients connect via OLE DB for OLAP or via XML for Analysis. (Licensing may be a different matter.)&lt;/div&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.simba.com/images/MDX-Connector-for-Oracle-OLAP.gif"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 383px; height: 317px;" src="http://www.simba.com/images/MDX-Connector-for-Oracle-OLAP.gif" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;div&gt;This announcement means that now it is possible to talk MDX to every major OLAP server. (Are there any OLAP servers that do not speak MDX? I can't think of any.) The OLAP market has moved very slowly towards standardization, but this is a significant moment, even a tipping point. In a conversation five years ago, Oracle executives agreed that MDX was a fine language, but said they would not support it, because that would be to acknowledge that Microsoft was the thought-leader in the OLAP marketplace. It's that old PR strategy: deny in public, agree in private. And in a sense their strategy worked, because without a standard language, the OLAP market could not begin to commoditize.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There is still a long way to go towards OLAP interoperability. Servers differ widely in their support of MDX. Unlike SQL, the MDX language is not in the hands of an independent standards organization; even the originators of the de facto standard, Microsoft, have not released a specification for MDX or &lt;a href="http://www.xmla.org"&gt;XMLA&lt;/a&gt; for several years.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A query language is no good without an API to issue queries, and APIs only exist in Microsoft's own technologies: COM (OLE DB for OLAP), .NET (adomd.net) and web services (XMLA).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I have been advocating &lt;a href="http://www.olap4j.org"&gt;olap4j&lt;/a&gt; as the standard API for Java-based OLAP, but it has yet to receive public backing from vendors outside the open source community. And there are no OLAP APIs for languages such as python, perl, and php.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The final point of concern is the emergence of Simba as virtually the sole supplier of MDX, OLE DB for OLAP and XMLA technology. Simba is an excellent company, who understand MDX very well, and have invested in building a technology stack. But they also benefit from a close relationship with Microsoft. (Remember those specifications for MDX and XMLA I referred to earlier? Though they have not seen public updates for several years, I'm sure those specifications still exist behind the walls of Castle Redmond, and are available to Microsoft's partners.)&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;As far as I am aware, Simba have been responsible for all of the projects in the last few years to bolt MDX support on to existing servers and applications. (With a sole exception: I was never able to find out where JasperSoft sourced the technology for its &lt;a href="http://www.jaspersoft.com/jaspersoft_app16.html"&gt;ODBO Connect&lt;/a&gt; product.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;To summarize, this is a milestone moment in the development of OLAP technology, but there is still cause for concern. OLAP APIs exist only for a small number of languages, vendors show little inclination to provide true interoperability, and the key technology is provided by a small number of players.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;You can help. If  you are a user of OLAP technology it is in your interests to see the emergence of standards in the OLAP marketplace. So, please ask your vendor what they are doing about interoperability. Ask them whether there are OLAP clients, other than their own, that run on their server. And ask them for APIs to connect to their server from all of the languages you use in your organization. Then, we may move a little closer to the goal of OLAP for all.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-7466747269030858612?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/7466747269030858612/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=7466747269030858612' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7466747269030858612'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7466747269030858612'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/04/last-mdx-holdout-folds-but-true-olap.html' title='The last MDX holdout folds, but true OLAP interop is still a long way off'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-903810947836140325</id><published>2009-04-05T15:50:00.000-07:00</published><updated>2009-04-05T16:06:48.795-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian olap writeback splash olap4j palo jpalo'/><title type='text'>A what-if scenario: musing about adding writeback capability to Mondrian</title><content type='html'>&lt;div&gt;The &lt;a href="http://www.pentaho.com/summit09/"&gt;Pentaho Partner Summit&lt;/a&gt; last week was a great chance to meet people who are using — and being successful — with Mondrian.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;As always, people are thinking of using it in ways that I hadn't imagined. A couple of comments got me thinking about adding writeback support, something we'd long talked about, but seriously considered implementing.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Writeback allows the OLAP end-user to modify cell values and see the effects ripple through their spreadsheet. As you can imagine, it is useful for doing what-if analysis, especially budgeting.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If the cell is a sum of finer-grained cells, we need to modify those finer-grained cells also, and all of the totals of other dimensionalities created from those finer-grained cells, otherwise things just don't add up. This is hard to implement, because you sometimes need to modify a lot of cells, and even harder for ROLAP engines like Mondrian, because such engines don't store cells, they read directly from the unaggregated fact table.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;First, I went looking for existing APIs for writeback.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Microsoft offers support for writeback via the &lt;a href="http://msdn.microsoft.com/en-us/library/ms145568.aspx"&gt;UPDATE CUBE&lt;/a&gt; MDX statement. As always with Microsoft's MDX support, it's difficult to tell whether this is 'standard MDX', but the command seems to be well thought-out. The fact that it is an MDX command rather than an API call allows them to use an MDX expression as the rule by which to pro-rate changes to child cells.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I also looked at the &lt;a href="http://www.jpalo.com/en/products/palo_java_api.html"&gt;JPalo Java API&lt;/a&gt;. (I've always wanted to work more closely with &lt;a href="http://www.palo.net/"&gt;Palo&lt;/a&gt;. Although they're an OLAP engine, they have a different architecture (C and MOLAP) and core target audience (Excel users), and they're open source, so I see a lot of benefit to them and us if we pool resources. I invited them to join the &lt;a href="http://www.olap4j.org/"&gt;olap4j&lt;/a&gt; process early on, but they preferred to define their own Palo-specific Java API. I'm still hopeful.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=209776"&gt;downloaded their most recent release from SourceForge&lt;/a&gt; and found that it was a release out of date (2.0 versus 2.5) and didn't contain the source code. There is a more up-to-date version in &lt;a href="http://jpalo.svn.sourceforge.net/viewvc/jpalo/trunk/"&gt;subversion&lt;/a&gt;. In DbConnection I found the setDataNumericSplashed method:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div&gt; /**&lt;/div&gt;&lt;div&gt;  * Sets the given &lt;code&gt;double&lt;/code&gt; value at the specified cell.&lt;/div&gt;&lt;div&gt;  * The splashMode paramater is only important for consolidated cells and&lt;/div&gt;&lt;div&gt;  * determines how the value is scattered among the consolidated elements.&lt;/div&gt;&lt;div&gt;  * Please use the defined class constants for valid values. Although more&lt;/div&gt;&lt;div&gt;  * modes are currently defined only three are supported, namely:&lt;/div&gt;&lt;div&gt;  * SPLASH_MODE_DEFAULT, SPLASH_MODE_BASE_SET and SPLASH_MODE_BASE_ADD&lt;/div&gt;&lt;div&gt;  * @param cube {@link CubeInfo} representation&lt;/div&gt;&lt;div&gt;  * @param coordinates {@link ElementInfo} representations which specify the&lt;/div&gt;&lt;div&gt;  * coordinates&lt;/div&gt;&lt;div&gt;  * @param value the new value&lt;/div&gt;&lt;div&gt;  * @param splashMode the splash mode, use defined class constants&lt;/div&gt;&lt;div&gt;  */&lt;/div&gt;&lt;div&gt; public void setDataNumericSplashed(CubeInfo cube, ElementInfo[] coordinate, double value, int splashMode);&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;/blockquote&gt;&lt;div&gt;I couldn't find any more documentation than that, but 'splash mode' seems to be equivalent to Microsoft's update strategies USE_EQUAL_ALLOCATION etc.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There are several remaining questions. What are the right changes to the olap4j API to support writeback? Support for the UPDATE CUBE statement is the leading contender. I'd love to hear what the olap4j community — especially the folks building the &lt;a href="http://wiki.pentaho.com/display/COM/Pentaho+Analysis+Tool"&gt;Pentaho Analysis Tool&lt;/a&gt; — think of this API, and how they would expose writeback in their UI.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I presume we'll need a scheme for transaction management. End-users will want to save their work, come back another day and continue where they left off. Several end-users might be using Mondrian at the same time, and want to see their numbers, not anyone else's. So, I think we'll need to introduce a concept I'd call a 'scenario', which is a property of a connection and can be persisted.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We'll need to figure out how to implement writeback within a Mondrian's ROLAP-with-caching architecture. Writing to the fact table is not tenable, because the modified cells can be of a multitude of dimensionalities. Neither is writing to an aggregate table, for the same reason. Ideal would be to write to disk a minimal description of the cells the user has modified — in XML, say — and do the other magic in the caching layer.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Lastly, I just need to find time to implement it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-903810947836140325?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/903810947836140325/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=903810947836140325' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/903810947836140325'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/903810947836140325'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/04/what-if-scenario-musing-about-adding.html' title='A what-if scenario: musing about adding writeback capability to Mondrian'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-3248997853782973475</id><published>2009-03-31T14:20:00.001-07:00</published><updated>2009-03-31T14:24:31.082-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian crosstab'/><title type='text'>Time crosstab in Mondrian</title><content type='html'>The Mondrian/Pentaho community are smarter than I am. &lt;a href="http://forums.pentaho.org/showthread.php?p=210498"&gt;Tom Barber wanted a way to display many months on one screen&lt;/a&gt;, and &lt;a href="http://pedroalves-bi.blogspot.com/2009/03/interesting-olap-date-crosstab-question.html"&gt;Pedro Alves figured out a way to display it as a years x months crosstab&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_Yeslj5UWDbY/Sc6_apfAgLI/AAAAAAAAAAU/irXpVCRdjZU/s320/result2.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 185px;" src="http://3.bp.blogspot.com/_Yeslj5UWDbY/Sc6_apfAgLI/AAAAAAAAAAU/irXpVCRdjZU/s320/result2.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-3248997853782973475?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/3248997853782973475/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=3248997853782973475' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3248997853782973475'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3248997853782973475'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/03/time-crosstab-in-mondrian.html' title='Time crosstab in Mondrian'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_Yeslj5UWDbY/Sc6_apfAgLI/AAAAAAAAAAU/irXpVCRdjZU/s72-c/result2.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-8394031466984827213</id><published>2009-03-23T11:40:00.000-07:00</published><updated>2009-03-23T12:03:48.747-07:00</updated><title type='text'>Being an open source vendor is like being Irish</title><content type='html'>There's been a lot of flap recently about the definition of an &lt;a href="http://blogs.the451group.com/opensource/2009/02/02/define-open-source-vendor"&gt;open source vendor&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Just about every piece of software these days will have some fraction that is based on open source code, so every software company is to some degree an open source vendor. And in the process of developing this code, the company's developers will need to participate in the community of those projects, and possibly fix bugs and contribute features.&lt;br /&gt;&lt;br /&gt;So, whether you are an open source vendor, or for that matter, &lt;a href="http://blogs.opennms.org/?p=656"&gt;open core vendor&lt;/a&gt; or &lt;a href="http://blogs.the451group.com/opensource/2009/03/16/define-free-software-vendor/"&gt;free software vendor&lt;/a&gt; is a question of degree. That means that everyone gets to argue what level the bar is set, and it all descends into silliness.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It reminds me of the fact that, especially at this time of year, everyone in the United States &lt;a href="http://blogs.abcnews.com/politicalpunch/2009/03/president-ob-13.html"&gt;gets to be Irish&lt;/a&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In contrast, there is a very clear &lt;a href="http://www.opensource.org/docs/definition.php"&gt;definition for open source software&lt;/a&gt;. The beauty of it is that it doesn't matter whether the software is written by an employee of open source vendor, an anarchist student, or Microsoft-loving independent consultant. If it has an open source license, it's open source software. As simple as that.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-8394031466984827213?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/8394031466984827213/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=8394031466984827213' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8394031466984827213'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8394031466984827213'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/03/being-open-source-vendor-is-like-being.html' title='Being an open source vendor is like being Irish'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-6256712358193463505</id><published>2009-03-11T15:27:00.000-07:00</published><updated>2009-03-15T01:26:56.886-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mondrian kettle pentaho oem training'/><title type='text'>Pentaho Partner Summit 2009</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.pentaho.com/email/partner_summit09.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 380px; " src="http://www.pentaho.com/email/partner_summit09.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;Are you thinking of incorporating Mondrian, or perhaps the entire Pentaho BI Suite, in your application? The Pentaho Partner Summit, on April 2nd and 3rd in Menlo Park, CA, is a good chance to get an overview of the technology.&lt;br /&gt;&lt;br /&gt;The first day will contain an executive overview for vendors thinking of using the technology. Chance to hear myself, Matt Casters (Kettle lead), Zack Urlocker (MySQL VP Marketing and general Open Source luminary) and Richard Daley (Pentaho CEO) hold forth, then a cocktail reception where you get to tell us what you think.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The second day will have in depth training for OEM, reseller, and systems integrator partners who are already building apps using Pentaho technology.&lt;br /&gt;&lt;br /&gt;Read the agenda, get details, and register at &lt;a href="http://www.pentaho.com/summit09/agenda.php"&gt;http://www.pentaho.com/summit09/agenda.php&lt;/a&gt;.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-6256712358193463505?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/6256712358193463505/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=6256712358193463505' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6256712358193463505'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6256712358193463505'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/03/pentaho-partner-summit-2009.html' title='Pentaho Partner Summit 2009'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-1652082261421917585</id><published>2009-02-27T16:47:00.000-08:00</published><updated>2009-02-27T16:58:47.456-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ggro hawkwatch'/><title type='text'>Golden Gate Raptor Observatory is seeking volunteers</title><content type='html'>Every fall, I &lt;a href="http://julianhyde.blogspot.com/2008/08/hawkwatch.html"&gt;volunteer with the Golden Gate Raptor Observatory&lt;/a&gt;, identifying and counting migrating hawks. The location is spectacular &amp;mdash; Hawk Hill overlooks the Golden Gate Bridge &amp;mdash; and the hawks even more so.&lt;br /&gt;&lt;br /&gt;If you live in the Bay Area and think this might be your thing, attend one of the informational meetings: Wed 29th or Thu 30th April from 7 - 9.30pm, or Sat 2nd May 10am - 12.30pm.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_BVv0WTpeWTs/SaiLSAkrMFI/AAAAAAAAADE/7yWyNeuobTY/s1600-h/ggro2009-flyer.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 309px; height: 400px;" src="http://2.bp.blogspot.com/_BVv0WTpeWTs/SaiLSAkrMFI/AAAAAAAAADE/7yWyNeuobTY/s400/ggro2009-flyer.png" alt="" id="BLOGGER_PHOTO_ID_5307645302363861074" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;(Can anyone identify the 3 birds at the top of the flyer? Bonus points for age &amp;amp; gender.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-1652082261421917585?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/1652082261421917585/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=1652082261421917585' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1652082261421917585'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1652082261421917585'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/02/golden-gate-raptor-observatory-is.html' title='Golden Gate Raptor Observatory is seeking volunteers'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_BVv0WTpeWTs/SaiLSAkrMFI/AAAAAAAAADE/7yWyNeuobTY/s72-c/ggro2009-flyer.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-1320772154487808157</id><published>2009-02-16T15:06:00.000-08:00</published><updated>2009-02-16T15:42:37.843-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='baby sebastian beer camra'/><title type='text'>Welcome, Sebastian Hyde!</title><content type='html'>Those of you who follow me in &lt;a href="http://twitter.com/julianhyde"&gt;Twitter&lt;/a&gt; and &lt;a href="http://www.facebook.com/profile.php?id=706953192"&gt;Facebook&lt;/a&gt; will know that my thoughts have been on &lt;a href="http://www.facebook.com/album.php?aid=65483&amp;amp;id=706953192"&gt;the birth of my son, Sebastian Hyde&lt;/a&gt;, and not on open-source BI or streaming SQL as much as usual.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;British fathers often celebrate their baby's arrival by "wetting the baby's head" at a local hostelry. Unlike the Christening ceremony that it is patterned after, this involves lots of beer but no baby. My friend Rhys (who, my American readers may like to note, is British but not English) devised a variation of that tradition he called "baby bingo", and we carried out the ceremony at &lt;a href="http://www.barclayspub.com/"&gt;Barclay's pub&lt;/a&gt; on Friday.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.expressandstar.com/2007/10/06/festival-pot-tribute-to-legend/"&gt;As my father well knows&lt;/a&gt;, beer and commemorative plaques go together. Rhys, Jacq and Sabine just presented us an excellent framed image listing the beers that we consumed in Sebastian's honor. I wonder what Sebastian will make of it when he's older?&lt;/div&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_BVv0WTpeWTs/SZn5IlmKzWI/AAAAAAAAAC8/PpFDp3Zde0E/s1600-h/sebastian.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 132px;" src="http://4.bp.blogspot.com/_BVv0WTpeWTs/SZn5IlmKzWI/AAAAAAAAAC8/PpFDp3Zde0E/s400/sebastian.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5303543962131942754" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-1320772154487808157?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/1320772154487808157/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=1320772154487808157' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1320772154487808157'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1320772154487808157'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/02/welcome-sebastian-hyde.html' title='Welcome, Sebastian Hyde!'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_BVv0WTpeWTs/SZn5IlmKzWI/AAAAAAAAAC8/PpFDp3Zde0E/s72-c/sebastian.jpg' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-1718086890022712202</id><published>2009-02-07T14:53:00.000-08:00</published><updated>2009-02-07T15:55:40.083-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='arithmetic journalism'/><title type='text'>Calculus: it's so basic that even moronic headline writers should know about it</title><content type='html'>The following &lt;a href="http://www.telegraph.co.uk/scienceandtechnology/technology/facebook/4512806/Facebook-at-five-Ten-times-more-traffic-to-Twitter-website-than-Facebook-in-last-year.html"&gt;headline in the Daily Telegraph&lt;/a&gt; struck me as really odd:&lt;blockquote&gt;"Facebook at five: Ten times more traffic to Twitter website than Facebook in last year"&lt;/blockquote&gt;The actual facts in the article show that the headline was patently false:&lt;div&gt;&lt;ul&gt;&lt;li&gt;"Over the last year, traffic to Twitter [...] has increased by 1191 per cent, while traffic to Facebook has grown just 110 per cent"&lt;br /&gt;&lt;/li&gt;&lt;li&gt;"Facebook [...] received 133 times more UK internet visits than Twitter"&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;If the headline had read "Ten times more growth in traffic to Twitter website than Facebook in last year" or "133 times more traffic to Twitter website than Facebook in last year", it would have been correct. I'm guessing that the headline writer omitted the word 'growth', presumably to save an inch of headline space, and turned the truth on its head: by a factor of a thousand.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The problem is that journalists are confusing a quantity and the time derivative of that quantity, and it bugs the heck out of me. Journalists who have a professional respect for punctuation, grammar and fact-checking seem to have a disdain for basic numeracy concepts like time derivatives. Do they not understand the difference, or do they think that we're too dumb to notice?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I hear financial journalists trotting out that such and such tax would "raise 15 million pounds". What, on the very morning it is introduced? No; we, the reader, are supposed to insert "per year" to compensate for the journalistic shorthand.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And here in the U.S., the terms "deficit" and "debt" are often used synonymously in the public discourse, where in fact, one is the derivative of the other. With record deficits and national debt looming, and astronomical numbers that we have a duty as citizens to try to comprehand, we need the help of journalists more than ever to help us make sense of the world.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-1718086890022712202?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/1718086890022712202/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=1718086890022712202' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1718086890022712202'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1718086890022712202'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/02/calculus-its-so-basic-that-even-moronic.html' title='Calculus: it&apos;s so basic that even moronic headline writers should know about it'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-8498887698576151172</id><published>2009-02-01T14:45:00.000-08:00</published><updated>2009-02-02T09:10:26.253-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sqlstream monitoring tivoli'/><title type='text'>Michael Cote on SQLstream</title><content type='html'>In their weekly podcast John M. Willis &lt;a href="http://www.redmonk.com/cote/2009/01/31/it-management-podcast-34-cloud-taxonomy-scom-realtime-data-warehousing/"&gt;discusses SQLstream&lt;/a&gt; with &lt;a href="http://www.redmonk.com/cote/about/"&gt;Redmonk's Michael Coté&lt;/a&gt;.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[Edit: This originally read "In his weekly podcast, Redmonk's Michael Coté discusses SQLstream with John M. Willis."]&lt;br /&gt;&lt;div&gt;&lt;blockquote&gt;"Is anybody foolish enough to think that one screen is going to be able to tell you what is going on?&lt;/blockquote&gt;&lt;blockquote&gt;I've actually seen customers try [...] to do ETLs every 15 minutes  in order to look at what's goin' on, aggregating your data and doing analytics  on your data, [...] which is at least a 15-fold [improvement  over] using a single-pane-of-glass. [If you do that,] you can avert  major&lt;span&gt; &lt;/span&gt;disasters because you're watching the  trend happen.&lt;/blockquote&gt;&lt;blockquote&gt;I think that something like SQLstream, or something like  that, that is watching it as it goes [...] has  brilliant&lt;span&gt;  &lt;/span&gt;potential."&lt;/blockquote&gt;&lt;/div&gt;The action starts at around 1:10:00 in &lt;a href="http://media.libsyn.com/media/redmonk/itmanagement034.mp3"&gt;the podcast&lt;/a&gt;.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-8498887698576151172?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/8498887698576151172/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=8498887698576151172' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8498887698576151172'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8498887698576151172'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/02/michael-cote-on-sqlstream.html' title='Michael Cote on SQLstream'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-1275312021710682299</id><published>2009-01-30T15:32:00.000-08:00</published><updated>2009-01-30T16:10:30.105-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sqlstream twitter'/><title type='text'>Recursive event-driven demo</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_BVv0WTpeWTs/SYOW1b26rjI/AAAAAAAAACc/pLsHvSEk8sQ/s1600-h/sqlstream-twitter.png"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px; height: 131px;" src="http://1.bp.blogspot.com/_BVv0WTpeWTs/SYOW1b26rjI/AAAAAAAAACc/pLsHvSEk8sQ/s200/sqlstream-twitter.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5297243431473163826" /&gt;&lt;/a&gt;I just gave a demo of &lt;a href="http://www.sqlstream.com/"&gt;SQLstream&lt;/a&gt; to &lt;a href="http://twitter.com/jadp"&gt;Joe di Paolantonio&lt;/a&gt;. I gave Joe a couple of the usual demos, including &lt;a href="http://www.intelligententerprise.com/blog/archives/2008/12/bi_on_content_f.html"&gt;the one where we read feeds from Twitter, Google news and various RSS feeds&lt;/a&gt;. Joe sent a real-time commentary as a &lt;a href="http://twitter.com/JAdP/status/1162682145"&gt;series of tweets&lt;/a&gt;, and these duly showed up in the demo.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A bit of a stunt, but it was cool, in a geeky &lt;a href="http://en.wikipedia.org/wiki/Douglas_Hofstadter"&gt;Hofstadter&lt;/a&gt;-esque recursion kind of way. Then we had lunch; good chinese food, and blissfully offline.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-1275312021710682299?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/1275312021710682299/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=1275312021710682299' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1275312021710682299'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1275312021710682299'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/01/recursive-event-driven-demo.html' title='Recursive event-driven demo'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_BVv0WTpeWTs/SYOW1b26rjI/AAAAAAAAACc/pLsHvSEk8sQ/s72-c/sqlstream-twitter.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-3174702700531509652</id><published>2009-01-29T18:19:00.000-08:00</published><updated>2009-01-29T18:25:17.103-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sqlstream event-driven marketing customer experience'/><title type='text'>Event-driven marketing</title><content type='html'>David Raab writes a &lt;a href="http://customerexperiencematrix.blogspot.com/2009/01/sqlstream-simplifies-event-stream.html"&gt;great piece&lt;/a&gt; on &lt;a href="http://www.sqlstream.com"&gt;SQLstream&lt;/a&gt;, its internals, and how it can be applied to event-driven marketing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-3174702700531509652?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/3174702700531509652/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=3174702700531509652' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3174702700531509652'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3174702700531509652'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/01/event-driven-marketing.html' title='Event-driven marketing'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-7382017940354547694</id><published>2009-01-26T09:19:00.000-08:00</published><updated>2009-01-26T11:30:50.375-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sqlstream realtime bi'/><title type='text'>SQLstream 2.0</title><content type='html'>SQLstream release 2.0 &lt;a href="http://www.businesswire.com/news/google/20090126005383/en"&gt;hit the streets today&lt;/a&gt;. It's actually our third release, but it's the first one we've made a fuss about. Since release 1.1, we have hardened the product at customer deployments, introduced 64 bit support to allow larger working memory, and added two major SQL extensions: streaming aggregation and user-defined transforms.&lt;br /&gt;&lt;br /&gt;Streaming aggregation adds support for the GROUP BY construct to streaming SQL, and allows us to compute totals on a periodic basis, for example computing hourly subtotals. SQLstream acts as a continuously operating conduit between operational systems and the data warehouse, replacing the traditional batch-based ETL process and populating the fact table and aggregate tables simultaneously. This gives us a &lt;a href="http://julianhyde.blogspot.com/2008/02/streaming-sql-meets-olap.html"&gt;natural synergy&lt;/a&gt; with my other project, the &lt;a href="http://mondrian.pentaho.org/"&gt;Mondrian OLAP engine&lt;/a&gt;, and we have been getting good take-up among &lt;a href="http://www.pentaho.com/"&gt;Pentaho&lt;/a&gt;'s customers.&lt;br /&gt;&lt;br /&gt;Streaming aggregation builds on relational operators in SQLstream 1.1 such as joins, windowed aggregations and unions. Those relational operators allowed you to build queries such as fraud detection, looking for anamolous rows in real time. Those were 'needle in the haystack' kinds of problem, and the new features also allow you to do build a high-performance 'water main' between your operational system and data warehouse.&lt;br /&gt;&lt;br /&gt;User-defined transforms allow you to define new relational operators in Java and incorporate them into streaming SQL statements. SQLstream 1.1 had user-defined functions, to allow you to compute scalar quantities in Java, and syntactically, user-defined transforms are simply functions that have JDBC &lt;a href="http://java.sun.com/javase/6/docs/api/java/sql/ResultSet.html"&gt;ResultSets&lt;/a&gt; and &lt;a href="http://java.sun.com/javase/6/docs/api/java/sql/PreparedStatement.html"&gt;PreparedStatements&lt;/a&gt; as parameters. If a function has a PreparedStatement as a parameter, SQLstream lets you include it in the FROM clause of a SQL statement as a data source, alongside regular streams, tables, and views. Similarly, if a function has a ResultSet as a parameter, then you can pass in a cursor based on a SELECT statement as an argument.&lt;br /&gt;&lt;br /&gt;User-defined transforms are an excellent example of our ongoing collaboration with the open-source &lt;a href="http://www.eigenbase.org/"&gt;Eigenbase project&lt;/a&gt;. User-defined transforms were originally developed for &lt;a href="http://www.luciddb.org/"&gt;LucidDB&lt;/a&gt; to operate on traditional stored relational data, with SQL:2003-compliant syntax, and we extended them to handle streaming relational data, but keeping the syntax the same. For more about user-defined transforms in LucidDB, see the &lt;a href="http://pub.eigenbase.org/wiki/LucidDbUdxJavaHowto"&gt;excellent documentation at Eigenbase&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Release 2.0 is a major milestone for SQLstream, and is the culmination of several years of development. It allows you to tackle in industry-standard SQL some application areas that previously required guile and custom coding. Go to &lt;a href="http://www.sqlstream.com/"&gt;www.sqlstream.com&lt;/a&gt; and see whether there is a fit with your real-time BI application.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-7382017940354547694?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/7382017940354547694/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=7382017940354547694' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7382017940354547694'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7382017940354547694'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/01/sqlstream-20.html' title='SQLstream 2.0'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-1715744444015766165</id><published>2009-01-22T10:08:00.000-08:00</published><updated>2009-01-23T12:59:11.182-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pivot olap4j gwt'/><title type='text'>Pentaho Analysis Tool</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_BVv0WTpeWTs/SXi68fb0cNI/AAAAAAAAACU/NgZjWFo2lWk/s1600-h/PAT.png"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 200px; height: 128px;" src="http://4.bp.blogspot.com/_BVv0WTpeWTs/SXi68fb0cNI/AAAAAAAAACU/NgZjWFo2lWk/s200/PAT.png" alt="" id="BLOGGER_PHOTO_ID_5294186910367117522" border="0" /&gt;&lt;/a&gt;Some folks are working on an &lt;a href="http://www.olap4j.org/"&gt;olap4j&lt;/a&gt;-based viewer as an alternative to &lt;a href="http://jpivot.sourceforge.net/"&gt;JPivot&lt;/a&gt;, called &lt;a href="http://code.google.com/p/pentahoanalysistool/"&gt;Pentaho Analysis Tool&lt;/a&gt;.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The key developers &lt;a href="http://pentahomusings.blogspot.com/2009/01/pentaho-analysis-tool.html"&gt;Tom&lt;/a&gt; and Luc tell me that they noticed that the &lt;a href="http://code.google.com/p/halogen/"&gt;halogen project&lt;/a&gt; hadn't changed in a few months, so they took the halogen source code (based on &lt;a href="http://code.google.com/webtoolkit/"&gt;GWT&lt;/a&gt;, by the way) and started to take it in the direction of the OLAP viewer they'd like to see.&lt;br /&gt;&lt;br /&gt;(&lt;span style="font-style: italic;"&gt;Edit&lt;/span&gt;: There are actually three key developers. I forgot to mention Paul Stöllberger. Sorry Paul!)&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;No hard feelings! I, and some other key Pentaho folks, are delighted that this project is happening, and will support it any way we can.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It's ironic that when Pentaho seeded a project to build an olap4j-based viewer, they chose an organic, open-sourcey name 'halogen', yet these folks (none of whom works for Pentaho) chose a name that whiffs of corporate branding.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A rose by any other name, as they say.  And despite the name, the viewer should work on top of any olap4j data source (which today means &lt;a href="http://mondrian.pentaho.org/"&gt;Mondrian&lt;/a&gt; and any OLAP engine with an XMLA interface).&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-1715744444015766165?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/1715744444015766165/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=1715744444015766165' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1715744444015766165'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1715744444015766165'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/01/pentaho-analysis-tool.html' title='Pentaho Analysis Tool'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_BVv0WTpeWTs/SXi68fb0cNI/AAAAAAAAACU/NgZjWFo2lWk/s72-c/PAT.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-6156298593534142835</id><published>2009-01-22T08:35:00.000-08:00</published><updated>2009-01-22T09:00:49.575-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pentaho mondrian gartner'/><title type='text'>Gartner releases Magic Quadrant for Business Intelligence</title><content type='html'>&lt;div&gt;Gartner have released their &lt;a href="http://mediaproducts.gartner.com/reprints/sas/vol5/article8/article8.html"&gt;2009 Magic Quadrant for Business Intelligence&lt;/a&gt; (via &lt;a href="http://www.dbms2.com/2009/01/22/gartners-2009-magic-quadrant-for-business-intelligence/"&gt;DBMS2&lt;/a&gt;).&lt;/div&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://mediaproducts.gartner.com/reprints/sas/vol5/article8/163529_0001.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 410px;" src="http://mediaproducts.gartner.com/reprints/sas/vol5/article8/163529_0001.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;div&gt;&lt;a href="http://www.pentaho.com/"&gt;Pentaho&lt;/a&gt; has not made it onto the quadrant diagram yet (I suppose because they have not crossed the $20M revenue threshold), but earns its own paragraph in the accompanying commentary:&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;Pentaho, after just four years in existence, has put together a comprehensive open-source BI platform that includes data integration and data mining capabilities. In 2008, Pentaho was noticeably more aggressive, openly competing against traditional BI platform vendors. Like Jaspersoft, Pentaho is affordable and also offers a subscription-based model that avoids an initial large payment for the software license. Some of the significant features Pentaho introduced in 2008 include an automatic table designer that analyzes relational schemas and data patterns, performs a cost-benefit analysis of aggregation at different levels, and generates and populates those aggregate tables. Despite a handful of large customers, Pentaho reference survey respondents more frequently indicated that they had more departmental deployments (versus enterprisewide) and smaller data volumes compared with the other vendors.&lt;/blockquote&gt;&lt;div&gt;Nice that the aggregate table designer gets a call-out. It's very important in helping Mondrian scale to enterprise-scale data warehouses. (And besides, it was a lot of work to write!)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;The report calls out the quality of Pentaho's customer support:&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;This was in evidence in the MQ reference survey, as both Jaspersoft and particularly Pentaho scored strongly on the customer support question — higher than any of the megavendors.&lt;/blockquote&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-6156298593534142835?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/6156298593534142835/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=6156298593534142835' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6156298593534142835'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6156298593534142835'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/01/gartner-releases-magic-quadrant-for.html' title='Gartner releases Magic Quadrant for Business Intelligence'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2273958695666795847</id><published>2009-01-07T17:00:00.000-08:00</published><updated>2009-01-07T18:42:10.641-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='olap query optimization mondrian native SQL'/><title type='text'>Hard-won lessons in Mondrian query optimization</title><content type='html'>&lt;a href="http://mondrian.pentaho.org/"&gt;Mondrian&lt;/a&gt; is generally very smart in how it chooses to implement queries.  Over the last month or so, I have learned some lessons about how hard can be to make Mondrian smarter.&lt;br /&gt;&lt;br /&gt;As a &lt;a href="http://en.wikipedia.org/wiki/ROLAP"&gt;ROLAP&lt;/a&gt; engine (I prefer to call it 'ROLAP with caching'), Mondrian's evaluation strategy has always been a blend of in-memory processing, caching, and native SQL execution. Naturally there is always SQL involved, because Mondrian doesn't store any of its own data, but the question is how much of the processing Mondrian pushes down to the DBMS and how much it does itself, based on data in its cache.&lt;br /&gt;&lt;br /&gt;The trends are towards native SQL execution. Data volumes are growing across the board, Mondrian is being deployed to larger enterprises with large data sets (in some cases displacing more established, and expensive, engines). Mondrian cannot keep up with the growth by simply pulling more data into memory and throwing one or two more CPU cores at the problem.&lt;br /&gt;&lt;br /&gt;Luckily a new breed of database engines, including &lt;a href="http://www.asterdata.com/"&gt;Aster Data&lt;/a&gt;, &lt;a href="http://www.greenplum.com/"&gt;Greenplum&lt;/a&gt;, &lt;a href="http://www.infobright.com/"&gt;Infobright&lt;/a&gt;, &lt;a href="http://www.kickfire.com/"&gt;Kickfire&lt;/a&gt;, &lt;a href="http://www.luciddb.org/"&gt;LucidDB&lt;/a&gt;, &lt;a href="http://www.netezza.com/"&gt;Netezza&lt;/a&gt; and &lt;a href="http://www.vertica.com/"&gt;Vertica&lt;/a&gt;, are helping to solve the data problem with innovative architectures and algorithms. To exploit the power of the database engine, Mondrian's ability to generate native SQL is more important than ever.&lt;br /&gt;&lt;br /&gt;I have spent the last few weeks struggling to make Mondrian handle a particular case more efficiently. It was ultimately unsuccessful, but it was a case where defeat teaches you more than victory.&lt;br /&gt;&lt;br /&gt;Here is the actual MDX query:&lt;br /&gt;&lt;blockquote  style="font-family:courier new;"&gt;&lt;span style="font-size:85%;"&gt;WITH&lt;br /&gt;SET [COG_OQP_INT_s9] AS&lt;br /&gt;&amp;nbsp;&amp;nbsp;'CROSSJOIN({[Store Size in SQFT].[Store Sqft].MEMBERS},[COG_OQP_INT_s8])'&lt;br /&gt;SET [COG_OQP_INT_s8] AS&lt;br /&gt;&amp;nbsp;&amp;nbsp;'CROSSJOIN({[Yearly Income].[Yearly Income].MEMBERS},[COG_OQP_INT_s7])'&lt;br /&gt;SET [COG_OQP_INT_s7] AS&lt;br /&gt;&amp;nbsp;&amp;nbsp;'CROSSJOIN({[Time].[Time].MEMBERS}, [COG_OQP_INT_s6])'&lt;br /&gt;SET [COG_OQP_INT_s6] AS&lt;br /&gt;&amp;nbsp;&amp;nbsp;'CROSSJOIN({[Store].[Store Country].MEMBERS},[COG_OQP_INT_s5])'&lt;br /&gt;SET [COG_OQP_INT_s5] AS&lt;br /&gt;&amp;nbsp;&amp;nbsp;'CROSSJOIN({[Promotions].[Promotions].MEMBERS}, [COG_OQP_INT_s4])'&lt;br /&gt;SET [COG_OQP_INT_s4] AS&lt;br /&gt;&amp;nbsp;&amp;nbsp;'CROSSJOIN({[Promotion Media].[Promotion Media].MEMBERS},[COG_OQP_INT_s3])'&lt;br /&gt;SET [COG_OQP_INT_s3] AS&lt;br /&gt;&amp;nbsp;&amp;nbsp;'CROSSJOIN({[Store Type].[Store Type].MEMBERS}, [COG_OQP_INT_s2])'&lt;br /&gt;SET [COG_OQP_INT_s2] AS&lt;br /&gt;&amp;nbsp;&amp;nbsp;'CROSSJOIN({[Marital Status].[Marital Status].MEMBERS}, [COG_OQP_INT_s1])'&lt;br /&gt;SET [COG_OQP_INT_s1] AS&lt;br /&gt;&amp;nbsp;&amp;nbsp;'CROSSJOIN({[Gender].[Gender].MEMBERS},&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{[Education Level].[Education Level].MEMBERS})'&lt;br /&gt;SELECT {[Measures].[Unit Sales]} ON AXIS(0),&lt;br /&gt;&amp;nbsp;&amp;nbsp;NON EMPTY [COG_OQP_INT_s9] ON AXIS(1)&lt;br /&gt;FROM [Sales]&lt;br /&gt;WHERE ([Customers].[All Customers].[USA].[CA].[San Francisco].[Karen Moreland])&lt;/span&gt;&lt;/blockquote&gt;The query looks a bit fearsome, but is quite likely to occur in practice as a business user slices and dices on several attributes simultaneously. The rows axis is a CrossJoin of ten dimensions, but because of the filtering effect of the slicer (combined with NON EMPTY) the query evaluates to a single row. The goal is to make Mondrian generate a SQL statement to evaluate the axis.&lt;br /&gt;&lt;br /&gt;Each way that I tried to write the logic, I ended up making decisions that made other optimizations invalid. It was difficult to make Mondrian see the big picture: that, although named sets are not supposed to inherit the context where they evaluated, in this case it was OK; and to recognize a complex expression (many nested CrossJoin operators, slicer, and implicit non-empty context), and convert the whole thing into a single SQL statement. For instance, in one attempt I succeeded in generating a SQL statement which evaluates very efficiently, but in so doing I had to let the non-empty context of the evaluator leak into places that it shouldn't... which broke quite a few existing queries, in particular queries involving calculated sets.&lt;br /&gt;&lt;br /&gt;There are several conclusions for Mondrian's architecture. One conclusion is that we need to deal with filtering non-empty tuples as part of the expression, not as a flag in the evaluator (the data structure that contains, among other things, the set of members that form the context for evaluating an expression).&lt;br /&gt;&lt;br /&gt;MDX has an operator, &lt;a href="http://msdn.microsoft.com/en-us/library/ms144936.aspx"&gt;EXISTS&lt;/a&gt;, that specifies that empty tuples should be removed from a set. Then we can reason about queries by applying logic-preserving transformations (just the way that an RDBMS query optimizer works), which should be safer than today's ad hoc reasoning. For example, if I am a developer implementing an MDX function and the evaluator has nonEmpty=true, am I &lt;span style="font-style: italic;"&gt;required to&lt;/span&gt; eliminate non-empty tuples or am I merely &lt;span style="font-style: italic;"&gt;allowed to&lt;/span&gt; eliminate them? (In other words, will my caller return the wrong result if I forget to check the evaluator flag?) I often forget, so I suspect that filtering of empty tuples is performed inconsistently throughout the Mondrian code base; which is a shame, because eliminating empty tuples early can do a lot for performance.&lt;br /&gt;&lt;br /&gt;I'd also like to use the same model for native SQL generation as for other forms of expression compilation. Native SQL generation currently happens at query execution time: when the function is evaluated, it figures out whether it can possibly translate the logic (and the constraints inherited from the evaluation context) into SQL. That is currently unavoidable, because the nonEmpty flag is only available in the evaluator, at query execution time. And we need to do some work at query execution time, if only to plug in the keys of the members in the current context as predicates in the SQL statement. But I've seen several cases where we need to be smarter.&lt;br /&gt;&lt;br /&gt;One example is '&lt;span style="font-family:courier new;"&gt;NON EMPTY [Level].Members&lt;/span&gt;' that always gets translated into SQL even though the level only has two members and they are in cache. Cost-based optimization would help there.&lt;br /&gt;&lt;br /&gt;Another example is where there are many layers of MDX functions — say Filter on top of CrossJoin on top of Filter — and these could be rolled into a single SQL statement. The right approach is to build a SQL statement by accretion, but it is too expensive to do every time the expression is evaluated.&lt;br /&gt;&lt;br /&gt;Further, as we add more rules for recognizing MDX constructs that can turn into SQL, we will reach decision points where we choose to have to choose whether to apply rule A or rule B. Solutions are (a) using costing to decide which rule to apply, and (b) applying both rules and seeing which ultimately generates a better outcome. Neither of these solutions are suitable for query execution time: they need an optimization stage, as part of query preparation.&lt;br /&gt;&lt;br /&gt;It's ironic, considering I've been building SQL optimizers for years (the first at &lt;a href="http://infolab.stanford.edu/infoseminar.Archive/FallY97/slides/broadbase/sld001.htm"&gt;Broadbase&lt;/a&gt;, and the second the optimizer for the &lt;a href="http://www.eigenbase.org/"&gt;Eigenbase project&lt;/a&gt;, which is used by both LucidDB and &lt;a href="http://www.sqlstream.com/"&gt;SQLstream&lt;/a&gt;) that I have avoided giving Mondrian a true query optimizer for so long. I know it's a lot of work to build an optimizer, and it's foolish to start before you know what problem you need to solve.&lt;br /&gt;&lt;br /&gt;Don't expect to see any changes in the short term; this kind of architectural change doesn't happen fast. My struggle over the past few weeks has been a big step in seeing the big picture, and realize that the considerable pain and effort of unifying Mondrian's query planning system is justified by the potential benefits in performance.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2273958695666795847?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2273958695666795847/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2273958695666795847' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2273958695666795847'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2273958695666795847'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2009/01/hard-won-lessons-in-mondrian-query.html' title='Hard-won lessons in Mondrian query optimization'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2250947655583837195</id><published>2008-12-15T02:48:00.000-08:00</published><updated>2008-12-15T04:24:16.362-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='streaming web rss twitter'/><title type='text'>Streaming content feeds part 2: forging the Streaming Web</title><content type='html'>My previous blog post "&lt;a href="http://julianhyde.blogspot.com/2008/12/streaming-analytics-over-content-feeds.html"&gt;Streaming analytics over content feeds (and how content feeds could be better)&lt;/a&gt;" drew some excellent comments, so I thought I'd follow up with some more thoughts about a protocol for streaming web content, and a vision that I'll dub the "Streaming Web".&lt;br /&gt;&lt;br /&gt;To &lt;a href="http://www.blogger.com/profile/13334730102829428646"&gt;John Kalucki&lt;/a&gt;'s points first. I absolutely agree that the driver for this protocol is latency. But it is difficult to answer the question "what latency is necessary?", because we don't yet know what applications people will devise.&lt;br /&gt;&lt;br /&gt;(An illustration of how latency changes everything, from a very different business: when my wife worked for &lt;a href="http://www.nimanranch.com/"&gt;Niman Ranch&lt;/a&gt;, I was amazed to hear that they dispatch steaks via FedEx (packed in ice and insulation, and sent overnight); this would be out of the question using the USPS and a three day delivery time.)&lt;br /&gt;&lt;br /&gt;I believe that real-time web content feeds are a game changer. I call it the Streaming Web — a web where every piece of content is accessible via a URL and you can subscribe to be alerted immediately if a piece of content changes. Every page would become a potential feed, and there would be agents that allow us to collect and filter content we are interested in: be it a friend's photo album or the price of a plane ticket.&lt;br /&gt;&lt;br /&gt;A huge effort is required to make the Streaming Web a reality. The first steps, the web content formats such as RSS and Atom, are already in place. The next step is to introduce a protocols so that subscribers are notified of changes as soon as they happen.&lt;br /&gt;&lt;br /&gt;John says:&lt;br /&gt;&lt;blockquote style="font-style: italic;"&gt;What experience can you offer with feeds at a 50ms push latency vs a 180,000ms pull latency? If a machine is consuming the feed, not much. If a human is immediately consuming a feed, perhaps a great deal.&lt;/blockquote&gt;I agree that a human can benefit from low-latency content, although there is little benefit for content arriving faster than the human's think time — say 5,000ms. But if a computer is the consumer, ideal latencies span a broad spectrum: a mail server would operate more efficiently if it is allowed latencies in the minutes or hours, whereas an automated stock trading system needs information to arrive within 50ms.&lt;br /&gt;&lt;br /&gt;Today, not much web content is of interest to automated stock trading systems. Most web content feeds today are textual — written by humans, and consumed by humans — but I believe that once we remove the latency constraints and introduce some standard protocols, we will start to see more structured data in feeds. Also, we will see algorithms for &lt;a href="http://www.intelligententerprise.com/blog/archives/2008/11/up_next_bi_on_s.html"&gt;extracting information from textual feeds&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As for the right protocol for the job, I am not really the best judge, so I am going to punt for now, and focus on the architecture. &lt;a href="http://www.blogger.com/profile/14036876973506495788"&gt;Richard Taylor&lt;/a&gt; suggests &lt;a href="http://www.xmpp.org"&gt;XMPP&lt;/a&gt;. It seems to have the right qualifications, and I'm sure that it could be made to work technically. (And I see that XMPP is already a &lt;a href="http://www.techcrunch.com/2008/05/05/twitter-can-be-liberated-heres-how/"&gt;central part of the Twitter ecosystem&lt;/a&gt;.) It comes down to power versus simplicity: the power of an established standard versus the simplicity required to reach a new audience of developers.&lt;br /&gt;&lt;br /&gt;I've been around long enough to see new approaches overturn "over-complex" existing technologies and then, in time, acquire the features that made their predecessors complicated. Take for example &lt;a href="http://en.wikipedia.org/wiki/SOAP_%28protocol%29"&gt;SOAP&lt;/a&gt; overturning &lt;a href="http://en.wikipedia.org/wiki/CORBA"&gt;CORBA&lt;/a&gt;, or PCs overturning minicomputers. I'm not going to take sides: these revolutions are part of the process of how technology moves forward. But it does seem that each revolution will only be successful if the new technology serves a new audience. And, to borrow Einstein's words, a protocol should be as simple as possible, but no simpler; otherwise, even if the technology finds its initial audience, it will not survive its growing pains.&lt;br /&gt;&lt;br /&gt;I'm not a big fan of XML as a protocol for transmitting data over a network, mainly because it is bulky, and that makes it expensive to produce and consume at high data rates. But for this protocol, I would choose XML over a binary format. If you're a developer learning a new protocol, it's a lot easier to debug your code if you can read the messages being sent over the nextwork as text.&lt;br /&gt;&lt;br /&gt;Which brings us to the audience for this protocol. I do agree with &lt;a href="http://www.innoq.com/blog/st/"&gt;Stefan Tilkov&lt;/a&gt; that "[f]or the majority of use cases, [the polling] approach is vastly superior to a push model". That majority is already well served, so I'm focusing on the minority that need low latency. I think those use cases are important, and we'll all be using them if the "streaming web" thing catches on.&lt;br /&gt;&lt;br /&gt;To achieve low latency feeds, push is more efficient than high-frequency polling, but it is still more expensive than low-frequency polling, which is what people are doing today. So, if every web content aggregator and RSS reader switched to a low-latency push protocol overnight, the system would collapse.&lt;br /&gt;&lt;br /&gt;But luckily, there is no need for those millions of clients who would like to receive low-latency feed updates to connect using this new protocol. If those clients are humans, they will be happy to receive their updates via XMPP or SMS, or slower protocols like email. A single server could speak the streaming web feed protocol to various source feeds, and route the results to thousands of end users via XMPP or SMS. This approach means that each source feed is serving a modest number of downstream servers.&lt;br /&gt;&lt;br /&gt;I'd describe it as a 'wholesale' architecture. A food producer has a central depot, where it loads its goods onto the trucks of several client stores. The food company allows consumers to buy from the depot, if they are prepared to buy their goods in bulk, but most consumers opt for the convenience of visiting a local store and buying their goods in smaller quantities.&lt;br /&gt;&lt;br /&gt;(If you're Twitter, &lt;a href="http://louisgray.com/live/2008/11/twitter-planning-to-open-up-firehose-by.html"&gt;no problem is ever small&lt;/a&gt;, so that 'modest number' is probably in the tens of thousands. But I suppose that problem can be solved using multiple tiers of servers and fanning out streams between one tier and the next.)&lt;br /&gt;&lt;br /&gt;The next step in the evolution of the architecture would be to introduce a query language. Queries present a more convenient interface for clients, but they would have architectural advantages. For example, using a query, a client can specify more precisely which content it is interested in. It would save CPU effort on the client and possibly the server, and bandwidth for everyone, so there would be a strong incentive to use queries rather than raw feeds.&lt;br /&gt;&lt;br /&gt;Queries would also allow feeds to be virtualized: rather than talk directly to &lt;a href="http://www.blogger.com/"&gt;blogger&lt;/a&gt; and &lt;a href="http://www.typepad.com"&gt;typepad&lt;/a&gt;, a client could talk to a third party that aggregates the content into a single feed.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.sqlstream.com"&gt;Streaming SQL&lt;/a&gt; would be a good candidate for expressing these queries, but is by no means the only choice. And in fact the architecture and protocol would work well enough for clients that did not use queries and wanted to consume only raw feeds.&lt;br /&gt;&lt;br /&gt;The resulting system, the Streaming Web, would enable applications yet to be imagined.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2250947655583837195?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2250947655583837195/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2250947655583837195' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2250947655583837195'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2250947655583837195'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/12/streaming-content-feeds-part-2-forging.html' title='Streaming content feeds part 2: forging the Streaming Web'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-8158338202323064859</id><published>2008-12-12T14:57:00.000-08:00</published><updated>2008-12-12T19:43:41.121-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sqlstream rss atom twitter feed realtime'/><title type='text'>Streaming analytics over content feeds (and how content feeds could be better)</title><content type='html'>We have been experimenting with different web-based data sources for &lt;a href="http://www.sqlstream.com/"&gt;SQLstream&lt;/a&gt;. Seth Grimes saw the demo, and wrote a piece "&lt;a href="http://www.intelligententerprise.com/blog/archives/2008/12/bi_on_content_f.html"&gt;BI on Content Feeds, a.k.a. Continuous (Twitter) Transformation&lt;/a&gt;" in Intelligent Enterprise.&lt;br /&gt;&lt;br /&gt;Social networks and web content feeds such as RSS have, in a few short years, added a dynamic component to the vast static content on the web. As less-sophisticated users have become more accustomed to consuming them, these feeds have become a ubiquitous part of the web experience.&lt;br /&gt;&lt;br /&gt;Web feeds have an information content that is at present untapped. In the same way that a radical new approach — the search engine — was needed to harness the static information content of the web, a streaming analytics solution in this area becomes important sooner rather than later.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_BVv0WTpeWTs/SULwfsnQ5vI/AAAAAAAAABs/kMk1gfNWz_c/s1600-h/SQLstream-web-feed-demo.png"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 320px; height: 262px;" src="http://3.bp.blogspot.com/_BVv0WTpeWTs/SULwfsnQ5vI/AAAAAAAAABs/kMk1gfNWz_c/s320/SQLstream-web-feed-demo.png" alt="" id="BLOGGER_PHOTO_ID_5279046140574099186" border="0" alt="SQLstream Studio showing web content feeds"/&gt;&lt;/a&gt;The SQLstream prototype illustrates how several data formats (tweets from &lt;a href="http://twitter.com/"&gt;Twitter&lt;/a&gt;, &lt;a href="http://earthquake.usgs.gov/eqcenter/catalogs/feeds.html"&gt;USGS quake data&lt;/a&gt; in &lt;a href="http://en.wikipedia.org/wiki/Rss"&gt;RSS format&lt;/a&gt;, news from &lt;a href="http://www.google.com/"&gt;Google&lt;/a&gt;'s &lt;a href="http://en.wikipedia.org/wiki/Atom_%28standard%29"&gt;Atom &lt;/a&gt;feed, and so forth) can be integrated into SQLstream.&lt;br /&gt;&lt;br /&gt;For each data format we built an adapter that implemented the &lt;a href="http://en.wikipedia.org/wiki/SQL/MED"&gt;SQL/MED&lt;/a&gt; specification, and using these adapters we mapped each feed into SQLstream as a foreign stream. Once data is in SQL format, you can build views on top of these streams to filter, join and aggregate records.&lt;br /&gt;&lt;br /&gt;Now we've done the hard part — getting the data feeds into a common format — there are plenty of ways to extract information from the feeds. For instance, it would be easy to find out which Twitter users are the most active over the last hour or the last seven days.&lt;br /&gt;&lt;br /&gt;Or you could pull apart messages to discover word frequencies, and write a stream that detects words that are being used more frequently than usual (similar to &lt;a href="http://www.google.com/intl/en/press/zeitgeist2008/"&gt;Google zeitgeist&lt;/a&gt; but in real time).&lt;br /&gt;&lt;br /&gt;But the prototype has some limitations: news items tend to arrive in bursts every couple of minutes, and many Twitter messages are missing. These are all limitations of the data sources with respect to latency (how soon messages arrive) and throughput (how many messages per second the system can handle), and the limitations stem from the inefficiencies of the web feed protocols.&lt;br /&gt;&lt;br /&gt;You would think that something called a 'feed' would push content to subscribers as soon as it arrives, but in fact RSS and the other feed types in the prototype use a pull protocol. With a pull protocol, the subscriber needs to continually &lt;a href="http://en.wikipedia.org/wiki/Polling_%28computer_science%29"&gt;poll&lt;/a&gt; the feed to get the content (typically an XML document a few kilobytes long), parse the content, and figure out what, if anything, is new since the last time we polled.&lt;br /&gt;&lt;br /&gt;This process soaks up a lot of network bandwidth and resources for both the provider and the subscriber, and the cost goes up the more regularly we poll. Typically the provider has to throttle the feed to prevent their servers from being overwhelmed. For example, Twitter updates its feed only once per minute and limits the number of tweets on the page. At times of high volume, only a small percentage of tweets make it into the feed.&lt;br /&gt;&lt;br /&gt;This may not sound that serious if the content is a twitter conversation between friends, or a blog with one or two posts a week. But web feed protocols are becoming part of the IT infrastructure, and business users require lower latency, higher throughput and higher availability. (The existence of services like &lt;a href="http://www.gnipcentral.com/"&gt;Gnip&lt;/a&gt; is evidence of the need to control the web content chaos.)&lt;br /&gt;&lt;br /&gt;I would like to see the emergence of a genuine 'push' protocol for web-based content. It doesn't have to be particularly complicated. To illustrate what I have in mind, here is an example of a simple, stateless protocol, built using XML over HTTP, like the current feed formats. A subscriber sends a request &lt;blockquote&gt;&amp;lt;readRequest&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;minimumRowtime&amp;gt;2008-12-04&amp;nbsp;18:00:46.000&amp;lt;/minimumRowtime&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;maximumCount&amp;gt;1000&amp;lt;/maximumCount&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;maximumWait&amp;gt;10s&amp;lt;/maximumWait&amp;gt;&lt;br /&gt;&amp;lt;/readRequest&amp;gt;&lt;/blockquote&gt; over HTTP, and the provider replies with a set of content records &lt;blockquote&gt;&amp;lt;rows&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;row&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;rowtime&amp;gt;2008-12-04 18:00:46.217&amp;lt;/rowtime&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;category&amp;gt;U.S.&amp;lt;/category&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;title&amp;gt;Ex-FBI agent faces 30 years to life for mob hit - CNN&amp;lt;/title&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;/row&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;row&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;rowtime&amp;gt;2008-12-04 18:00:46.714&amp;lt;/rowtime&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;category&amp;gt;More Top Stories&amp;lt;/category&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;title&amp;gt;Bill Richardson chalks up another Cabinet job for the resume - Los Angeles Times&amp;lt;/title&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;/row&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;row&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;rowtime&amp;gt;2008-12-04 18:00:48.104&amp;lt;/rowtime&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;category&amp;gt;More Top Stories&amp;lt;/category&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;title&amp;gt;Showdown in Hebron as settlers evicted - Jewish Telegraphic Agency&amp;lt;/title&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;lt;/row&amp;gt;&lt;br /&gt;&amp;lt;/rows&amp;gt;&lt;/blockquote&gt; According to the protocol, the provider sends the results after 10 seconds, or when there are 1000 records to return, whichever occurs sooner. After it has received a result, the subscriber will typically ask for the next set of rows with a higher rowtime threshold.&lt;br /&gt;&lt;br /&gt;Even though it is simple, the protocol ensures that data flows efficiently for feeds of all data rates. For a high volume feed, the 1000 record limit will be reached before the 10 second timeout, so latency naturally decreases. For a low volume feed, many requests may time out and return an empty result; but the 10 second wait limits the number of requests per minute that the server has to handle.&lt;br /&gt;&lt;br /&gt;Naturally, I have in mind an even better protocol that allows subscribers to submit SQL queries, and of course every web would have a SQLstream server behind the curtain. But seriously folks... I would be satisfied with a lot less than that. A simple, open protocol for streaming content syndication would unlock the web and make it the medium of choice for streaming as well as static content.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-8158338202323064859?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/8158338202323064859/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=8158338202323064859' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8158338202323064859'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8158338202323064859'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/12/streaming-analytics-over-content-feeds.html' title='Streaming analytics over content feeds (and how content feeds could be better)'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_BVv0WTpeWTs/SULwfsnQ5vI/AAAAAAAAABs/kMk1gfNWz_c/s72-c/SQLstream-web-feed-demo.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2862978729703113039</id><published>2008-11-25T11:41:00.000-08:00</published><updated>2008-11-25T12:02:08.119-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ubuntu intrepid ibex fennel eigenbase'/><title type='text'>Upgrading to ubuntu 8.10 Intrepid Ibex</title><content type='html'>Someone had to take the plunge. I upgraded one of my development environments, my laptop (a Dell D630 which dual-boots to Vista) to &lt;a href="http://www.ubuntu.com/"&gt;Ubuntu 8.10&lt;/a&gt; last night just to kick the tires.&lt;br /&gt;&lt;br /&gt;Since 8.04 was a LTS (long-term support) release, 8.04 users will not get automatically upgraded. You have to explicitly ask for it, as described &lt;a href="http://www.ubuntu.com/getubuntu/upgrading"&gt;on the Ubuntu site&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;(If you are surprised that 8.10 is the successor to 8.04, you need to know that Ubuntu releases occur approximately six months apart and are numbered {year}.{month}. So 8.10 was released in October 2008. As for where the {adjective} {animal} release names come from, I have never been bold enough to ask.)&lt;br /&gt;&lt;br /&gt;The upgrade went smoothly. Everything worked right out of the box. Kudos to the Ubuntu folks, yet again: I have been able to do distribution upgrades since Ubuntu 7.06.&lt;br /&gt;&lt;br /&gt;As for features, I haven't noticed anything different. I like to stay up to date, and for now I'm pleased that everything still works and looks the same. I'm sure I'll come across the good stuff in due course.&lt;br /&gt;&lt;br /&gt;Not so great for &lt;a href="http://www.eigenbase.org"&gt;Eigenbase&lt;/a&gt; developers, though. Fennel has problems building, which is not unsurprising considering its dependencies on C++ libraries and build tools. Ubuntu 8.10 installs libtool-2.2 (8.04 was libtool-1.5.26). I got some syntax errors in what looked like a generated bash script, possibly related to libtool.&lt;br /&gt;&lt;br /&gt;I'm not going to attempt to track down and solve the problems here. I will do that on the &lt;a href="http://n2.nabble.com/fennel-developers-f1374754.html"&gt;fennel-dev&lt;/a&gt; list, in the next few weeks. Eigenbase developers should note that 8.10 is not yet a viable development environment; for everyone else, it's just fine.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2862978729703113039?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2862978729703113039/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2862978729703113039' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2862978729703113039'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2862978729703113039'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/11/upgrading-to-ubuntu-810-intrepid-ibex.html' title='Upgrading to ubuntu 8.10 Intrepid Ibex'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2884320683075377259</id><published>2008-11-17T08:09:00.000-08:00</published><updated>2008-11-17T08:54:32.011-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='kettle mondrian workbench'/><title type='text'>Tutorial video for Kettle and Mondrian</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_BVv0WTpeWTs/SSGhmM-x7vI/AAAAAAAAABg/EA2pDpXlciQ/s1600-h/pdi-workbench-tutorial.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 200px; height: 138px;" src="http://1.bp.blogspot.com/_BVv0WTpeWTs/SSGhmM-x7vI/AAAAAAAAABg/EA2pDpXlciQ/s200/pdi-workbench-tutorial.jpg" alt="" id="BLOGGER_PHOTO_ID_5269670716691836658" border="0" /&gt;&lt;/a&gt;There's a nice video which shows how you can use &lt;a href="http://kettle.pentaho.org/"&gt;Kettle&lt;/a&gt; together with the &lt;a href="http://mondrian.pentaho.org/documentation/workbench.php"&gt;Mondrian Schema Workbench&lt;/a&gt; to populate a star schema and build cubes on top of it.&lt;br /&gt;&lt;br /&gt;See "Pentaho Data Integration and Schema Workbench Tutorial" on &lt;a href="http://www.pentaho.com/products/demos/presales_tools.php" target="_blank"&gt;the Pentaho web site&lt;/a&gt; (registration required).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2884320683075377259?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2884320683075377259/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2884320683075377259' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2884320683075377259'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2884320683075377259'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/11/tutorial-video-for-kettle-and-mondrian.html' title='Tutorial video for Kettle and Mondrian'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_BVv0WTpeWTs/SSGhmM-x7vI/AAAAAAAAABg/EA2pDpXlciQ/s72-c/pdi-workbench-tutorial.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-5119892850316161035</id><published>2008-11-11T22:37:00.000-08:00</published><updated>2008-11-11T22:49:54.998-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='jpivot trend arrows mondrian'/><title type='text'>Trend arrows in JPivot</title><content type='html'>Nick Goodman has a great tip &lt;a href="http://www.nicholasgoodman.com/bt/blog/2008/11/11/hidden-little-trend-arrows-2/"&gt;how to make trend arrows appear in JPivot cells&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2008/11/200811111457.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 242px;" src="http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2008/11/200811111457.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-5119892850316161035?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/5119892850316161035/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=5119892850316161035' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5119892850316161035'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5119892850316161035'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/11/trend-arrows-in-jpivot.html' title='Trend arrows in JPivot'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-8919414377082922167</id><published>2008-11-04T21:40:00.000-08:00</published><updated>2008-11-04T21:45:13.789-08:00</updated><title type='text'>To: Rest of World</title><content type='html'>Normal service has resumed.&lt;br /&gt;&lt;br /&gt;Sorry about the &lt;a href="http://en.wikipedia.org/wiki/George_W._Bush"&gt;last eight years&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-8919414377082922167?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/8919414377082922167/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=8919414377082922167' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8919414377082922167'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8919414377082922167'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/11/to-rest-of-world.html' title='To: Rest of World'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2623975210810048382</id><published>2008-11-03T14:48:00.000-08:00</published><updated>2008-11-11T22:50:52.792-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='openmrs pri bbc'/><title type='text'>OpenMRS coverage</title><content type='html'>A few months ago I wrote about &lt;a href="http://julianhyde.blogspot.com/2007/09/openmrs.html"&gt;OpenMRS&lt;/a&gt;, an excellent open-source project that is enabling hospitals and medical practices in developing countries to automate medical record-keeping, built using &lt;a href="http://www.pentaho.org/"&gt;Pentaho&lt;/a&gt; technology including &lt;a href="http://mondrian.pentaho.org/"&gt;mondrian&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So it was good to hear a radio segment about OpenMRS on &lt;a href="http://www.theworld.org/"&gt;PRI's "The World"&lt;/a&gt; this afternoon, including an interview about how it is helping a hospital in Haiti. You can listen to it via &lt;a href="http://clarkboyd.wordpress.com/2008/10/07/wtp-213-openmrs-open-source-medical-record-systems-for-the-developing-world/"&gt;Clark Boyd's blog&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2623975210810048382?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2623975210810048382/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2623975210810048382' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2623975210810048382'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2623975210810048382'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/11/openmrs-coverage.html' title='OpenMRS coverage'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2586160820428099900</id><published>2008-11-02T01:50:00.000-07:00</published><updated>2008-11-02T01:55:58.071-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='welsh road sign auto-reply'/><title type='text'>Lost in translation</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://newsimg.bbc.co.uk/media/images/45162000/jpg/_45162744_-2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 416px; height: 300px;" src="http://newsimg.bbc.co.uk/media/images/45162000/jpg/_45162744_-2.jpg" alt="" border="0" /&gt;&lt;/a&gt;Someone had a &lt;a href="http://news.bbc.co.uk/2/hi/uk_news/wales/7702913.stm"&gt;bad day at the office&lt;/a&gt;. The Welsh part of this road sign reads "I am not in the office at the moment. Please send any work to be translated."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2586160820428099900?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2586160820428099900/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2586160820428099900' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2586160820428099900'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2586160820428099900'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/11/lost-in-translation.html' title='Lost in translation'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-7793345986830342</id><published>2008-10-27T16:12:00.000-07:00</published><updated>2008-10-27T17:11:53.575-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='OSBOOTCAMP'/><title type='text'>Academia meets Open Source</title><content type='html'>There is a natural fit between university students and researchers and the open source community. They are smart, educated, short of cash, and want to make the world a better place; and some of them, at least, have plenty of spare time.&lt;br /&gt;&lt;br /&gt;More seriously, open source projects are a great platform for software research. By starting with a mature software platform, the researchers can spend less time recreating existing functionality, and get to the new, interesting stuff faster. The findings of such projects are more applicable to the real world because the new ideas have been tested in realistic architectures and on data sets of a reasonable size. In the area of spatial (GIS) applications alone, there are &lt;a href="http://www.google.com/search?q=%28spatial+or+gis%29+mondrian"&gt;several projects&lt;/a&gt;, including the work of &lt;a href="http://portal.acm.org/citation.cfm?id=1141277.1141292&amp;amp;coll=GUIDE&amp;amp;dl=GUIDE&amp;amp;CFID=8047493&amp;amp;CFTOKEN=97746316"&gt;Joel da Silva and others at Pernambuco, Brazil&lt;/a&gt;, the &lt;a href="http://people.plan.aau.dk/%7Eenc/AGILE2007/PDF/28_PDF.pdf"&gt;GeWOlap&lt;/a&gt; project, and &lt;a href="http://geosoa.scg.ulaval.ca/"&gt;GeoMondrian&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Still, the majority of the mondrian and Pentaho communities are from industry. I would love to get more committers and active community members from academia.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://osbootcamp.org/"&gt;Open Source Boot Camp (OSBOOTCAMP)&lt;/a&gt; is a mini-conference which is trying to put that right. I will be participating in a &lt;a href="http://osbootcamp.org/index.php?page=oak1"&gt;panel on open-source database development at Berkeley this Thursday 30th October&lt;/a&gt;, along with open-source advocates &lt;span class="plain"&gt;Bill Maimone (Ingres), Josh Berkus (PostgreSQL), Mark Atwood (MySQL) and John Sichi (LucidDB).&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-7793345986830342?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/7793345986830342/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=7793345986830342' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7793345986830342'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7793345986830342'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/10/academia-meets-open-source.html' title='Academia meets Open Source'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-7801524386643335067</id><published>2008-10-22T03:35:00.000-07:00</published><updated>2008-10-23T05:03:01.181-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pentaho puc aggregate designer mondrian summary table'/><title type='text'>Pentaho 2.0 brings good things</title><content type='html'>This week &lt;a href="http://www.pentaho.com/the_alternative/"&gt;Pentaho released version 2.0 of its BI Suite&lt;/a&gt;, and it contains two major features that the mondrian community will love.&lt;br /&gt;&lt;br /&gt;First, the Pentaho User Console, a web-based environment where end users can create, view, save, and share BI content. Content is arranged into folders, and includes operational reports created with Pentaho Reports, and dimensional analytics created with Pentaho Analysis (mondrian). Users can also create subscriptions to receive reports regularly by email. PUC is simple and elegant. I predict that it will quickly become the face of Pentaho for end users.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_BVv0WTpeWTs/SP8FLyIXrYI/AAAAAAAAABQ/KYlKC-PuSWI/s1600-h/puc.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_BVv0WTpeWTs/SP8FLyIXrYI/AAAAAAAAABQ/KYlKC-PuSWI/s400/puc.png" alt="" id="BLOGGER_PHOTO_ID_5259928589785607554" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Second, the Pentaho Aggregate Designer, which automatically creates a set of aggregate tables to accelerate a mondrian schema. I believe that the Aggregate Designer takes mondrian to a new level of scalability. Let me explain why.&lt;br /&gt;&lt;br /&gt;There are several architectures possible for multidimensional analysis engines, and they each have their strengths and weaknesses. Mondrian's architecture is best described as 'ROLAP with caching'. A ROLAP engine, short for 'relational online analytical processing', stores its data in a relational database (RDBMS) and accesses it via SQL. It follows that the RDBMS does most of the heavy-duty processing, such as JOIN and GROUP BY operations, while the ROLAP engine deals with presentation, caching, and calculations that can be expressed in a multidimensional model but cannot easily be converted into SQL.&lt;br /&gt;&lt;br /&gt;The chief advantage of ROLAP is its simplicity. A ROLAP engine does not have its own storage engine: everything is in the RDBMS. In particular, the load process is simply a matter of loading the RDBMS, and if the contents of the RDBMS change you just need to &lt;a href="http://julianhyde.blogspot.com/2007/02/mondrian-cache-control.html"&gt;flush mondrian's cache&lt;/a&gt; to see the up to date contents. Provided that the RDBMS scales, you can scale mondrian to greater numbers of concurrent users by having multiple instances of mondrian in a farm of web servers.&lt;br /&gt;&lt;br /&gt;This great strength is also a great weakness. It means that mondrian is beholden to the RDBMS for performance. In particular, that first query of the day, the one that scans all 100 million rows in the fact table to generate a three segment pie chart on the CEO's dashboard:&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-weight: bold;font-family:courier new;" &gt;SELECT&lt;br /&gt;      customer.region,&lt;br /&gt;      sum(fact.store_sales)&lt;br /&gt;FROM sales AS fact&lt;br /&gt;      JOIN customer ON fact.cust_id = sales.cust_id&lt;br /&gt;GROUP BY customer.region&lt;/span&gt;&lt;/blockquote&gt;Without aggregate tables, that query takes however long the RDBMS takes to scan 100 million rows -- perhaps 1 minute, or ten minutes -- but the CEO is not prepared to wait that long. Aggregate tables are the answer. They contain the pre-computed result of such queries, and are declared in mondrian's schema so that mondrian knows how to generate SQL to make use of them.&lt;br /&gt;&lt;br /&gt;The problem is that aggregate tables are hard to use. Mondrian has &lt;a href="http://mondrian.pentaho.org/documentation/aggregate_tables.php"&gt;supported aggregate tables&lt;/a&gt; for several releases, but very few people have made effective use of them. The steps are as follows.&lt;br /&gt;&lt;br /&gt;First of all, choose an effective set of aggregate tables. The possibilities are literally exponential: in a schema with N attributes (hierarchy levels) there are 2&lt;sup&gt;N&lt;/sup&gt; possible aggregate tables. If you choose too many, you will use too much disk space and spend too long loading them every night. If you choose too few, many queries will fall through the net and end up using a full scan of the fact table. Many aggregates can be derived from other aggregates, so it is possible to economize, but there are pitfalls if you do it by hand. (I will write further about the algorithm the Aggregate Designer uses to choose a near-optimal set of aggregate tables in a future post.)&lt;br /&gt;&lt;br /&gt;Next, create the aggregate tables in the RDBMS and add mapping elements such as &lt;aggname&gt; to mondrian's schema. Last, write SQL statements to populate the aggregate tables as part of your ETL process.&lt;br /&gt;&lt;br /&gt;These steps are possible by hand, but very difficult for mere mortals to get right. The Aggregate Designer automates all of these steps. Once you have chosen a mondrian schema, and a particular cube in that schema to optimize, the algorithm analyzes the data in the star schema underlying that cube, and generates a set of aggregate tables. If you have a particular set of aggregate tables in mind, you can create these before running the algorithm, and the algorithm will create additional aggregate tables, taking yours into account.&lt;br /&gt;&lt;br /&gt;&lt;/aggname&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_BVv0WTpeWTs/SP8JNEhKGiI/AAAAAAAAABY/XipEThIHlCo/s1600-h/pad.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_BVv0WTpeWTs/SP8JNEhKGiI/AAAAAAAAABY/XipEThIHlCo/s400/pad.png" alt="" id="BLOGGER_PHOTO_ID_5259933009947793954" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;aggname&gt;Each aggregate table is categorized according to its cost (a combination of the number of bytes on disk and the time it will take to populate) and its benefit (the effort that will be saved at run time, over a typical query load, by having the aggregate table). The Aggregate Designer displays the set of tables it has chosen as a graph: usually convex, reflecting the fact that the first tables suggested are the ones with the most favorable cost/benefit ratios.&lt;br /&gt;&lt;br /&gt;When the algorithm has run, Aggregate Designer can add the definitions of the aggregate tables into the mondrian schema. You can either create and populate the tables immediately, or save a scripts of 'CREATE TABLE' and 'INSERT INTO {aggregate table} SELECT ...' statements. You can even generate &lt;a href="http://kettle.pentaho.org/"&gt;Pentaho Data Integration (Kettle)&lt;/a&gt; steps to perform the ETL process.&lt;br /&gt;&lt;br /&gt;Pentaho User Console and Pentaho Aggregate Designer are both available in the Pentaho open source BI suite version 2.0. Download the suite, or check out the &lt;a href="http://demo.pentaho.com/"&gt;live demo&lt;/a&gt;. They are compatible with mondrian 3.0.4.11371, which is available as part of Pentaho 2.0 or for separate download.&lt;/aggname&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-7801524386643335067?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/7801524386643335067/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=7801524386643335067' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7801524386643335067'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7801524386643335067'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/10/pentaho-20-brings-good-things.html' title='Pentaho 2.0 brings good things'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_BVv0WTpeWTs/SP8FLyIXrYI/AAAAAAAAABQ/KYlKC-PuSWI/s72-c/puc.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-7435741856374211320</id><published>2008-09-26T10:45:00.000-07:00</published><updated>2008-09-30T17:43:45.887-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lemon pie chart'/><title type='text'>The only pie chart you can really trust</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://graphjam.files.wordpress.com/2008/09/171.gif"&gt;&lt;img style="cursor: pointer; width: 400px;" src="http://graphjam.files.wordpress.com/2008/09/171.gif" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;[From &lt;a href="http://www.logicnest.com/archives/120"&gt;logicnest&lt;/a&gt; via &lt;a href="http://infosthetics.com"&gt;infosthetics&lt;/a&gt;.]&lt;br /&gt;&lt;br /&gt;Don't worry, this isn't turning into a humor site. Normal service will resume shortly. The last couple of days, I just needed a laugh.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-7435741856374211320?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/7435741856374211320/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=7435741856374211320' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7435741856374211320'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7435741856374211320'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/09/only-pie-chart-you-can-really-trust.html' title='The only pie chart you can really trust'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-5125399266117350913</id><published>2008-09-26T01:35:00.001-07:00</published><updated>2008-09-26T01:40:02.621-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='goto t.rex attack'/><title type='text'>GOTO considered harmful</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://imgs.xkcd.com/comics/goto.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 350px;" src="http://imgs.xkcd.com/comics/goto.png" alt="" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-5125399266117350913?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/5125399266117350913/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=5125399266117350913' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5125399266117350913'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5125399266117350913'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/09/goto-considered-harmful.html' title='GOTO considered harmful'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-6710846193849010409</id><published>2008-09-23T10:25:00.000-07:00</published><updated>2008-09-23T10:32:14.894-07:00</updated><title type='text'>Chain blog</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_BVv0WTpeWTs/SNkm4Sjo6BI/AAAAAAAAABI/KUgB_l4bwMs/s1600-h/Pix004.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_BVv0WTpeWTs/SNkm4Sjo6BI/AAAAAAAAABI/KUgB_l4bwMs/s320/Pix004.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5249269589172873234" /&gt;&lt;/a&gt;&lt;br /&gt;From &lt;a href="http://www.nicholasgoodman.com/bt/blog/2008/09/22/off-topic-meme-me/"&gt;Mr. Goodman&lt;/a&gt; this morning. A blogger equivalent of “send this mail to 10 people you know”.&lt;br /&gt;&lt;br /&gt;1. Take a picture of yourself right now.&lt;br /&gt;2. Don’t change your clothes, don’t fix your hair... just take a picture.&lt;br /&gt;3. Post that picture with NO editing.&lt;br /&gt;4. Post these instructions with your picture.&lt;br /&gt;&lt;br /&gt;PS I'm on &lt;a href="http://www.urbandictionary.com/define.php?term=Workation"&gt;workation&lt;/a&gt; a friend's cottage in &lt;a href="http://en.wikipedia.org/wiki/Pacific_Grove,_California"&gt;Pacific Grove&lt;/a&gt; for a few days, so don't have my camera to hand. I had to learn how to send a picture from my phone to my PC via Bluetooth. Cool.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-6710846193849010409?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/6710846193849010409/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=6710846193849010409' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6710846193849010409'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6710846193849010409'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/09/chain-blog.html' title='Chain blog'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_BVv0WTpeWTs/SNkm4Sjo6BI/AAAAAAAAABI/KUgB_l4bwMs/s72-c/Pix004.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-4026734431624018875</id><published>2008-09-21T19:16:00.000-07:00</published><updated>2008-09-23T11:03:23.183-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle openworld sqlstream aeturnum mpp smp streaming sql etl'/><title type='text'>Is Oracle about to embrace MPP?</title><content type='html'>Oracle's Larry Ellison has some major announcements to make during Oracle's &lt;a href="http://www.oracle.com/openworld/2008/index.html"&gt;OpenWorld conference&lt;/a&gt; this coming week in San Francisco. A few months ago he was promising to announce a "&lt;a href="http://seekingalpha.com/article/82717-oracle-f4q08-qtr-end-5-31-08-earnings-call-transcript?page=3"&gt;major database innovation&lt;/a&gt;", but declined to give further details, so the Oracle community has been &lt;a href="http://www.computerworld.com/action/article.do?command=viewArticleBasic&amp;amp;taxonomyName=Databases&amp;amp;articleId=9115059&amp;amp;taxonomyId=53&amp;amp;pageNumber=1"&gt;speculating furiously&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;With a keynote entitled "Extreme Performance," and product announcements coming in areas of grid computing and database acceleration, all the indications are that Oracle is getting serious about problems that require massive scalability, massive throughput, and low latency.&lt;br /&gt;&lt;br /&gt;This is an area where Oracle has been falling behind. In Oracle's approach, which independent database analyst Curt Monash calls a "&lt;a target="new" href="http://www.dbms2.com/2007/03/06/why-oracle-and-microsoft-will-lose-in-vldb-data-warehousing/"&gt;shared everything&lt;/a&gt;" architecture, multiple servers belong to the same Oracle Real Application Cluster (RAC) and share a common pool of memory and disk storage. But this approach does not allow Oracle to be run on hundreds or thousands of servers, which is how companies such as Google are solving problems which require large amounts of storage and processing. That sort of massively parallel processing (MPP) requires a "shared nothing" architecture, and internet companies have been rolling their own architectures out of simpler components.&lt;br /&gt;&lt;br /&gt;The result is that Oracle "is way behind in the 'scale-out' world," said Paul Vallee, CEO of &lt;a href="http://www.pythian.com/"&gt;The Pythian Group&lt;/a&gt;, an Ottawa, Ontario-based database services provider. "MySQL is eating its lunch in terms of Internet-scaled deployments."&lt;br /&gt;&lt;br /&gt;Oracle's own experts seem to agree. In the abstract for a talk "&lt;a href="http://www28.cplan.com/cc208/session_details.jsp?isid=298681&amp;amp;ilocation_id=208-1&amp;amp;ilanguage=english"&gt;Oracle's New Database Accelerator: A Technical Overview&lt;/a&gt;", Ron Weiss writes:&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-style: italic;"&gt;"New and revolutionary solutions and methodologies are coming together to handle the exploding data volumes real-world systems are being required to store and serve up. Supporting ever-larger databases, with ever-increasing demands for getting "answers" faster, requires a new way to approach the problem."&lt;/span&gt; &lt;/blockquote&gt;Weiss's solution uses improvements to storage management, but I doubt that it would satisfy Google's requirements, or even the price/performance requirements of a medium-sized internet media company.&lt;br /&gt;&lt;br /&gt;Meanwhile, those who have adopted shared-nothing architectures are feeling the pain too. Having stitched together hundreds or thousands of databases, the problem is how to populate and coordinate them. For example, internet companies' transaction rates are so high that it is not possible to load the day's data during an eight hour nightly load window, and besides, business owners want to see data in near real time. Organizations are adapting a 'trickle ETL' process to populate the data warehouse continuously and with low latency.&lt;br /&gt;&lt;br /&gt;So data architects seem to be caught between a rock and a hard place. Either stick with Oracle's shared-everything (or indeed IBM DB2 or Microsoft SQL Server - they have the same approach) and live with the scalability limitations, or move to the wild frontier of shared-nothing, and be prepared to spend a lot of effort managing, populating and coordinating your farm of databases.&lt;br /&gt;&lt;br /&gt;Ironically, the answer, as Larry Ellison and his cohorts taught us thirty years ago, is in the relational model. By extending the relational model beyond stored data to include streaming data, SQL can be used to efficiently manage data flowing into and between multiple databases, as well as storage and retrieval within those databases. This creates a scalable shared-nothing system, with databases decoupled from each other, but because the data flow is managed by declarative SQL, it is as manageable as a shared-everything system such as Oracle.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.sqlstream.com/"&gt;SQLstream&lt;/a&gt; is an implementation of this new SQL, and can be applied to continuous ETL, real-time BI and monitoring problems. For example, if there are many data sources for your ETL process, and many servers to be populated, SQLstream can act as a cross-hatch, load-balancing the data, aggregating, and routing each row to the correct database engine with low latency. And because SQLstream's SQL encompasses both data at rest and &lt;a href="http://julianhyde.blogspot.com/2008/02/streaming-sql-meets-olap.html"&gt;data in flight&lt;/a&gt;, it can correlate data in the warehouse with arriving data.&lt;br /&gt;&lt;br /&gt;SQLstream is partnering with companies that are building next-generation data warehousing architectures on Oracle and on other databases. &lt;a href="http://www.aeturnum.com/"&gt;Aeturnum&lt;/a&gt; is an exciting new delivery partner for SQLstream with extensive expertise in data warehousing (&lt;a href="http://www.netezza.com/"&gt;Netezza&lt;/a&gt;) and business intelligence (&lt;a href="http://www.pentaho.com/"&gt;Pentaho&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Come and see SQLstream at Oracle OpenWorld. We will be at the Aeturnum stand (2716 Moscone South) all this week.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-4026734431624018875?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/4026734431624018875/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=4026734431624018875' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4026734431624018875'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4026734431624018875'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/09/is-oracle-about-to-embrace-mpp.html' title='Is Oracle about to embrace MPP?'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-7871491263793635980</id><published>2008-09-15T23:42:00.000-07:00</published><updated>2008-09-16T00:19:44.078-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ggro raptor migration'/><title type='text'>Fall migration kicks in...</title><content type='html'>You can't fault nature.&lt;br /&gt;&lt;br /&gt;We've been &lt;a href="http://julianhyde.blogspot.com/2008/08/hawkwatch.html"&gt;counting raptors&lt;/a&gt; for almost a month now, and the numbers have been really, really low. I've been promising everyone that the peak of raptor migration, will be within a day or two of the equinox, because that's how it always goes, but I was getting worried.&lt;br /&gt;&lt;br /&gt;We weren't seeing enough birds. In particular, the &lt;a href="http://en.wikipedia.org/wiki/Accipiter"&gt;accipiters&lt;/a&gt; (the Cooper's Hawk and Sharp-shinned Hawk), which form the bulk of the equinox peak, were nowhere to be seen. The counts consisted mainly of the Turkey Vultures and Red-tailed Hawks which are ubiquitous around Hawk Hill. Had something gone wrong? Had this summer's forest fires disrupted the breeding season and delayed the migration?&lt;br /&gt;&lt;br /&gt;Look at the &lt;a href="http://ggro.org/hawkwatch/default.aspx"&gt;accipiter numbers&lt;/a&gt; for last week:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Sun 8: 2 sharpies, no coops&lt;/li&gt;&lt;li&gt;Mon 9: no birds (fog)&lt;/li&gt;&lt;li&gt;Tue 10: no sharpies, 1 coop&lt;/li&gt;&lt;li&gt;Wed 12: 1 sharpie, 1 coop&lt;/li&gt;&lt;li&gt;Thu 11: 8 sharpies, 3 coops&lt;/li&gt;&lt;li&gt;Fri 12: 5 sharpies, 3 coops&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Then came my day on the hill, Saturday. It was a slow start, foggy at first, and overcast for most of the day, but the birds started coming. We had 40 sharpies, 8 coops. We also got a juvenile &lt;a href="http://en.wikipedia.org/wiki/Golden_Eagle"&gt;Golden Eagle&lt;/a&gt; (at 10.30am even -- conventional wisdom has it that eagles are late risers, like the thermals they soar upon), and a couple of &lt;a href="http://en.wikipedia.org/wiki/Broad-winged_Hawk"&gt;Broad-winged Hawks&lt;/a&gt;. (According to the books, you won't see a broadie west of Kansas, but the &lt;a href="http://en.wikipedia.org/wiki/Marin_Headlands#Wildlife"&gt;Marin Headlands&lt;/a&gt; are very effective at channeling the few we do have into a narrow stream.)&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Sat 13: 40 sharpies, 8 coops&lt;/li&gt;&lt;/ul&gt;And in the last couple of days, the trend has accelerated:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Sun 14: 111 sharpies, 29 coops&lt;/li&gt;&lt;li&gt;Mon 15: 133 sharpies, 44 coops&lt;/li&gt;&lt;/ul&gt;Yes folks, it looks like we have a fall migration after all. As sure as clockwork, the &lt;a href="http://en.wikipedia.org/wiki/Bird_migration#Timing"&gt;changing day-length&lt;/a&gt; is telling those birds to head south. &lt;a href="http://www.hydromatic.net/ggro.xml"&gt;Check the counts&lt;/a&gt; over the next week or two, you should see the spectacle continue.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;(Note that all statistics quoted are copyright of the GGRO and may not be reproduced without permission.)&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-7871491263793635980?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/7871491263793635980/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=7871491263793635980' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7871491263793635980'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7871491263793635980'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/09/fall-migration-kicks-in.html' title='Fall migration kicks in...'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-1422035625145420945</id><published>2008-09-03T17:27:00.000-07:00</published><updated>2008-09-04T13:59:26.665-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sql standard extensions streaming'/><title type='text'>SQL extension to allow FIRST_VALUE and LAST_VALUE in GROUP BY query</title><content type='html'>At SQLstream, we have come across an interesting query pattern that seems to be difficult to express in standard SQL (SQL:2003 or SQL:2008). It turns out to be applicable to regular SQL as well as streaming SQL, and therefore it would make sense as an extension to the SQL standard.&lt;br /&gt;&lt;br /&gt;First some background, for those of you who don't fall asleep every night reading the SQL standard. There are two kinds of aggregation in standard SQL: windowed aggregation, of the form&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-style: italic;"&gt;function&lt;/span&gt;(&lt;span style="font-style: italic;"&gt;arg&lt;/span&gt; {, &lt;span style="font-style: italic;"&gt;arg&lt;/span&gt;}) OVER &lt;span style="font-style: italic;"&gt;window-specification&lt;/span&gt;&lt;/blockquote&gt;and grouped aggregation, which is of the form&lt;br /&gt;&lt;blockquote style="font-family:courier new;"&gt;&lt;span style="font-style: italic;"&gt;function&lt;/span&gt;(&lt;span style="font-style: italic;"&gt;arg&lt;/span&gt; {, &lt;span style="font-style: italic;"&gt;arg&lt;/span&gt;})&lt;/blockquote&gt;and requires a GROUP BY clause. (If the GROUP BY clause is not present, 'GROUP BY ()' is assumed.)&lt;br /&gt;&lt;br /&gt;According to the standard, these two forms should never meet. It is illegal to use a windowed aggregation in a SELECT that has a GROUP BY, or to mix grouped aggregation and windowed aggregation in the same SELECT. (It's OK to use one in a sub-query and another in an enclosing query.)&lt;br /&gt;&lt;br /&gt;However, here is a very reasonable query that is difficult to express in standard SQL: Given a record of every trade on a stock exchange, give me the volume and closing price of each ticker symbol. You might try&lt;br /&gt;&lt;blockquote style="font-family: courier new;"&gt;&lt;b&gt;SELECT&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;day,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;ticker,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;SUM(shares) AS volume,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;LAST_VALUE(price) AS closingPrice&lt;br /&gt;FROM Trades&lt;br /&gt;GROUP BY day, ticker&lt;/b&gt;&lt;/blockquote&gt;but this is illegal SQL. Why is it illegal? Because the LAST_VALUE function (like FIRST_VALUE and RANK) is a windowed aggregate function and is only meaningful on an ordered set.&lt;br /&gt;&lt;br /&gt;To introduce the notion of ordering, I propose that the following query should be valid:&lt;br /&gt;&lt;blockquote style="font-family: courier new;"&gt;&lt;b&gt;SELECT&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;day,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;ticker,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;SUM(shares) AS volume,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;LAST_VALUE(price) OVER (ORDER BY timeOfDay) AS closingPrice&lt;br /&gt;FROM Trades&lt;br /&gt;GROUP BY day, ticker&lt;/b&gt;&lt;/blockquote&gt;With the OVER clause, LAST_VALUE is now a windowed aggregate function within the context of a GROUP BY query, which was previously illegal. Every windowed aggregate is applied to a window, so what is the window in this case? We want the window to contain all of the rows with the same day and ticker value, and to be sorted by timeOfDay. In other words, the window inherits the GROUP BY columns as its implicit PARTITION BY clause. It is as if they had written&lt;br /&gt;&lt;blockquote style="font-family: courier new;"&gt;&lt;b&gt;LAST_VALUE(price) OVER (PARTITION BY day, ticker ORDER BY timeOfDay)&lt;/b&gt;&lt;/blockquote&gt;Now, if you know that I work for SQLstream, you will guess that I am motivated to make this work for streaming queries. A streaming aggregation query over the Trades stream would look like this:&lt;br /&gt;&lt;blockquote style="font-family: courier new;"&gt;&lt;b&gt;SELECT STREAM&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;day,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;ticker,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;SUM(shares) AS volume,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;LAST_VALUE(price) OVER (ORDER BY timeOfDay) AS closingPrice&lt;br /&gt;FROM Trades&lt;br /&gt;GROUP BY day, ticker&lt;/b&gt;&lt;/blockquote&gt;This is identical to the traditional, non-streaming SQL above, except for the STREAM keyword that tells SQLstream that the result of the query should be a stream.&lt;br /&gt;&lt;br /&gt;In idiomatic SQLstream SQL, we would typically express the query as follows:&lt;br /&gt;&lt;blockquote style="font-family: courier new;"&gt;&lt;b&gt;SELECT STREAM&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;FLOOR(t.ROWTIME TO DAY),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;ticker,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;SUM(shares) AS volume,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;LAST_VALUE(price) OVER () AS closingPrice&lt;br /&gt;FROM Trades AS t&lt;br /&gt;GROUP BY FLOOR(t.ROWTIME TO DAY), ticker&lt;/b&gt;&lt;/blockquote&gt;This form uses SQLstream's system ROWTIME column and the 'FLOOR(datetime expression TO time unit)' operator, and so can dispense with the day and timeOfDay columns. Also, streams are ordered by ROWTIME by default, so we can abbreviate 'OVER (ORDER BY ROWTIME)' to 'OVER ()'. This form is more terse, and more typical of how the query would be written in a SQLstream application, but the previous form works also.&lt;br /&gt;&lt;br /&gt;The end result is powerful and, I think, consistent with the spirit of standard SQL.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-1422035625145420945?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/1422035625145420945/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=1422035625145420945' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1422035625145420945'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1422035625145420945'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/09/sql-extension-to-allow-firstvalue-and.html' title='SQL extension to allow FIRST_VALUE and LAST_VALUE in GROUP BY query'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-9038453523802235237</id><published>2008-08-27T12:14:00.000-07:00</published><updated>2008-08-27T12:52:08.727-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='virtualization etl cdc olap streaming sql esp'/><title type='text'>Database virtualization, distributed caching and streaming SQL</title><content type='html'>&lt;a href="http://www.networkworld.com/columnists/2008/082008kobelius.html"&gt;James Kobelius writes in Network World&lt;/a&gt; how the need for scalable real-time business intelligence will create a convergence of technologies centered on database virtualization:&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-style: italic;"&gt;"Real-time is the most exciting new frontier in business intelligence, and virtualization will facilitate low-latency analytics more powerfully than traditional approaches. Database virtualization will enable real-time business intelligence through a policy-driven, latency-agile, distributed-caching memory grid that permeates an infrastructure at all levels. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;As this new approach takes hold, it will provide a convergence architecture for diverse approaches to real-time business intelligence, such as trickle-feed extract transform load (ETL), changed-data capture (CDC), event-stream processing and data federation. Traditionally deployed as stovepipe infrastructures, these approaches will become alternative integration patterns in a virtualized information fabric for real-time business intelligence."&lt;/span&gt;&lt;/blockquote&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt; Kobelius makes it clear that this "virtualized information fabric" is an ambitious program that will be accomplished only over a number of years, but the underlying trends are visible now: for example, the convergence of distributed caches with databases, as evidenced by &lt;a href="http://www.oracle.com/tangosol/index.html"&gt;Oracle's acquisition of Tangosol&lt;/a&gt;, and &lt;a href="http://code.msdn.microsoft.com/velocity"&gt;Microsoft's recently announced Project Velocity&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This envisioned system contains so many moving parts that a new paradigm will be needed to link them together. I don't think that databases are the answer. They elegantly handle stored data, but founder when dealing with change, caching, and the kind of replication problems you encounter when implementing virtualized and distributed systems. For example, database triggers are the standard way of managing change in a database, and are still clunky fifteen years after they were introduced; and &lt;a href="http://en.wikipedia.org/wiki/Enterprise_Information_Integration"&gt;Enterprise Information Integration (EII)&lt;/a&gt; systems were an attempt to extend the database model to handle federated data, but only work well for a proscribed set of distribution patterns.&lt;br /&gt;&lt;br /&gt;I &lt;a href="http://julianhyde.blogspot.com/2008/02/streaming-sql-meets-olap.html"&gt;wrote recently&lt;/a&gt; about how &lt;a href="http://www.sqlstream.com/"&gt;SQLstream&lt;/a&gt; can implement trickle-feed &lt;a href="http://en.wikipedia.org/wiki/Extract,_transform,_load"&gt;ETL&lt;/a&gt; and use the knowledge it gleans from the passing data to proactively manage the &lt;a href="http://mondrian.pentaho.org/"&gt;mondrian OLAP engine&lt;/a&gt;'s cache. SQLstream also has adapters to implement &lt;a href="http://en.wikipedia.org/wiki/Change_data_capture"&gt;change-data capture (CDC)&lt;/a&gt; and to manage data federation.&lt;br /&gt;&lt;br /&gt;In SQLstream, the &lt;span style="font-style: italic;"&gt;lingua franca&lt;/span&gt; for all of these integration patterns is SQL, whereas ironically, if you tried to achieve these things in Oracle or Microsoft SQL Server, you would end up writing procedural code: PL/SQL or Transact SQL. Therefore streaming SQL - a variant of what Kobelius calls event-stream processing where, crucially, the language for event-processing language is SQL - seems the best candidate for that unifying paradigm.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-9038453523802235237?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/9038453523802235237/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=9038453523802235237' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/9038453523802235237'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/9038453523802235237'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/08/database-virtualization-distributed.html' title='Database virtualization, distributed caching and streaming SQL'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-5670575637908057003</id><published>2008-08-27T10:33:00.000-07:00</published><updated>2008-08-27T11:10:30.119-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ggro raptor hawk hill rss'/><title type='text'>Hawkwatch</title><content type='html'>It's that time of year when the days are getting imperceptibly shorter, birds start thinking of heading south, and a couple of hundred volunteer birders of the &lt;a href="http://www.ggro.org/"&gt;Golden Gate Raptor Observatory (GGRO)&lt;/a&gt; head to Hawk Hill to watch them.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.hydromatic.net/pix2004/new/P2004A01_140_4064_cropped.JPG"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px;" src="http://www.hydromatic.net/pix2004/new/P2004A01_140_4064_cropped.JPG" border="0" alt="Rufous-morph Red-Tailed Hawk over Hawk Hill" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Hawk Hill is at the southern tip of the &lt;a href="http://en.wikipedia.org/wiki/Marin_Headlands#Wildlife"&gt;Marin Headlands&lt;/a&gt; overlooking the Golden Gate Bridge, which naturally funnels migrating raptors from a fifty mile stretch into less than a mile. The result is a huge concentration of raptors. During peak season — which, not coincidentally, is usually within a day or two of the autumn solstice — you will typically see over 100 birds an hour from 12 or 13 species of raptors, including eagles, falcons, &lt;a href="http://en.wikipedia.org/wiki/Accipiter"&gt;accipiters&lt;/a&gt; and some of the rarer &lt;a href="http://en.wikipedia.org/wiki/Buteo"&gt;buteo hawks&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It has to be seen to be believed, so if you're curious, come up to Hawk Hill and see for yourself. When you get there, on any day between late August and early December, as long as Hawk Hill isn't shrouded in fog, you'll find about a dozen volunteers with binoculars counting hawks.&lt;br /&gt;&lt;br /&gt;The GGRO has been counting and banding raptors at this site for over twenty years. I have been volunteering for 6 years, and this year I have stepped up to be day leader of the &lt;a href="http://groups.google.com/group/hawkwatchsatii"&gt;Saturday II hawkwatch team&lt;/a&gt;. With the depth of hawk-watching experience on the team, it's not too onerous a job. The main responsibility is to ensure that the numbers are recorded systematically — this is a scientific study, after all — and, when the fog rolls in, to tell the team that it's time to hang up the binoculars for the day.&lt;br /&gt;&lt;br /&gt;To help me keep up with the action, I built an &lt;a href="http://www.hydromatic.net/ggro.xml"&gt;RSS feed&lt;/a&gt; that contains a summary of each day's hawk watching, including a count of each species of raptor. The first week was a wash — fog every day — but we can expect to see numbers, particularly of the accipiters (Cooper's hawk and Sharp-shinned hawk), climbing rapidly over the next two or three weeks. Subscribe to that feed and you can get daily updates too; or even better, join us on the hill!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-5670575637908057003?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/5670575637908057003/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=5670575637908057003' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5670575637908057003'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5670575637908057003'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/08/hawkwatch.html' title='Hawkwatch'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-4644143019221798711</id><published>2008-08-19T09:44:00.000-07:00</published><updated>2008-08-19T09:48:39.347-07:00</updated><title type='text'>Mondrian on TimesTen</title><content type='html'>Funny what you find while googling for error messages: apparently &lt;a href="http://forums.oracle.com/forums/thread.jspa?messageID=2340835"&gt;mondrian runs on TimesTen&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;One of the bizarre things about open source is that you have no way of knowing who is using your project, and on what platform. (Until they find something wrong, that is.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-4644143019221798711?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/4644143019221798711/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=4644143019221798711' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4644143019221798711'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4644143019221798711'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/08/mondrian-on-timesten.html' title='Mondrian on TimesTen'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-7212230696877623691</id><published>2008-08-18T09:48:00.000-07:00</published><updated>2008-08-18T10:32:48.618-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='real-time analytics event stream processing esp'/><title type='text'>Really urgent analytics</title><content type='html'>A Forrester report entitled "&lt;a href="http://www.forrester.com/Research/Document/Excerpt/0,7211,45965,00.html"&gt;&lt;span class="greyBLURB"&gt;Really Urgent Analytics: The Sweet Spot for Real-Time Data Warehousing&lt;/span&gt;&lt;/a&gt;" makes the connection between event-stream processing (ESP) and data warehousing, and Intelligent Enterprise published a &lt;a href="http://www.intelligententerprise.com/info_centers/ent_dev/showArticle.jhtml?articleID=210101150&amp;amp;pgno=1"&gt;nice summary&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;A traditional data warehouse contained huge amounts of data but was loaded infrequently: say monthly, or nightly at best. Modern businesses demand actions at lower latencies, and data warehousing professionals have been able tune the traditional data warehouse load process to reduce latency.&lt;br /&gt;&lt;br /&gt;But even when cranked up to the maximum, the load process cannot achieve latencies of less than a few seconds, whereas many business processes need their answers faster than that. And this performance comes at the cost of higher complexity, so it takes more time and effort to modify the load process to incorporate new data or ask new questions.&lt;br /&gt;&lt;br /&gt;To solve the squeeze between lower latency and increasing complexity -- and, I would mention, ever-increasing data volumes and a trend towards distributed systems -- data warehousing needs a new architectural component, and Forrester rightly point to Event Stream Processing to fill that gap. I would add that, given the skill set of data warehousing professionals, it makes a lot of sense for that Event Stream Processing to be in SQL.&lt;br /&gt;&lt;br /&gt;At &lt;a href="http://www.sqlstream.com"&gt;SQLstream&lt;/a&gt;, we saw this need four years ago, and are dedicated to solving the latency-complexity problem using SQL.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-7212230696877623691?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/7212230696877623691/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=7212230696877623691' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7212230696877623691'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/7212230696877623691'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/08/really-urgent-analytics.html' title='Really urgent analytics'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-1578489472903099001</id><published>2008-07-14T11:28:00.000-07:00</published><updated>2008-07-14T11:47:58.236-07:00</updated><title type='text'>The process of database research</title><content type='html'>Jennifer Widom just received the &lt;a href="http://www.sigmod.org/sigmodinfo/awards/#innovations"&gt;ACM SIGMOD Edgar F. Codd Innovations Award&lt;/a&gt;, and spoke about the &lt;a href="http://infoblog.stanford.edu/2008/07/database-research-principles-revealed.html"&gt;process of database research&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;"[I]t's imperative to think about all three of the critical components -- &lt;i&gt;data model&lt;/i&gt;, &lt;i&gt;query language&lt;/i&gt;, and&lt;i&gt; system&lt;/i&gt; -- and in that order! We in research have a rare luxury, compared to those in industry, that we can mull over a data model for a long time before we move on to think about how we'll query it, and we can nail down a solid syntax and semantics for a query language before we implement it."&lt;/blockquote&gt;I've designed languages before, and I know how hard it is to do it right, so when I was designing &lt;a href="http://www.sqlstream.com/Products/SQLstream_RAMMS_White_Paper.pdf"&gt;SQLstream's extensions to SQL&lt;/a&gt; I looked at the research, and &lt;a href="http://dbpubs.stanford.edu:8090/pub/2003-67"&gt;Jennifer's team's work&lt;/a&gt; was easily the best in the field.&lt;br /&gt;&lt;br /&gt;Some of my colleagues balked at the paper's formal approach, but it was just what we needed to build a language for combining streaming and stored relational data, and the optimizer rules and execution objects to implement it.&lt;br /&gt;&lt;br /&gt;She is correct that it is a rare luxury for industry to have a sound foundation to build next-generation technology on. Congratulations on the award, Jennifer, and thanks for helping to build that foundation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-1578489472903099001?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/1578489472903099001/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=1578489472903099001' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1578489472903099001'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/1578489472903099001'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/07/process-of-database-research.html' title='The process of database research'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-3379418843354762954</id><published>2008-07-07T11:56:00.000-07:00</published><updated>2008-07-07T12:05:17.054-07:00</updated><title type='text'>Mondrian, Flex and Openbravo</title><content type='html'>This guy is using &lt;a href="http://opensourceerpguru.com/2008/07/06/flex-client-side-olap-for-open-source-erp-openbravo/"&gt;Adobe Flex as an OLAP client against Openbravo ERP data&lt;/a&gt;. He now plans to "connect Flex to [the Mondrian] OLAP server and let the OLAP sever do all the hard work".&lt;br /&gt;&lt;br /&gt;Sounds like a great idea: keep the big data on the server side, send just the multidimensional, aggregated results over SOAP (XML for Analysis), and let Flex do what it does best: fast, rich client.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-3379418843354762954?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/3379418843354762954/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=3379418843354762954' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3379418843354762954'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3379418843354762954'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/07/mondrian-flex-and-openbravo.html' title='Mondrian, Flex and Openbravo'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-3805697812277510676</id><published>2008-06-27T15:21:00.000-07:00</published><updated>2008-06-27T15:35:55.555-07:00</updated><title type='text'>More from my mother's funeral service</title><content type='html'>Barbara Webb, the extraordinary reader from Wombourne church who worked with me to develop the service in accordance with Mum's wishes, has sent me the words that she spoke on the day. Barbara knew my mother well, and her words are a fine tribute.&lt;br /&gt;&lt;br /&gt;I am not a Christian, but I have a lot of time for the spirit of philosophical inquiry that Barbara brings to her religion, and indeed her outlook on life. Barbara helped us give our mother a wonderful send-off; thank you, Barbara.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Introduction&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;This service has been built around Judy’s well-thought out wishes, to such an extent that I feel that she has invited us to celebrate her life by sharing much of what she found good in it. Even during the last weeks in Compton Hospice Judy was very much in charge.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;The manner of her illness has meant that she was able to talk with and share many thoughts with Bethan, Julian and Justin, drawing the family closer together which was, I am sure, her dearest wish. Those bonds have been strengthened and will not break, even though the Atlantic separates her sons from her daughter much of the time.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;As most of you here know, music played an enormous part in Judy’s life, especially over the last years.  That is reflected in the service sheet which contains the full details she requested. Music, I know, touched Judy at the deepest level, and in her choice of music for this service she has chosen two pieces and asked us to sit quietly as they are played.  So let us do just that, not only remembering Judy but allowing the music speak to deeply to our hearts, freeing emotions that words simply don’t touch.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Address&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Judy had a lot of time to decide the details of her funeral. She asked me over a year ago to co-operate with our Rector Paul. And she knew that the tone of the service would be 100% Christian because it was only a couple of months ago that, being Judy who tackled things head on she asked to be sidesperson at a local funeral in this church … which it so happened I took. Yes, she knew this would be a 100% Christian funeral.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Judy was a thinking person, who didn’t take what she was taught for granted, and there were aspects of the Christian faith which she found difficult. She questioned, and I believe that God will welcome her with open arms because of that. Some of us come to him as unquestioning simple believers, others seek more concrete answers, but fortunately God loves us all. just as we are.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;For her own reasons Judy struggled with the concept of resurrection, which is the bedrock of the Christian faith. In fact most faiths allow for some form of existence before, outside or beyond the span and confines of a human life, so there must be a widespread, deep–seated, instinctive belief in humankind that death is not the end but the start of something new. Our problem is, of course, that only one Man came back from the dead to prove it and tell us about it, Jesus Christ himself. Apostle Paul tackled this head on in his letter to the Corinthians. “If the dead are not raised, it follows that Christ was not raised; and if Christ was not raised, your faith has nothing to it.  If it is for this life only that Christ has given us hope, we Christians of all people are most to be pitied.”&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;But Paul didn’t stop there.  He must have been a tremendous orator and I can hear him thundering “But the truth is, Christ was raised from the dead”. Paul believed totally in the resurrection, and so do I.  The Christian hope of life after death is the heart of our service today, and from what I have heard from her family, I believe that, as Judy drew nearer to death, her life story made sense to her, and she was finding the peace which God alone can bring.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;God doesn’t ask us to understand; he asks us to trust in him, and live our lives in hope, inspired by his love, the love he expressed in Jesus Christ, the love he lavishes on us every day.  Our job is simple; we are to accept that love and use it to love our families and all those around us, as agents for God, spreading his peace in the world.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Judy, as we say after most of our services, may you rest in peace and rise in glory.   Amen&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-3805697812277510676?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/3805697812277510676/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=3805697812277510676' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3805697812277510676'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/3805697812277510676'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/06/more-from-my-mothers-funeral-service.html' title='More from my mother&apos;s funeral service'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-4167529163232074827</id><published>2008-06-26T13:19:00.000-07:00</published><updated>2008-06-26T13:34:05.484-07:00</updated><title type='text'>Eulogy for Judy Hyde</title><content type='html'>It was my mother's funeral today, and I gave following eulogy. As you will see from the &lt;a href="http://www.hydromatic.net/judy-hyde-funeral-order-of-service.pdf"&gt;order of service&lt;/a&gt;, the piece of music I was referring to, which immediately followed my address, was the Officium defunctorum (Requiem) by Tomás Luis de Victoria.&lt;br /&gt;&lt;br /&gt;Thank you all for coming.&lt;br /&gt;&lt;br /&gt;Many, many people have helped out in the months that Judy has been ill. I’d like to thank a few people in particular.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Gill and Robert Green, my brother Justin’s in-laws, for putting up with a full house for the past few months;&lt;/li&gt;&lt;li&gt;Pamela’s mother Barbara, who has been sending a card containing a few words and a pasted Snoopy cartoon, three times a week for two years;&lt;/li&gt;&lt;li&gt;Elaine, who was like a sister to my mother;&lt;/li&gt;&lt;li&gt;Adrian and Judy, quietly and selflessly helping out;&lt;/li&gt;&lt;li&gt;All of the staff at Compton Hospice.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Lots of people here! In addition to her family, there are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Friends from her young married life with Adrian, in particular from Dudley Kingswinford Rugby club;&lt;/li&gt;&lt;li&gt;Teachers from Westfield and Bobbington; &lt;/li&gt;&lt;li&gt;People she knows from courses at U3A (University of the Third Age) Latin, Italian, history;&lt;/li&gt;&lt;li&gt;People she knows from music courses at Bromsgrove and Wolverhampton University;&lt;/li&gt;&lt;li&gt;Members of Wombourne &amp;amp; District Choral Society;&lt;/li&gt;&lt;li&gt;Friends from her passion for Early Music, there are recorder players, harpsichord players, and for all I know, players of crumhorns, racketts, and hurdy gurdies;&lt;/li&gt;&lt;li&gt;And of course members of the church and the community in and around Wombourne, that Judy was an enthusiastic part of.&lt;/li&gt;&lt;/ul&gt;So many people! I had always thought of Judy as a quiet woman, and I think she did too. But that reserve was balanced by a drive to do what she loved, to get out there and play music, or do what she needed to do.&lt;br /&gt;&lt;br /&gt;I think everyone who has been with her over her struggle with cancer over the past months knows about that determination. She just kept on going, kept on doing the things she loved.&lt;br /&gt;&lt;br /&gt;I’ll give a couple more examples.&lt;br /&gt;&lt;br /&gt;When I was 5, Judy and Adrian got divorced. It must have been very hard for her, bringing up two young children, but of course we were too young to notice. We moved into a new house, and of course she was working full time to pay the mortgage. She wanted her children to have swimming lessons and piano lessons, play rugby, and join the scouts, so she needed to learn to drive. She created a mantra ‘learn to drive in 75’. She had driving lessons from a friend with her two children sitting on the back seat. She was such a nervous driver, and she devised a way to calm her nerves before her driving test: have a swift gin and tonic, and cover the smell by eating mint imperials.&lt;br /&gt;&lt;br /&gt;And by the way, she still keeps mint imperials in her car. Justin found a Tupperware container in the driver’s door the other day. Even after all her struggles to eat high calorie food and gain weight, they still had a hand-written label on the lid: “Mint imperials, bought 26th September 2007, 11 kilocalories each”.&lt;br /&gt;&lt;br /&gt;Another example of that determination. In the mid 90s, Mum announced that she was going to Uganda. The reason for that visit was my sister Bethan. We never knew that we had a sister. In those days, if you were a young unmarried mother, you weren’t given much choice. Judy’s mother’s first concern was to save the family from embarrassment. I think Mum was shaken up by this. She didn’t think she would ever see Bethan again. When Bethan contacted Mum, she and her family were doing development work in Uganda. I had been to Uganda as a student, and I knew that it was a difficult place to travel in, let alone for a sixty year old. OK, said Mum, and jumped on a plane to meet her daughter.&lt;br /&gt;&lt;br /&gt;In the last few years of her life, Judy had a flowering. She had always clashed wills with her mother, and she said that when her mother died, she ‘came out from under a cloud’. These last ten or fifteen years were the best times of her life. After she retired, she took a music ‘A’ level, then a music degree, and got involved in all kinds of activities, including early music, but also other studies with U3A. Her sons, Justin and I, completed our studies with good degrees, and embarked on careers in computer software that she never totally understood, but was nevertheless very proud of. We both moved to California, but the family remained in close touch. Bethan and her family became part of Judy’s family, and she saw me and my brother settle down with women she liked. Just after my wedding to Pamela in California two years ago, Justin and Caroline had Zachary, and that gave her great pleasure.&lt;br /&gt;&lt;br /&gt;I’m proud of everything that she achieved in those years. It was nice to be on the receiving end of all that love. I love you too, Mum.&lt;br /&gt;&lt;br /&gt;The final healing happened this April. Judy took my father Adrian and his partner Judy to a weekend course to Benslow in South Wales. She lived for these courses, and knew that this was going to be her last. And she was pleased to be introducing her music, which she loved so much, to new people.&lt;br /&gt;&lt;br /&gt;I’m going to leave you with a piece of music. Music was of course a passion of Judy’s life. She felt kinship with people who shared her passion for music. When she found a perfect piece of music, she could of course tell you intellectually why it was perfect. This particular piece, the first movement of a requiem written over 400 years ago for a Spanish princess, was the apex of polyphonic Renaissance choral music. But it’s quite simply a beautiful piece of music. That’s what she would like to leave us with. Please, sit back, reflect, and enjoy it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-4167529163232074827?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/4167529163232074827/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=4167529163232074827' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4167529163232074827'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4167529163232074827'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/06/eulogy-for-judy-hyde.html' title='Eulogy for Judy Hyde'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-4396216680385497207</id><published>2008-06-05T11:07:00.001-07:00</published><updated>2008-06-05T11:11:30.773-07:00</updated><title type='text'>Mainz community meet up is just one week away</title><content type='html'>Just one week til the Pentaho community meet-up in Mainz, Germany, that I blogged about &lt;a href="http://julianhyde.blogspot.com/2008/04/mondrian-in-mainz.html"&gt;earlier&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;If you're coming, please &lt;a href="http://pentaho2008mainz.eventbrite.com/"&gt;register&lt;/a&gt;. (Just so we know how many beer glasses to have on hand!)&lt;br /&gt;&lt;br /&gt;Plenty of other information about the event &lt;a href="http://wiki.pentaho.com/display/COM/Pentaho+Community+Gathering+-+Mainz+2008"&gt;on the Pentaho wiki&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-4396216680385497207?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/4396216680385497207/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=4396216680385497207' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4396216680385497207'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4396216680385497207'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/06/mainz-community-meet-up-is-just-one.html' title='Mainz community meet up is just one week away'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-5421935830813185509</id><published>2008-05-11T13:59:00.000-07:00</published><updated>2008-05-11T14:34:32.236-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='maven ivy mondrian'/><title type='text'>Maven and Ivy</title><content type='html'>We experimented a few years ago using &lt;a href="http://maven.apache.org/"&gt;Apache Maven&lt;/a&gt; to manage the Java libraries that mondrian depends on. Maven looked great on paper, but it was tricky to set up, and since simplicity was the goal in the first place, we gave up.&lt;br /&gt;&lt;br /&gt;Since then Maven2 has been released. Version 2 has a new architecture and is by all accounts a great improvement. I have recently been using Maven for a project with a lot of dependencies on other projects and it helps a lot. It imposes a structure on the dependencies by forcing you to name and version your projects in a certain way, and provides a central repository to put them in. You can provide local repositories for projects only you or your project team are using.&lt;br /&gt;&lt;br /&gt;Still, Maven seems to be a one-trick pony, even though it's a great trick. Maven generates distributions, javadoc, project pages, code coverage, and so forth, but for mondrian the only job I want it to do is dependency management. I don't want to throw away the considerable investment we've made in mondrian's ant scripts. I'd rather have something that adds dependency management to my existing framework.&lt;br /&gt;&lt;br /&gt;I've just come across &lt;a href="http://ant.apache.org/ivy/"&gt;Ivy&lt;/a&gt;, and it seems to be just the ticket. Ivy aims to do dependency-management within an ant framework, and it uses Maven's repository and metadata protocols to manage its libraries. It's just been accepted by Apache as a sub-project of &lt;a href="http://ant.apache.org/"&gt;Ant&lt;/a&gt;, so we know that its integration with ant will continue to get better.&lt;br /&gt;&lt;br /&gt;I went through &lt;a href="http://ant.apache.org/ivy/history/latest-milestone/tutorial.html"&gt;Ivy's tutorial&lt;/a&gt; and was impressed that Ivy could bootstrap itself using just ant 1.6, JDK 1.4 (or higher) and a &lt;a href="http://ant.apache.org/ivy/history/latest-milestone/samples/build.xml"&gt;build.xml&lt;/a&gt; file. (Try it! Just download that file and type 'ant'.)&lt;br /&gt;&lt;br /&gt;So, I'll be looking to add Ivy support to mondrian in the next week or so. The big benefit will be smaller distributions. If you download a source distribution, it will no longer contain libraries such as olap4j.jar, javacup.jar, commons-pool.jar, and so forth. The build process will download these libraries the first time you build. It takes quite a lot of effort, each release, to make sure that a source distribution contains all dependencies, so we hope to same some time there. We'll be able to delete these libraries from our source control system -- always a strange place for libraries, I thought.&lt;br /&gt;&lt;br /&gt;And, for those of you who use mondrian with different libraries than we ship with (say you use a different version of log4j or apache commons than we do) you should be able to easily modify your dependencies and recompile the source distribution.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-5421935830813185509?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/5421935830813185509/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=5421935830813185509' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5421935830813185509'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5421935830813185509'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/05/maven-and-ivy.html' title='Maven and Ivy'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-6210717690961868221</id><published>2008-04-15T09:27:00.000-07:00</published><updated>2008-04-15T09:50:16.183-07:00</updated><title type='text'>Mondrian in Mainz</title><content type='html'>Pentaho is hosting its first &lt;a href="http://wiki.pentaho.org/display/COM/Pentaho+Community+Gathering+-+Mainz+2008"&gt;community meet-up in Mainz, Germany&lt;/a&gt;, on 13th and 14th June, 2008.&lt;br /&gt;&lt;br /&gt;I will be there, and so will the leaders of the other Pentaho projects: Thomas Morgner (Pentaho Reporting), Matt Casters (Kettle), Mark Hall (Weka).&lt;br /&gt;&lt;br /&gt;The format of the meeting will be along the lines of a &lt;a href="http://en.wikipedia.org/wiki/BarCamp"&gt;BarCamp&lt;/a&gt;: no PowerPoint, lots of demos, audience participation, beer/wine on hand, and fun afterwards in the form of a cruise on the river Rhine. (Mainz is in the heart of Germany's wine country, so it would be difficult &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; to have fun!)&lt;br /&gt;&lt;br /&gt;Are you going to join us? What would you like to see at the meet-up? What application/technology could you demo?&lt;br /&gt;&lt;br /&gt;And by the way, if you can't wait until June, I will be giving a &lt;a href="http://en.oreilly.com/mysql2008/public/schedule/detail/997"&gt;talk entitled "Creating Interactive OLAP Applications with MySQL Enterprise and Mondrian" at the MySQL conference&lt;/a&gt; tomorrow in Santa Clara, California. Join me there... or find me on the conference floor/in the bar afterwards.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-6210717690961868221?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/6210717690961868221/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=6210717690961868221' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6210717690961868221'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6210717690961868221'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/04/mondrian-in-mainz.html' title='Mondrian in Mainz'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-6162397310379136982</id><published>2008-02-27T06:21:00.000-08:00</published><updated>2008-02-27T06:48:13.754-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sqlstream mondrian stream sql olap etl'/><title type='text'>Streaming SQL meets OLAP</title><content type='html'>Streaming SQL and &lt;a href="http://en.wikipedia.org/wiki/Olap"&gt;OLAP&lt;/a&gt; are two of the most interesting and powerful paradigms in data processing. OLAP is a well-established technique for analyzing large databases of historic data. Streaming SQL is a more recent innovation, that applies the declarative power of the SQL language to the problem of managing data in motion.&lt;br /&gt;&lt;br /&gt;So, what happens when you combine OLAP with Streaming SQL? The combination is capable of solving some business problems that can't be solved any other way. OLAP is usually hampered by conventional &lt;a href="http://en.wikipedia.org/wiki/Etl"&gt;ETL&lt;/a&gt; techniques: it is difficult to keep the data warehouse up to date, because batch-based ETL processes are only efficient when dealing with a few hours or days of data. OLAP engines excel at comparisons between time periods (say, this quarter compared to the same quarter last year) or comparable data sets (say, this brand versus that brand); when powered by a streaming SQL engine, an OLAP engine can also include the most current data in its analysis (say, this hour compared to the average for this hour of the day over the last 6 months).&lt;br /&gt;&lt;br /&gt;The highest value data in the enterprise is that which represents what is happening to the business right now. This data includes various kinds of remote procedure calls, state changes of critical systems, and all kinds of business events. This data isn't stored on disk - we call it &lt;span style="font-style: italic;"&gt;data in flight&lt;/span&gt; as opposed to conventional &lt;span style="font-style: italic;"&gt;data at rest&lt;/span&gt; - and conventional ETL has difficulty accessing it. Streaming SQL allows you to bring this data into the same format as other enterprise data, but retain the ability to analyze and act on it in real time.&lt;br /&gt;&lt;br /&gt;I'm going to look at how you could combine the &lt;a href="http://mondrian.pentaho.org"&gt;mondrian&lt;/a&gt; OLAP engine with the &lt;a href="http://www.sqlstream.com/"&gt;SQLstream &lt;/a&gt;streaming SQL engine.&lt;br /&gt;&lt;br /&gt;Mondrian requires its data to be stored in a relational database. To ensure high performance on a large data set, mondrian caches query results in memory, and also uses aggregate tables which have been populated with summaries of the data. Mondrian's &lt;a href="http://mondrian.pentaho.org/documentation/architecture.php"&gt;cache&lt;/a&gt; and &lt;a href="http://mondrian.pentaho.org/documentation/aggregate_tables.php"&gt;aggregate tables&lt;/a&gt; both require careful management if mondrian is to give the correct answers on a rapidly changing data set.&lt;br /&gt;&lt;br /&gt;SQLstream helps mondrian do this by providing a continuous, real-time ETL process. As we shall see, the steps are: &lt;span style="font-style: italic;"&gt;acquire&lt;/span&gt; the real-time data and expose it as a common relational format; &lt;span style="font-style: italic;"&gt;transform&lt;/span&gt; into an organization suitable for OLAP and data warehousing; &lt;span style="font-style: italic;"&gt;load&lt;/span&gt; into the data warehouse, including aggregate tables; and &lt;span style="font-style: italic;"&gt;notify&lt;/span&gt; mondrian of changes to its cache.&lt;br /&gt;&lt;br /&gt;First of all, SQLstream can help to &lt;span style="font-weight: bold; font-style: italic;"&gt;acquire&lt;/span&gt; the data. As we said earlier, traditional ETL processes are limited to reading data at rest: from databases, mainframes, and files extracted from other operational systems. Data in flight exists in other formats: messages on &lt;a href="http://en.wikipedia.org/wiki/Message_Oriented_Middleware"&gt;message-oriented middleware&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Web_service"&gt;web service&lt;/a&gt; calls, &lt;a href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol"&gt;TCP&lt;/a&gt; network packets, and so forth. SQLstream can &lt;a href="http://www.sqlstream.com/Products/productsTechAdapters.htm"&gt;subscribe to these sources of data&lt;/a&gt;, and tap into the traditional data warehouse sources too: it can monitor a database table and generate events as new transactions occur, and tail a log file to read rows as they are appended to the log file.&lt;br /&gt;&lt;br /&gt;One of SQLstream's core concepts is a &lt;span style="font-style: italic;"&gt;stream&lt;/span&gt;. A stream is analogous to a table in a relational database; but whereas a table contains a finite set of rows which have been inserted at some time in the past and stored on disk, a stream contains an infinite sequence of rows that arrive whenever the producer decides to send them. (SQLstream in fact supports tables too, so that you can combine historical or reference data with event data.)&lt;br /&gt;&lt;br /&gt;What streams and tables have in common is the fact that you can manipulate them using SQL queries.  Not just the simple operations like filtering and routing, but operations which combine multiple rows such as join and aggregation. You can combine rows with other rows in the same stream (often demarcated by a time window of interest), with rows from other streams, and with historical and reference data.&lt;br /&gt;&lt;br /&gt;Next, you need to &lt;span style="font-weight: bold; font-style: italic;"&gt;prepare the data&lt;/span&gt; and convert it into a form suitable for large-scale analysis. In SQLstream, you can use SQL to perform a real-time, continuous ETL process. For example:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;You can apply standard SQL operators to cleanse and convert the data fields&lt;/li&gt;&lt;li&gt;You can calculate trends such as moving averages using SQLstream's windowed aggregation operations.&lt;/li&gt;&lt;li&gt;If your data warehouse schema contains &lt;a href="http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2"&gt;slowly-changing dimensions&lt;/a&gt;, SQLstream can help the loading process by identifying transactions which represent a new member of a dimension. For example, when an order is received from an existing customer, SQLstream can find that customer's id, whereas if the customer is new, it can generate a new surrogate key value.&lt;/li&gt;&lt;li&gt;If your data warehouse schema contains aggregate tables, they need to be populated with records which represent multiple fact table records. It is often cheaper to compute these aggregate records in memory.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;On the subject of aggregate tables, note that if you have many aggregate tables and data rates are extremely high, eventually the I/O capacity of the DBMS makes it impossible to keep the aggregate tables 100% up to date. You should reduce the number or granularity of the aggregate tables, and partition each aggregate table by time to ensure that only one block per is being actively written to and therefore the active block of aggregate tables can fit into the DBMS's buffer cache.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Loading the data warehouse&lt;/span&gt; is straightforward. SQLstream has a database adapter that makes DBMS tables appear as foreign streams; writing to these streams makes an insert, update or delete occur in the data warehouse.&lt;br /&gt;&lt;br /&gt;As data is loaded into the data warehouse, it becomes inconsistent with the state of mondrian's cache. Mondrian's cache is necessary for performance if mondrian has many concurrent users or if the data warehouse is so large that SQL queries take a long time, but flushing the entire cache every time there is an update negates the value of the cache.&lt;br /&gt;&lt;br /&gt;Fortunately mondrian has an &lt;a href="http://julianhyde.blogspot.com/2007/02/mondrian-cache-control.html"&gt;API to let you notify mondrian of changes that affect its cache contents&lt;/a&gt;. You can tell mondrian specifically which data changed; for example, you can say 'there was just a sale of beer in Texas', and mondrian will mark precisely these entries in the cache as invalid, so they will be re-read from the database next time an OLAP query requests them.&lt;br /&gt;&lt;br /&gt;Once again, the problem can easily be solved using a foreign stream. The foreign stream should call mondrian's cache control API for each row it receives; a SQLstream pump object ensures that every record written into the fact table is mirrored into the foreign stream and therefore mondrian's cache is kept in sync with the DBMS.&lt;br /&gt;&lt;br /&gt;In conclusion, there is a synergy between OLAP and streaming SQL techniques that allows new business problems to be solved and existing problems to be solved much more efficiently. SQLstream provides a platform for all manner of continuous ETL operations, and mondrian with its open-source license and extensible Java architecture is a natural fit.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-6162397310379136982?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/6162397310379136982/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=6162397310379136982' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6162397310379136982'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6162397310379136982'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/02/streaming-sql-meets-olap.html' title='Streaming SQL meets OLAP'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-2027917081251146989</id><published>2008-02-13T00:33:00.000-08:00</published><updated>2008-02-13T00:46:04.611-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='recycling'/><title type='text'>30% less bad</title><content type='html'>&lt;a href="http://www.arrowheadwater.com/"&gt;Arrowhead&lt;/a&gt;, distributors of mountain spring water, are running a high-profile TV campaign about how they are using 30% less plastic:&lt;br /&gt;&lt;blockquote&gt;&lt;/blockquote&gt;&lt;blockquote&gt;"Less plastic. Less impact. A little natural does a lot of good&lt;small&gt;&lt;sup&gt;TM&lt;/sup&gt;&lt;/small&gt;".&lt;/blockquote&gt;They should be careful; they might give their consumers ideas about &lt;a href="http://www.msnbc.msn.com/id/5279230/"&gt;how bad those plastic bottles are&lt;/a&gt;. Who knows, some of those consumers might come up with &lt;a href="http://www.landmarkstores.co.uk/Uploads/ProductImages/ref464green.jpg"&gt;a solution that uses 100% less plastic&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-2027917081251146989?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/2027917081251146989/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=2027917081251146989' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2027917081251146989'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/2027917081251146989'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/02/30-less-bad.html' title='30% less bad'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-8918641283212314355</id><published>2008-02-10T11:03:00.000-08:00</published><updated>2008-02-10T11:42:40.691-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='caucus colorado democracy'/><title type='text'>Democracy, caucus-style</title><content type='html'>I've heard a lot about caucuses since I moved to the United States fifteen years ago, and particularly over the past few months. Caucuses are like a living fossil, recalling the days when the Founding Fathers were experimenting with this dangerous new idea called Democracy.&lt;br /&gt;&lt;br /&gt;Since I'm not a US citizen, and I live in a state (California) which does not have a caucus system, I'm not likely to experience a caucus any time soon. So I was pleased to hear about one first-hand from my friend Eric in Colorado. In Eric's words:&lt;br /&gt;&lt;blockquote&gt;&lt;i&gt;First time I've ever caucused.  Heck, first time I've ever registered with a party. First caucus for most everyone else, too.&lt;br /&gt;&lt;br /&gt;7 precincts in one elementary school cafeteria.  Good thing the fire marshal didn't show. Typical turnout is 15 to 20 people combined in all 7 precincts, but last night the turnout for my precinct alone was 49, plus a handful of observers, and the total turnout for all 7 precincts was 355 voters.&lt;br /&gt;&lt;br /&gt;As the only one who had actually read the rules, I wound up chairing. That'll teach me to read the documentation.&lt;/i&gt;&lt;/blockquote&gt;What was the experience like?&lt;br /&gt;&lt;blockquote&gt;&lt;i&gt;Well, you could simulate the experience pretty easily.&lt;br /&gt;&lt;br /&gt;Just get 40 software developers together plus 10 random people off the street and pack them in a room meant for 20.  Make sure any available lavatory facilities are sized for eight-year-olds.&lt;br /&gt;&lt;br /&gt;To simulate the presidential poll, ask them to discuss the merits of vi versus emacs.  To ensure a proper level of passion, make sure they are all believe that Microsoft will dump Word and replace it with whichever editor they pick.  When they're finally ready to vote, do it by having them gather into separate groups and walk by you so that you can count them off like sheep. Once that's done, search through seventy pages of random government forms until you find the right reporting form and fill it out.&lt;br /&gt;&lt;br /&gt;To simulate the senate poll, ask them to choose what they would like for dessert: apple pie ala mode, or an incredibly obscure dish from Mozambique which nobody has ever heard of or tasted (and which you can't even find the name of in your packet).  Counting these results should be easy, but make sure to account for people who have already left.  Search the random forms again for the proper form.&lt;br /&gt;&lt;br /&gt;For the party platform, read aloud 3 random paragraphs each from the EU constitution, War and Peace, the Federalist Papers, Marvel Superhero Comics #37, and the Unabomber manifesto.  Have them vote yea or nay on approving each for discussion at the state convention.&lt;br /&gt;&lt;br /&gt;Finally, pass a donation envelope to help do this all again next election, and adjourn the meeting (but be warned, you'll need to stick around to sort out and sign 20 more forms before you get to leave).&lt;/i&gt;&lt;/blockquote&gt;I still wish I'd been there. As Winston Churchill said, Democracy is the second worst form of government; the only worse form is all of the others.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-8918641283212314355?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/8918641283212314355/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=8918641283212314355' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8918641283212314355'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/8918641283212314355'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/02/democracy-caucus-style.html' title='Democracy, caucus-style'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-5520317633252226864</id><published>2008-02-06T15:01:00.000-08:00</published><updated>2008-02-07T19:27:23.603-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='olap4j gwt slice dice olap mdx'/><title type='text'>A nice little OLAP viewer</title><content type='html'>Bill Seyler and Will Gorman from &lt;a href="http://www.pentaho.com/"&gt;Pentaho&lt;/a&gt; have put together a nice little OLAP viewer  in their spare time, called &lt;a href="http://code.google.com/p/halogen/"&gt;Halogen&lt;/a&gt;. It isn't fully baked (by a long stretch) but it shows what you can do if you pair up &lt;a href="http://code.google.com/webtoolkit/"&gt;GWT&lt;/a&gt; with &lt;a href="http://www.olap4j.org/"&gt;olap4j&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://billandlizz.com/images/Report_Screen.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; cursor: pointer; width: 320px;" src="http://billandlizz.com/images/Report_Screen.jpg" align="right" alt="Halogen viewer" border="1" /&gt;&lt;/a&gt; I think it shows off the strengths of both GWT and olap4j nicely. Both technologies have a strong portability message. Because of GWT, Halogen has really nice AJAX usability and can run in any browser. Because of olap4j, it can run against mondrian and an XMLA provider without changing a line of code. (I haven't tried it against against &lt;a href="http://www.microsoft.com/sql/solutions/bi/bianalysis.mspx"&gt;Microsoft SQL Server Analysis Services&lt;/a&gt;, for instance, but it shouldn't be hard to get it working.)&lt;br /&gt;&lt;br /&gt;This isn't an official Pentaho product, more of a proof of concept with the potential to grow into an alternative to &lt;a href="http://jpivot.sourceforge.net/"&gt;JPivot&lt;/a&gt; if the community thinks it is cool and we get some momentum behind it. To make it easier for people to contribute, we made a point of releasing it under the commercial-friendly license, namely the &lt;a href="http://www.mozilla.org/MPL/"&gt;Mozilla Public License&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Check it out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-5520317633252226864?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/5520317633252226864/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=5520317633252226864' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5520317633252226864'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5520317633252226864'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2008/02/nice-little-olap-viewer.html' title='A nice little OLAP viewer'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-4413186193899761452</id><published>2007-12-19T22:18:00.000-08:00</published><updated>2007-12-19T23:05:07.337-08:00</updated><title type='text'>Vista Service Pack 1 Release Candidate</title><content type='html'>I'm one of those folks who runs Windows on my laptop out of habit, but I'm not passionate about it. I run Ubuntu on my server, and am happy with that, and have considered moving to Mac OS just so I can have one of those cool, shiny Apple Powerbook notebooks.&lt;br /&gt;&lt;br /&gt;I hadn't been thinking upgrading from XP to Vista, but I bought a new Dell laptop earlier this year, and it came with Vista, so I said 'Why not?'. Six months later, I am among the legions of people who are &lt;a href="http://www.microsoft-watch.com/content/vista/what_went_wrong_with_windows_vista.html"&gt;unimpressed with Vista&lt;/a&gt;. It's not that it's worse than XP, but Microsoft have changed a lot of things, and virtually none of them are for the better.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_BVv0WTpeWTs/R2oQDI64_7I/AAAAAAAAAAM/a1qctG3n2AM/s1600-h/hibernate_menu.gif"&gt;&lt;img align="right" style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp0.blogger.com/_BVv0WTpeWTs/R2oQDI64_7I/AAAAAAAAAAM/a1qctG3n2AM/s320/hibernate_menu.gif" alt="" id="BLOGGER_PHOTO_ID_5145943170344353714" border="0" /&gt;&lt;/a&gt;For example, XP used to give you the choice whether to sleep, turn off, or restart; and if you pressed the shift key, the sleep option would change to hibernate. In Vista, the corresponding menu has no hibernate option. This is a nuisance, because I dual boot my laptop, and I use hibernate a lot.&lt;br /&gt;&lt;br /&gt;And so on. The whole thing felt more sluggish than it should, given that I'd moved to a laptop 3x more powerful; and there were other niggling things, nothing broken exactly.&lt;br /&gt;&lt;br /&gt;I thought I'd give them chance to redeem themselves with Vista Service Pack 1 (SP1). I resisted the temptation to try SP1 Beta in September, but when they released the &lt;a href="http://technet.microsoft.com/en-us/windowsvista/bb738089.aspx"&gt;Release Candidate&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I just installed the Release Candidate (RC) of Vista SP1. The upgrade took about an hour, as they warned, but otherwise went smoothly. Vista and most apps work fine, but in two hours I have had problems with the usually impeccable &lt;a href="http://www.raxco.com/products/perfectdisk2k/"&gt;PerfectDisk&lt;/a&gt; (offline defrag doesn't work) and &lt;a href="http://desktop.google.com/features.html"&gt;Google Desktop&lt;/a&gt; (has crashed twice so far).&lt;br /&gt;&lt;br /&gt;Given that post-SP1 Vista isn't noticeably different (they may have shuffled the fields of the wireless connection dialog around, I'm not sure) and a couple of apps have problems, I'd recommend that you don't install SP1 until it's officially released.&lt;br /&gt;&lt;br /&gt;I can't blame Microsoft when two apps that they don't control have problems in a release which isn't production. But over the past ten years, we have grown accustomed to the new release always being better than the last. I'll have to revisit that assumption. And for future Microsoft products, I'm downgrading my rating from 'sure, let's give it a try' to 'skeptical'.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-4413186193899761452?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/4413186193899761452/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=4413186193899761452' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4413186193899761452'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4413186193899761452'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2007/12/vista-service-pack-1-release-candidate.html' title='Vista Service Pack 1 Release Candidate'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp0.blogger.com/_BVv0WTpeWTs/R2oQDI64_7I/AAAAAAAAAAM/a1qctG3n2AM/s72-c/hibernate_menu.gif' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-4290443096554852098</id><published>2007-11-18T18:13:00.000-08:00</published><updated>2007-11-18T18:35:24.627-08:00</updated><title type='text'>olap4j support for scrolling result sets</title><content type='html'>&lt;a href="http://www.olap4j.org/"&gt;olap4j&lt;/a&gt; version 0.9 (beta) is almost ready for release. The specification has been expanded and clarified considerably (&lt;a href="http://olap4j.svn.sourceforge.net/viewvc/*checkout*/olap4j/trunk/doc/olap4j_fs.html"&gt;latest spec&lt;/a&gt;). There is now a beta-quality &lt;a href="http://mondrian.pentaho.org/"&gt;mondrian&lt;/a&gt; driver for olap4j, and a comprehensive set of tests in the TCK (technology compatibility kit). It will be released before the end of the month.&lt;br /&gt;&lt;br /&gt;But before we release 0.9, I need to solve a dilemma regarding how olap4j should handle scrolling result sets.&lt;br /&gt;&lt;br /&gt;We have already established that olap4j will allow clients to access the positions on an axis both via a random-access list and via a bi-directional iterator. Hence &lt;a href="http://www.olap4j.org/api/org/olap4j/CellSetAxis.html"&gt;CellSetAxis&lt;/a&gt; implements &lt;span style="font-family:courier new;"&gt;Iterable&amp;lt;Position&amp;gt;&lt;/span&gt; and has methods&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:courier new;"&gt;List&amp;lt;Position&amp;gt; getPositions()&lt;/position&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:courier new;"&gt;ListIterator&amp;lt;Position&amp;gt; iterator()&lt;/position&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;With these two methods, I have no doubt that client applications will find it easy to navigate a &lt;span style="font-family:courier new;"&gt;CellSet&lt;/span&gt; with concise, clear code. The issue is more whether the writer of an olap4j driver can write an efficient implementation. To do that, the client needs a clear way to signal their intended access pattern to the driver, or conversely, the driver needs to infer the client's access pattern from the methods it calls.&lt;br /&gt;&lt;br /&gt;Now, one might say, let's use a smart list which does paging and generally behaves like an iterator behind the scenes. The problem is that the driver can't infer the client's intended access pattern. The smart list would not know whether to page out previous positions (and have to ask the server for them again, at considerable cost to the server and communication cost) or to try to keep them in memory. So, I reject the smart list approach. Let's stick with the dual list and iterator, and see what the driver could infer from the client.&lt;br /&gt;&lt;br /&gt;Consider these cases:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;1. Small number of columns and rows&lt;/span&gt;. No memory or performance issues here. Client may prefer the convenience of the list, but the iterator will work fine too.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;2. Small number of columns, large number of rows&lt;/span&gt;. Client will probably want to use a list for columns, so that they can scan the list repeatedly, but use an iterator for rows, to signal that earlier rows can be released from memory.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;3. Large number of columns and rows&lt;/span&gt;. Client will probably want to use an iterator for both columns and rows. Certainly the number of cells will be so large that they will have to be paged, probably in blocks. The driver can infer the access pattern from the iterators and be reasonably intelligent how to manage cell values.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;4. Unknown number of columns and rows&lt;/span&gt;. Not as common as you would think, because if the client is going to receive many thousands of rows, the application writer generally knows that this is characteristic of their application (say managing mailing lists) and chooses an appropriate client. I suppose the client could test the water using an iterator to find the size of the axis before switching to a list access method; or the driver writer could implement a smart list which reads large blocks of positions (say 1,000) at a time but for small-to-moderate sized cell sets behaves essentially the same as a dumb list.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Problem #1: Cell ordinals&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In case #3, it is not possible to access cells via an integer cell ordinal computed using the formula &lt;span style="font-style: italic;"&gt;cellOrdinal == columnOrdinal + columnCount * rowOrdinal&lt;/span&gt;, because &lt;span style="font-style: italic;"&gt;columnCount&lt;/span&gt; is not known until the entire columns axis has been evaluated.&lt;br /&gt;&lt;br /&gt;I propose that you can open a cell set in two modes, random-access-mode and cursor-mode. Cursor-mode is for very large result sets. In cursor-mode, the driver would not attempt to find the length of the columns axis, and therefore you could not call &lt;span style="font-family:courier new;"&gt;Cell.getOrdinal()&lt;/span&gt; or &lt;span style="font-family:courier new;"&gt;CellSet.getCell(int ordinal)&lt;/span&gt; or &lt;span style="font-family:courier new;"&gt;Cell.getProperty(StandardCellProperty.CELL_ORDINAL)&lt;/span&gt;: you would get a runtime error if you called any of these methods. You could still call &lt;span style="font-family:courier new;"&gt;CellSet.getCell(List&lt;integer&gt; coords)&lt;/integer&gt;&lt;/span&gt;, &lt;span style="font-family:courier new;"&gt;CellSet.getCell(Position... positions)&lt;/span&gt;, &lt;span style="font-family:courier new;"&gt;List&lt;integer&gt; Cell.getCoordinates()&lt;/integer&gt;&lt;/span&gt;, and &lt;span style="font-family:courier new;"&gt;Cell.getProperty(Property)&lt;/span&gt; for any other Property than &lt;span style="font-family:courier new;"&gt;CELL_ORDINAL&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Random-access mode is for regular cell sets. If you attempted to call one of the above methods, it would find the length of the columns axis, scanning to the end if necessary.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Problem #2: Backwards iteration&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you're writing an OLAP client application, it's really nice that the &lt;span style="font-family:courier new;"&gt;ListIterator&lt;/span&gt; can go backwards. You can even go back to the start and iterate over the list, as many times as you like. But if you're writing an olap4j driver which is intended to be network and memory efficient, backwards iterators are sheer hell. You have to buffer values just in case the client wants to go back.&lt;br /&gt;&lt;br /&gt;So, we need a way for the client to declare (or the driver to infer) that the client will not go backwards. My preferred option would be to change the &lt;span style="font-family:courier new;"&gt;CellSetAxis&lt;/span&gt; method &lt;span style="font-family:courier new;"&gt;ListIterator&amp;lt;Position&amp;gt; iterate()&lt;/position&gt;&lt;/span&gt; to &lt;span style="font-family:courier new;"&gt;Iterator&amp;lt;Position&amp;gt; iterate()&lt;/span&gt;, and throw a runtime error if this method is called more than once.&lt;br /&gt;&lt;br /&gt;Another option would be to open the &lt;span style="font-family:courier new;"&gt;CellSet&lt;/span&gt; with an option where the client promises not to drive iterators backwards; but this would not support case #2, where you might want to restart the iterator over columns but not the iterator over rows.&lt;br /&gt;&lt;br /&gt;Let me know your thoughts at the &lt;a href="http://sourceforge.net/forum/message.php?msg_id=4630374"&gt;olap4j Open Discussion forum&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-4290443096554852098?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/4290443096554852098/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=4290443096554852098' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4290443096554852098'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/4290443096554852098'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2007/11/olap4j-support-for-scrolling-result.html' title='olap4j support for scrolling result sets'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-6212382616500950647</id><published>2007-09-30T15:56:00.000-07:00</published><updated>2007-09-30T16:04:55.276-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sdforum mondrian'/><title type='text'>SDForum BI SIG</title><content type='html'>I shall be giving a talk entitled "Building Scalable OLAP applications with Mondrian and Pentaho" to the Business Intelligence SIG of SDForum. The talk will be at 7pm on October 16th 2007 in the Cubberley Community Center in Palo Alto, CA. All are welcome; details &lt;a href="http://www.sdforum.com/index.cfm?fuseaction=Calendar.eventDetail&amp;amp;eventID=12963"&gt;at the SDForum site&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-6212382616500950647?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/6212382616500950647/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=6212382616500950647' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6212382616500950647'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/6212382616500950647'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2007/09/sdforum-bi-sig.html' title='SDForum BI SIG'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5672165237896126100.post-5669272067079590790</id><published>2007-09-27T10:20:00.000-07:00</published><updated>2007-09-30T16:04:28.227-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='openmrs mondrian kettle pentaho'/><title type='text'>OpenMRS</title><content type='html'>&lt;p&gt;OpenMRS stands for 'Open Medical Records System', and is a major open-source project aiming to build medical information systems for developing countries. Their general goal is to provide a low-cost (free) solution for tracking patient conditions. Their specific goal is to fight AIDS in Africa.&lt;/p&gt;&lt;p&gt;Their website is &lt;a href="http://openmrs.org/wiki/OpenMRS"&gt;http://openmrs.org/wiki/OpenMRS&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;The project is gaining a lot of momentum. They have strong links with organizations such as the World Health Organization (WHO), corporate sponsorship from companies such as Google, and are already deployed in 12 countries.&lt;/p&gt;The OpenMRS architects are are in the process of adding analytics using mondrian. I have committed to helping them out, and &lt;a href="http://www.ibridge.be/"&gt;Matt Casters&lt;/a&gt; (architect of &lt;a href="http://kettle.pentaho.org/"&gt;Pentaho Data Integration aka Kettle&lt;/a&gt;) is also involved. They are looking for committers do build the mondrian schema and analytics, and that's why I'm reaching out to the mondrian community.&lt;br /&gt;&lt;p&gt;Maybe you have used mondrian in a commercial project, and are looking to use those skills in a project which makes the world a better place. Maybe you have some experience with databases and data modeling, but want to hone those skills on a real project and improve your CV. If you are student or have limited income, they have some stipends available.&lt;/p&gt;I urge you to get involved with this project. Go to the website and sign up! If you know anyone who would be interested in this project, please forward this to them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5672165237896126100-5669272067079590790?l=julianhyde.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://julianhyde.blogspot.com/feeds/5669272067079590790/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5672165237896126100&amp;postID=5669272067079590790' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5669272067079590790'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5672165237896126100/posts/default/5669272067079590790'/><link rel='alternate' type='text/html' href='http://julianhyde.blogspot.com/2007/09/openmrs.html' title='OpenMRS'/><author><name>Julian Hyde</name><uri>http://www.blogger.com/profile/17816795169191026372</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://2.bp.blogspot.com/_BVv0WTpeWTs/SKjJnU5S9zI/AAAAAAAAAAw/f1Sx1wkg_sQ/S220/jhyde-headshot6.jpg'/></author><thr:total>0</thr:total></entry></feed>
