Friday, September 21, 2007

A nice way to populate those pesky closure tables

If you use mondrian's parent-child hierarchies, you will know that performance sucks unless you create closure tables. Closure tables expand the hierarchy, and allow mondrian to the operations required to roll-up a parent-child hierarchy using raw SQL: really fast.

The problem is populating the things. Closure tables contain what computer science profs. call a transitive closure of the parent-child relation (hence their name), and transitive closures aren't something which relational databases are very good at computing (which is why the database performs so much better when they're around). You can't just define a view, or write a simple SQL statement to populate them. Up til now, you'd have to use a stored procedure (if your database supports them) or write some gnarly JDBC code.

Now Matt Casters has added a Closure Generator step to Pentaho Data Integration (aka Kettle), which should make everyone's life easier. Yet another reason to use Kettle to load your mondrian schema. The new step debuts in Pentaho 3.0.0-RC1, but it should work with any version of mondrian.

Which, I suppose, means we all get to that Friday evening beer a little earlier. I'll drink to that.

4 comments:

Anonymous said...

Hi,

If we have direct parent-child relation table with us then we can build our closure tables with simple query in oracle.check link: http://www.oracle.com/technology/oramag/oracle/05-may/o35asktom.html

-AJ

Abhishek said...

I create a closure table in one of the cubes that uses hierarchical data. I published the cube on BI server and the hierarchies look great. However, when I try to filter the analyzer report on a particular value, the report just shows the row with that particular value on which I created a filter. I was expecting analyzer to show the results for the entire hierarchy.

Do we have a workaround for this issue?

Julian Hyde said...

Abhishek,

I don't know what's causing the problem. It's possible that you populated the closure table wrong. It's possible that Mondrian isn't using the closure table. It's possible (though unlikely) that Analyzer is generating the wrong MDX. (See whether the MDX includes "DataMember".)

Julian

Abhishek said...

Thanks Julian for a quick response.

I verfied:

1. the closure table and the transformation that populates the closure table. It looks correct and The hierarchies are populated correctly in pentaho analyzer reports.

2. I can see the hierarchical data in pentaho analyzer. Also, I checked the pentaho log and I see closure table being queried by mondrian.

3. I do not see "DataMember" being included in the MDX queries fired by the analyser. Here are two queries:

1) Before applying filter:
With
Set [*NATIVE_CJ_SET] as 'Filter([*BASE_MEMBERS_TestSegment], Not IsEmpty ([Measures].[total_users]))'
Set [*SORTED_ROW_AXIS] as 'Order([*CJ_ROW_AXIS],[TestSegment].CurrentMember.OrderKey,ASC)'
Set [*BASE_MEMBERS_TestSegment] as '[TestSegment].[segment].Members'
Set [*BASE_MEMBERS_Measures] as '{[Measures].[total_users]}'
Set [*CJ_ROW_AXIS] as 'Generate([*NATIVE_CJ_SET], {([TestSegment].currentMember)})'
Set [*CJ_COL_AXIS] as '[*NATIVE_CJ_SET]'
Select
[*BASE_MEMBERS_Measures] on columns,
[*SORTED_ROW_AXIS] on rows
From [TestSegment]

2) after applying filter:
With
Set [*NATIVE_CJ_SET] as 'Filter([*BASE_MEMBERS_TestSegment], Not IsEmpty ([Measures].[total_users]))'
Set [*SORTED_ROW_AXIS] as 'Order([*CJ_ROW_AXIS],[TestSegment].CurrentMember.OrderKey,ASC)'
Set [*BASE_MEMBERS_TestSegment] as '{[TestSegment].[In-Market].[Autos].[Makes & Models].[BMW].[5-Series]}'
Set [*BASE_MEMBERS_Measures] as '{[Measures].[total_users]}'
Set [*CJ_ROW_AXIS] as 'Generate([*NATIVE_CJ_SET], {([TestSegment].currentMember)})'
Set [*CJ_COL_AXIS] as '[*NATIVE_CJ_SET]'
Select
[*BASE_MEMBERS_Measures] on columns,
[*SORTED_ROW_AXIS] on rows
From [TestSegment]

After applying filter, it just shows one row for which the filter is applied rather than entire hierarchy.

-Abhishek