Thursday, June 11, 2009

Cell writeback in Mondrian

Writeback is a feature that allows you to modify OLAP cell values and see the effects ripple through the data set, automatically modifying child and parent cells, and also cells derived using calculations. This allows you to perform 'what if' analysis and applications such as budgeting.

I have added experimental support for writeback to Mondrian.

In Mondrian's case, the term 'writeback' is a bit misleading. In a ROLAP system such as Mondrian, writing back to the database would be difficult, since values are stored in a fact table but we allow cells of any granularity to be modified. One modified cell might contain thousands of fact table rows. So, we don't write cells back to the database, but just retain the modified cells in memory, and propagate the modifications to related cells.

Here's how to use the experimental writeback support. Some of the details may change later as we make the feature more usable.

First, enable writeback for your cube. Create a dimension called 'Scenario', and a measure called 'Atomic Cell Count':
<Cube name='Sales'>
<Dimension name='Scenario' foreignKey='time_id'>
<Hierarchy primaryKey='time_id' hasall='true'>
<InlineTable alias='_dummy'>
<ColumnDefs>
<ColumnDef name='foo' type='Numeric'/>
</ColumnDefs>
<Rows/>
</InlineTable>
<Level name='Scenario' column='foo'/>
</Hierarchy>
</Dimension>
<!-- Other dimensions... -->
<Measure name='Atomic' aggregator='count'/>
<!-- Other measures... -->
</Cube>
(Yes, this is a lot of crud to add to your cube definition, and it's temporary. In future, we will let you flag a cube as 'writeback enabled', and a [Scenario] dimension and [Atomic Cell Count] measure will be created automatically. Also, we will make it easier for you to create dimensions that have only calculated members, without resorting to inline tables.)

Next, create a Scenario:
Connection connection;
Scenario scenario = connection.createScenario();
int scenarioId = scenario.getId();
(The Scenario API will soon move to olap4j: before mondrian-4.0, I hope. This includes the class mondrian.olap.Scenario, the method mondrian.olap.Cell.setValue(), and the method mondrian.olap.Connection.createScenario(). It will be optional for an olap4j driver to support writeback, but Mondrian's olap4j driver will, of course.)

Write a query that uses the scenario. Assuming that scenarioId above was 1, the query
SELECT [Measures].[Unit Sales] ON COLUMNS,
{[Product],
[Product].Children,
[Product].[Drink].Children} ON ROWS
FROM [Sales]
WHERE [Scenario].[1]
returns
[Product] [Unit Sales]
========================= ============
(All) 266,773
+ Drink 24,597
+--+ Alcoholic Beverages 6,838
+--+ Beverages 13,573
+--+ Dairy 4,186
+ Food 191,940
+ Non-Consumable 50,236
Choose one of the cells returned from the query and modify its value. For example, let's reduce the sales of Drink by 1,000 from 24,597 to 23,597:
Result result = connection.executeQuery(...);
Cell cell = result.getCell(new int[] {0, 1});
cell.setValue(23597, AllocationPolicy.EQUAL_ALLOCATION);
Execute the query again, and it returns
[Product] [Unit Sales]
========================= ============
(All) 265,773
+ Drink 23,597
+--+ Alcoholic Beverages 6,563
+--+ Beverages 12,990
+--+ Dairy 4,043
+ Food 191,940
+ Non-Consumable 50,236
The value for Drink is 23,597, as expected, and the values of its children have been correspondingly reduced.

How the value is allocated to the children (and in fact all descendants) is decided by the allocation policy. In this case, we specified EQUAL_ALLOCATION, which means that all atomic cells have the same value.

An atomic cell is the finest grained value that can be viewed multidimensionally; for this cube, it is an instance of a particular customer buying a particular product, on a particular promotion, on a particular day, in a particular store. That makes for an awful lot of of atomic cells, but there may be fewer atomic cells than fact table rows. If the fact table does not have a primary key on (customer, product, time, promotion, store) some cells may have more than one fact table row.

If instead we had written
cell.setValue(23597, AllocationPolicy.EQUAL_INCREMENT);
the query would have returned
[Product] [Unit Sales]
========================= ============
(All) 265,773
+ Drink 23,597
+--+ Alcoholic Beverages 6,560
+--+ Beverages 13,022
+--+ Dairy 4,015
+ Food 191,940
+ Non-Consumable 50,236
We notice that Beverages has not been reduced as much under EQUAL_INCREMENT policy than EQUAL_ALLOCATION policy; the average value for atomic cells of Beverages must be greater than for Drink as a whole.

Allocation policies are defined consistent with Analysis Services' UPDATE CUBE statement. Mondrian does not currently implement WEIGHTED_ALLOCATION or WEIGHTED_INCREMENT policies.

Treating scenarios as a dimension is an elegant and powerful idea. Using the Scenario dimension, you can easily switch from one scenario to another, or you can compare scenarios side-by-side.

Note that you can also set a connection's current scenario. This effectively becomes the default value for the Scenario dimension in that connection, so you do not need to specify Scenario in the slicer. However, there still needs to be an explicit scenario in the context when you call Cell.setValue(). I'm not sure whether the benefit of having a scenario for a connection outweighs the benefit/confusion, and we may discontinue this feature.

Remember, this is still an experimental feature. There is some cleanup to be done, some performance tuning, and the API needs to be moved into olap4j. But most importantly, it's not useful until a user interface, such as PAT or JPivot, supports scenarios and modifying cell values.

15 comments:

Tom Barber said...

Looking good, the scrolltable model we're currently migrating to has the facility to inline editing, so I'm looking forward to playing around with this functionality.

Tom

Shishir said...
This comment has been removed by the author.
Julian Hyde said...

Further progress on writeback support.

Today I moved the public API from Mondrian into olap4j. It is checked in as the milestone olap4j-0.9.7.277, and you can browse the API of the new Scenario interface.

The Mondrian functionality is about the same as before. The corresponding Mondrian changes are on Mondrian's main line, which means they will hit the streets as part of mondrian-4.0. I can't say yet whether writeback will be a fully supported Mondrian feature or whether it will remain in the current experimental state. But it's ready for Tom and Paul to add writeback support to the PAT interface, whenever they have a moment.

By the way, this is the first feature that is available through Mondrian's olap4j API but not through the old native API. The olap4j interface is so elegant, and it didn't make sense to duplicate the effort. If you want examples for how to use the API, see Mondrian's ScenarioTest.

The changes to the olap4j API came out simple and powerful. They feel about right to me. I would love to hear whether people think that the olap4j would be suitable for other OLAP engines.

Anonymous said...

Hi, I'm very newbie in the mondrian and I have problem which I have no idea how to solve it. when I try to compile script in get error :


java.lang.NoClassDefFoundError : net/sourceforge/retoweaver/runtime/java/lang/Enum

I have included retroweaver-rt.jar in CLASSPATH but no luck. Is there anyone what the problem is ?

amel said...

plz, can you help me?
i want to do connection between eclipse and sqlserver 2005!
but i have many pb and th rt in the console of eclipse, No suitable driver!
plz can you help me!
thx :)

amel said...

Sorry
i forget to say that i want to do the connection with the API olap4j!
thx again

Julian Hyde said...

@amel Please post olap4j questions to the olap4j "Open Discussion" forum.

Rebai Hamida said...

I have a problem to update the content of the cellset,I am using mondrian-3.2.0.13661 and I want to update the content of the cellset (update the value of the cellset with another value).
for example : if I dot that :
CellSetAxis cols = cellSet.getAxes().get(0);
CellSetAxis rows = cellSet.getAxes().get(1);
for (int row = 0; row < rows.getPositions().size(); row++) {
System.out.println("Row #" + (row + 1) + ":");
for (int col = 0; col < cols.getPositions().size(); col++) {
List positions = new ArrayList(2);
positions.add(col);
positions.add(row);
Cell cell = cellSet.getCell(positions);
System.out.println(cell.getFormattedValue());
System.out.println(cell.getValue());
}
}
}
I want to update the value of the cell with specific a value, I used this like the sample:


cell.setValue(23597, AllocationPolicy.EQUAL_INCREMENT); but the value is not updated.
I didn't use the scenario.

Rebai Hamida said...

Hi, this is example we use an explicit Scenario dimension, what about implicit Scenario dimension? it's not supported on mondrian. So,how can I update the CellSet?

Rebaï

Julian Hyde said...

Rebai,

Did you call OlapConnection.setScenario? Writeback won't work unless you do that (or include a scenario in your WHERE clause).

If this doesn't work please log a bug against mondrian.

Julian

Altaf Baradia said...

Hi, This is very interesting and powerful feature. Great stuff.

Is the cell write back feature available as part of Mondrian as yet? Is Mondrian 4 out? I could only see Mondrian 3.2.1 available for download on pentaho site.

Also, I understand that mondrian uses olap4j. But still if you could please point out some link which differentiates the two. Basically where olap4j ends and where Mondrian begins in terms of overall functionality.

Julian Hyde said...

Is the cell write back feature available as part of Mondrian as yet?

Yes. Officially it is still beta and unsupported, but it works nevertheless.

Is Mondrian 4 out? I could only see Mondrian 3.2.1 available for download on pentaho site.

Mondrian-4 is not out. I am hoping to get it to alpha in the next month or two.

But my comment in the original post, "The Scenario API will soon move to olap4j: before mondrian-4.0, I hope", has come true, regardless. Scenarios are part of olap4j-1.0, released in April. See org.olap4j.Scenario.

Also, I understand that mondrian uses olap4j. But still if you could please point out some link which differentiates the two. Basically where olap4j ends and where Mondrian begins in terms of overall functionality.

olap4j is just the interface to the OLAP engine. It doesn't do any of the processing, just allows your application to get at it. Same as the relationship between JDBC and a database engine such as Oracle or MySQL.

Saber said...

thanks for this explication, plz Mr Julian i need your help.
In fact i'm newbie on mondrian & i had to develop an BI similator, so i would like to know the actual state of mondrian , does it support what-if simulation , for example using mondrian, can i know the impact on departmental resource budgets by moving employees between departments or moving product from one factory to another ??
thank you in advance.
Cordially.

Julian Hyde said...

Saber,

Mondrian scenarios do not allow you to model moving a member from one parent to another. However you can simulate the effect by subtracting X from the total of one parent member and adding X to the total of the other parent member.

Hope that helps.

Julian

Saber said...

Thank you very much for your response,
Actually i'm in the step of technical specification and i'm looking for the different technologies that would help me to implement my application.
So your remark is interessting and if you can help by another advice i will be very greatful.

Cordially.
Saber