Monday, January 01, 2007

Mondrian URLs and Jakarta Commons VFS

If you've used Mondrian, you're probably familiar with how Mondrian loads its schema from a URL embedded in the connect string.

A Mondrian connection is a URL which contains a reference to an XML file containing a Mondrian schema definition, information to connect to the JDBC database which holds the data, and various other parameters. For example,
Provider=Mondrian; Jdbc='jdbc:mysql://localhost/foodmart'; JdbcUser=foodmart; JdbcPassword=foodmart; JdbcDrivers=com.mysql.jdbc.Driver; Catalog=file:demo/FoodMart.xml
Embedded within the connect string URL is another URL, here file:demo/FoodMart.xml, from where Mondrian should load its schema.

Until now, the URL following the Catalog keyword could only one of the small number of protocols supported by java.net.URL, such as 'http' or 'file'. I've just changed Mondrian to use Jakarta Commons VFS to resolve URLs, which is a more powerful and extensible scheme.

With VFS, you can use the same builtin protocols, some new builtin protocols, and even define your own protocol. For example, when used within Pentaho BI Platform, Mondrian could use the URL
solution:/sales/schemas/my_mondrian_model.xml
to reference a Mondrian schema file stored within Pentaho's solution repository. This is possible because the Pentaho folks have exposed their solution repository as a custom filesystem.

You can even create a URL which references a file within a JAR within a zip that exists on an FTP site.

This change will be released as part of mondrian-2.3.

6 comments:

Anonymous said...

Hello Julian,

That sounds great, I have actually had some problems resolving the catalog url using mondrian with pentaho. I would like to ask you when do you think that Mondrian 2.3 will be available or if there is somewhere a roadmap of what new features it will include.

Regards,
Javier

Julian Hyde said...

Mondrian-2.3 will be released sometime this month.

The roadmap is here, but it's a little out of date. The main new feature in mondrian-2.3 will be cache control, and as usual, there are a host of bug fixes and minor enhancements.

To get an idea of the bug fixes and enhancements, see
the list of recent source code changes
.

Anonymous said...

I'm afraid current implementation doesn't work with dynamic schema processor.

if ( ! Util.isEmpty(dynProcName)) {
assert catalogStr == null;
try {
final URL url = new URL(catalogUrl);
...


The last line will fail with solution:/ or something else not supported by URL class.

I think the place of "VFS support" implementation is wrong (at load, but not at schema retrieval in Pool.get method).
When I was making a patch for change #8277, I thought that the process of schema loading and parsing is unclear and unreasonably complicated (I know, it's because of 3 different sources of schema: directly in connect string, catalog URL and dynamic processor that could also generate the schema).
I believe this process could be something like this:
1) Get schema content.
2) Process it (if needed), for example with dynamic schema processor (I think there could be more than one of them, for example: first generate, then localize).
3) Load it.
4) Cache it.
But it just thoughts and it will require some refactoring.
In real life (simply to fix this issue) seems we should change DynamicSchemaProcessor interface by adding method which takes catalogUrl as String instead of URL.
What do you think about it? If you agree with these changes I can submit a patch.
Victor Glushenkov, Pharmanet

Julian Hyde said...

Victor,

Excellent points. I didn't think hard enough about DynamicSchemaProcessor when I introduced Apache VFS.

The right solution would be to change the signature of the method from

String processSchema(
URL schemaUrl,
Util.PropertyList connectInfo)

to

String processSchema(
String schemaUrl,
Util.PropertyList connectInfo)

Of course, that will mean that anyone who has written a DynamicSchemaProcessor will have to rewrite it to use Apache-VFS.

I don't want to break backwards compatibility, so I'll introduce a new interface, and deprecate the old one. The old one will continue to work if people use non-VFS URLs.

Regarding the structure of the schema-loading code, I agree that it is complicated. If we followed your proposal to load the schema first, then process it second, we would lose one of the benefits of DynamicSchemaProcessor, which is the ability to redirect to another URL.

The most complicated part is the interaction with the schema cache, and the checksumming algorithm to prevent schemas being read unnecessarily. It seems to me that the schema should be checksummed AFTER it passes through the schema processor. (Though I suppose we could add a method in the schema processor to ask it to checksum its input.)

Any thoughts on how we can reconcile all of those requirements would be welcome.

Julian

Anonymous said...

I agree that introducing a new dynamic schema processor interface is better that breaking backward compatibility. I think it would be better also to introduce abstract implementation of it which will deal with VFS. I suppose that generating schema (or redirecting to another location) is not the main application of schema processors. Primarily they are processors of schema (that is content). So it would be not so good to implement content retrival via VFS in every dynamic schema processor implementation. For example, this class could be:

public abstract class AbstractSchemaProcessor implements TheNewDynamicSchemaProcessor {

public String processSchema(String schemaUrl, Util.PropertyList connectInfo) throws Exception {
FileSystemManager fsManager = VFS.getManager();
File userDir = new File("").getAbsoluteFile();
FileObject file = fsManager.resolveFile(userDir, schemaUrl);
FileContent fileContent = file.getContent();
return processSchema(fileContent.getInputStream());
}

public abstract String processSchema(InputStream schemaContent) throws Exception;

}

If someone's schema processor will just process the schema (neither generate it, nor redirecting elsewhere), he just need to implement processing of schema content without dealing with VFS. In another case he could implement interface in his own way.

Victor

Julian Hyde said...

I took your suggestions and changed the API of DynamicSchemaProcessor in changes 9000/9001.

I decided that retaining the processSchema(URL,PropertyList) method was more trouble than it was worth, so now there is just processSchema(String,PropertyList). I also moved DynamicSchemaProcessor to the mondrian.spi package (a more logical place for it than mondrian.rolap) and added a base class for implementations, mondrian.spi.impl.FilterDynamicSchemaProcessor.

These changes will be in mondrian-2.4.

Many thanks for your helpful suggestions, Victor.

Julian