Tuesday, June 30, 2009

SQLstream powers Firefox 3.5 realtime downloads monitor

Mozilla launched Firefox 3.5 today, and with it, a neat applet, powered by SQLstream, to monitor downloads in real time.

You can see the results at Mozilla's download stats page.

A few weeks ago, Apple's Hyperwall was awe-inspiring as a piece of visual art, but it was less impressive as a piece of real-time data integration, because the data was delayed five minutes from the app store.

SQLstream gathers data from Mozilla's download centers around the world, assigns each record a latitude and longitude, and summarizes the information in a continuously executing SQL query. Data is read with sub-second latencies, and then aggregated (using SQLstream's streaming GROUP BY operator) into summary records each describing a second of activity.

A server-side Java program reads the data using JDBC, serializes it as JSON, and transmits it to all connected web clients. Clients render the charts using the Canvas tag, newly introduced in HTML 5. The results are very impressive visually, but to a back-end guy like myself, the plumbing is impressive too.

The amazing thing is that SQLstream makes this so easy. Our official company blurb talks about "shortening data integration projects from months to weeks", but this project took just a couple of days of work.

By the way, don't try to view the page in Microsoft's Internet Explorer. Ten years ago, Internet Explorer led the charge to enhance the capabilities of the web browser, introducing dynamic HTML (DHTML), XML handling in the browser, ActiveX controls and other capabilities, but those days are over. With HTML 5 there is a renaissance in web standards; Firefox is leading the pack, with other 'modern' browsers such as Safari, Opera and Chrome not far behind.

2 comments:

Alex said...

It is interesting to observe that despite SQLStream computes its download stats in real-time - the actual charting is actually build (under the hood with javascript plumbings) upon a 1 minute polling to server and that the polled data chunck is then displayed as a playback - so as to have the user actually see some real time like data - while what he sees is already 1 minute old.

Wondering if things could be improved (HTML5 WebSocket?) and if that was a deliberate choice to ensure scalability of the frontend servers given this is a public dashboard.

Julian Hyde said...

Your comments are right on the money. SQLstream is computing a variety of analytics on the Mozilla download feed (barely breaking a sweat at ~100 downloads per second), and can deliver them to users within Mozilla in real time, but today's web technologies make it difficult to propagate that data to thousands of simultaneous users over HTTP. It's that old last-mile problem in another guise.

This app is, after all, about the number of Firefox downloads; on the way, we also managed to showcase the capabilities of SQLstream in processing that data, and new HTML5 features such as Canvas element. We didn't want to get bogged down solving a herculean server problem (see twitter's boards for plenty of examples of those), so we took the advice of Mozilla's operations staff: process the data, generate the results into one minute batch files, and deliver those files to Mozilla's load balancers. It was an appropriate architecture, given the huge numbers of web clients viewing the download data.

HTTP has serious problems handling 'push' data because it is an asymmetric protocol. The client calls the server, but the server can't call the client. Hacks such as long-polling tend to have other compromises, such as huge numbers of open sockets on the server. I am glad that HTML5 addresses that problem. If WebSockets can take a good swipe at solving this problem on the client and server side, that would be excellent news. We will be taking a good look at supporting WebSockets in future versions of SQLstream.