The project is interesting in several respects:
- The project involves signal processing. Unlike the "event-processing" application that we see most often at SQLstream, events arrive at a regular rate (generally 40 readings every second, per sensor). In signal processing, events are more likely to be processed using complex mathematical formulas (such as Fourier transforms) than by boolean logic (event A happened, then event B happened). Using SQLstream's user-defined function framework, we were easily able to accommodate this form of processing.
- It illustrates how a stream-computing "fabric" can be created, connecting multiple SQLstream processing nodes using RabbitMQ.
- One of the reasons for building a distributed system was to allow an agile approach. Researchers can easily deploy new algorithms without affecting the performance or correctness of other algorithms running in the cloud.
- Another goal of the distributed system was performance and scalability. Nodes can easily be added to accommodate greater numbers of sensors. The system is not embarassingly parallel, but we were still able to parallelize the solution effectively.
- Lastly, the system needs to be both continuous and real-time. "Continuous" meaning that data is processed as it arrives; a smoother, more predictable and more efficient mode of operation than ETL. "Real-time" because some of the potential outputs of the system, such as tsunami alerts, need to be delivered as soon as possible in order to be useful.
No comments:
Post a Comment