Storm is the real-time processing system developed by BackType. Storm provides a set of general primitives for doing distributed real-time computation. It can be used for “stream processing”, processing messages and updating databases in real-time. This is an alternative to managing your own cluster of queues and workers.
You will need to use an external database like Cassandra or Riak with your Storm Topologies if you need persistence.
- Simple programming model. lowers the complexity of doing parallel batch processing, lowers the complexity for doing real-time processing.
- Runs any programming language. Support for other languages can be added by implementing a simple Storm communication protocol.
- Fault-tolerant. Storm manages worker processes and node failures.
- Horizontally scalable. Computations are done in parallel using multiple threads, processes and servers.
- Guaranteed message processing. Storm guarantees that each message will be fully processed at least once.
- Fast. The system is designed so that messages are processed quickly and uses ØMQ as the underlying message queue.
- Local mode. Storm has a “local mode” (meant for development and unit testing) where it simulates a Storm cluster completely in-process
The Storm cluster is composed of a master node and worker nodes. The master node runs a daemon called “Nimbus” which is responsible for distributing code, assigning tasks, and checking for failures. Nimbus and Supervisor daemons are handled by Apache Zookeeper.
Use case of STORM
Storm has a wide range of use cases:
- Stream processing: processing a stream of new data and updating databases in real time.
- Continuous computation: streaming the results of a query to clients to visualize in real time.
- Distributed RPC: computing an intense query on the fly in parallel.