As some of you may already know, Storm is an open-source framework designed to perform distributed real-time computation. And we start by saying “some of you” because, as a lot of Big Data technologies, Storm is just making its place in the Big Data world. It’s been around for over a year, being Twitter the main user and contributor, and it’s already in the eyes of a lot of developers. Why? Because a large community has seen Storm as a perfect fit for nowadays needs of real-time processing.
Technologies like Hadoop have made possible to process vast amounts of information, in matter of minutes. Storm does the same, but in matter of seconds or even less. That’s why some sonsider Storm as the Hadoop of Real-Time processing.
One of the scenarios where Storm is a perfect candidate is a system that needs to process a huge amount of messages delivered to a Message Queue. And when we say huge amount, we mean thousands of messages per second!
Yes, you might be thinking: “But why not use a Message Driven Bean instead?”. Well, an MDB is an excellent approach when you expect a lesser amount of messages per socond or if the processing time is not critical. But here we’re talking about a massive volume of messages, continuously arriving to the queue and in the need of immediate and fast processing.
Just imagine that you need to perform some processing to all these messages, and that the final result must be stored in Data Marts (let’s say Oracle, DB2 or even Hive), so that it can be immediately available for real-time queries. With Storm, it would be as simple as just defining a Spout to consume the messages from the queue, and passing them to Bolts that will be in charge of the processing and further storage of the information in one or several of the mentioned Data Marts. We can even think of including decision Bolts to choose a particular storage depending on information (like metadata) contained in the message.
The following diagram ilustrates better what was previously described:
As you can see, one of the good things about Storm is the possibility to devide the business logic into small pieces, that is, Bolts. This makes reusability and maintenance a lot simpler.
On the second part of this series we will be taking a look at some code behind this design. In the meantime, if you want to learn more about Storm and what you can do with it, you can go to the wiki: https://github.com/nathanmarz/storm/wiki