There are two main reasons why Apache Storm is so popular. The number one is it can connect to many sources. The number two is scalable. The other advantage is fault tolerant. That means, guaranteed data processing.
The co-ordination between Nimbus and Supervisor carried by Zookeeper
What are topologies
- The jobs in Hadoop are similar to topology. The jobs run as per schedule defined.
- In Storm, the topology runs forever.
- A topology consists of many worker processes spread across many machines.
- A topology is a pre-defined design to get end product using your data.
- A topology comprises of 2 parts. These are Spout and bolts.
- The Spout is a funnel for topology
Two nodes in Storm
- Master Node: similar to Hadoop job tracker. It runs on a daemon called Nimbus.
- Worker Node: It runs on a daemon called Supervisor. The Supervisor listens to the work assigned to each machine.
- Nimbus is responsible for distributing the code
- Monitors failures
- Assign tasks to each machine
- It listens to the work assigned by Nimbus.
- It works under the subset of the topology.