Real-time Analytics with Storm and Cassandra
上QQ阅读APP看书,第一时间看更新

Executing a sample Storm topology – local mode

Before we start this section, the assumption is that you have gone through the prerequisites and installed the expected components.

WordCount topology from the Storm-starter project

To understand the components described in the previous section, let's download the Storm-starter project and execute a sample topology:

  1. The Storm-starter project can be downloaded using the following Git command:
    Linux-command-Prompt $ sudo git clone git://github.com/apache/incubator-storm.git && cd incubator-storm/examples/storm-starter
    
  2. Next, you need to import the project into your Eclipse workspace:
    1. Start Eclipse.
    2. Click on the File menu and select the Import wizard.
    3. From the Import wizard, select Existing Maven Projects.
      WordCount topology from the Storm-starter project
    4. Select pom.xml in the Storm-starter project and specify it as <download-folder>/starter/incubator-storm/examples/storm-starter.
    5. Once the project has been successfully imported, the Eclipse folder structure will look like the following screenshot:
      WordCount topology from the Storm-starter project
    6. Execute the topology using the run command and you should be able to see the output as shown in the following screenshot:
    WordCount topology from the Storm-starter project

To understand the functioning of the topology, let's take a look at the code and understand the flow and functioning of each component in the topology:

// instantiates the new builder object
TopologyBuilder builder = new TopologyBuilder();
// Adds a new spout of type "RandomSentenceSpout" with a  parallelism hint of 5
builder.setSpout("spout", new RandomSentenceSpout(), 5);

Starting with the main function, in the WordCountTopology.java class, we find the TopologyBuilder object called builder; this is important to understand as this is the class that provides us with a template to define the topology. This class exposes the API to configure and wire in various spouts and bolts into a topology—a topology that is essentially a thrift structure at the end.

In the preceding code snippet, we created a TopologyBuilder object and used the template to perform the following:

  • setSpout –RandomSentenceSpout: This generates random sentences. Please note that we are using a property called parallelism hint, which is set to 5 here. This is the property that identifies how many instances of this component will be spawned at the time of submitting the topology. In our example, we will have five instances of the spout.
  • setBolt: We use this method to add two bolts to the topology: SplitSentenceBolt, which splits the sentence into words, and WordCountBolt, which counts the words.
  • Other noteworthy items in the preceding code snippet are suffleGrouping and fieldsGrouping; we shall discuss these in detail in the next chapter; for now, understand that these are the components that control routing of tuples to various bolts in the topology.