Content

Monday, March 14, 2016

How to setup a 3 Node Apache Storm 0.10 cluster in CentOS 7?

Apache Storm is the real thing for realtime computing. There are some others like Apache Spark Streaming which claims they are realtime computing which are just modified behavior than what they are designed for. Apache Storm or any variation of its design pattern is the one that needs to be picked for realtime big data computing. Apache Storm by default want to run under a supervisor process. Here we are trying to run as a background process.


The following needs to be done before beginning  the Storm cluster Setup.

1. Create 3 CentOS 7 Servers STNODE1, STNODE2, and STNODE3 as discussed in How to install CentOS 7 on Virtual Machine using VMWare vSphere 6 client?

2. Make sure Java 7 is installed and configured as default as discussed in How to install Java 7 and Java 8 in CentOS 7?. You can also install Apache Storm with Java 8.

3. Create the bigdatauser, bigdataadmin and the bigdatagroup as discussed in How to create a user, group and enable him to do what a super user can in CentOS7?

4. Make sure the firewall is disabled and stopped as discussed in How to turn off firewall on CentOS 7? 

5. Change etc/hosts file so that all the IPs and the names of the servers are resolved as discussed in
How to setup DNS entries for big data servers in the cloud or not on a domain in /etc/hosts file?

6. Using the bigdatauser setup password less ssh across the 3 clusters namely STNODE1, STNODE2 and STNODE3 as discussed in How to setup password less ssh between CentOS 7 cluster servers?

7. Install Apache Zookeeper clusters as discussed in How to setup a 3 Node Apache Zookeeper 3.4.6 cluster in CentOS 7?. Make sure you do the same as in step 5 for these servers too.

Storm has a concept of Master and Supervisor (Worker) Nodes. We are going to set STNODE1 as the Master, Storm UI and the DRPC Server Roles. The other nodes would run the Supervisor roles.

For each of the Servers STNODE1, STNODE2 and STNODE3 do the following.

Login using the bigdataadmin

//create a folder for storm under the /usr/local directory
cd /usr/local
sudo mkdir storm

//change ownership to bigdatauser
sudo chown -R bigdatauser:bigdatagroup storm

//create the data directory for storm under the var/lib
cd /var/lib
sudo mkdir storm

//change ownership to bigdatauser
sudo chown -R bigdatauser:bigdatagroup  storm


Switch to bigdataauser

//download storm
wget http://apache.claz.org/storm/apache-storm-0.10.0/apache-storm-0.10.0.tar.gz

//unpack the file
tar xzf apache-storm-0.10.0.tar.gz


//move the storm installation to the usr/local/storm from the download directory
mv apache-storm-0.10.0 /usr/local/storm/

//switch to the storm directory
cd /usr/local/storm/apache-storm-0.10.0



//switch to the conf directory
cd conf

edit the config file and change the following

vi storm.yaml

#include the zookeeper servers

storm.zookeeper.servers:
  - ZKNODE1
  - ZKNODE2
  - ZKNODE3

//change data directory to
storm.local.dir: "/var/lib/storm"


//change the nimbus host so that all servers know its in a cluster
nimbus.host: "STNODE1"

//we can run DRPC on all the server.
drpc.servers:
  - STNODE1
  - STNODE2

  - STNODE3


//move to the root of the cluster server on any of the cluster. Start the Storm in all of the 3 servers.
cd /usr/local/storm/apache-storm-0.10.0

//start nimbus (master) only on STNODE1
bin/storm nimbus >/dev/null &


//start storm UI only on STNODE1
bin/storm ui >/dev/null &

//start supervisors on STNODE2 and STNODE3
bin/storm supervisor >/dev/null &

//start DRPC on all Servers
bin/storm drpc >/dev/null &

//if you need to stop
kill processid

//check if the process is running
ps -aux | grep java

//check for backtype.storm.daemon.nimbus for Nimbus

//check for backtype.storm.ui.core for UI
//check for backtype.storm.daemon.drpc for DRPC
//check for backtype.storm.daemon.supervisor for supervisor

//check the status of the cluster from the UI
http://stnode1:8080

//you should be able to see 1 nimbus and 2 supervisor servers, if we have configured it correctly. 

Troubleshooting
If the UI does not come up make sure that all the services include zookeeper instances are running and the bigdatauser can ssh into all the servers including zookeeper.

No comments:

Post a Comment