Content

Friday, February 19, 2016

How to setup a 3 Node Apache Zookeeper 3.4.6 cluster in CentOS 7?

Zookeeper is short is a distributed state manager which can be used by many clusters to maintain state across its own clusters. Like HBase can use Zookeeper to maintain state across its own set of clusters without having to have cluster state within it.

The following needs to be done before beginning  the Zookeeper cluster Setup.

1. Create 3 CentOS 7 Servers ZKNODE1, ZKNODE2, and ZKNOD3 as discussed in How to install CentOS 7 on Virtual Machine using VMWare vSphere 6 client?

2. Make sure Java 7 is installed and configured as default as discussed in How to install Java 7 and Java 8 in CentOS 7?

3. Create the bigdatauser, bigdataadmin and the bigdatagroup as discussed in How to create a user, group and enable him to do what a super user can in CentOS7?

4. Make sure the firewall is disabled and stopped as discussed in How to turn off firewall on CentOS 7? 

5. Change etc/hosts file so that all the IPs and the names of the servers are resolved as discussed in
How to setup DNS entries for big data servers in the cloud or not on a domain in /etc/hosts file?

6. Using the bigdatauser setup password less ssh across the 3 clusters namely ZKNODE1, ZKNODE2 and ZKNODE3 as discussed in How to setup password less ssh between CentOS 7 cluster servers?


For each of the Server ZKNODE1, ZKNODE2 and ZKNODE3 do the following

Login using the bigdataadmin

//create a folder for zookeeper under the /usr/local directory
cd /usr/local
sudo mkdir zookeeper

//change ownership to bigdatauser
sudo chown -R bigdatauser:bigdatagroup zookeeper

//create the data directory for Zookeeper under the var/lib
cd /var/lib
sudo mkdir zookeeper

//change ownership to bigdatauser
sudo chown -R bigdatauser:bigdatagroup  zookeeper


Switch to bigdataauser

//create a file named myid under the data directory
cd /var/lib/zookeeper
vi myid

put only the number for the corresponding servers. DO NOT put all the 3 numbers in each server.
on ZKNODE1
1
on ZKNODE2
2
on ZKNODE3
3

if you do a cat myid it should just display 1 for ZKNODE1 and so on.


//download zookeeper
wget http://apache.arvixe.com/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz

//unpack the file
tar xzf zookeeper-3.4.6.tar.gz


//move the zookeeper installation to the usr/local/zookeeper from the download directory
mv zookeeper-3.4.6 /usr/local/zookeeper/

//switch to the /usr/local/zookeeper directory
cd /usr/local/zookeeper/zookeeper-3.4.6

//move to the conf folder for the version of zookeeper like
cd conf

//copy the sample config to zoo.cfg
cp zoo_sample.cfg zoo.cfg

//switch to the conf directory
cd /usr/local/zookeeper/zookeeper-3.4.6/conf

edit the zoo.cfg file and change the data directory to

vi zoo.cfg

//change data directory to
dataDir=/var/lib/zookeeper

#include the cluster servers
server.1=ZKNODE1:2888:3888
server.2=ZKNODE2:2888:3888
server.3=ZKNODE3:2888:3888


//move to the root of the cluster server on any of the cluster. Start the zookeeper in all of the 3 servers.
cd /usr/local/zookeeper/zookeeper-3.4.6

//start zookeeper
bin/zkServer.sh start

//if you need to stop
bin/zkServer.sh stop

//check if the process is running
ps -aux | grep java

//check for QuorumPeerMain

//check the status of each server to see if they are in a cluster. Only one among the 3 should be master and the others are followers
bin/zkServer.sh status
to find if its running as follower or leader similar to master, slave.
Mode: follower
Mode: leader


Troubleshooting Errors
Error:
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
mkdir: cannot create directory ‚/var/bin‚: Permission denied
Starting zookeeper ... bin/zkServer.sh: line 113: /var/bin/zookeeper/zookeeper_server.pid: No such file or directory
Solution:
Make sure that the data directory is correct and you are running as bigdatauser and not bigdataadmin





Thursday, February 18, 2016

How to switch Java version from Java 7 to Java 8 on CentOS 7?

Most of the Big Data stack technologies would work with Java 1.7 or Java 7 which is installed by default in CentOS 7 Server UI Edition. If not follow the instruction as described in How to install Java 7 and Java 8 in CentOS 7?

Some technologies require Java 1.8 or 8. For example the Gremlin Server from Titan Graph requires Java 8. Once we have installed Java 8 we can switch the default Java version to 8 for those technologies that required Java 8 as the default.

We can attempt to run technologies that require both Java 7 and Java 8 on the same box by using the different user login each having different Java home path but this is not recommended. Try to stick to technologies that run with the same Java version per server.

Do the following command. You would need to be an administrator.

sudo update-alternatives --config java


This would display

There are 2 programs which provide 'java'.

  Selection    Command
-----------------------------------------------
*  1           /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64/jre/bin/java
 + 2           /usr/java/jdk1.8.0_60/jre/bin/java


Enter to keep the current selection[+], or type selection number: 


The current selection is displayed in +. To change it
Type 1 and press enter to switch to Java 7
Type 2 and press enter to switch to Java 8.
To exist without changing press "enter".


Wednesday, February 17, 2016

How to setup password less ssh between CentOS 7 cluster servers?

Now that we are working with clusters we need a way for machines to communicate with each other. In the windows world with Active Directory (AD) we could have created a domain account and added this user in all the machines. The issue with this approach is that each service would still need to login to these machine using the domain account every time which means there is an authentication request which need to go to AD for login.

In the Linux world they have solved this differently, they now have a concept of password less ssh (Secure Shell), this mean that the password is actually stored in each server and given a certificate. The next time the user needs to communicate with the generated certificate to login to the server.  This way it automatically log the user who has setup the password less ssh without prompting for a password.

Do the following to create a password less ssh for a specific user. In our case the bigdatauser

The setting is:
3 Apache Hbase Node clusters
HBASENODE1
HBASENODE2
HBASENODE3

On each of these machines we have created a user called the bigdatauser as described in How to create a user, group and enable him to do what a super user can in CentOS7?.
We also need to create the DNS records in /etc/hosts as described in How to setup DNS entries for big data servers in the cloud or not on a domain in /etc/hosts file?

Login to the CentOS 7 Server HBASENODE1 using ssh with "bigdatauser" and issue the following commands

Create the certificate for the user on his local home directory

cd ~

//create the ssh keys
ssh-keygen -t rsa -P ""


press enter do not type anything and accept the default directory.

//copy the keys to the authorized keys from bigdatauser
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys


Test the password less ssh is working by typing

ssh localhost


Accept the warnings, you should be able to login now. To exit local host. Type

exit


Do this another time, this would not show any warnings.

Now that we have setup password less ssh for one node lets call it HBASENODE1  we need to do the same for HBASENODE2 and HBASENODE3.

Once we have done the same to all the 3 servers. We now need to enable the password less ssh between the nodes.
The logic would be as follows from HBASENODE1 do the following command to HBASENODE2

//copy the keys to other nodes
ssh-copy-id -i $HOME/.ssh/id_rsa.pub bigdatauser@HBASENODE2


The same command needs to happen from
HBASENODE1 to HBASENODE3
and from
HBASENODE2 to HBASENODE1, HBASENODE3
HBASENODE3 to HBASENODE1, HBASENODE2

once this is done verify if you can login from any node to any other node by typing the following

ssh HBASENODE2

on HBASENODE1 and the other combination.
accept the warning first time and the next time it should directly log you into the servers.


Tuesday, February 16, 2016

How to install Java 7 and Java 8 on Mac OS X?

Now that we have installed the package managers in Mac OS X, we can install the most widely used Java versions on Mac OS X.

Most big data technologies work with Java 1.7 also called Java 7. Few of these use the Java 1.8 also called Java 8.

Unfortunately for Mac OS X the distributions available are from Oracle and not from OpenJDK. Install Java 7 before installing Java 8. This would install the JDK version of the Java.

Issue the following command to install Java 7

sudo brew cask install java7

Issue the following command to install Java 8

sudo brew cask install java

The default location of the installation is present under

/opt/homebrew-cask/Caskroom/

You could also find more information about the Software by typing the installation keyword like java, java7 etc

brew cask info java7


Feb 19, 2019
An updated article is available to install Java 8 as Java 7 is quite old.

Monday, February 15, 2016

Install Homebrew and Cask for package management in Mac OS X

Just like we have yum in CentOS 7 we need to have the package managers in Development IDE namely Mac OS X.

Homebrew and Cask are the preferred package managers.

For installing Homebrew give the following command

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Refer the following URL for more information

Once you have installed brew you can install Cask by giving the following command

brew tap caskroom/cask

Refer the following URL for more information

Note: As with all installation make sure that you have admin rights or use the sudo keyword.