Big Data School

Wednesday, February 17, 2016

How to setup password less ssh between CentOS 7 cluster servers?

Now that we are working with clusters we need a way for machines to communicate with each other. In the windows world with Active Directory (AD) we could have created a domain account and added this user in all the machines. The issue with this approach is that each service would still need to login to these machine using the domain account every time which means there is an authentication request which need to go to AD for login.

In the Linux world they have solved this differently, they now have a concept of password less ssh (Secure Shell), this mean that the password is actually stored in each server and given a certificate. The next time the user needs to communicate with the generated certificate to login to the server. This way it automatically log the user who has setup the password less ssh without prompting for a password.

Do the following to create a password less ssh for a specific user. In our case the bigdatauser

The setting is:
3 Apache Hbase Node clusters
HBASENODE1
HBASENODE2
HBASENODE3

On each of these machines we have created a user called the bigdatauser as described in How to create a user, group and enable him to do what a super user can in CentOS7?.
We also need to create the DNS records in /etc/hosts as described in How to setup DNS entries for big data servers in the cloud or not on a domain in /etc/hosts file?

Login to the CentOS 7 Server HBASENODE1 using ssh with "bigdatauser" and issue the following commands

Create the certificate for the user on his local home directory


cd ~



//create the ssh keys

ssh-keygen -t rsa -P ""

press enter do not type anything and accept the default directory.


//copy the keys to the authorized keys from bigdatauser

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Test the password less ssh is working by typing


ssh localhost

Accept the warnings, you should be able to login now. To exit local host. Type


exit

Do this another time, this would not show any warnings.

Now that we have setup password less ssh for one node lets call it HBASENODE1 we need to do the same for HBASENODE2 and HBASENODE3.

Once we have done the same to all the 3 servers. We now need to enable the password less ssh between the nodes.
The logic would be as follows from HBASENODE1 do the following command to HBASENODE2


//copy the keys to other nodes

ssh-copy-id -i $HOME/.ssh/id_rsa.pub bigdatauser@HBASENODE2

The same command needs to happen from
HBASENODE1 to HBASENODE3
and from
HBASENODE2 to HBASENODE1, HBASENODE3
HBASENODE3 to HBASENODE1, HBASENODE2

once this is done verify if you can login from any node to any other node by typing the following

ssh HBASENODE2

on HBASENODE1 and the other combination.
accept the warning first time and the next time it should directly log you into the servers.

Tuesday, February 16, 2016

How to install Java 7 and Java 8 on Mac OS X?

Now that we have installed the package managers in Mac OS X, we can install the most widely used Java versions on Mac OS X.

Most big data technologies work with Java 1.7 also called Java 7. Few of these use the Java 1.8 also called Java 8.

Unfortunately for Mac OS X the distributions available are from Oracle and not from OpenJDK. Install Java 7 before installing Java 8. This would install the JDK version of the Java.

Issue the following command to install Java 7

sudo brew cask install java7

Issue the following command to install Java 8


sudo brew cask install java

The default location of the installation is present under


/opt/homebrew-cask/Caskroom/

You could also find more information about the Software by typing the installation keyword like java, java7 etc

brew cask info java7

Feb 19, 2019
An updated article is available to install Java 8 as Java 7 is quite old.

Monday, February 15, 2016

Install Homebrew and Cask for package management in Mac OS X

Just like we have yum in CentOS 7 we need to have the package managers in Development IDE namely Mac OS X.

Homebrew and Cask are the preferred package managers.

For installing Homebrew give the following command


/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Refer the following URL for more information

Once you have installed brew you can install Cask by giving the following command

brew tap caskroom/cask

Refer the following URL for more information

Note: As with all installation make sure that you have admin rights or use the sudo keyword.

Friday, January 29, 2016

How to setup DNS entries for big data servers in the cloud or not on a domain in /etc/hosts file?

IMPORTANT:
a. Do not do this if the servers are already in domain and DNS is already handled by a dedicated DNS Server.
b. Its always better to assign static IP for the servers before doing these changes so that we don't chain the DNS entries every time the host machine IP changes due to restart.
c. You need to be an administrator to perform these changes.

Now that we have created these servers which are on the virtual network and have only an IPV4 number. We need to make sure that these machines in the network are accessible by its host name.

Most of the big data clusters need not be managed by a domain controller and can exist as standalone in the cloud or in your own virtualization host.

On each of the clusters the DNS entries for the other nodes that it uses in the cluster needs to be set in the /etc/hosts file.
For those clusters which are interacting with other clusters then the DNS entries needs to be set for all the other clusters the current cluster interacts with.

e.g
If we have a Elastic Search 3 node clusters then on each of the 3 node cluster we need to setup the DNS entries in /etc/hosts file

sudo vi /etc/hosts

Insert the following records and save on all the clusters. Keep the localhost entry as it is and append to it. (You can insert with the comment using the #)

# Elastic Nodes

192.168.0.6 SEARCHNODE1

192.168.0.7 SEARCHNODE2

192.168.0.8 SEARCHNODE3

if we have an Apache Storm 3 node clusters which inserts data to any of the Elastic Search clusters then on each of the Storm cluster we need to have both the entries for Storm nodes as well as the Elastic Search nodes.

sudo vi /etc/hosts

Insert the following records and save on all the clusters

# Storm Nodes

192.168.0.10 STORMNODE1

192.168.0.11 STORMNODE2

192.168.0.12 STORMNODE3

# Elastic Nodes

192.168.0.6 SEARCHNODE1

192.168.0.7 SEARCHNODE2

192.168.0.8 SEARCHNODE3

Verify if its working by doing the following command

ping STORMNODE1

cancel by ctrl + c

You should not get any dropped packets.

How to turn off firewall on CentOS 7?

Important:
Do not turn off firewall in Production and Pre-production servers. Figure out the ports to open for all the technologies that run in the server and open these rules in firewall.

In Developer Box, Developer Instance and QA Instance environments its safe to turn off for quick deployment.

You need to be an administrator to perform these changes.




//check the status of firewall

sudo systemctl status firewalld



//stop the firewall

sudo systemctl stop firewalld



//disable firewall so that it doesn't start again when restarted

sudo systemctl disable firewalld

Thursday, January 28, 2016

How to install Java 7 and Java 8 in CentOS 7?

Java has three concepts

1. Versions like 1.7 , 1.8 which are also called Java 7 and Java 8
2. Various distributions like JRE, JDK, etc
3. Vendor distributions i.e the Oracle distributes or OpenJDK distributes.

In the Development machine install the JDK which also has the JRE.
In Server machines install only the JRE.
If there is an Open JDK version available for your OS use this else take the Oracle one.
Since there is an OpenJDK version for CentOS we shall install this distribution.

CentOS 7 Server with UI should come pre-installed with Java 7 JRE.

Check if Java is already installed by issuing the following command.


java -version


It should print out




java version "1.7.0_79"

Yum is the package manager in CentOS (Windows installer in Microsoft). It can auto detect if java is already installed and install only if its not.

Install Java 8 (1.8) JDK using the following command. You need to have admin privileges.


sudo yum install java-1.8.0-openjdk-devel.x86_64

If already installed you would get


Loaded plugins: fastestmirror, langpacks

Loading mirror speeds from cached hostfile

 * base: mirror.5ninesolutions.com

 * extras: mirror.millry.co

 * updates: mirror.cogentco.com

Package 1:java-1.8.0-openjdk-devel-1.8.0.65-2.b17.el7_1.x86_64 already installed and latest version

Check the installation by using the following command. which list any java processes running. This is available only in the JDK version and not on the JRE version

jps

Install Java 8 (1.8) JRE using the following command. You need to have admin privileges.


sudo yum install java-1.8.0-openjdk.x86_64

Don't bother about the naming convention its called openjdk, this is a company name but installs the JRE.

If Java 7 is not installed and you got a server core version of CentOS 7.

Then issue the following commands before you install Java 8


//install the java 7 JRE

sudo yum install java-1.7.0-openjdk.x86_64 



//install the java 7 JDK 

sudo yum install java-1.7.0-openjdk-devel.x86_64

Wednesday, January 27, 2016

How to build an Apache Storm 0.10 topology job in Eclipse using Maven?

The following is the process of building an Apache Storm 0.10 Topology Job.

Technologies used:

a. Apache Storm 0.10

b. Eclipse Mars

c. Apache Maven 3.3.9
d. Java 7 or Java 8

1.You need to have two environments to build Storm Topology.
Environment 1: Developer Box
Environment 2: Development / Build Box

IMPORTANT: The changes you are going to make if done in Developer Box would render the running the topologies on the local Storm cluster broken.

2.Remove the jars that Storm already provides.
Remove Storm jar and Logging Jar like Log4j2
Go to the project pom.xml that has the Storm jar reference and add the following so that this jar is not including when we compile to a single jar with dependencies.


    <dependency>
      <groupid>org.apache.storm</groupid>
      <artifactid>storm-core>/artifactid>
      <version>0.10.0</version>
      <!--  This need to be enabled for storm submit job --> 
      <scope>provided</scope>   
    </dependency>

    <dependency>
      <groupid>org.apache.logging.log4j</groupid>
      <artifactid>log4j-api>/artifactid>
      <version>2.5</version>
      <!--  This need to be enabled for storm submit job --> 
      <scope>provided</scope>   
    </dependency>

    <dependency>
      <groupid>org.apache.logging.log4j</groupid>
      <artifactid>log4j-core>/artifactid>
      <version>2.5</version>
      <!--  This need to be enabled for storm submit job --> 
      <scope>provided</scope>   
    </dependency>

3. Packing the project into a single jar with dependencies.
Go to the maven project that has the main method. Edit the pom.xml to have the following.
below the <dependencies> node.

  </dependencies>

  <build>

   <plugins>

   <plugin>

    <artifactId>maven-assembly-plugin</artifactId>

    <configuration>

     <archive>

      <manifest>

       <mainClass>com.yourcompany.yourproduct.yourproject.App</mainClass>

      </manifest>

     </archive>

     <descriptorRefs>

      <descriptorRef>jar-with-dependencies</descriptorRef>

     </descriptorRefs>

    </configuration>

    <executions>

     <execution>

      <id>make-assembly</id>

      <phase>package</phase>

      <goals>

       <goal>single</goal>

      </goals>

     </execution>

    </executions>

   </plugin>

   </plugins>

  </build> 

4. Issue the following Maven command and copy the jar with dependencies in the target folder.


    mvn package

Note: you would have both the jars, copy the one with the following name appended like


yourproject-1.0.0-jar-with-dependencies.jar

5. Follow the instruction for the next step of submitting the Storm Topology Job

Content