Content

Saturday, October 22, 2016

How to enable SSH on your developer Mac OSX?

For most of the big data technologies, the able to to password less ssh to each other is a must.
In order to make these technologies work, you need to enable ssh in your Mac (El Capitan).

1. Click on System Preference
2. Click on Sharing
3. On the left hand side under "Service" enable "Remote Login"

How to Setup a 3 Node Apache Hbase 1.2.3 cluster in CentOS 7?

The following needs to be done before beginning  the Apache Hadoop cluster Setup.

1. Create 3 CentOS 7 Servers HBNODE1, HBNODE2 and HBNODE3 as discussed in How to install CentOS 7 on Virtual Machine using VMWare vSphere 6 client?

2. Make sure Java 7 is installed and configured as default as discussed in How to install Java 7 and Java 8 in CentOS 7?.

3. Create the bigdatauser, bigdataadmin and the bigdatagroup as discussed in How to create a user, group and enable him to do what a super user can in CentOS7?

4. Make sure the firewall is disabled and stopped as discussed in How to turn off firewall on CentOS 7? 

5. Change etc/hosts file so that all the IPs and the names of the servers are resolved as discussed in

6. Using the bigdatauser setup password less ssh across the 3 clusters namely HBNODE1, HBNODE2 and HBNODE3 as discussed in How to setup password less ssh between CentOS 7 cluster servers?


7. Install Apache Zookeeper clusters as discussed in How to setup a 3 Node Apache Zookeeper 3.4.6 cluster in CentOS 7? Make sure you do the same as in step 5 for these servers too.
8. Install Apache Hadoop clusters as discussed in How to Setup a 3 Node Apache Hadoop 2.7.3 cluster in CentOS 7? Make sure you do the same as in step 5 for these servers too.

For each of the Servers HBNODE1, HBNODE2 and HBNODE3 do the following.



For each of the Servers HBNODE1, HBNODE2 and HBNODE3 do the following.
 
Login using the bigdataadmin
 
#create a folder for hadoop under the /usr/local directory
cd /usr/local
sudo mkdir hbase
 
#change ownership to bigdatauser
sudo chown -R bigdatauser:bigdatagroup hbase

#Switch to bigdataauser
su bigdataauser

#move to a download folder and download hbase
wget http://www-eu.apache.org/dist/hbase/1.2.3/hbase-1.2.3-bin.tar.gz

#unzip the files
tar xzf hbase-1.2.3-bin.tar.gz

#move this to the common directory
mv hbase-1.2.3 /usr/local/hbase

#go to the hbase directory
cd /usr/local/hbase/hbase-1.2.3

#move to config directory
cd conf

#edit hbase-env.sh
vi hbase-env.sh

#change Java Home Path
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64/jre

#disable internal zookeeper
export HBASE_MANAGES_ZK=false

#save
wq

#edit the hbase-site.xml

vi hbase-site.xml

<configuration>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
    </description>
  </property>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://hdnode1:9000/user/hadoop/hbase</value>
    <description>The directory shared by RegionServers.</description>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>zknode1,zknode2,zknode3</value>
    <description>The Zookeeper ensemble</description>
  </property>
</configuration>

#save
wq

#edit the regsionservers file only on the master node hbnode1

vi regionservers
hbnode2
hbnode3

#save
wq

#move to the root folder and start the HBase cluster from the master node hbnode1
cd /usr/local/hbase/hbase-1.2.3

bin/start-hbase.sh

#This would start the regions servers in other node too
#check for the following process 

ps aux | grep hbase

#HMaster on master hbnode1 and HRegionServer on other nodes.

#view the status of the cluster in the following URL
http://hbnode1:16010/master-status

This should display the nodes as well as other details like Zookeeper etc.


Wednesday, October 12, 2016

Linux systems folder structure - File System Hierarchy Standard (FHS)

The following link describes the Linux File System Hierarchy Standard structure that all developers should be aware of when using linux systems. This should also give an idea on where to place the softwares we develop for deployment. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/3/html/Reference_Guide/s1-filesystem-fhs.html
Pay attention to the following folders

/usr/
/usr/libexec
/usr/local
/var/lib



3.2. Overview of File System Hierarchy Standard (FHS)

Red Hat Enterprise Linux uses the Filesystem Hierarchy Standard (FHS) file system structure, which defines the names, locations, and permissions for many file types and directories.
The FHS document is the authoritative reference to any FHS-compliant file system, but the standard leaves many areas undefined or extensible. This section is an overview of the standard and a description of the parts of the file system not covered by the standard.
Compliance with the standard means many things, but the two most important are compatibility with other compliant systems and the ability to mount a /usr/ partition as read-only. This second point is important because the directory contains common executables and should not be changed by users. Also, since the /usr/ directory is mounted as read-only, it can be mounted from the CD-ROM or from another machine via a read-only NFS mount.

3.2.1. FHS Organization

The directories and files noted here are a small subset of those specified by the FHS document. Refer to the latest FHS document for the most complete information.
The complete standard is available online at http://www.pathname.com/fhs/.

3.2.1.1. The /boot/ Directory

The /boot/ directory contains static files required to boot the system, such as the Linux kernel. These files are essential for the system to boot properly.
WarningWarning
Do not remove the /boot/ directory. Doing so will render the system unbootable.

3.2.1.2. The /dev/ Directory

The /dev/ directory contains file system entries which represent devices that are attached to the system. These files are essential for the system to function properly.

3.2.1.3. The /etc/ Directory

The /etc/ directory is reserved for configuration files that are local to the machine. No binaries are to be put in /etc/. Any binaries that were once located in /etc/ should be placed into/sbin/ or /bin/.
The X11/ and skel/ directories are subdirectories of the /etc/ directory:
/etc
  |- X11/
  |- skel/
The /etc/X11/ directory is for X Window System configuration files such as XF86Config. The /etc/skel/ directory is for "skeleton" user files, which are used to populate a home directory when a user is first created.

3.2.1.4. The /lib/ Directory

The /lib/ directory should contain only those libraries needed to execute the binaries in /bin/ and /sbin/. These shared library images are particularly important for booting the system and executing commands within the root file system.

3.2.1.5. The /mnt/ Directory

The /mnt/ directory is for temporarily mounted file systems, such as CD-ROMs and 3.5 diskettes.

3.2.1.6. The /opt/ Directory

The /opt/ directory provides storage for large, static application software packages.
A package placing files in the /opt/ directory creates a directory bearing the same name as the package. This directory, in turn, holds files that otherwise would be scattered throughout the file system, giving the system administrator an easy way to determine the role of each file within a particular package.
For example, if sample is the name of a particular software package located within the /opt/ directory, then all of its files are placed in directories inside the /opt/sample/ directory, such as /opt/sample/bin/ for binaries and /opt/sample/man/ for manual pages.
Large packages that encompass many different sub-packages, each of which accomplish a particular task, are also located in the /opt/ directory, giving that large package a way to organize itself. In this way, our sample package may have different tools that each go in their own sub-directories, such as /opt/sample/tool1/ and /opt/sample/tool2/, each of which can have their own bin/man/, and other similar directories.

3.2.1.7. The /proc/ Directory

The /proc/ directory contains special files that either extract information from or send information to the kernel.
Due to the great variety of data available within /proc/ and the many ways this directory can be used to communicate with the kernel, an entire chapter has been devoted to the subject. For more information, please refer to Chapter 5 The proc File System.

3.2.1.8. The /sbin/ Directory

The /sbin/ directory stores executables used by the root user. The executables in /sbin/ are only used at boot time and perform system recovery operations. Of this directory, the FHS says:
/sbin contains binaries essential for booting, restoring, recovering, and/or repairing the system in addition to the binaries in /bin. Programs executed after /usr/ is known to be mounted (when there are no problems) are generally placed into /usr/sbin. Locally-installed system administration programs should be placed into /usr/local/sbin.
At a minimum, the following programs should be in /sbin/:
arp, clock,
halt, init, 
fsck.*, grub
ifconfig, lilo, 
mingetty, mkfs.*, 
mkswap, reboot, 
route, shutdown, 
swapoff, swapon

3.2.1.9. The /usr/ Directory

The /usr/ directory is for files that can be shared across multiple machines. The /usr/ directory is often on its own partition and is mounted read-only. At minimum, the following directories should be subdirectories of /usr/:
/usr
  |- bin/
  |- dict/
  |- doc/
  |- etc/
  |- games/
  |- include/
  |- kerberos/
  |- lib/
  |- libexec/     
  |- local/
  |- sbin/
  |- share/
  |- src/
  |- tmp -> ../var/tmp/
  |- X11R6/
Under the /usr/ directory, the bin/ directory contains executables, dict/ contains non-FHS compliant documentation pages, etc/ contains system-wide configuration files, games is for games, include/ contains C header files, kerberos/ contains binaries and other Kerberos-related files, and lib/ contains object files and libraries that are not designed to be directly utilized by users or shell scripts. The libexec/ directory contains small helper programs called by other programs, sbin/ is for system administration binaries (those that do not belong in the /sbin/ directory), share/ contains files that are not architecture-specific, src/ is for source code, and X11R6/ is for the X Window System (XFree86 on Red Hat Enterprise Linux).

3.2.1.10. The /usr/local/ Directory

The FHS says:
The /usr/local hierarchy is for use by the system administrator when installing software locally. It needs to be safe from being overwritten when the system software is updated. It may be used for programs and data that are shareable among a group of hosts, but not found in /usr.
The /usr/local/ directory is similar in structure to the /usr/ directory. It has the following subdirectories, which are similar in purpose to those in the /usr/ directory:
/usr/local
       |- bin/
       |- doc/
       |- etc/
       |- games/
       |- include/
       |- lib/
       |- libexec/
       |- sbin/
       |- share/
       |- src/
In Red Hat Enterprise Linux, the intended use for the /usr/local/ directory is slightly different from that specified by the FHS. The FHS says that /usr/local/ should be where software that is to remain safe from system software upgrades is stored. Since software upgrades can be performed safely with Red Hat Package Manager (RPM), it is not necessary to protect files by putting them in /usr/local/. Instead, the /usr/local/ directory is used for software that is local to the machine.
For instance, if the /usr/ directory is mounted as a read-only NFS share from a remote host, it is still possible to install a package or program under the /usr/local/ directory.

3.2.1.11. The /var/ Directory

Since the FHS requires Linux to mount /usr/ as read-only, any programs that write log files or need spool/ or lock/ directories should write them to the /var/ directory. The FHS states/var/ is for:
...variable data files. This includes spool directories and files, administrative and logging data, and transient and temporary files.
Below are some of the directories found within the /var/ directory:
/var
  |- account/
  |- arpwatch/
  |- cache/
  |- crash/
  |- db/
  |- empty/
  |- ftp/
  |- gdm/
  |- kerberos/
  |- lib/
  |- local/
  |- lock/
  |- log/
  |- mail -> spool/mail/
  |- mailman/
  |- named/
  |- nis/
  |- opt/
  |- preserve/
  |- run/
  +- spool/
       |- at/
       |- clientmqueue/
       |- cron/
       |- cups/
       |- lpd/
       |- mail/
       |- mqueue/
       |- news/
       |- postfix/ 
       |- repackage/
       |- rwho/
       |- samba/ 
       |- squid/
       |- squirrelmail/
       |- up2date/ 
       |- uucppublic/
       |- vbox/
  |- tmp/
  |- tux/
  |- www/
  |- yp/
System log files such as messages/ and lastlog/ go in the /var/log/ directory. The /var/lib/rpm/ directory contains RPM system databases. Lock files go in the /var/lock/directory, usually in directories for the program using the file. The /var/spool/ directory has subdirectories for programs in which data files are stored.