Content

Wednesday, October 12, 2016

Linux systems folder structure - File System Hierarchy Standard (FHS)

The following link describes the Linux File System Hierarchy Standard structure that all developers should be aware of when using linux systems. This should also give an idea on where to place the softwares we develop for deployment. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/3/html/Reference_Guide/s1-filesystem-fhs.html
Pay attention to the following folders

/usr/
/usr/libexec
/usr/local
/var/lib



3.2. Overview of File System Hierarchy Standard (FHS)

Red Hat Enterprise Linux uses the Filesystem Hierarchy Standard (FHS) file system structure, which defines the names, locations, and permissions for many file types and directories.
The FHS document is the authoritative reference to any FHS-compliant file system, but the standard leaves many areas undefined or extensible. This section is an overview of the standard and a description of the parts of the file system not covered by the standard.
Compliance with the standard means many things, but the two most important are compatibility with other compliant systems and the ability to mount a /usr/ partition as read-only. This second point is important because the directory contains common executables and should not be changed by users. Also, since the /usr/ directory is mounted as read-only, it can be mounted from the CD-ROM or from another machine via a read-only NFS mount.

3.2.1. FHS Organization

The directories and files noted here are a small subset of those specified by the FHS document. Refer to the latest FHS document for the most complete information.
The complete standard is available online at http://www.pathname.com/fhs/.

3.2.1.1. The /boot/ Directory

The /boot/ directory contains static files required to boot the system, such as the Linux kernel. These files are essential for the system to boot properly.
WarningWarning
Do not remove the /boot/ directory. Doing so will render the system unbootable.

3.2.1.2. The /dev/ Directory

The /dev/ directory contains file system entries which represent devices that are attached to the system. These files are essential for the system to function properly.

3.2.1.3. The /etc/ Directory

The /etc/ directory is reserved for configuration files that are local to the machine. No binaries are to be put in /etc/. Any binaries that were once located in /etc/ should be placed into/sbin/ or /bin/.
The X11/ and skel/ directories are subdirectories of the /etc/ directory:
/etc
  |- X11/
  |- skel/
The /etc/X11/ directory is for X Window System configuration files such as XF86Config. The /etc/skel/ directory is for "skeleton" user files, which are used to populate a home directory when a user is first created.

3.2.1.4. The /lib/ Directory

The /lib/ directory should contain only those libraries needed to execute the binaries in /bin/ and /sbin/. These shared library images are particularly important for booting the system and executing commands within the root file system.

3.2.1.5. The /mnt/ Directory

The /mnt/ directory is for temporarily mounted file systems, such as CD-ROMs and 3.5 diskettes.

3.2.1.6. The /opt/ Directory

The /opt/ directory provides storage for large, static application software packages.
A package placing files in the /opt/ directory creates a directory bearing the same name as the package. This directory, in turn, holds files that otherwise would be scattered throughout the file system, giving the system administrator an easy way to determine the role of each file within a particular package.
For example, if sample is the name of a particular software package located within the /opt/ directory, then all of its files are placed in directories inside the /opt/sample/ directory, such as /opt/sample/bin/ for binaries and /opt/sample/man/ for manual pages.
Large packages that encompass many different sub-packages, each of which accomplish a particular task, are also located in the /opt/ directory, giving that large package a way to organize itself. In this way, our sample package may have different tools that each go in their own sub-directories, such as /opt/sample/tool1/ and /opt/sample/tool2/, each of which can have their own bin/man/, and other similar directories.

3.2.1.7. The /proc/ Directory

The /proc/ directory contains special files that either extract information from or send information to the kernel.
Due to the great variety of data available within /proc/ and the many ways this directory can be used to communicate with the kernel, an entire chapter has been devoted to the subject. For more information, please refer to Chapter 5 The proc File System.

3.2.1.8. The /sbin/ Directory

The /sbin/ directory stores executables used by the root user. The executables in /sbin/ are only used at boot time and perform system recovery operations. Of this directory, the FHS says:
/sbin contains binaries essential for booting, restoring, recovering, and/or repairing the system in addition to the binaries in /bin. Programs executed after /usr/ is known to be mounted (when there are no problems) are generally placed into /usr/sbin. Locally-installed system administration programs should be placed into /usr/local/sbin.
At a minimum, the following programs should be in /sbin/:
arp, clock,
halt, init, 
fsck.*, grub
ifconfig, lilo, 
mingetty, mkfs.*, 
mkswap, reboot, 
route, shutdown, 
swapoff, swapon

3.2.1.9. The /usr/ Directory

The /usr/ directory is for files that can be shared across multiple machines. The /usr/ directory is often on its own partition and is mounted read-only. At minimum, the following directories should be subdirectories of /usr/:
/usr
  |- bin/
  |- dict/
  |- doc/
  |- etc/
  |- games/
  |- include/
  |- kerberos/
  |- lib/
  |- libexec/     
  |- local/
  |- sbin/
  |- share/
  |- src/
  |- tmp -> ../var/tmp/
  |- X11R6/
Under the /usr/ directory, the bin/ directory contains executables, dict/ contains non-FHS compliant documentation pages, etc/ contains system-wide configuration files, games is for games, include/ contains C header files, kerberos/ contains binaries and other Kerberos-related files, and lib/ contains object files and libraries that are not designed to be directly utilized by users or shell scripts. The libexec/ directory contains small helper programs called by other programs, sbin/ is for system administration binaries (those that do not belong in the /sbin/ directory), share/ contains files that are not architecture-specific, src/ is for source code, and X11R6/ is for the X Window System (XFree86 on Red Hat Enterprise Linux).

3.2.1.10. The /usr/local/ Directory

The FHS says:
The /usr/local hierarchy is for use by the system administrator when installing software locally. It needs to be safe from being overwritten when the system software is updated. It may be used for programs and data that are shareable among a group of hosts, but not found in /usr.
The /usr/local/ directory is similar in structure to the /usr/ directory. It has the following subdirectories, which are similar in purpose to those in the /usr/ directory:
/usr/local
       |- bin/
       |- doc/
       |- etc/
       |- games/
       |- include/
       |- lib/
       |- libexec/
       |- sbin/
       |- share/
       |- src/
In Red Hat Enterprise Linux, the intended use for the /usr/local/ directory is slightly different from that specified by the FHS. The FHS says that /usr/local/ should be where software that is to remain safe from system software upgrades is stored. Since software upgrades can be performed safely with Red Hat Package Manager (RPM), it is not necessary to protect files by putting them in /usr/local/. Instead, the /usr/local/ directory is used for software that is local to the machine.
For instance, if the /usr/ directory is mounted as a read-only NFS share from a remote host, it is still possible to install a package or program under the /usr/local/ directory.

3.2.1.11. The /var/ Directory

Since the FHS requires Linux to mount /usr/ as read-only, any programs that write log files or need spool/ or lock/ directories should write them to the /var/ directory. The FHS states/var/ is for:
...variable data files. This includes spool directories and files, administrative and logging data, and transient and temporary files.
Below are some of the directories found within the /var/ directory:
/var
  |- account/
  |- arpwatch/
  |- cache/
  |- crash/
  |- db/
  |- empty/
  |- ftp/
  |- gdm/
  |- kerberos/
  |- lib/
  |- local/
  |- lock/
  |- log/
  |- mail -> spool/mail/
  |- mailman/
  |- named/
  |- nis/
  |- opt/
  |- preserve/
  |- run/
  +- spool/
       |- at/
       |- clientmqueue/
       |- cron/
       |- cups/
       |- lpd/
       |- mail/
       |- mqueue/
       |- news/
       |- postfix/ 
       |- repackage/
       |- rwho/
       |- samba/ 
       |- squid/
       |- squirrelmail/
       |- up2date/ 
       |- uucppublic/
       |- vbox/
  |- tmp/
  |- tux/
  |- www/
  |- yp/
System log files such as messages/ and lastlog/ go in the /var/log/ directory. The /var/lib/rpm/ directory contains RPM system databases. Lock files go in the /var/lock/directory, usually in directories for the program using the file. The /var/spool/ directory has subdirectories for programs in which data files are stored.



Sunday, July 3, 2016

How to delete a topic in Apache Kafka Message Broker 0.9.x?

Deleting a topic is relevant only in development or testing environments. DO NOT enable this setting in production.

To delete a topic (associated with a message queue in other systems). you need the following
1. The zookeeper ensemble that Kafka clusters use.
2. Enable delete of topic in the server.properties namely
delete.topic.enable=true
Refer to How to setup standalone instance of Apache Kafka 0.9.0.1 on localhost for Mac OS X? for enabling this setting.

For a kafka server cluster installation with a zookeeper ensemble. Refer How to setup a 2 Node Apache Kafka 0.9.0.1 cluster in CentOS 7?

Navigate to any node of Kafka installation instance namely


cd /usr/local/kafka/kafka_2.11-0.9.0.1

bin/kafka-topics.sh --zookeeper ZKNODE1:2181,ZKNODE2:2181,ZKNODE3:2181  --delete --topic topicName



Navigate to installation instance namely
cd /usr/local/kafka/kafka_2.11-0.9.0.1

bin/kafka-topics.sh --zookeeper yourmac.local:2181 --delete --topic topicName


Note: Make sure that you have killed all consumers before your delete the topic. Kafka would take anywhere between 2 seconds to a minute to delete a topic. When the delete command is issued it would just mark the topic for deletion.




How to read messages in a topic from Apache Kafka Message Broker 0.9.x?



Sometime we need to quickly check what messages are present in Apache Kafka topic. Apache Kafka provides a default consumer shell for reading messages off a topic. Apache Kafka does not allow you to read a message by message Id or partition key. You can only read from the beginning or from the last position that was read which is automatically maintained in Zookeeper.

The reader is like an application waiting to read messages and would read continuously as long as you don't kill the session in the console. i.e a Kafka producer can produce 100 messages in time t1, you would see all the 100 messages printed in the console, if now there are another 10 messages at time t2 as long as the consumer is running you would now see only the next 10 messages.

Kafka consumers has a concept of offset. i.e the last position of the messages that it has read. This offset is maintained in Apache Zookeeper. Since Kafka supports multiple partitions an offset is maintained for each partition.

Apache Kafka is not like most other message queue systems where a message can be read only by one consumer and the message is removed after reading. Kafka allows multiple consumers to read from the same topic. Its the responsibility of each consumer to keep track of what each has read. The default Kafka installation keeps messages for 7 Days after which they are removed from the topic.

Every time a Kafka consumer shell is invoked it maintains the offset in Zookeeper. Apache Kafka supports two types of messages, String and Binary.  The content of the message alone is printed to the console without the partition key or from which partition it was read. Its good for String message types.


To read message  topic (associated with a message queue in other systems). you need the following
1. The zookeeper ensemble that Kafka clusters use.

For a kafka server cluster installation with a zookeeper ensemble. Refer How to setup a 2 Node Apache Kafka 0.9.0.1 cluster in CentOS 7?

Navigate to any node of Kafka installation instance namely


cd /usr/local/kafka/kafka_2.11-0.9.0.1

bin/kafka-console-consumer.sh --zookeeper ZKNODE1:2181,ZKNODE2:2181,ZKNODE3:2181 --topic topicName --from-beginning


To exit reading you would need to kill the process.

For CentOS7
To stop reading press ctrl + C to exit to the shell 
Kafka would now print the number of messages it has read. It would always be 1 more than the messages that your producer has put inside the topic.

ps -aux | grep kafka
to view the consumer processes and note the process id
kill processid or force kill using kill -9 processid




Navigate to installation instance namely
cd /usr/local/kafka/kafka_2.11-0.9.0.1

bin/kafka-console-consumer.sh --zookeeper yourmac.local:2181 --topic yourTopic --from-beginning

To exit
For Mac
To stop reading press ctrl + C to exit to the shell 
Kafka would now print the number of messages it has read. It would always be 1 more than the messages that your producer has put inside the topic.

ps -a | grep kafka
to view the consumer processes and note the process id
kill processid or force kill using kill -9 processid

Note: if there are no messages, the consumer would wait do not assuming no messages are coming check your producer to verify that its correctly sending to the topic.

How to create a topic in Apache Kafka Message Broker 0.9.x?

To create a topic (associated with a message queue in other systems). you need the following
1. The zookeeper ensemble that Kafka clusters use.

For a kafka server cluster installation with a zookeeper ensemble. Refer How to setup a 2 Node Apache Kafka 0.9.0.1 cluster in CentOS 7?

Navigate to any node of Kafka installation instance namely


cd /usr/local/kafka/kafka_2.11-0.9.0.1

bin/kafka-topics.sh --create --zookeeper ZKNODE1:2181,ZKNODE2:2181,ZKNODE3:2181 --replication-factor 2 --partitions 8 --topic topicname

This assumes that you have a minimum 2 node cluster. If you setup more than 2 you can increase the replication factor correspondingly. The partition is the number of concurrent reads that you would like to perform form your application. In order to better utilize the partition you would need to understand partition key which we would cover int the later lessons.


Navigate to installation instance namely
cd /usr/local/kafka/kafka_2.11-0.9.0.1

bin/kafka-topics.sh --create --zookeeper yourmac.local:2181 --replication-factor 1 --partitions 4 --topic topicname





Saturday, May 21, 2016

How to setup standalone instance of Apache Kafka 0.9.0.1 on localhost for Mac OS X?


Apache Kafka is distributed Message Broker which would also for reading messages in a sequential manner maintaining the order in which a message has arrived.
If also allows multiple read of message by way of partitions. So if we have a 4 partition topic it would allow 4 threads to read the messages in parallel. However its the job of the developer to make sure that the messages for the same entity goes in a sequential manner to the same thread instead of a different thread. Kafka does this by having a message key. So if we send an entity E1 with key Key1  with time T1, if another message of E1 comes at Time T2 its the developer responsibility to give the same Key 1 so that the messages are read in the order by a thread namely E1 T1 first and then E1 T2 next and so on.
1. You have admin privileges for your development box
2. Make sure Java 7 or 8 is installed and configured as default as discussed in How to install Java 7 and Java 8 in Mac OS X
3. Make sure Apache Zookeeper standalone is installed as specified in How to setup standalone instance of Apache Zookeeper 3.4.6 on localhost for Mac OS X?
//create a folder for kafka under the /usr/local directory cd /usr/local sudo mkdir kafka //create the data cum log directory for kafka under the var/lib cd /var/lib sudo mkdir kafka //download kafka wget http://apache.claz.com/kafka/0.9.0.1/kafka_2.11-0.9.0.1.tgz //unpack the file tar xzf kafka_2.11-0.9.0.1.tgz //move the kafka installation to the usr/local/kafka from the download directory mv kafka_2.11-0.9.0.1 /usr/local/kafka/ //switch to the kafka directory cd /usr/local/kafka/kafka_2.11-0.9.0.1/ //switch to the config directory cd config edit the config file and change the following vi server.properties #The broker Id should be unique broker.id=1 #change data cum log directory to log.dirs=/var/lib/kafka #include the zookeeper servers zookeeper.connect=YOURMACHOSTNAME.local:2181 #Since this is a dev machine allow a topic to be deleted delete.topic.enable=true //save the file :wq
//move to the kafka root
cd /usr/local/kafka/kafka_2.11-0.9.0.1

//start kafka broker
bin/kafka-server-start.sh config/server.properties >/dev/null &

//if you need to stop
kill processid

//check if the process is running
ps -a | grep kafka

//or use jps
jps


How to setup standalone instance of Apache Zookeeper 3.4.6 on localhost for Mac OS X?

Apache Zookeeper is a distributed state manager that other systems use for state management. You could also setup a standalone zookeeper instead of a built in one to share this zookeeper instance across multiple technologies like Kafka, Storm, Hbase etc so that each instance does not start its own instance. These instruction let you setup Zookeeper as a standalone instance

1. You have admin privileges for your development box
2. Make sure Java 7 or 8 is installed and configured as default as discussed in How to install Java 7 and Java 8 in Mac OS X



//create a folder for zookeeper under the /usr/local directory
cd /usr/local
sudo mkdir zookeeper

//create the data directory for Zookeeper under the var/lib
cd /var/lib
sudo mkdir zookeeper

//create a file named myid under the data directory
cd /var/lib/zookeeper
vi myid

//Put only the number 1.
1

//save the file
:wq

if you do a cat myid it should just display 1


//download zookeeper on any local directory
wget http://apache.arvixe.com/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz

//unpack the file
tar xzf zookeeper-3.4.6.tar.gz


//move the zookeeper installation to the usr/local/zookeeper 
//from the download directory
mv zookeeper-3.4.6 /usr/local/zookeeper/

//switch to the /usr/local/zookeeper directory
cd /usr/local/zookeeper/zookeeper-3.4.6

//move to the conf folder for the version of zookeeper like
cd conf

//copy the sample config to zoo.cfg
cp zoo_sample.cfg zoo.cfg

//switch to the conf directory
cd /usr/local/zookeeper/zookeeper-3.4.6/conf

edit the zoo.cfg file and change the data directory to

vi zoo.cfg

//change data directory to
dataDir=/var/lib/zookeeper

#include the cluster servers

server.1=YOURMACHOSTNAME.local:2888:3888


//move to the root of zookeeper
cd /usr/local/zookeeper/zookeeper-3.4.6

//start zookeeper
bin/zkServer.sh start

//if you need to stop
bin/zkServer.sh stop

//check if the process is running
jps

//check for QuorumPeerMain

//check the status of zookeeper
bin/zkServer.sh status
//This should display
Mode: standalone


Monday, March 14, 2016

How to setup a 2 Node Apache Kafka 0.9.0.1 cluster in CentOS 7?

Apache Kafka is one of the realtime message brokers used for realtime stream processing in big data world.


The following needs to be done before beginning  the Apache Kafka cluster Setup.

1. Create 2 CentOS 7 Servers KFNODE1 and KFNODE2 as discussed in How to install CentOS 7 on Virtual Machine using VMWare vSphere 6 client?

2. Make sure Java 7 is installed and configured as default as discussed in How to install Java 7 and Java 8 in CentOS 7?.

3. Create the bigdatauser, bigdataadmin and the bigdatagroup as discussed in How to create a user, group and enable him to do what a super user can in CentOS7?

4. Make sure the firewall is disabled and stopped as discussed in How to turn off firewall on CentOS 7? 

5. Change etc/hosts file so that all the IPs and the names of the servers are resolved as discussed in
How to setup DNS entries for big data servers in the cloud or not on a domain in /etc/hosts file?

6. Using the bigdatauser setup password less ssh across the 3 clusters namely KFNODE1 and KFNODE2 as discussed in How to setup password less ssh between CentOS 7 cluster servers?

7. Install Apache Zookeeper clusters as discussed in How to setup a 3 Node Apache Zookeeper 3.4.6 cluster in CentOS 7?. Make sure you do the same as in step 5 for these servers too.


For each of the Servers KFNODE1 and KFNODE2 do the following.

Login using the bigdataadmin

//create a folder for kafka under the /usr/local directory
cd /usr/local
sudo mkdir kafka

//change ownership to bigdatauser
sudo chown -R bigdatauser:bigdatagroup kafka

//create the data cum log directory for kafka under the var/lib
cd /var/lib
sudo mkdir kafka

//change ownership to bigdatauser
sudo chown -R bigdatauser:bigdatagroup  
kafka

Switch to bigdataauser

//download kafka
wget http://apache.claz.com/kafka/0.9.0.1/kafka_2.11-0.9.0.1.tgz

//unpack the file
tar xzf kafka_2.11-0.9.0.1.tgz


//move the kafka installation to the usr/local/kafka from the download directory
mv kafka_2.11-0.9.0.1 /usr/local/kafka/

//switch to the kafka directory
cd /usr/local/kafka/kafka_2.11-0.9.0.1/



//switch to the config directory
cd config

edit the config file and change the following

vi server.properties


#The broker Id should be unique for KFNODE1 and KFNODE2
#KFNODE1
broker.id=1

#KFNODE2
broker.id=2

#change data cum log directory to
log.dirs=/var/lib/kafka


#include the zookeeper servers

zookeeper.connect=ZKNODE1:2181,ZKNODE2:2181,ZKNODE3:2181

//move to the root of the cluster server on any of the cluster. Start the kafka in both servers.
cd /usr/local/kafka/kafka_2.11-0.9.0.1

//start kafka broker
bin/kafka-server-start.sh config/server.properties >/dev/null &


//if you need to stop
kill processid

//check if the process is running
ps -aux | grep kafka

//check for the kafka data/log folders 


//There is no built in UI for kafka nor any commands to query the broker list
//we can use the create topic script to see if we have a cluster. Here we are //attempting to create a topic with replication factor of 3 and the error would say
//how many brokers we have that is 2
bin/kafka-topics.sh --create --zookeeper ZKNODE1:2181,ZKNODE2:2181,ZKNODE3:2181 --replication-factor 3 --partitions 4 --topic testkfbrokers

//Error while executing topic command : replication factor: 3 larger than available brokers: 2