Big Data School: October 2015

Wednesday, October 28, 2015

How to create a user, group and enable him to do what a super user can in CentOS7?

As with all OS, we first need to make sure that we can create users. The users are of two types.

Super Users (Administrators in Windows World)
Normal Users

It's a best practice to not do any development with the root user and have separate account that can be a Super User and another account that can do the normal services or development work.

When a task needs to be done that requires the super user privileges we switch between a super user and the normal user and then switch back to the normal user. Always work with the normal user. Note this concept is new for Windows Users. This is similar to the run as but within the same window you can keep switching users to perform tasks with the corresponding privileges.

When you installed CentOS 7 it would expect you to have the root user and its password. You need to know this before you can create these types of users.

Creating a Super User using SSH Terminal

Using putty SSH into the CentOS 7 Server using root credentials
Since you are the root you can give the following command to create the admin user and add him to the root group


useradd -g root bigdataadmin

Now we need to change the password


passwd bigdataadmin

In order for this user to switch to a super user to perform task the require super user privileges we need to give permission. This is done by using


gpasswd -a bigdataadmin wheel

Creating a Normal User using SSH Terminal

Using putty SSH into the CentOS 7 Server using the previously created "bigdataadmin" user
Since this time you are not the root you need to use the following to create the user and the group. Note the use of sudo in front of the command. This stands for "super user do"
First create the group


sudo groupadd bigdatagroup

Create the user


sudo useradd -g bigdatagroup bigdatauser

Change the password


sudo passwd bigdatauser

Note: As with all articles, Open source technologies are sometimes not backward compatible, please check the version number before following this lesson or any lesson on this site.

Thursday, October 22, 2015

Big Data Lab Setup

Now that we are ready to start, first we need machines, since we can't have many machines at home or office, we need to rely on the virtual machines. There is another concept of dockers which we would explorer later.

So figure out if you have some PC either new or old or be nice to your manager and try to obtain an old server in the office that you can use to create the Virtual Machines.

The lab setup would be done using VSphere Hypervisor. This is a free version and does not come with any UI. Sign up and install it on the box. The version as of this date is VMWare vSphere 6 Hypervisor version VMWare ESXi, 6.0.0, 3029758

The client for creating the VM is done using the vSphere Client. Download this and connect to your server from any other laptop or desktop with a UI. The version as of this date is vSphere Client version 6.0.0 Build 3016447.

If you already have the machines skip this class.

Tuesday, October 13, 2015

Open Source Software Inter-Dependency Management

In the open source each software is developed independently by the same or different vendor/enthusiast. Some of the software uses other software as base with each software may or may not be aware others are using it and each software not guranteeing release assurance, we now have a unique problem which does not exist in Microsoft World.

Lets say we have 3 Software Manufacturers M1, M2 and M3
They produce the following software and its corresponding versions

M1 - Software 1 (S1) current Version V4
M2 - Software 2 (S2) current Version V5 uses M1 S1 V3
M3 - Software 3 (S3) current Version V3 uses M1 SI V2 and M2 S2 V3

We now have manufacturer M2 trying to use the software of M1 S1 V3 and build on top of it. So the M3 S2 V5 is currently using M1 S1 V3.

The M1 may or may not be aware that M 2 is using the software. Now M1 decides that he needs to be bleeding edge in software development and comes up with Version 4 of S1. He now puts a notice saying that these are the things that would break or says that previous versions won't be supported further.

M2 has two problems one, when ever M1 changes S1 he needs to stop what ever he is doing and make sure that M2 software S2 works with S1. He also has to take care of his feature sets that is built on top.

The similar is the case when other manufacturers uses different software and its corresponding version. In this case M3 S3 depends on both M1 S1 and M2 S2.

So how do you now pick a technology if each version is not compatible.

Choice 1
Run each instance ecosystem separately for their usecases when collaborating with different software. Namely in order to get the best of the software and then keep replicating the data between the different ecosystems.

eco system 1 M1 S1 V4
eco system 2 M2 S2 V5 with another instance of M1 S1 V3
eco system 3 M3 S3 V3 with another instance of M1 SI V2 and M2 S2 V3

Choice 2
Run the LCM version of all the dependent software are wait for the software manufactures to certify for compatibility and learn to live with the delay in bleeding edge.

eco system 1 downgrade to M1 SI V2 as this is the lowest dependency by M3
eco system 2 downgrade to M2 S2 V3 with the same instance of M1 SI V2
eco system 3 M3 S3 V3 with the same instance of M1 SI V2 and M2 S2 V3

in summary there is only one instance of M1 SI V2, M2 S2 V3 and M3 S3 V3

Choice 3
A combination of Choice 1 and Choice 2

Friday, October 9, 2015

Moving from Microsoft Technologies to Big Data

Well, the choice of Big Data technologies are only present in the Open Source World!!!

Unless Microsoft does something drastic, all Microsoft technology professionals should learn the open source technologies.

So gear up and refresh those memories that you had when you were at college where all the open source technologies were taught which we forgot once we came into Microsoft world.

Remember the dumb terminal, the green screen and most importantly no mouse and the legendary vi text editor.

So how much to learn and how fast?
Since I mostly develop Products or Application on top of the Microsoft Stack, here are some questions that might arise between Microsoft and Open Source.

Which one to learn?
Unlike a Microsoft Stack where there is only one Technology to do the job; in the open source you have a plethora of choices. There is no architect or company saying what to build next or how the pieces of software needs to work together to form a big part. So the choices would boil down to the ones that you pick based on reading, recommendation etc.

What happens if I pick the wrong one?
Well, unlike Microsoft technologies which is forced to make the technology work for all the generic usecases that their technology needs to solve, the open source takes it easy, they would just ask your to switch to what ever that works for you? The reason is you are not paying for it or that is not in the roadmap or you can do it yourself and add the feature. So in a way you have to keep changing the technologies to suit your need.

What happens to release assurance?
Well, most of the development would be free flow, so no curbing on what was released. So be prepared to throw things away when changing from version to version. There is nobody to police anyone as its all free without warranty. At the very least you might get patches for the existing version for some time, but the point is be prepared to constantly evolve as the version of the technology that you are using changes.

What happens to Interoperability?
Again, since there are so many technologies out there for the same or similar problem, its the responsibility of the Architect (Tailor) to stitch together the design and the different technologies that conform to the design. This role which is played by Microsoft right now needs to be played by the technology head that is making the decision on the application. Just like Microsoft makes mistakes and sunsets the different technologies and moves on, the tailor to needs to do these without much impact to the product or the business.

What happens to Documentation and Samples?
This for some reason resonates with the developers. We don't want to write documentation, we don't want to write samples, make it simply for others; why cause, "Hey, its so simply figure it out". Well welcome to reality, you now get back with interest, specially if you come from Microsoft world, be prepared to look at logs, user nets, google, stack overflow etc. etc. for hours before you can get anything done. There are no folks out there who have a dedicated job of documenting. They would rather wait for you to post the exception or the question and answer those than do any preventive techniques that could save time in the first place. So be prepared to dive in, drown and then learn to swim. Just like a cartoon character you always live!!!

What happens to Development Tools?
The similar story, since every technology would be suited for a different need, we can't have the same development tool that would support everything, so unlearn what you learnt from Visual Studio and be open to all the possible editors and languages that you need to master to get the job done.

To summarize come out of the protected homogeneous world of Microsoft and enter into the actual technology world with diversity also described as a polyglot environment.

Lets take the first step.....

Thursday, October 8, 2015

Welcome to Big Data School!

After nearly 16 Years in the IT Industry doing Product Development. The time has finally come that the data has become so huge that the traditional RDbMS can't solve this problem.

The current word out there is Big Data which means that now we need to learn the technologies that deal with Big Data.

This blog is created to learn the different technologies and how to co-relate the need to the existing technologies that solve for normal data.

I shall structure this as class rooms and each class room would teach you the different technologies that deal with the data.

Other bloggers are welcome to contribute to the class rooms as simple lessons that would enable others to work with these as novices.

Since I come from the Microsoft world there would be things that related to how it is done in the Microsoft world. Open source users please bear with this as you have no clue how even simple things in your world is very complicated in Microsoft world and vice versa.

Content