Big Data School: 2015

Tuesday, December 29, 2015

OS X ability to open multiple Eclipse Mars workspace instances on Mac

By default only one instance of Eclipse can be opened in Mac.

In order to open multiple instance do the following.

1. On Eclipse Mars click on Help -> Eclipse Marketplace
2. On the file text box enter "OS X eclipse launcher" and click go
3. Install the plugin
more info from
http://marketplace.eclipse.org/content/os-x-eclipse-launcher?mpc=true&mpc_state=
As of today the version is OS X eclipse launcher 3.0

After installing restart eclipse.
1. Click on File Menu
2. You should now have an "Open Workspace" menu link which would allow you to launch multiple Eclipse instances

Saturday, November 7, 2015

Linux Class Overview

Unlike the Microsoft world where there is only one OS untill a new version comes and replaces it; in the open source world there are many OSes. Though this topic can span multiple pages, I would like to keep it short.. The mother of all OS is Unix, which according to many is only for geeks or scientist so a simpler flavor with UI was created and named as Linux. As with all open source there is no one owner of this OS, each vendor or community would create its own version of Linux, these are callled flavors. Few of the flavors are Ubuntu, CentOS, Suse, Redhat, Debian, OS X and so on. These in-turn have its own versions.

Few of the Linux commands are common across all these OS while some are not. In the Big Data School anything that is common is put in the OS-Linux Class while the specific OS lesson are put under OS Version Class.

Using putty to connect to a remote linux box CentOS7 or Ubuntu 14.04.3 LTS

As explained in the previous post, the center of the open source server interaction client would revolve around the console.

The popular console that existing for Windows users is putty. Its currently available only on 32 bit mode but this is good enough.

1. Download putty.
2. Double click to open it.
3. On the putty configuration enter the IP address for the host name, leave the default SSH port 22.
You should also give a name for the Saved Sessions and click on the "Save" button so that you don't have to type this information again and again. Next it you can select from the saved list and then click on Load.

4.Click on the open to enter the console

5. Enter the username and password to connect to the server

Where is my remote desktop in open source world?

Well, in the Microsoft world we are spoiled, the server is not the place to have a UI, say goodbye to your UI and mouse skills, learn to live with the command line world and master this skill as fast as you can. No more UI or mouse when interacting with servers, everything should be done using the console. The most popular console in Windows world is putty and the most popular text editor is the vi editor. These are simple by powerful so learn these as fast as you can.

How to setup a CentOS 7 Virtual Machine using VMWare vSphere 6 client?

Download the vSphere client as discussed in the Big Data Lab Setup.
Login to the vSphere Hypervisor using the client using the root username and password which you used to install the vSphere Hypervisor.

3.Right click on the server and select create new virtual machine

4. Select the Typical configuration on the Create New Virtual Machine Popup

5. Give a name for the virtual machine

6. Select storage and size.

7. Select Guest Operating System. From the Guest Operating System select Linux and from the versions select the Cent OS 4/5/6/7 (64 bit)

8. Select Network
This would be the default. We would look at how to connect this network card to the external network when we setup the network settings for the ESXi Server.

9. Create a Virtual Disk. Select a disk size depending on the storage amount. We can always come back and add more disk space if we run out of it

10. Select Ready to complete. Make sure that you check box the "edit the virtual machine before completion" so that we can change some default settings.

11. Change the Memory to 4 GB

12. Change the CPU Cores

13. Attached a CentOS7 iso for installation to the CD ROM drive, enable the "connect on power on" and select the CentOS7 iso on the radio button "Datastore ISO File".

14. Start the VM. Go back to the vSphere client and start the VM

15. Launch the Virtual Machine Console to install the CentOS which would be done in the next blog. Select the VM that we created and then click on the computer icon with the arrow.

Wednesday, October 28, 2015

How to create a user, group and enable him to do what a super user can in CentOS7?

As with all OS, we first need to make sure that we can create users. The users are of two types.

Super Users (Administrators in Windows World)
Normal Users

It's a best practice to not do any development with the root user and have separate account that can be a Super User and another account that can do the normal services or development work.

When a task needs to be done that requires the super user privileges we switch between a super user and the normal user and then switch back to the normal user. Always work with the normal user. Note this concept is new for Windows Users. This is similar to the run as but within the same window you can keep switching users to perform tasks with the corresponding privileges.

When you installed CentOS 7 it would expect you to have the root user and its password. You need to know this before you can create these types of users.

Creating a Super User using SSH Terminal

Using putty SSH into the CentOS 7 Server using root credentials
Since you are the root you can give the following command to create the admin user and add him to the root group


useradd -g root bigdataadmin

Now we need to change the password


passwd bigdataadmin

In order for this user to switch to a super user to perform task the require super user privileges we need to give permission. This is done by using


gpasswd -a bigdataadmin wheel

Creating a Normal User using SSH Terminal

Using putty SSH into the CentOS 7 Server using the previously created "bigdataadmin" user
Since this time you are not the root you need to use the following to create the user and the group. Note the use of sudo in front of the command. This stands for "super user do"
First create the group


sudo groupadd bigdatagroup

Create the user


sudo useradd -g bigdatagroup bigdatauser

Change the password


sudo passwd bigdatauser

Note: As with all articles, Open source technologies are sometimes not backward compatible, please check the version number before following this lesson or any lesson on this site.

Thursday, October 22, 2015

Big Data Lab Setup

Now that we are ready to start, first we need machines, since we can't have many machines at home or office, we need to rely on the virtual machines. There is another concept of dockers which we would explorer later.

So figure out if you have some PC either new or old or be nice to your manager and try to obtain an old server in the office that you can use to create the Virtual Machines.

The lab setup would be done using VSphere Hypervisor. This is a free version and does not come with any UI. Sign up and install it on the box. The version as of this date is VMWare vSphere 6 Hypervisor version VMWare ESXi, 6.0.0, 3029758

The client for creating the VM is done using the vSphere Client. Download this and connect to your server from any other laptop or desktop with a UI. The version as of this date is vSphere Client version 6.0.0 Build 3016447.

If you already have the machines skip this class.

Tuesday, October 13, 2015

Open Source Software Inter-Dependency Management

In the open source each software is developed independently by the same or different vendor/enthusiast. Some of the software uses other software as base with each software may or may not be aware others are using it and each software not guranteeing release assurance, we now have a unique problem which does not exist in Microsoft World.

Lets say we have 3 Software Manufacturers M1, M2 and M3
They produce the following software and its corresponding versions

M1 - Software 1 (S1) current Version V4
M2 - Software 2 (S2) current Version V5 uses M1 S1 V3
M3 - Software 3 (S3) current Version V3 uses M1 SI V2 and M2 S2 V3

We now have manufacturer M2 trying to use the software of M1 S1 V3 and build on top of it. So the M3 S2 V5 is currently using M1 S1 V3.

The M1 may or may not be aware that M 2 is using the software. Now M1 decides that he needs to be bleeding edge in software development and comes up with Version 4 of S1. He now puts a notice saying that these are the things that would break or says that previous versions won't be supported further.

M2 has two problems one, when ever M1 changes S1 he needs to stop what ever he is doing and make sure that M2 software S2 works with S1. He also has to take care of his feature sets that is built on top.

The similar is the case when other manufacturers uses different software and its corresponding version. In this case M3 S3 depends on both M1 S1 and M2 S2.

So how do you now pick a technology if each version is not compatible.

Choice 1
Run each instance ecosystem separately for their usecases when collaborating with different software. Namely in order to get the best of the software and then keep replicating the data between the different ecosystems.

eco system 1 M1 S1 V4
eco system 2 M2 S2 V5 with another instance of M1 S1 V3
eco system 3 M3 S3 V3 with another instance of M1 SI V2 and M2 S2 V3

Choice 2
Run the LCM version of all the dependent software are wait for the software manufactures to certify for compatibility and learn to live with the delay in bleeding edge.

eco system 1 downgrade to M1 SI V2 as this is the lowest dependency by M3
eco system 2 downgrade to M2 S2 V3 with the same instance of M1 SI V2
eco system 3 M3 S3 V3 with the same instance of M1 SI V2 and M2 S2 V3

in summary there is only one instance of M1 SI V2, M2 S2 V3 and M3 S3 V3

Choice 3
A combination of Choice 1 and Choice 2

Friday, October 9, 2015

Moving from Microsoft Technologies to Big Data

Well, the choice of Big Data technologies are only present in the Open Source World!!!

Unless Microsoft does something drastic, all Microsoft technology professionals should learn the open source technologies.

So gear up and refresh those memories that you had when you were at college where all the open source technologies were taught which we forgot once we came into Microsoft world.

Remember the dumb terminal, the green screen and most importantly no mouse and the legendary vi text editor.

So how much to learn and how fast?
Since I mostly develop Products or Application on top of the Microsoft Stack, here are some questions that might arise between Microsoft and Open Source.

Which one to learn?
Unlike a Microsoft Stack where there is only one Technology to do the job; in the open source you have a plethora of choices. There is no architect or company saying what to build next or how the pieces of software needs to work together to form a big part. So the choices would boil down to the ones that you pick based on reading, recommendation etc.

What happens if I pick the wrong one?
Well, unlike Microsoft technologies which is forced to make the technology work for all the generic usecases that their technology needs to solve, the open source takes it easy, they would just ask your to switch to what ever that works for you? The reason is you are not paying for it or that is not in the roadmap or you can do it yourself and add the feature. So in a way you have to keep changing the technologies to suit your need.

What happens to release assurance?
Well, most of the development would be free flow, so no curbing on what was released. So be prepared to throw things away when changing from version to version. There is nobody to police anyone as its all free without warranty. At the very least you might get patches for the existing version for some time, but the point is be prepared to constantly evolve as the version of the technology that you are using changes.

What happens to Interoperability?
Again, since there are so many technologies out there for the same or similar problem, its the responsibility of the Architect (Tailor) to stitch together the design and the different technologies that conform to the design. This role which is played by Microsoft right now needs to be played by the technology head that is making the decision on the application. Just like Microsoft makes mistakes and sunsets the different technologies and moves on, the tailor to needs to do these without much impact to the product or the business.

What happens to Documentation and Samples?
This for some reason resonates with the developers. We don't want to write documentation, we don't want to write samples, make it simply for others; why cause, "Hey, its so simply figure it out". Well welcome to reality, you now get back with interest, specially if you come from Microsoft world, be prepared to look at logs, user nets, google, stack overflow etc. etc. for hours before you can get anything done. There are no folks out there who have a dedicated job of documenting. They would rather wait for you to post the exception or the question and answer those than do any preventive techniques that could save time in the first place. So be prepared to dive in, drown and then learn to swim. Just like a cartoon character you always live!!!

What happens to Development Tools?
The similar story, since every technology would be suited for a different need, we can't have the same development tool that would support everything, so unlearn what you learnt from Visual Studio and be open to all the possible editors and languages that you need to master to get the job done.

To summarize come out of the protected homogeneous world of Microsoft and enter into the actual technology world with diversity also described as a polyglot environment.

Lets take the first step.....

Thursday, October 8, 2015

Welcome to Big Data School!

After nearly 16 Years in the IT Industry doing Product Development. The time has finally come that the data has become so huge that the traditional RDbMS can't solve this problem.

The current word out there is Big Data which means that now we need to learn the technologies that deal with Big Data.

This blog is created to learn the different technologies and how to co-relate the need to the existing technologies that solve for normal data.

I shall structure this as class rooms and each class room would teach you the different technologies that deal with the data.

Other bloggers are welcome to contribute to the class rooms as simple lessons that would enable others to work with these as novices.

Since I come from the Microsoft world there would be things that related to how it is done in the Microsoft world. Open source users please bear with this as you have no clue how even simple things in your world is very complicated in Microsoft world and vice versa.

Content