Pages

Thursday, February 7, 2013

Installing Hadoop 1.0.4 on Ubuntu 12.04 (LTS) on Single Node cluster



I have been trying to install Hadoop on Windows using Cygwin but it was not successful because of permissions denied for sshd user. So, I moved on to Ubuntu 12.04 (32 bit) on Oracle VM Virtualbox. 

This whole post is based on installation guide by Michael Noll. So keep opened this post and Noll's installation guide as I will be adding only missing steps given in his post.

Step 1: Sun Java 6

Open a terminal and run following commands to install Sun-java6-dk. Noll's commands didn't work for me.

$ sudo add-apt-repository "deb http://archive.ubuntu.com/ubuntu hardy main multiverse"
$ sudo add-apt-repository "deb http://archive.ubuntu.com/ubuntu hardy-updates main multiverse"
$ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
$ sudo add-apt-repository "deb http://ppa.launchpad.net/webupd8team/java/ubuntu precise main"
$ sudo apt-get update
$ sudo apt-get install sun-java5-jdk sun-java6-jdk oracle-java7-installer

And run following command to confirm jdk6 installation

$ java -version

Step 2:

Add a hadoop system user.

Step 3:

 3.1) :

Configure SSH until you reach following command

hduser@ubuntu:~$ ssh localhost

Above given command didn't work on my machine as it was giving me error like port 22 is closed.
So first of all, we need to add hduser into sudoers list and then install openssh-server using terminal.
So,

3.2)

Login with root or any user which can run sudo commands.

3.3)

$ sudo adduser hduser sudo

$ /usr/sbin/visudo

A file will be opened. Find a line with
root ALL= (ALL:ALL) ALL

Copy paste this line after the root and change 'root' with 'hduser' so that now file will contain two lines like below:
root ALL= (ALL:ALL) ALL
hduser ALL= (ALL:ALL) ALL

3.4)

Now install openssh-server preferably using root login.
# sudo apt-get install openssh-server

3.5) 

Login with hduser account and run following command

$ ssh localhost

and enter 'yes' to continue connecting and complete configuration of ssh.

Step 4: Disable IPV6 

Use following command to open .conf file and gedit was not working on my machine.
$ sudo vim /etc/sysctl.conf

Step 5:   

And for rest of the installation, follow the Nolls' tutorial. In the future, when you are asked to edit a file, use 'sudo vim' for all the files.

Rest of the things went smoothly for me, so I think, Noll's tutorial would suffice for complete installation and get hadoop working for a single node cluster.

Happy hadooping =)


4 comments: