HadoopJadhav: Hadoop Cluster Configuration 2

Download and configure Hadoop

You can download Hadoop in two different ways

1. By using your web browser.

2. By using Terminal(command prompt)

I will choose second option i.e. Terminal

1. Go to http://hadoop.apache.org/releases.html

Click on ‘Download’ link

Click on ‘Download a release now!’ link

Click on ‘http://download.nextag.com/apache/hadoop/common’

Click on ‘Stable’

download hadoop-1.1.2-bin.tar.gz

You can copy the location of the file to install through terminal

(Right click - copy link location, path will be like http://download.nextag.com/apache/hadoop/common/stable/hadoop-1.1.2-bin.tar.gz)

2. Open Master machine and open Termianl using command CTRL+ALT+T

Type below command

$ wget http://download.nextag.com/apache/hadoop/common/stable/hadoop-1.1.2-bin.tar.gz

This command will download hadoop files, it takes some time to download.

It will download in "Download" folder.

3. After downloading hadoop files you can extract in two way

one by using TERMINAL and another you can unzip using some software

I will try to extract using TERMINAL

$ tar xzf hadoop-1.1.2-bin.tar.gz or $ tar xzfv hadoop-1.1.2-bin.tar.gz

It will extract files in HOME folder hadoop-1.1.2

4. Now go to hadoop-1.2.0/config/

Three files you generally need to change.

a.hadoop-env.sh

First you need to change java class path(JVM path setting)

eg: export JAVA_HOME=/usr/lib/jvm/java-6-sun

b.core-site.xml

There are three options how you want to run hadoop

1. StandAlone or Local Mode: you need not to change anything - you just start working.

2. Psedue Distributed Mode: NN, SNN, JT, TT and DN - all run on same machine

3. Fully Distributed or Cluster mode: NameNode run on Master machine, Secondary NameNode run on some other machine and DN TT run on some other machine.

c.mapred-site.xml

How to configure Hadoop on single system.

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

5. Change owner of hadoop folder

$ chown -R yash hadoop-1.1.2 -> the owner of this folder has been changed

$ chmod -R 755 hadoop-1.1.2 ->

6. Now open core-site.xml and copy below code in configuration tag

<name>hadoop.tmp.dir</name>

<value>/home/yash/tempdir</value>

<description>A base for other temporary directories.</description>

</property>

<name>fs.default.name</name>

<value>hdfs://yeshwanth1:9000</value>

<description>The name of the default file system. A URI whose

scheme and authority determine the FileSystem implementation. The

uri's scheme determines the config property (fs.SCHEME.impl) naming

the FileSystem implementation class. The uri's authority is used to

determine the host, port, etc. for a filesystem.</description>

</property>

**To run on local machine fs.dafult.name should point localhost

eg:

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

6. Now open mapred-site.xml file

<name>mapred.job.tracker</name>

<value>yeshwanth1:9001</value>

<description>The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map

and reduce task.

</description>

</property>

**To run on local machine fs.dafult.name should point localhost

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

7. Open hdfs-site.xml add below configuration tags (Replication factor should not be more than DataNode "dfs.replication")

<name>dfs.replication</name>

<description>Default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.

</description>

</property>

<value>/home/yash/namenodeanddatanode</value>

</property>

<value>/home/yash/namenodeanddatanode</value>

</property>

8. Open masterfile

Add text "yeshwanth1" -> because my master is running on yeshwanth1

9. Openslaves -> add below names

yeshwanth1

yeshwanth2

I will keep Master machine as slave so that you can run DataNode on same machine

10. Now open TERMINAL

You can copy hadoop folder to shared machine(i.e. master to slave)

$ scp -r hadoop-1.1.2 yash@yeshwanth2:/home/yash

you can use any method to copy this folder(copy paste)

After running above command you can able to see hadoop-1.1.2 in slave

machine called yeshwanth2

In master file - Location hadoop-1.1.2/config

You can able to see below text

yeshwanth1

In slaves file you can see

You can able to see below text

yeshwanth1

yeshwanth2

11. Now format your Hadoop using below commands

$ cd hadoop-1.2.0/

$ bin/hadoop namenode -format

12. Now start all the jobs using below command

$ bin/start-all.sh

$ jps (java processes)

It should show below jobs to be running

a. JobTracker

b. NameNode

c. SecondaryNameNode

d. TaskTracker

DataNode will not be working due to some reason

13. Now go to slave machine i.e. yeshwanth2

Open TERMINAL and run below command

$ jps

You can see below jobs running in slave

a. DataNode

b. TaskTracker

14. Now go to yeshwanth1

$ jps

$ bin/start-all.sh (It start the process on Master as well as on Slave also automatically)

$ jps

Now you can see all the jobs are running

NN,DN,TT,JT and SNN

** Suppose if you have three nodes, you make second or third node as SNN because if master goes down process can be handled by SNN which is on other machine.

15. Now go to yeshwanth2

$ jps

You can see below are running on this machine

a. DataNode

b. TaskTracker

HadoopJadhav

Monday, July 29, 2013

Hadoop Cluster Configuration 2

No comments:

Post a Comment

Sidebar One