Download and
configure Hadoop
You can download Hadoop in two different ways
1.
By using your web browser.
2.
By using Terminal(command prompt)
I will choose second option i.e. Terminal
Click
on ‘Download’ link
Click
on ‘Download a release now!’ link
Click
on ‘Stable’
download
hadoop-1.1.2-bin.tar.gz
OR
You can copy the location of the file to
install through terminal
(Right
click - copy link location, path will be like
http://download.nextag.com/apache/hadoop/common/stable/hadoop-1.1.2-bin.tar.gz)
2. Open Master machine and open Termianl using command CTRL+ALT+T
Type below command
$
wget
http://download.nextag.com/apache/hadoop/common/stable/hadoop-1.1.2-bin.tar.gz
This
command will download hadoop files, it takes some time to download.
It will download in
"Download" folder.
3. After downloading hadoop files you can extract in two way
one by using TERMINAL and
another you can unzip using some software
I will try to extract using
TERMINAL
$ tar xzf hadoop-1.1.2-bin.tar.gz or $ tar xzfv hadoop-1.1.2-bin.tar.gz
It will extract files in HOME
folder hadoop-1.1.2
4. Now go to hadoop-1.2.0/config/
Three files you generally
need to change.
a.hadoop-env.sh
First you need
to change java class path(JVM path setting)
eg: export
JAVA_HOME=/usr/lib/jvm/java-6-sun
b.core-site.xml
There
are three options how you want to run hadoop
1.
StandAlone
or Local Mode: you need not to change anything - you just start working.
2.
Psedue
Distributed Mode: NN, SNN, JT, TT and DN - all run on same machine
3.
Fully
Distributed or Cluster mode: NameNode run on Master machine, Secondary NameNode
run on some other machine and DN TT run on some other machine.
c.mapred-site.xml
How to configure Hadoop on single system.
5.
Change owner of hadoop folder
$
chown -R yash hadoop-1.1.2 -> the owner of this folder has been changed
$
chmod -R 755 hadoop-1.1.2 ->
6. Now open core-site.xml and copy below code in configuration tag
<property>
<name>hadoop.tmp.dir</name>
<value>/home/yash/tempdir</value>
<description>A
base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://yeshwanth1:9000</value>
<description>The
name of the default file system. A URI whose
scheme
and authority determine the FileSystem implementation. The
uri's
scheme determines the config property (fs.SCHEME.impl) naming
the
FileSystem implementation class. The uri's authority is used to
determine
the host, port, etc. for a filesystem.</description>
</property>
**To run on local machine fs.dafult.name should point localhost
eg:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
6.
Now open mapred-site.xml file
<property>
<name>mapred.job.tracker</name>
<value>yeshwanth1:9001</value>
<description>The
host and port that the MapReduce job tracker runs
at.
If "local", then jobs are run in-process as a single map
and
reduce task.
</description>
</property>
**To run on local machine fs.dafult.name
should point localhost
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
7.
Open hdfs-site.xml add below configuration tags (Replication factor should not
be more than DataNode "dfs.replication")
<!-- Set replication factory -->
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default
block replication.
The
actual number of replications can be specified when the file is created.
The
default is used if replication is not specified in create time.
</description>
</property>
<!-- Here NameNode Data will be sotred -->
<property>
<name>dfs.name.dir</name>
<value>/home/yash/namenodeanddatanode</value>
</property>
<!-- Data will be stored here, If you do specify this by default it will create dataname folder inside /tmp direcotry to store data -->
<property>
<name>dfs.data.dir</name>
<value>/home/yash/namenodeanddatanode</value>
</property>
8. Open masterfile
Add
text "yeshwanth1" -> because my master is running on yeshwanth1
9. Openslaves -> add below names
yeshwanth1
yeshwanth2
I will keep Master machine as slave so that
you can run DataNode on same machine
10. Now open TERMINAL
You
can copy hadoop folder to shared machine(i.e. master to slave)
$
scp -r hadoop-1.1.2 yash@yeshwanth2:/home/yash
Or
you can use any method to copy
this folder(copy paste)
After
running above command you can able to see hadoop-1.1.2 in slave
machine called yeshwanth2
In master file - Location hadoop-1.1.2/config
You
can able to see below text
yeshwanth1
In slaves file you can see
You
can able to see below text
yeshwanth1
yeshwanth2
11. Now format your Hadoop using below commands
$
cd hadoop-1.2.0/
$
bin/hadoop namenode -format
12. Now start all the jobs using below command
$
bin/start-all.sh
$ jps (java processes)
It
should show below jobs to be running
a.
JobTracker
b.
NameNode
c.
SecondaryNameNode
d.
TaskTracker
DataNode
will not be working due to some reason
13.
Now go to slave machine i.e. yeshwanth2
Open
TERMINAL and run below command
$
jps
You
can see below jobs running in slave
a.
DataNode
b.
TaskTracker
14. Now go to yeshwanth1
$
jps
$
bin/start-all.sh (It start the process on Master as well as on Slave also
automatically)
$
jps
Now
you can see all the jobs are running
NN,DN,TT,JT
and SNN
**
Suppose if you have three nodes, you make second or third node as SNN because
if master goes down process can be handled by SNN which is on other machine.
15.
Now go to yeshwanth2
$
jps
You
can see below are running on this machine
a.
DataNode
b.
TaskTracker
No comments:
Post a Comment