HadoopJadhav: Errors

Hadoop Common Errors with Possible Solution

Here I’m writing some of the Hadoop Issue faced by me and providing the solution with it hope you all get the benefit from it.

Hadoop cluster namenode formatted (bin / hadoop namenode -format)

restart cluster will appear as follows

1. After formatted name node (bin/hadoop namenode - format), Its come

up with namespace Error,

ERROR: Incompatible namespaceIDS in ...: namenode namespaceID = ...,

datanode namespaceID = ...

Error because the format namenode will re-create a new namespaceID, so

that the original and datanode inconsistent.

Solution:

1. Data files deleted the datanode dfs.data.dir directory (default is tmp /

dfs / data)

2. Modify dfs.data.dir / current / VERSION file the namespaceID and

namenode identical to (log errors where there will be prompt)

3. To reassign new dfs.data.dir directory

2. Hadoop cluster is started with start-all.sh, slave always fail to start

datanode, and will get an error:

ERROR: Could only be replicated to 0 nodes, instead of 1

Is the node identification may be repeated (personally think the wrong

reasons). There may also be other reasons, and what solution then tries to

solve.

Solution:

1. If port access, you should make sure the port is open, such as hdfs :/

/ machine1: 9000 / 50030,50070 like. Executive # iptables-I INPUT-p

tcp-dport 9000-j ACCEPT command. If there is an error:

hdfs.DFSClient: Exception in createBlockOutputStream

java.net.ConnectException: Connection refused in; datanode port can

not access, modify iptables: # iptables-I INPUT-s machine1-p tcp-j

datanode on ACCEPT

2. There may be firewall restrictions between clusters to communicate

with each other. Try to turn off the firewall. / Etc / init.d / iptables stop

3. Finally, there may be not enough disk space, check df -al

3. The program execution

Error: java.lang.NullPointerException

Null pointer exception, to ensure that the correct java program. Instantiated

before the use of the variable what statement do not like array out of

bounds. Inspection procedures.

When the implementation of the program, (various) error, make sure that

the

situation:

1. Premise of your program is correct by compiled

2. Cluster mode, the data to be processed wrote HDFS path and

ensure correct

3. Specify the execution of jar package the entrance class name (I do not

know why sometimes you do not specify also can run)

The correct wording similar to this:

$ hadoop jar myCount.jar myCount input output

4. Hadoop start datanode

ERROR: Unrecognized option:-jvm Could not the create the Java virtual

machine.

Hadoop installation directory / bin / hadoop following piece of shell:

CLASS = 'org.apache.hadoop.hdfs.server.datanode.DataNode'

if [[$ EUID-eq 0]]; then

HADOOP_OPTS = "$ HADOOP_OPTS-jvm server $

HADOOP_DATANODE_OPTS"

else

HADOOP_OPTS = "$ HADOOP_OPTS-server $

HADOOP_DATANODE_OPTS"

$ EUID user ID, if it is the root of this identification will be 0, so try not to

use the root user to operate hadoop .

5.Terminal error message:

ERROR hdfs.DFSClient: Exception closing file / user / hadoop /

musicdata.txt: java.io.IOException: All datanodes 10.210.70.82:50010 are

bad. Aborting ...

There are the jobtracker logs the error information

Error register getProtocolVersion

java.lang.IllegalArgumentException: Duplicate metricsName:

getProtocolVersion

And possible warning information:

WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Broken

pipe

WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for

block blk_3136320110992216802_1063java.io.IOException: Connection

reset by peer

WARN hdfs.DFSClient: Error Recovery for block

blk_3136320110992216802_1063 bad datanode [0] 10.210.70.82:50010

put: All datanodes 10.210.70.82:50010 are bad. Aborting ...

The solution:

1. Path of under the dfs.data.dir properties of whether the disk is full, try

hadoop fs -put data if the processing is full again.

2. Related disk is not full, you need to troubleshoot related disk has no

bad sectors, need to be detected.

6.Hadoop jar program get the error message:

java.io.IOException: Type mismatch in key from map: expected

org.apache.hadoop.io.NullWritable, recieved

org.apache.hadoop.io.LongWritable

Or something like this:

Status: FAILED java.lang.ClassCastException:

org.apache.hadoop.io.LongWritable cannot be cast to

org.apache.hadoop.io.Text

Then you need to learn the basics of Hadoop and map reduce model. In

"hadoop Definitive Guide book” in Chapter Hadoop I / O and in Chapter VII,

MapReduce type and format. If you are eager to solve this problem, I can

also tell you a quick solution, but this is bound to affect you later

development:

Ensure consistent data:

... Extends Mapper ...

public void map (k1 k, v1 v, OutputCollector output) ...

...

... Extends Reducer ...

public void reduce (k2 k, v2 v, OutputCollector output) ...

...

job.setMapOutputKeyClass (k2.class);

job.setMapOutputValueClass (k2.class);

job.setOutputKeyClass (k3.class);

job.setOutputValueClass (v3.class);

...

Note that the corresponding k * and v *. Recommendations or two chapters

I just said. Know the details of its principles.

7. If you hit a datanode error as follows:

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:

java.io.IOException: Cannot lock storage / data1/hadoop_data. The

directory is already locked.

According to the error prompts view, it is the directory locked, unable to

read. At this time you need to look at whether there are related process is

still running or slave machine hadoop process is still running, use the linux

command to view:

Netstat -nap

ps-aux | grep Related PID

If hadoop related process is still running, use the kill command to kill can.

And then re-use start-all.sh.

8. If you encounter the jobtracker error follows:

ERROR: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailingout.

Solution, modify datanode node /etc/hosts file.

Hosts under brief format:

Each line is divided into three parts: the first part of the network IP address,

the second part of the host name or domain name, the third part of the host

alias detailed steps are as follows:

1.first check the host name:

$ echo –e “ `hostname - i ` \t `hostname -n` \t $stn ”

Stn= short name or alies of hostname.

It will result in something like that

10.200.187.77 hadoop-datanode DN

If the IP address is configured on successfully modified, or show host name

there is a problem, continue to modify the hosts file,

The shuffle error still appears this problem, then try to modify the

configuration file of another user said hdfs-site.xml file, add the following:

dfs.http.address

*. *. *: 50070 The ports do not change, instead of the asterisk IP hadoop

information transfer through HTTP, the port is same.

9.If you encounter the jobtracker error follows:

ERROR: java.lang.RuntimeException: PipeMapRed.waitOutputThreads ():

subprocess failed with code *

This is a java throws the system returns an error code, the meaning of the

error code indicates details.

Please Excuse my typos and please share comment if you feel anything i left

out.

10. If you encounter the following error:

FAILED java.lang.IllegalArgumentException: java.net.URISyntaxException:

Relative path in absolute URI: ***

URI inside the characters not allowed, such as the colon: the class, the

operating system does not allow the file named characters. In detail

according to the prompt part (part asterisk) , Elimination of illegal character

can solve this issue.

11. To encounter tasktracker not start, tasktracker log error as follows:

ERROR org.apache.hadoop.mapred.TaskTracker: Cannot start task tracker

because java.net.BindException: Address already in use ***

Ports are occupied or have corresponding process starts when you first stop

the cluster, and then use ps-aux | grep hadoop command to look at the

related process of hadoop , And Kill hadoop daemon .

12. Encounter datanode not started, the datanode log error is as follows:

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:

java.io.IOException: No locks available

No locks available can mean that you are trying to use hadoop on a

filesystem that does not support file level locking.\

13. Are you trying to run your name node storage in NFS space?

Mentioned file-level locking, use

$ / Etc / init.d / nfs status

Command to view network file system, are closed. Another command df-Th

or mount the type of the file system, you can view the results obtained is

indeed the NFS file system. Hanging in the network file system can not be

used, because seemingly read-only, if not read-only situation, like said

above, does not support the file-level locking.

Finally the solution, you can try to add a file-level locking to nfs. I am here is

to modify the dfs.data.dir, not to use nfs bin.

You can also try to format you Hadoop cluster (if it’s new one) and start all over again.

14. The datanode died, and you cannot start the process, log reported the

following errors:

2012-06-04 10:31:34,915 INFO

org.apache.hadoop.hdfs.server.common.Storage: Cannot access storage

directory / data5/hadoop_data

2012-06-04 10:31:34,915 INFO

org.apache.hadoop.hdfs.server.common.Storage: Storage directory /data5/hadoop_data does not exist.

2012-06-04 10:31:35,033 ERROR

org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.util.DiskChecker $ DiskErrorException: Invalid value for volsFailed: 2, Volumes tolerated: 0

This problem, I found reason the nodes of the disk becomes read-only (only read) mode, online search, I found that quite a number of this case, Linux machine's hard disk could have been set to read-write (Read / Write) mode, but occasionally found automatically becomes read-only (Read Only), check some information, this may happens for various reasons, may be the problem:

· File system errors

· Kernel hardware driver bug

· The FW firmware classes problem

· Disk bad sectors

· Hard disk backplane fault

· Hard drive cable fault

· HBA card failure

· RAID card failure

· inode resource depletion

The solution:

· Restart the server (command reboot)

· Re-mount the hard disk

· fsck try to repair

· Replace the hard disk

15. When running mapreduce task error is as follows:

2012-06-21 10:50:43,290 WARN org.mortbay.log: / mapOutput:

org.apache.hadoop.util.DiskChecker $ DiskErrorException: Could not find

taskTracker/hadoop/jobcache/job_201206191809_0004/attempt_20120619

1809_0004_m_000006_0/output/file.out . index in any of the configured

local directories

2012-06-21 10:50:45,592 WARN org.apache.hadoop.mapred.TaskTracker:

getMapOutput (attempt_201206191809_0004_m_000006_0, 0) failed:

org.apache.hadoop.util.DiskChecker $ DiskErrorException: Could not find

taskTracker / ha.doop /

jobcache/job_201206191809_0004/attempt_201206191809_0004_m_0000

06_0/output/file.out.index in any of the configured local directories

Although two warn, but also affect the operating efficiency, they still try to

resolve the cause of the error is unable to find a job in the middle of the

output file. Need to make the following checks:

1. Configuration of mapred.local.dir property.

2. Df -h to see space in the cache path adequacy.

3. Free look at the memory space adequacy.

4. To ensure that the cache path writable permissions.

5. Check disk corruption.

The namenode cycle error is as follows:

2012-08-21 09:20:24,486 WARN

org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit

log, edits.new files already exists in all healthy directories:

/ Data / work / hdfs / name / current / edits.new

/ Backup / current / edits.new

2012-08-21 09:20:25,357 ERROR

org.apache.hadoop.security.UserGroupInformation:

PriviledgedActionException as: hadoop cause: java.net.ConnectException:

Connection refused

2012-08-21 09:20:25,357 ERROR

org.apache.hadoop.security.UserGroupInformation:

PriviledgedActionException as: hadoop cause: java.net.ConnectException:

Connection refused

2012-08-21 09:20:25,359 WARN org.mortbay.log: / getimage:

java.io.IOException: GetImage failed. Java.net.ConnectException:

Connection refused

Related errors in secondarynamenode.

Search into an argument because:

With 1.0.2, only one checkpoint process is executed at a time. When the

namenode gets an overlapping checkpointing request, it checks edit.new in

its storage directories. If namenode have this file, namenode concludes the

previous checkpoint process is not done yet and prints the warning message

you've seen. This is the case if you ensure edits.new file before the error operation

residual useless files can be deleted after the detection of whether there is

such a problem.

Also make sure that the following namenode hdfs-site.xml configuration:

< Property>

< name>dfs.secondary.http.address </ name>

< value> 0.0.0.0:50090 </ value>

</ Property>

Above 0.0.0.0 modify your deployment secondarynamenode host name

secondarynamenode hdfs-site.xml the following items:

< Property>< name> dfs.http.address </ name>

< value> 0.0.0.0:50070 </ value>

</ Property>

0.0.0.0 modify the above namenode host name for your deployment

1, hadoop-root-datanode-master.log the following error:

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:

Incompatible namespaceIDs in

Lead to the datanode not start.

Cause: each namenode format will be re-created is a namenodeId and directory as dfs.data.dir configuration parameters contained in the directory configured by last format created by id and dfs.name.dir parameters id inconsistent.

cleared the data of namenode under the, namenode format but not emptied the datanode under the data, resulting in startup failure have to do is each time before format, the Empty dfs.data.dir parameter configuration directory.

Formatted HDFS command

Shell code

1. hadoop namenode-format

2, if the datanode not connect Namenode, resulting in datanode can not start.

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:

Call to ... failed on local exception: java.net.NoRouteToHostException: No route to

host

Turn off the firewall

Shell code

1. service iptables stop

The machine is rebooted, the firewall will open.

3, from the local to the HDFS file system to upload files, the following error:

INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:

Bad connect ack with firstBadLink

INFO hdfs.DFSClient: Abandoning block blk_-1300529705803292651_37023

WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to

create new block.

Solution:

Turn off the firewall:

Shell code

1. service iptables stop

Disable SELinux:

Edit / etc / selinux / config file, set "SELINUX = disabled"

4, safe mode errors caused by

org.apache.hadoop.dfs.SafeModeException: Cannot delete ..., Name node is in

safe mode

Start in the distributed file system, the beginning of time there will be safe mode,

when the case of the distributed file system in safe mode, the contents of the file

system not allowed to modify can not be deleted until the end of the safe mode.

Safe Mode to check the of each DataNode data on the block effectiveness of the

system starts, according to certain policies necessary to copy or delete the part of

the data block. The runtime command can also be into safe mode. In practice, the

system starts to modify, and delete files safe mode not allowed to modify the

error message will just have to wait a while you can.

Java code

1. hadoop dfsadmin-safemode leave

Turn off safe mode

HadoopJadhav

Friday, October 11, 2013

Errors

No comments:

Post a Comment

Sidebar One