Hadoop Common Errors with Possible Solution
Here I’m writing some of the Hadoop Issue faced by me and
providing the solution
with it hope you all get the benefit from it.
Hadoop cluster namenode formatted (bin /
hadoop namenode -format)
restart cluster will appear as follows
1. After formatted name node (bin/hadoop namenode - format), Its
come
up with namespace Error,
ERROR: Incompatible namespaceIDS in ...: namenode namespaceID = ...,
datanode namespaceID = ...
Error because the format namenode will re-create a new
namespaceID, so
that the original and datanode inconsistent.
Solution:
1. Data files deleted the datanode
dfs.data.dir directory (default is tmp /
dfs / data)
2. Modify dfs.data.dir / current /
VERSION file the namespaceID and
namenode identical to
(log errors where there will be prompt)
3. To reassign new dfs.data.dir
directory
2. Hadoop cluster is started with start-all.sh, slave always
fail to start
datanode, and will get an error:
ERROR: Could only be replicated to 0 nodes, instead of 1
Is the node identification may be repeated (personally think the
wrong
reasons). There may also be other reasons, and what solution
then tries to
solve.
Solution:
1. If port access, you should make sure the
port is open, such as hdfs :/
/ machine1: 9000 /
50030,50070 like. Executive # iptables-I INPUT-p
tcp-dport 9000-j
ACCEPT command. If there is an error:
hdfs.DFSClient: Exception in
createBlockOutputStream
java.net.ConnectException:
Connection refused in; datanode port can
not access, modify iptables:
# iptables-I INPUT-s machine1-p tcp-j
datanode on ACCEPT
2. There may be firewall restrictions
between clusters to communicate
with each other. Try to turn
off the firewall. / Etc / init.d / iptables stop
3. Finally, there may be not enough disk
space, check df -al
3. The program execution
Error: java.lang.NullPointerException
Null pointer exception, to ensure that the correct java program.
Instantiated
before the use of the variable what statement do not like array
out of
bounds. Inspection procedures.
When the implementation of the program, (various) error, make
sure that
the
situation:
1. Premise of your program is
correct by compiled
2. Cluster mode, the data to be
processed wrote HDFS path and
ensure correct
3. Specify the execution of jar
package the entrance class name (I do not
know why sometimes you
do not specify also can run)
The correct wording
similar to this:
$ hadoop jar
myCount.jar myCount input output
4. Hadoop start datanode
ERROR: Unrecognized option:-jvm Could not the create the Java
virtual
machine.
Hadoop installation directory / bin / hadoop following piece of
shell:
CLASS = 'org.apache.hadoop.hdfs.server.datanode.DataNode'
if [[$ EUID-eq 0]]; then
HADOOP_OPTS = "$ HADOOP_OPTS-jvm server $
HADOOP_DATANODE_OPTS"
else
HADOOP_OPTS = "$ HADOOP_OPTS-server $
HADOOP_DATANODE_OPTS"
if
$ EUID user ID, if it is the root of this identification will be
0, so try not to
use the root user to operate hadoop .
5.Terminal error message:
ERROR hdfs.DFSClient: Exception closing file / user / hadoop /
musicdata.txt: java.io.IOException: All datanodes 10.210.70.82:50010
are
bad. Aborting ...
There are the jobtracker logs the error information
Error register getProtocolVersion
java.lang.IllegalArgumentException: Duplicate metricsName:
getProtocolVersion
And possible warning information:
WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException:
Broken
pipe
WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for
block blk_3136320110992216802_1063java.io.IOException: Connection
reset by peer
WARN hdfs.DFSClient: Error Recovery for block
blk_3136320110992216802_1063 bad datanode [0] 10.210.70.82:50010
put: All datanodes 10.210.70.82:50010 are bad. Aborting ...
The solution:
1. Path of under the
dfs.data.dir properties of whether the disk is full, try
hadoop fs
-put data if the processing is full again.
2. Related disk is not
full, you need to troubleshoot related disk has no
bad
sectors, need to be detected.
6.Hadoop jar program get the error message:
java.io.IOException: Type mismatch in key from map: expected
org.apache.hadoop.io.NullWritable, recieved
org.apache.hadoop.io.LongWritable
Or something like this:
Status: FAILED java.lang.ClassCastException:
org.apache.hadoop.io.LongWritable cannot be cast to
org.apache.hadoop.io.Text
Then you need to learn the basics of Hadoop and map reduce
model. In
"hadoop Definitive Guide book” in Chapter Hadoop I / O and
in Chapter VII,
MapReduce type and format. If you are eager to solve this
problem, I can
also tell you a quick solution, but this is bound to affect you
later
development:
Ensure consistent data:
... Extends Mapper ...
public void map (k1 k, v1 v, OutputCollector output) ...
...
... Extends Reducer ...
public void reduce (k2 k, v2 v, OutputCollector output) ...
...
job.setMapOutputKeyClass (k2.class);
job.setMapOutputValueClass (k2.class);
job.setOutputKeyClass (k3.class);
job.setOutputValueClass (v3.class);
...
Note that the corresponding k * and v *. Recommendations or two
chapters
I just said. Know the details of its principles.
7. If you hit a datanode error as follows:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
java.io.IOException: Cannot lock storage / data1/hadoop_data. The
directory is already locked.
According to the error prompts view, it is the directory locked,
unable to
read. At this time you need to look at whether there are related
process is
still running or slave machine hadoop process is still running,
use the linux
command to view:
Netstat -nap
ps-aux | grep Related PID
If hadoop related process is still running, use the kill command
to kill can.
And then re-use start-all.sh.
8. If you encounter the jobtracker error follows:
ERROR: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES;
bailingout.
Solution, modify datanode node /etc/hosts file.
Hosts under brief format:
Each line is divided into three parts: the first part of the
network IP address,
the second part of the host name or domain name, the third part
of the host
alias detailed steps are as follows:
1.first check the host name:
$ echo –e “ `hostname - i ` \t `hostname -n` \t $stn ”
Stn= short name or alies of hostname.
It will result in something like that
10.200.187.77 hadoop-datanode DN
If the IP address is configured on successfully modified, or
show host name
there is a problem, continue to modify the hosts file,
The shuffle error still appears this problem, then try to modify
the
configuration file of another user said hdfs-site.xml file, add
the following:
dfs.http.address
*. *. *: 50070 The ports do not change, instead of the asterisk
IP hadoop
information transfer through HTTP, the port is same.
9.If you encounter the jobtracker error follows:
ERROR: java.lang.RuntimeException: PipeMapRed.waitOutputThreads ():
subprocess failed with code *
This is a java throws the system returns an error code, the
meaning of the
error code indicates details.
Please Excuse my typos and please share comment if you feel
anything i left
out.
10. If you encounter the following error:
FAILED java.lang.IllegalArgumentException:
java.net.URISyntaxException:
Relative path in absolute URI: ***
URI inside the characters not allowed, such as the colon: the
class, the
operating system does not allow the file named characters. In
detail
according to the prompt part (part asterisk) , Elimination of
illegal character
can solve this issue.
11. To encounter tasktracker not start, tasktracker log error as
follows:
ERROR org.apache.hadoop.mapred.TaskTracker: Cannot start task
tracker
because java.net.BindException: Address already in use ***
Ports are occupied or have corresponding process starts when you
first stop
the cluster, and then use ps-aux | grep hadoop command to look
at the
related process of hadoop , And Kill hadoop daemon .
12. Encounter datanode not started, the datanode log error is as
follows:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
java.io.IOException: No locks available
No locks available can mean that you are trying to use hadoop on
a
filesystem that does not support file level locking.\
13. Are you trying to run your name node storage in NFS space?
Mentioned file-level locking, use
$ / Etc / init.d / nfs status
Command to view network file system, are closed. Another command
df-Th
or mount the type of the file system, you can view the results
obtained is
indeed the NFS file system. Hanging in the network file system
can not be
used, because seemingly read-only, if not read-only situation,
like said
above, does not support the file-level locking.
Finally the solution, you can try to add a file-level locking to
nfs. I am here is
to modify the dfs.data.dir, not to use nfs bin.
You can also try to format you Hadoop cluster (if it’s new one)
and start all over again.
14. The datanode died, and you cannot start the process, log
reported the
following errors:
2012-06-04 10:31:34,915 INFO
org.apache.hadoop.hdfs.server.common.Storage: Cannot access storage
directory / data5/hadoop_data
2012-06-04 10:31:34,915 INFO
org.apache.hadoop.hdfs.server.common.Storage: Storage directory /data5/hadoop_data
does not exist.
2012-06-04 10:31:35,033 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.util.DiskChecker
$ DiskErrorException: Invalid value for volsFailed: 2, Volumes tolerated: 0
This problem, I found reason the nodes of the disk becomes
read-only (only read) mode, online search, I found that quite a number of this
case, Linux machine's hard disk could have been set to read-write (Read /
Write) mode, but occasionally found automatically becomes read-only (Read
Only), check some information, this may happens for various reasons, may be the problem:
· File system errors
· Kernel hardware driver bug
· The FW firmware classes problem
· Disk bad sectors
· Hard disk backplane fault
· Hard drive cable fault
· HBA card failure
· RAID card failure
· inode resource depletion
The solution:
· Restart the server (command reboot)
· Re-mount the hard disk
· fsck try to repair
· Replace the hard disk
15. When running mapreduce task error is as follows:
2012-06-21 10:50:43,290 WARN org.mortbay.log: / mapOutput:
org.apache.hadoop.util.DiskChecker $ DiskErrorException: Could not
find
taskTracker/hadoop/jobcache/job_201206191809_0004/attempt_20120619
1809_0004_m_000006_0/output/file.out . index in any of the
configured
local directories
2012-06-21 10:50:45,592 WARN org.apache.hadoop.mapred.TaskTracker:
getMapOutput (attempt_201206191809_0004_m_000006_0, 0) failed:
org.apache.hadoop.util.DiskChecker $ DiskErrorException: Could not
find
taskTracker / ha.doop /
jobcache/job_201206191809_0004/attempt_201206191809_0004_m_0000
06_0/output/file.out.index in any of the configured local
directories
Although two warn, but also affect the operating efficiency,
they still try to
resolve the cause of the error is unable to find a job in the
middle of the
output file. Need to make the following checks:
1. Configuration of
mapred.local.dir property.
2. Df -h to see space
in the cache path adequacy.
3. Free look at the
memory space adequacy.
4. To ensure that the
cache path writable permissions.
5. Check disk
corruption.
The namenode cycle error is as follows:
2012-08-21 09:20:24,486 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll
edit
log, edits.new files already exists in all healthy directories:
/ Data / work / hdfs / name / current / edits.new
/ Backup / current / edits.new
2012-08-21 09:20:25,357 ERROR
org.apache.hadoop.security.UserGroupInformation:
PriviledgedActionException as: hadoop cause: java.net.ConnectException:
Connection refused
2012-08-21 09:20:25,357 ERROR
org.apache.hadoop.security.UserGroupInformation:
PriviledgedActionException as: hadoop cause:
java.net.ConnectException:
Connection refused
2012-08-21 09:20:25,359 WARN org.mortbay.log: / getimage:
java.io.IOException: GetImage failed. Java.net.ConnectException:
Connection refused
Related errors in secondarynamenode.
Search into an argument because:
With 1.0.2, only one checkpoint process is executed at a time.
When the
namenode gets an overlapping checkpointing request, it checks
edit.new in
its storage directories. If namenode have this file, namenode
concludes the
previous checkpoint process is not done yet and prints the
warning message
you've seen. This is the case if you ensure edits.new file
before the error operation
residual useless files can be deleted after the detection of
whether there is
such a problem.
Also make sure that the following namenode hdfs-site.xml
configuration:
< Property>
< name>dfs.secondary.http.address </ name>
< value> 0.0.0.0:50090 </ value>
</ Property>
Above 0.0.0.0 modify your deployment secondarynamenode host name
secondarynamenode hdfs-site.xml the following items:
< Property>< name> dfs.http.address </ name>
< value> 0.0.0.0:50070 </ value>
</ Property>
0.0.0.0 modify the above namenode host name for your deployment
1, hadoop-root-datanode-master.log the following error:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
java.io.IOException:
Incompatible namespaceIDs in
Lead to the datanode not start.
Cause: each namenode format will be re-created is a namenodeId
and directory as dfs.data.dir configuration parameters
contained in the directory configured by last format created
by id and dfs.name.dir parameters id inconsistent.
cleared the data of namenode under the, namenode format but not
emptied the datanode under the data, resulting in startup failure have to do is
each time before format, the Empty dfs.data.dir
parameter configuration directory.
Formatted HDFS command
Shell code
1. hadoop namenode-format
2, if the datanode not connect Namenode,
resulting in datanode can not start.
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
java.io.IOException:
Call to ... failed on local exception:
java.net.NoRouteToHostException: No route to
host
Turn off the firewall
Shell code
1. service
iptables stop
The machine is rebooted, the firewall will open.
3, from the local to the HDFS file system
to upload files, the following error:
INFO hdfs.DFSClient:
Exception in createBlockOutputStream java.io.IOException:
Bad connect ack
with firstBadLink
INFO hdfs.DFSClient:
Abandoning block blk_-1300529705803292651_37023
WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException:
Unable to
create new block.
Solution:
Turn off the firewall:
Shell code
1. service
iptables stop
Disable SELinux:
Edit / etc / selinux / config file, set "SELINUX =
disabled"
4, safe mode errors caused by
org.apache.hadoop.dfs.SafeModeException: Cannot delete ..., Name
node is in
safe mode
Start in the distributed file system, the beginning of time
there will be safe mode,
when the case of the distributed file system in safe mode, the
contents of the file
system not allowed to modify can not be deleted until the end of
the safe mode.
Safe Mode to check the of each DataNode data on the block
effectiveness of the
system starts, according to certain policies necessary to copy
or delete the part of
the data block. The runtime command can also be into safe mode.
In practice, the
system starts to modify, and delete files safe mode not allowed
to modify the
error message will just have to wait a while you can.
Java code
1. hadoop dfsadmin-safemode leave
Turn off safe mode
No comments:
Post a Comment