Login with  Log in with facebook
Hiring Manager? SIGN UP HERE
Q1A Tech Tips
Information for developers tagged by technical specialty
Have a cool tech tip you want to share? Add it here

Vote
Answer
21 Views

Yes, you can do it using output commiters. Output Committers Hadoop makes sure a job either succeds or fails gracefully. This is done via OutputCommitter. This is accessible from OutputFormat by OutputFormat.getOutputCommiter() public abstract OutputCommitter [...]

RishiYadav
05/28/2013 at 19:38
Vote
Answer
8 Views

Remove from  mapred.exclude Remove from  hdfs.exclude $ hadoop mradmin -refreshNodes $ hadoop dfsadmin -refreshNodes $ hadoop-daemon.sh start tasktracker $ hadoop-daemon.sh start datanode

RishiYadav
05/27/2013 at 13:48
Vote
Answer
10 Views

If your cluster does not have excludes file, add it in hdfs-site.xml dfs.hosts.exclude /usr/local/hadoop/conf/excludes Names a file that contains a list of hosts excluded from cluster   Add hostname of the node you want to remove to  mapred.exclude [...]

RishiYadav
05/27/2013 at 13:47
Vote
Answer
6 Views

Intermediate data is not written in hdfs but in local disk.

RishiYadav
05/21/2013 at 18:53
Vote
Answer
13 Views

If all replicas of one or more blocks of a file become unavailable, a file is considered corrupt and any attempt to access this file will lead to exception. To check health of hadoop filesystem like Linux hadoop has "fsck" command.  fsck generates a summary report that lists the [...]

RishiYadav
05/06/2013 at 19:12
Vote
Answer
4 Views

Hadoop stores data in form of blocks. A block is replicated across the cluster as per the replicationFactor which is 3 by default. Default block size is 64MB.  A file is divided into blocks when it's moved into the cluster. A file can be divided into multiple blocks but one block can [...]

RishiYadav
04/30/2013 at 13:26
Vote
Answer
1 View

one easy way to differentiate between Hadoop old api and new api is packages. old api packages are identifiable by  mapred or to put it precisely subpackages of org.apache.hadoop.mapred package, new api packages are identifiable by  mapreduce or to put precisely subpackages of [...]

RishiYadav
04/25/2013 at 19:41
Vote
Answer
2 Views

With so many Hadoop versions floating around , it becomes a challenge which one to choose. Choice is simple though, always go for most stable version which at the time of this writing is 1.0.4.  Yarn which is also called Hadoop 2 is making progress towards first stable release. Yarn [...]

RishiYadav
04/20/2013 at 10:22
Vote
Answer
3 Views

Hadoop write path is slightly more complicated than read path.  First client library sends a request to namenode with a named file. After checking permissions NameNode creates filesystem metadata for the file. No blocks are created yet.  Response to client tells that request to [...]

RishiYadav
04/10/2013 at 18:13
Vote
Answer
1 View

The default size of Distributed Cache for Hadoop is 10 GB but it can be configured using "local.cache.size" in core-site.xml or mapred-site.xml. It's default location is   /tmp/hadoop-root/mapred/local/ archive. How often this cleanup of cache ( if needed) is triggered. By default [...]

RishiYadav
04/18/2013 at 19:50
Schedule a Demo

Schedule a Demo with us

Name *
Email *
Phone *
Company *
Details