Login with  Log in with facebook
Hiring Manager? SIGN UP HERE

Hadoop stores data in form of blocks. A block is replicated across the cluster as per the replicationFactor which is 3 by default. Default block size is 64MB. 

A file is divided into blocks when it's moved into the cluster. A file can be divided into multiple blocks but one block can contain only one file. 

InputSplit is how RecordReader presents data to mapper. Mapper gets one input split at a time to put it other way. How it's split depends upon InputFormat. Default InputFormat is FileInputFormat which uses lineFeed for InputSplit i.e. each line is a separate input split.

Rishi Yadav
04/30/2013 at 13:26
If you want to post any answer to this forum then you need to log in.
Schedule a Demo

Schedule a Demo with us

Name *
Email *
Phone *
Company *