Login with  Log in with facebook
Hiring Manager? SIGN UP HERE
Feb/13

11

Who is this SecondaryNameNode anyway

SecondaryNameNode is one of the most confusing things in Hadoop. 

There are not many books in Hadoop and I happened to be reading one and I quit reading the book the moment I saw this sentence.

"The NameNode is a single point of failure, and on failure it will stop all the operations of the HDFS cluster. To avoid this, Hadoop supports a secondary NameNode that will hold a copy of all data in NameNode. If the NameNode fails, the secondary NameNode takes its place."

This statement may be true in a very specific case of NameNode HA but then it should be mentioned. 

Anyway, I thought it's a good idea to write a small blog about why SecondaryNameNode exists. The more appropriate term for SecondaryNameNode should be NameNodeHelper. 

Let's first understand how NameNode works. NameNode keeps metadata about files and blocks stored in the cluster. To make this information readily accessible it's kept in RAM, around 1 GB Ram is required for 1 million blocks.

Now this information has to be persisted somehow on harddisk. NameNode does that in fsimage file. Any changes to the cluster after that are recorded in edits file, WAL.  Whenever namenode restarts first Meta-data from NameNode is loaded then changes from edits are applied in sequence. 

This is good but can not continue forever. At some point in time the changes in edits have to be merged with fsimage. This is where role of secondary name comes into picture. How does SNN do it.

1. It asks NN to start writing to a new file edits.new. 

2. It copies both fsimage and edits.

3. It merges both of them to create new fsimage file.

4. Puts the new fsimage back ton NN.

 

Schedule a Demo

Schedule a Demo with us

Name *
Email *
Phone *
Company *
Details