Sunil S. Ranka's Weblog

Superior Data Analytics is the antidote to Business Failure

Hadoop Data Replication Strategy

Posted by sranka on October 17, 2013

Hi All

With replication and fault tolerance, an inbuilt feature of Hadoop. I was always curious to know how blocks are replicated. Got this information while reading “Hadoop The Definitive Guide Edition – 3 ”  in chapter 3 “The Hadoop Distributed Filesystem”. Thought would be interesting to share.

  • How does the namenode choose which datanodes to store replicas on?

Hadoop’s default strategy is to place the first replica on the same node as the client (for clients running outside the cluster, a node is chosen at random, although the system tries not to pick nodes that are too full or too busy). The second replica is placed on a different rack from the first (off-rack), chosen at random. The third replica is placed on the same rack as the second, but on a different node chosen at random. Further replicas are placed on random nodes on the cluster, although the system tries to avoid placing too many replicas on the same rack.

 The above entire text has been taken from Chapter 3 of “Hadoop The Definitive Guide Edition – 3 “

Hope This helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 50 other followers

%d bloggers like this: