In this post we’ll talk about the HDFS Federation feature introduced in Hadoop 2.x versions. With HDFS federation we can have more than one NameNode in the Hadoop cluster each managing a part of the namespace.
HDFS architecture limitations
HDFS follows a master/slave architecture where NameNode acts as a master and then there are several DataNodes. NameNode in the HDFS architecture performs the following tasks.
- Managing the Namespace– NameNode manages the file system namespace. NameNode stores metadata about the files
and directories in the file system.
NameNode also supports all the namespace related file system operations such as create, delete, modify and list files and directories.
- Block Storage Service- NameNode also maintains block mapping i.e. in which DataNode any given block is stored.
NameNode Supports block related operations such as create, delete, modify and get block location.
In the prior HDFS architecture (Before Hadoop 2.x) only single namespace for the entire cluster is allowed and a single Namenode manages the namespace. In a large clusters with many files having a single NameNode may become a limiting factor for memory needs and scaling.
HDFS Federation tries to address this problem.
HDFS Federation support for multiple NameNodes/NameSpaces
HDFS Federation addresses this limitation of having a single NameNode by adding support for multiple Namenodes/namespaces to HDFS. In HDFS Federation architecture NameNodes manages a part of the namespace. Thus HDFS Federation adds support for namespace horizontal scaling.
As Example– If there are two namespaces /sales and /finance then you can have two Namenodes; NameNode1 and NameNode2 where NameNode1 manages all files under /sales and NameNode2 manages all files under /finance.
In HDFS federation Namenodes are federated; the Namenodes are independent and do not require coordination with each other. Thus, a Namenode failure does not prevent the Datanode from serving other Namenodes in the cluster.
Note that Datanodes are still used as common storage for blocks by all the Namenodes. Each Datanode registers with all the Namenodes in the cluster. Datanodes send periodic heartbeats and block reports.
HDFS Federation Namespace Volume
In HDFS Federation a set of blocks that belong to a single namespace is known as Block Pool.
A Namespace and its block pool together are called Namespace Volume. It is a self-contained unit of management.
HDFS Federation configuration
If we take the same example of having two namespaces /sales and /finance and two namenodes NameNode1 and NameNode2 then the required configuration changes are as follows.
You need to add the dfs.nameservices parameter to your configuration file (hdfs-site.xml) and configure it with a list of comma separated NameServiceIDs.
<property> <name>dfs.nameservices</name> <value>sales,finance</value> </property>
You also need to add following configuration parameters suffixed with the corresponding NameServiceID.
<property> <name>dfs.namenode.rpc-address.sales</name> <value>namenode1:8020</value> </property> <property> <name>dfs.namenode.http-address.sales</name> <value>namenode1:50070</value> </property> <property> <name>dfs.namenode.rpc-address.finance</name> <value>namenode2:8020</value> </property> <property> <name>dfs.namenode.http-address.finance</name> <value>namenode2:50070</value> </property>You can use ViewFs to create personalized namespace views. That change is required in core-site.xml.
<property> <name>fs.defaultFS</name> <value>viewfs:///</value> </property>Then the mount table config variables for sales and finance.
<property> <name>fs.viewfs.mounttable.default.link./sales</name> <value>hdfs://namenode1:8020/sales</value> </property> <property> <name>fs.viewfs.mounttable.default.link./finance</name> <value>hdfs://namenode2:8020/finance</value> </property>
Reference: https://hadoop.apache.org/docs/r2.9.0/api/org/apache/hadoop/fs/viewfs/ViewFs.html
That's all for this topic HDFS Federation in Hadoop Framework. If you have any doubt or any suggestions to make please drop a comment. Thanks!
>>>Return to Hadoop Framework Tutorial Page
Related Topics
You may also like-