Last updated on May 27, 2012 by Dan Nanni
Cloudera's Distribution (CDH) provides streamlined installation of Apache Hadoop via Cloudera Manager. Besides Apache Hadoop, CDH also allows installation of other components such as Hive, Pig, HBase, ZooKeeper, etc. in a modular fashion. The free edition of CDH Manager allows you to build and monitor a Hadoop cluster consisting of up to 50 nodes.
If you would like to install and configure HDFS/Hadoop on a small scale, I strongly recommend CDH for you.
You can install Cloudera Manager on Redhat-compatible systems as well as Ubuntu/Debian systems. However, Cloudera Manager can only support cluster nodes which are based on CentOS/RHEL. Therefore you need to install CentOS or RHEL on every cluster node, to be able to have them managed by Cloudera Manager.
In this example, I will show you how to install and configure HDFS and Hadoop using CDH3 (CDH version 3). I assume that there are one Cloudera Manager node, five cluster nodes, and (optionally) one client node (which will access Hadoop cluster).
First, disable SELinux on all cluster nodes, and reboot them:
$ sudo vi /etc/sysconfig/selinux
SELINUX=disabled
$ sudo chkconfig iptables off
Make sure that every cluster node as well as Cloudera Manager node has a fully qualified domain name (FQDN) in /etc/sysconfig/network
and /etc/hosts
. I recommend that /etc/hosts
file of every cluster node as well as Cloudera Manager node include FQDNs of all nodes as follows. Otherwise, you may not be able to add cluster nodes to Cloudera Manager.
$ sudo vi /etc/hosts
192.168.212.10 manager.mydomain.com 192.168.212.11 node0.mydomain.com 192.168.212.12 node1.mydomain.com 192.168.212.13 node2.mydomain.com 192.168.212.14 node3.mydomain.com 192.168.212.15 node4.mydomain.com
Make sure to mount the partition used for data storage in each cluster node with "noatime" option. With noatime, read access to a file will no longer result in an update to the atime information associated with the file. For example, /etc/fstab in each cluster node can have:
/dev/sdb1 ext4 noatime 1 1
Make sure to have each and every cluster node accessible via ssh with the identical root password.
Next, install CDH3 on Cloudera Manager node:
$ wget http://archive.cloudera.com/cloudera-manager/installer/latest/cloudera-manager-installer.bin $ ./cloudera-manager-installer.bin
Now, go to http://manager.myhost.com:7180/
in your browser to access Cloudera Manager interface. The default login/password for CDH3 is admin
/admin
.
Add all cluster nodes, and then install/start HDFS/Hadoop on all existing cluster nodes through Cloudera Manager interface. Once HDFS/Hadoop get started by Cloudera Manager, the HDFS storage cluster will have /tmp
folder created by default.
Generate client configurations through Cloudera Manager interface, and download the generated global-clientconfig.zip
.
On client node (which will read/write files hosted in HDFS, and initiate Hadoop jobs), do the following.
Put the FQDNs of all cluster nodes in /etc/hosts
.
Upload global-clientconfig.zip
to the client node, and unzip it. It will create hadoop-conf
directory, and put HDFS/Hadoop configuration files inside.
Set up environment variable for Hadoop configuration directory.
$ export HADOOP_CONF_DIR=[location of hadoop-conf directory]
Install Hadoop on the client node.
Finally, test if you can access HDFS from the client node as follows.
$ hadoop dfs -ls /tmp
If the above command shows the content of the local /tmp
directory of the client node, instead of /tmp
directory created inside the storage cluster, something must be wrong. Double check if HADOOP_CONF_DIR
is set up correctly, and configuration files are sane. If the command successfully shows /tmp
directory created inside the storage cluster, you are ready to start a Hadoop job from the client node.
This website is made possible by minimal ads and your gracious donation via PayPal or credit card
Please note that this article is published by Xmodulo.com under a Creative Commons Attribution-ShareAlike 3.0 Unported License. If you would like to use the whole or any part of this article, you need to cite this web page at Xmodulo.com as the original source.
Xmodulo © 2021 ‒ About ‒ Write for Us ‒ Feed ‒ Powered by DigitalOcean