PRACTICAL 4 - Single and Multi Node Hadoop Install
PRACTICAL 4 - Single and Multi Node Hadoop Install
Aim: Hadoop installation as single node cluster and multi node cluster.
Pre-requisite:
OS: UBUNTU 14.04 LTS
FRAMEWORK: Hadoop 2.7.3
JAVA VERSION: 1.7.0_131
Steps:
3. Download Hadoop from apache.hadoop.org site and to install hadoop perform the step
as under:
Gcet@gfl1-5:~$ tar -xvf Hadoop-2.7.3.tar.gz
hadoop-2.7.3/share/hadoop/tools/lib/hadoop-extras-2.7.3.jar
hadoop-2.7.3/share/hadoop/tools/lib/asm-3.2.jar
hadoop-2.7.3/include/
hadoop-2.7.3/include/hdfs.h
hadoop-2.7.3/include/Pipes.hh
hadoop-2.7.3/include/TemplateFactory.hh
hadoop-2.7.3/include/StringUtils.hh
hadoop-2.7.3/include/SerialUtils.hh
hadoop-2.7.3/LICENSE.txt
hadoop-2.7.3/NOTICE.txt
hadoop-2.7.3/README.txt
Spilled Records=0
Shuffled Maps =1
GC time elapsed (ms)=10
Total committed heap usage (bytes)=854065152
Shuffle BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=98
File Output Format Counters
Bytes Written=8
Multi node cluster:
Steps:
Command: ip addr show (you can use the ifconfig command as well)
STEP 3: Open hosts file to add master and data node with their respective IP addresses.
Same properties will be displayed in the master and slave hosts files.
STEP 4: Restart the sshd service.
STEP 5: Create the SSH Key in the master node. (Press enter button when it asks you to enter a filename to
STEP 6: Copy the generated ssh key to master node’s authorized keys.
STEP 7: Copy the master node’s ssh key to slave’s authorized keys.
STEP 12: Add the Hadoop and Java paths in the bash file (.bashrc) on all nodes.
Open. bashrc file. Now, add Hadoop and Java Path as shown below:
For applying all these changes to the current Terminal, execute the source command.
accessed through the Terminal, execute the java -version and hadoop version commands.
STEP 13: Create masters file and edit as follows in both master and slave machines as below:
STEP 19: Copy mapred-site from the template in configuration folder and the edit mapred-site.xml on both
STEP 20: Edit yarn-site.xml on both master and slave machines as follows:
Command: sudo gedit /home/edureka/hadoop-2.7.3/etc/hadoop/yarn-site.xml
Command: ./sbin/start-all.sh
STEP 23: Check all the daemons running on both master and slave machines.
Command: jps
On master
On slave
At last, open the browser and go to master:50070/dfshealth.html on your master machine, this will give
you the NameNode interface. Scroll down and see for the number of live nodes, if its 2, you have
successfully setup a multi node Hadoop cluster. In case, it’s not 2, you might have missed out any of
the steps which I have mentioned above. But no need to worry, you can go back and verify all the