0% found this document useful (0 votes)
36 views11 pages

PRACTICAL 4 - Single and Multi Node Hadoop Install

Uploaded by

rodylogin69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views11 pages

PRACTICAL 4 - Single and Multi Node Hadoop Install

Uploaded by

rodylogin69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Practical – 4

Aim: Hadoop installation as single node cluster and multi node cluster.

Pre-requisite:
OS: UBUNTU 14.04 LTS
FRAMEWORK: Hadoop 2.7.3
JAVA VERSION: 1.7.0_131

Single node Cluster:

Steps:

1. check if linux repository service is working or not:


Gcet@gfl1-5:~$ sudo apt-get update

Ign http://extras.ubuntu.com trusty InRelease


Ign http://in.archive.ubuntu.com trusty InRelease
Get:1 http://extras.ubuntu.com trusty Release.gpg [72 B]

Hit http://in.archive.ubuntu.com trusty/universe Translation-en


Ign http://in.archive.ubuntu.com trusty/main Translation-en_IN
Ign http://in.archive.ubuntu.com trusty/multiverse Translation-en_IN
Ign http://in.archive.ubuntu.com trusty/restricted Translation-en_IN
Ign http://in.archive.ubuntu.com trusty/universe Translation-en_IN
Fetched 4,302 kB in 40s (107 kB/s)
Reading package lists... Done

2. Check java version:


Gcet@gfl1-5:~$ java -version

java version "1.7.0_131"


OpenJDK Runtime Environment (IcedTea 2.6.9) (7u131-2.6.9-0ubuntu0.14.04.2)
OpenJDK Server VM (build 24.131-b00, mixed mode)

3. Download Hadoop from apache.hadoop.org site and to install hadoop perform the step
as under:
Gcet@gfl1-5:~$ tar -xvf Hadoop-2.7.3.tar.gz

hadoop-2.7.3/share/hadoop/tools/lib/hadoop-extras-2.7.3.jar
hadoop-2.7.3/share/hadoop/tools/lib/asm-3.2.jar
hadoop-2.7.3/include/
hadoop-2.7.3/include/hdfs.h
hadoop-2.7.3/include/Pipes.hh
hadoop-2.7.3/include/TemplateFactory.hh
hadoop-2.7.3/include/StringUtils.hh
hadoop-2.7.3/include/SerialUtils.hh
hadoop-2.7.3/LICENSE.txt
hadoop-2.7.3/NOTICE.txt
hadoop-2.7.3/README.txt

Gcet@gfl1-5:~$ sudo mv/home/Gcet/Downloads/Hadoop-2.7.3 /usr/local/Hadoop


4. check if hadoop is working properly or not using the command under:
Gcet@gfl1-5:~$ /usr/local/hadoop/hadoop-2.7.3/bin/hadoop

classpath prints the class path needed to get the


credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

5. install openssl, ssh and rsync:


Gcet@gfl1-5:~$ sudo apt-get install openssl

[sudo] password for Gcet:


Reading package lists... Done
Building dependency tree
Reading state information... Done
openssl is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 460 not upgraded.

Gcet@gfl1-5:~$ sudo apt-get install ssh

Reading package lists... Done


Building dependency tree
Reading state information... Done
ssh is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 460 not upgraded.

Gcet@gfl1-5:~$ sudo apt-get install ssl

Reading package lists... Done


Building dependency tree
Reading state information... Done
rsync is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 460 not upgraded.

6. set environment variable for java:


Gcet@gfl1-5:~$ export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386

7. to run examples on single system requires to create input file


Gcet@gfl1-5:~$ mkdir input1
Gcet@gfl1-5:~$ mkdir output1

8. Now copy xml file from Hadoop folder to input folder


Gcet@gfl1-5:~$ cp /usr/local/hadoop/hadoop-2.7.3/etc/hadoop/capacity-scheduler.xml
/home/Gcet/Desktop/input1
9. Now run Hadoop examples:
Gcet@gfl1-5:~$ /usr/local/hadoop/hadoop-2.7.3/bin/hadoop jar /usr/local/hadoop/hadoop-
2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep /home/Gcet/Desktop/input1/
/home/Gcet/output1/output1 'principal[.]*'
17/07/26 15:15:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
17/07/26 15:15:52 INFO Configuration.deprecation: session.id is deprecated. Instead, use
dfs.metrics.session-id
17/07/26 15:15:52 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
17/07/26 15:15:52 INFO input.FileInputFormat: Total input paths to process : 2
17/07/26 15:15:52 INFO mapreduce.JobSubmitter: number of splits:2
17/07/26 15:15:53 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_local1790612813_0001
17/07/26 15:15:53 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/07/26 15:15:53 INFO mapreduce.Job: Running job: job_local1790612813_0001
17/07/26 15:15:53 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/07/26 15:15:53 INFO output.FileOutputCommitter: File Output Committer Algorithm version
is 1
17/07/26 15:15:53 INFO mapred.LocalJobRunner: OutputCommitter is
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
17/07/26 15:15:53 INFO mapred.LocalJobRunner: Waiting for map tasks
17/07/26 15:15:53 INFO mapred.LocalJobRunner: Starting task:
attempt_local1790612813_0001_m_000000_0
17/07/26 15:15:53 INFO output.FileOutputCommitter: File Output Committer Algorithm version
is 1
17/07/26 15:15:53 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
.
.
.
.
17/07/26 15:15:55 INFO mapreduce.Job: Job job_local192240145_0002 running in uber mode :
false
17/07/26 15:15:55 INFO mapreduce.Job: map 100% reduce 100%
17/07/26 15:15:55 INFO mapreduce.Job: Job job_local192240145_0002 completed successfully
17/07/26 15:15:55 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=1195494
FILE: Number of bytes written=2315812
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=0

Spilled Records=0
Shuffled Maps =1
GC time elapsed (ms)=10
Total committed heap usage (bytes)=854065152
Shuffle BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=98
File Output Format Counters
Bytes Written=8
Multi node cluster:
Steps:

We have two machines (master and slave) with IP:

Master IP: 192.168.56.102

Slave IP: 192.168.56.103

STEP 1: Check the IP address of all machines.

Command: ip addr show (you can use the ifconfig command as well)

STEP 2: Disable the firewall restrictions.

Command: service iptables stop

Command: sudo chkconfig iptables off

STEP 3: Open hosts file to add master and data node with their respective IP addresses.

Command: sudo nano /etc/hosts

Same properties will be displayed in the master and slave hosts files.
STEP 4: Restart the sshd service.

Command: service sshd restart

STEP 5: Create the SSH Key in the master node. (Press enter button when it asks you to enter a filename to

save the key).

Command: ssh-keygen -t rsa -P “”

STEP 6: Copy the generated ssh key to master node’s authorized keys.

Command: cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

STEP 7: Copy the master node’s ssh key to slave’s authorized keys.

Command: ssh-copy-id -i $HOME/.ssh/id_rsa.pub edureka@slave


STEP 8: Click here to download the Java 8 Package. Save this file in your home directory.

STEP 9: Extract the Java Tar File on all nodes.

Command: tar -xvf jdk-8u101-linux-i586.tar.gz

STEP 10: Download the Hadoop 2.7.3 Package on all nodes.

Command: wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz

STEP 11: Extract the Hadoop tar File on all nodes.

Command: tar -xvf hadoop-2.7.3.tar.gz

STEP 12: Add the Hadoop and Java paths in the bash file (.bashrc) on all nodes.

Open. bashrc file. Now, add Hadoop and Java Path as shown below:

Command: sudo gedit .bashrc

Then, save the bash file and close it.

For applying all these changes to the current Terminal, execute the source command.

Command: source .bashrc


To make sure that Java and Hadoop have been properly installed on your system and can be

accessed through the Terminal, execute the java -version and hadoop version commands.

Command: java -version

Command: hadoop version

Now edit the configuration files in hadoop-2.7.3/etc/hadoop directory.

STEP 13: Create masters file and edit as follows in both master and slave machines as below:

Command: sudo gedit masters

STEP 14: Edit slaves file in master machine as follows:

Command: sudo gedit /home/edureka/hadoop-2.7.3/etc/hadoop/slaves

STEP 15: Edit slaves file in slave machine as follows:

Command: sudo gedit /home/edureka/hadoop-2.7.3/etc/hadoop/slaves


STEP 16: Edit core-site.xml on both master and slave machines as follows:

Command: sudo gedit /home/edureka/hadoop-2.7.3/etc/hadoop/core-site.xml

1<?xml version="1.0" encoding="UTF-8"?>


2<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3<configuration>
4<property>
5<name>fs.default.name</name>
6<value>hdfs://master:9000</value>
7</property>
8</configuration>

STEP 7: Edit hdfs-site.xml on master as follows:

Command: sudo gedit /home/edureka/hadoop-2.7.3/etc/hadoop/hdfs-site.xml

1 <?xml version="1.0" encoding="UTF-8"?>


2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3 <configuration>
4 <property>
5 <name>dfs.replication</name>
6 <value>2</value>
7 </property>
8 <property>
9 <name>dfs.permissions</name>
10<value>false</value>
11</property>
12<property>
13<name>dfs.namenode.name.dir</name>
14<value>/home/edureka/hadoop-2.7.3/namenode</value>
15</property>
16<property>
17<name>dfs.datanode.data.dir</name>
18<value>/home/edureka/hadoop-2.7.3/datanode</value>
19</property>
20</configuration>

STEP 18: Edit hdfs-site.xml on slave machine as follows:

Command: sudo gedit /home/edureka/hadoop-2.7.3/etc/hadoop/hdfs-site.xml


1 <?xml version="1.0" encoding="UTF-8"?>
2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3 <configuration>
4 <property>
5 <name>dfs.replication</name>
6 <value>2</value>
7 </property>
8 <property>
9 <name>dfs.permissions</name>
10<value>false</value>
11</property>
12<property>
13<name>dfs.datanode.data.dir</name>
14<value>/home/edureka/hadoop-2.7.3/datanode</value>
15</property>
16</configuration>

STEP 19: Copy mapred-site from the template in configuration folder and the edit mapred-site.xml on both

master and slave machines as follows:

Command: cp mapred-site.xml.template mapred-site.xml

Command: sudo gedit /home/edureka/hadoop-2.7.3/etc/hadoop/mapred-site.xml

1<?xml version="1.0" encoding="UTF-8"?>


2<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3<configuration>
4<property>
5<name>mapreduce.framework.name</name>
6<value>yarn</value>
7</property>
8</configuration>

STEP 20: Edit yarn-site.xml on both master and slave machines as follows:
Command: sudo gedit /home/edureka/hadoop-2.7.3/etc/hadoop/yarn-site.xml

1 <?xml version="1.0" encoding="UTF-8"?>


2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3 <configuration>
4 <property>
5 <name>yarn.nodemanager.aux-services</name>
6 <value>mapreduce_shuffle</value>
7 </property>
8 <property>
9 <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
10<value>org.apache.hadoop.mapred.ShuffleHandler</value>
11</property>
12</configuration>

STEP 21: Format the namenode (Only on master machine).

Command: hadoop namenode -format

STEP 22: Start all daemons (Only on master machine).

Command: ./sbin/start-all.sh

STEP 23: Check all the daemons running on both master and slave machines.

Command: jps

On master

On slave
At last, open the browser and go to master:50070/dfshealth.html on your master machine, this will give

you the NameNode interface. Scroll down and see for the number of live nodes, if its 2, you have

successfully setup a multi node Hadoop cluster. In case, it’s not 2, you might have missed out any of

the steps which I have mentioned above. But no need to worry, you can go back and verify all the

configurations again to find the issues and then correct them.

You might also like