0% found this document useful (0 votes)

78 views

Big Data - ASSIGNMENT 2

This document describes running MapReduce jobs on Hadoop. It provides steps to: 1. Create MapReduce programs in Java with mappers and reducers to count product sales by country from input data. 2. Compile and build the Java programs into a JAR file. 3. Run the MapReduce job on Hadoop by submitting the JAR, which will process the input data and write results to output. 4. Explanations of the SalesMapper and SalesCountryReducer classes are also provided, describing how they implement the map and reduce functions.

Uploaded by

DHARSHANA C P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views

Big Data - ASSIGNMENT 2

Uploaded by

DHARSHANA C P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Assignment - 2

Q2. Perform following tasks on Big data platform such as Hadoop:

1. Run Map and Reduce codes

2. Data storage and retrieval operations
3. Batch processing operations

1. Run Map and Reduce codes

First Hadoop MapReduce Program

Step 1)
Create a new directory with name MapReduceTutorial

sudo mkdir MapReduceTutorial

Give permissions

sudo chmod -R 777 MapReduceTutorial

SalesMapper.java

package SalesCountry;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
public class SalesMapper extends MapReduceBase implements Mapper
<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);

public void map(LongWritable key, Text value, OutputCollector <Text,

IntWritable> output, Reporter reporter) throws IOException {

String valueString = value.toString();

String[] SingleCountryData = valueString.split(",");
output.collect(new Text(SingleCountryData[7]), one);
}
}

SalesCountryReducer.java

package SalesCountry;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;

public class SalesCountryReducer extends MapReduceBase implements

Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text t_key, Iterator<IntWritable> values,

OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException {
Text key = t_key;
int frequencyForCountry = 0;
while (values.hasNext()) {
// replace type of value with the actual type of our value
IntWritable value = (IntWritable) values.next();
frequencyForCountry += value.get();

}
output.collect(key, new IntWritable(frequencyForCountry));
}
}
SalesCountryDriver.java

package SalesCountry;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;

public class SalesCountryDriver {

public static void main(String[] args) {
JobClient my_client = new JobClient();
// Create a configuration object for the job
JobConf job_conf = new JobConf(SalesCountryDriver.class);

// Set a name of the Job

job_conf.setJobName("SalePerCountry");

// Specify data type of output key and value

job_conf.setOutputKeyClass(Text.class);
job_conf.setOutputValueClass(IntWritable.class);

// Specify names of Mapper and Reducer Class

job_conf.setMapperClass(SalesCountry.SalesMapper.class);
job_conf.setReducerClass(SalesCountry.SalesCountryReducer.class);

// Specify formats of the data type of Input and output

job_conf.setInputFormat(TextInputFormat.class);
job_conf.setOutputFormat(TextOutputFormat.class);

// Set input and output directories using command line arguments,

//arg[0] = name of input directory on HDFS, and arg[1] = name of output
directory to be created to store the output file.

FileInputFormat.setInputPaths(job_conf, new Path(args[0]));

FileOutputFormat.setOutputPath(job_conf, new Path(args[1]));

my_client.setConf(job_conf);
try {
// Run the job
JobClient.runJob(job_conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}

heck the file permissions of all these files

and if 'read' permissions are missing then grant the same-

Step 2)
Export classpath

export CLASSPATH="$HADOOP_HOME/share/hadoop/mapreduce/hadoop-
mapreduce-client-core-2.2.0.jar:
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-common-
2.2.0.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-
2.2.0.jar:~/MapReduceTutorial/SalesCountry/*:$HADOOP_HOME/lib/*"

Step 3)
Compile Java files (these files are present in directory Final-MapReduceHandsOn).
Its class files will be put in the package directory

javac -d . SalesMapper.java SalesCountryReducer.java SalesCountryDriver.java

This warning can be safely ignored.

This compilation will create a directory in a current directory named with package
name specified in the java source file (i.e. SalesCountry in our case) and put all
compiled class files in it.
Step 4)
Create a new file Manifest.txt

sudo gedit Manifest.txt

add following lines to it,

Main-Class: SalesCountry.SalesCountryDriver

SalesCountry.SalesCountryDriver is the name of main class. Please note that you

have to hit enter key at end of this line.

Step 5)
Create a Jar file

jar cfm ProductSalePerCountry.jar Manifest.txt SalesCountry/*.class

Check that the jar file is created

Step 6)
Start Hadoop

$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
Step 7)
Copy the File SalesJan2009.csv into ~/inputMapReduce

Now Use below command to copy ~/inputMapReduce to HDFS.

$HADOOP_HOME/bin/hdfs dfs -copyFromLocal ~/inputMapReduce /

We can safely ignore this warning.

Verify whether a file is actually copied or not.

$HADOOP_HOME/bin/hdfs dfs -ls /inputMapReduce

Step 8)
Run MapReduce job

$HADOOP_HOME/bin/hadoop jar ProductSalePerCountry.jar /inputMapReduce

/mapreduce_output_sales

This will create an output directory named mapreduce_output_sales on HDFS.

Contents of this directory will be a file containing product sales per country.

Step 9)
The result can be seen through command interface as,

$HADOOP_HOME/bin/hdfs dfs -cat /mapreduce_output_sales/part-00000

Results can also be seen via a web interface as-

Open r in a web browser.

Now select 'Browse the filesystem' and navigate to /mapreduce_output_sales

Open part-r-00000

Explanation of SalesMapper Class

In this section, we will understand the implementation of SalesMapper class.
1. We begin by specifying a name of package for our class. SalesCountry is a name
of our package. Please note that output of compilation, SalesMapper.class will go
into a directory named by this package name: SalesCountry.

Followed by this, we import library packages.

Below snapshot shows an implementation of SalesMapper class-

Sample Code Explanation:

1. SalesMapper Class Definition-

public class SalesMapper extends MapReduceBase implements

Mapper<LongWritable, Text, Text, IntWritable> {

Every mapper class must be extended from MapReduceBase class and it must

implement Mapper interface.

2. Defining 'map' function-

public void map(LongWritable key,

Text value,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException
The main part of Mapper class is a 'map()' method which accepts four arguments.
At every call to 'map()' method, a key-value pair ('key' and 'value' in this code) is
passed.

'map()' method begins by splitting input text which is received as an argument. It

uses the tokenizer to split these lines into words.

String valueString = value.toString();

String[] SingleCountryData = valueString.split(",");
Here, ',' is used as a delimiter.

After this, a pair is formed using a record at 7th index of

array 'SingleCountryData' and a value '1'.

output.collect(new Text(SingleCountryData[7]), one);

We are choosing record at 7th index because we need Country data and it is

located at 7th index in array 'SingleCountryData'.

Please note that our input data is in the below format (where Country is at 7th index,
with 0 as a starting index)-

Transaction_date,Product,Price,Payment_Type,Name,City,State,Country,Account_C
reated,Last_Login,Latitude,Longitude

An output of mapper is again a key-value pair which is outputted

using 'collect()' method of 'OutputCollector'.

Explanation of SalesCountryReducer Class

In this section, we will understand the implementation
of SalesCountryReducer class.

1. We begin by specifying a name of the package for our class. SalesCountry is a

name of out package. Please note that output of
compilation, SalesCountryReducer.class will go into a directory named by this
package name: SalesCountry.

Followed by this, we import library packages.

Below snapshot shows an implementation of SalesCountryReducer class-

Code Explanation:

1. SalesCountryReducer Class Definition-

public class SalesCountryReducer extends MapReduceBase implements

Reducer<Text, IntWritable, Text, IntWritable> {

Here, the first two data types, 'Text' and 'IntWritable' are data type of input key-
value to the reducer.

Output of mapper is in the form of <CountryName1, 1>, <CountryName2, 1>. This

output of mapper becomes input to the reducer. So, to align with its data
type, Text and IntWritable are used as data type here.

The last two data types, 'Text' and 'IntWritable' are data type of output generated
by reducer in the form of key-value pair.

Every reducer class must be extended from MapReduceBase class and it must

implement Reducer interface.

2. Defining 'reduce' function-

public void reduce( Text t_key,
Iterator<IntWritable> values,
OutputCollector<Text,IntWritable> output,
Reporter reporter) throws IOException {
An input to the reduce() method is a key with a list of multiple values.

For example, in our case, it will be-

<United Arab Emirates, 1>, <United Arab Emirates, 1>, <United Arab Emirates,
1>,<United Arab Emirates, 1>, <United Arab Emirates, 1>, <United Arab Emirates,
1>.

This is given to reducer as <United Arab Emirates, {1,1,1,1,1,1}>

So, to accept arguments of this form, first two data types are used,
viz., Text and Iterator<IntWritable>. Text is a data type of key
and Iterator<IntWritable> is a data type for list of values for that key.

The next argument is of type OutputCollector<Text,IntWritable> which collects the

output of reducer phase.

reduce() method begins by copying key value and initializing frequency count to 0.

Text key = t_key;

int frequencyForCountry = 0;

Then, using 'while' loop, we iterate through the list of values associated with the key
and calculate the final frequency by summing up all the values.

while (values.hasNext()) {
// replace type of value with the actual type of our value
IntWritable value = (IntWritable) values.next();
frequencyForCountry += value.get();

}
Now, we push the result to the output collector in the form of key and
obtained frequency count.

Below code does this-

output.collect(key, new IntWritable(frequencyForCountry));

Explanation of SalesCountryDriver Class

In this section, we will understand the implementation of SalesCountryDriver class
1. We begin by specifying a name of package for our class. SalesCountry is a name
of out package. Please note that output of
compilation, SalesCountryDriver.class will go into directory named by this package
name: SalesCountry.

Here is a line specifying package name followed by code to import library packages.

2. Define a driver class which will create a new client job, configuration object and
advertise Mapper and Reducer classes.

The driver class is responsible for setting our MapReduce job to run in Hadoop. In
this class, we specify job name, data type of input/output and names of mapper
and reducer classes.

3. In below code snippet, we set input and output directories which are used to
consume input dataset and produce output, respectively.

arg[0] and arg[1] are the command-line arguments passed with a command given

in MapReduce hands-on, i.e.,

$HADOOP_HOME/bin/hadoop jar ProductSalePerCountry.jar

/inputMapReduce /mapreduce_output_sales
4. Trigger our job

Below code start execution of MapReduce job-

try {
// Run the job
JobClient.runJob(job_conf);
} catch (Exception e) {
e.printStackTrace();
}

Python: Learn Python in 24 Hours
From Everand
Python: Learn Python in 24 Hours
Alex Nordeen
4/5 (12)
DSBDA 14
No ratings yet
DSBDA 14
16 pages
SalesData Map Reduce
No ratings yet
SalesData Map Reduce
3 pages
12.LogFileMapReduce
No ratings yet
12.LogFileMapReduce
2 pages
Assignment 2 Write-up
No ratings yet
Assignment 2 Write-up
7 pages
Exp-12
No ratings yet
Exp-12
7 pages
Bda Unit-Iii
No ratings yet
Bda Unit-Iii
42 pages
Experiment 13 PP
No ratings yet
Experiment 13 PP
3 pages
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
No ratings yet
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
12 pages
18mcs35e U4
No ratings yet
18mcs35e U4
7 pages
Core Java Programming Book
From Everand
Core Java Programming Book
Manish Soni
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Understanding Software Engineering Vol 3: Programming Basic Software Functionalities.
From Everand
Understanding Software Engineering Vol 3: Programming Basic Software Functionalities.
Gabriel Clemente
No ratings yet
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
Introduction to PHP, Part 4, Second Edition
From Everand
Introduction to PHP, Part 4, Second Edition
Adam Majczak
No ratings yet
Lecture 04
No ratings yet
Lecture 04
25 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Mapreduce Introduction
No ratings yet
Mapreduce Introduction
14 pages
MapReduce - Notes
No ratings yet
MapReduce - Notes
17 pages
Introduction to PHP, Part 5, Second Edition
From Everand
Introduction to PHP, Part 5, Second Edition
Adam Majczak
No ratings yet
User Instructions Hadoop Project
No ratings yet
User Instructions Hadoop Project
2 pages
Stock Analysis
No ratings yet
Stock Analysis
3 pages
Big Data Analytics Report (1) (1)
No ratings yet
Big Data Analytics Report (1) (1)
30 pages
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Bda Ex 05 06 07 (1)_removed
No ratings yet
Bda Ex 05 06 07 (1)_removed
7 pages
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
From Everand
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
Kanto
No ratings yet
BDA Lab 8 Manual
No ratings yet
BDA Lab 8 Manual
7 pages
Java Programming Tutorial With Screen Shots & Many Code Example
From Everand
Java Programming Tutorial With Screen Shots & Many Code Example
Desmond Ohwofosirai
No ratings yet
unit 2
No ratings yet
unit 2
12 pages
NgRx SignalStore: An effortless solution for state management
From Everand
NgRx SignalStore: An effortless solution for state management
Abdelfattah Ragab
No ratings yet
Execute Java Map Reduce Sample Using Eclipse
No ratings yet
Execute Java Map Reduce Sample Using Eclipse
9 pages
Problems on Relational Algebra
No ratings yet
Problems on Relational Algebra
12 pages
Big data
No ratings yet
Big data
8 pages
Map Reduce
No ratings yet
Map Reduce
46 pages
Assignment 1 - Ue21cs343ab2 - Big Data
No ratings yet
Assignment 1 - Ue21cs343ab2 - Big Data
8 pages
Mapreduce Notes (1)
No ratings yet
Mapreduce Notes (1)
4 pages
Software Design Simplified
From Everand
Software Design Simplified
Liviu Catalin Dorobantu
No ratings yet
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
C Programming
From Everand
C Programming
Netra
No ratings yet
Big Data Mapreduce and Streaming
No ratings yet
Big Data Mapreduce and Streaming
10 pages
Unit IV BDA
No ratings yet
Unit IV BDA
32 pages
Lecture 11
No ratings yet
Lecture 11
17 pages
What is Map Reduce Programming Model_ Explain.
No ratings yet
What is Map Reduce Programming Model_ Explain.
3 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Big Data Fundamentals and Platforms Assginment 3
No ratings yet
Big Data Fundamentals and Platforms Assginment 3
6 pages
Java CustomWritables
No ratings yet
Java CustomWritables
6 pages
Python-Deprecated Library v1.1 Documentation
From Everand
Python-Deprecated Library v1.1 Documentation
Laurent LAPORTE
No ratings yet
Unit 2
No ratings yet
Unit 2
7 pages
MapReduce Example
No ratings yet
MapReduce Example
3 pages
C++ Functions and tutorial
From Everand
C++ Functions and tutorial
Nino Paiotta
No ratings yet
Short Programs
No ratings yet
Short Programs
41 pages
Big Data Manual
No ratings yet
Big Data Manual
82 pages
L4
No ratings yet
L4
65 pages
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
Project 1
No ratings yet
Project 1
4 pages
Spring Boot Intermediate Microservices: Resilient Microservices with Spring Boot 2 and Spring Cloud
From Everand
Spring Boot Intermediate Microservices: Resilient Microservices with Spring Boot 2 and Spring Cloud
Jens Boje
No ratings yet
Pyqt6 101: A Beginner’s Guide to PyQt6
From Everand
Pyqt6 101: A Beginner’s Guide to PyQt6
Edward Chang
No ratings yet
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
Listing Files in HDFS: Step 1
No ratings yet
Listing Files in HDFS: Step 1
2 pages
Self Intro
No ratings yet
Self Intro
1 page
Big Data - ASSIGNMENT 3
No ratings yet
Big Data - ASSIGNMENT 3
2 pages
Organization Structure Training AT XYZ LTD.: Rajaram
No ratings yet
Organization Structure Training AT XYZ LTD.: Rajaram
18 pages
Hana Select
No ratings yet
Hana Select
20 pages
Computer Graphics Through Opengl: From Theory To Experiments Experiments Chapter 2
No ratings yet
Computer Graphics Through Opengl: From Theory To Experiments Experiments Chapter 2
26 pages
S4H - 406 BDC Questionnaire - Manufacturing
No ratings yet
S4H - 406 BDC Questionnaire - Manufacturing
78 pages
ASDA-A2
No ratings yet
ASDA-A2
753 pages
Notes On Writing Code
100% (1)
Notes On Writing Code
16 pages
Class 10 Syllabus 2024
No ratings yet
Class 10 Syllabus 2024
2 pages
B.sc. Computer Science
No ratings yet
B.sc. Computer Science
84 pages
Jputs STH Gy Fiyf Fofk : Syllabus
No ratings yet
Jputs STH Gy Fiyf Fofk : Syllabus
23 pages
Telesign - Final (English) - 2
0% (1)
Telesign - Final (English) - 2
12 pages
Safety Summary Electrical Wiring: SP1 Sp2 Sp3
No ratings yet
Safety Summary Electrical Wiring: SP1 Sp2 Sp3
2 pages
Fractal Audio Blocks Guide
No ratings yet
Fractal Audio Blocks Guide
100 pages
COS1512 201 - 2015 - 1 - B
No ratings yet
COS1512 201 - 2015 - 1 - B
15 pages
for stud
No ratings yet
for stud
51 pages
Class X Computer Project
No ratings yet
Class X Computer Project
5 pages
Cfx12 11 Cel Mod
No ratings yet
Cfx12 11 Cel Mod
19 pages
Unit 2 - OOP
No ratings yet
Unit 2 - OOP
22 pages
Error Messages in E3d Isodraft
No ratings yet
Error Messages in E3d Isodraft
16 pages
NN Tool Example
No ratings yet
NN Tool Example
3 pages
ENGG1003 Lab09 PythonDataProcessing
No ratings yet
ENGG1003 Lab09 PythonDataProcessing
32 pages
Tibero v5.0 SP1 Application Developer's Guide v2.1.1 en
No ratings yet
Tibero v5.0 SP1 Application Developer's Guide v2.1.1 en
92 pages
From Simulated To Real Environments: How To Use Sesam For Software Development
No ratings yet
From Simulated To Real Environments: How To Use Sesam For Software Development
12 pages
Sympy Tutorial
No ratings yet
Sympy Tutorial
68 pages
Ultimate Guide To Tensorflow 2.0 in Python
No ratings yet
Ultimate Guide To Tensorflow 2.0 in Python
23 pages
Final Exam Sem 1 - 2
No ratings yet
Final Exam Sem 1 - 2
15 pages
ADT-HC6500 User's Mannual
No ratings yet
ADT-HC6500 User's Mannual
112 pages
Object-Oriented Approach to Programming Logic and Design 4th Edition Joyce Farrell Solutions Manual download
100% (1)
Object-Oriented Approach to Programming Logic and Design 4th Edition Joyce Farrell Solutions Manual download
45 pages
TeleTek CA60Plus
No ratings yet
TeleTek CA60Plus
44 pages
OOPS Abap
No ratings yet
OOPS Abap
188 pages
Sinamics Bop 2 en Us
No ratings yet
Sinamics Bop 2 en Us
48 pages
Activity Diagram For IWP.
No ratings yet
Activity Diagram For IWP.
17 pages

Uploaded by

Uploaded by

Assignment - 2

Q2. Perform following tasks on Big data platform such as Hadoop:

1. Run Map and Reduce codes

1. Run Map and Reduce codes

First Hadoop MapReduce Program

sudo mkdir MapReduceTutorial

sudo chmod -R 777 MapReduceTutorial

public void map(LongWritable key, Text value, OutputCollector <Text,

String valueString = value.toString();

public class SalesCountryReducer extends MapReduceBase implements

public void reduce(Text t_key, Iterator<IntWritable> values,

public class SalesCountryDriver {

// Set a name of the Job

// Specify data type of output key and value

// Specify names of Mapper and Reducer Class

// Specify formats of the data type of Input and output

// Set input and output directories using command line arguments,

FileInputFormat.setInputPaths(job_conf, new Path(args[0]));

heck the file permissions of all these files

and if 'read' permissions are missing then grant the same-

javac -d . SalesMapper.java SalesCountryReducer.java SalesCountryDriver.java

This warning can be safely ignored.

sudo gedit Manifest.txt

SalesCountry.SalesCountryDriver is the name of main class. Please note that you

jar cfm ProductSalePerCountry.jar Manifest.txt SalesCountry/*.class

Check that the jar file is created

Now Use below command to copy ~/inputMapReduce to HDFS.

$HADOOP_HOME/bin/hdfs dfs -copyFromLocal ~/inputMapReduce /

We can safely ignore this warning.

Verify whether a file is actually copied or not.

$HADOOP_HOME/bin/hdfs dfs -ls /inputMapReduce

$HADOOP_HOME/bin/hadoop jar ProductSalePerCountry.jar /inputMapReduce

This will create an output directory named mapreduce_output_sales on HDFS.

$HADOOP_HOME/bin/hdfs dfs -cat /mapreduce_output_sales/part-00000

Open r in a web browser.

Now select 'Browse the filesystem' and navigate to /mapreduce_output_sales

Explanation of SalesMapper Class

Followed by this, we import library packages.

Below snapshot shows an implementation of SalesMapper class-

Sample Code Explanation:

1. SalesMapper Class Definition-

public class SalesMapper extends MapReduceBase implements

Every mapper class must be extended from MapReduceBase class and it must

2. Defining 'map' function-

public void map(LongWritable key,

'map()' method begins by splitting input text which is received as an argument. It

String valueString = value.toString();

After this, a pair is formed using a record at 7th index of

output.collect(new Text(SingleCountryData[7]), one);

We are choosing record at 7th index because we need Country data and it is

An output of mapper is again a key-value pair which is outputted

Explanation of SalesCountryReducer Class

1. We begin by specifying a name of the package for our class. SalesCountry is a

Followed by this, we import library packages.

Below snapshot shows an implementation of SalesCountryReducer class-

1. SalesCountryReducer Class Definition-

public class SalesCountryReducer extends MapReduceBase implements

Output of mapper is in the form of <CountryName1, 1>, <CountryName2, 1>. This

Every reducer class must be extended from MapReduceBase class and it must

2. Defining 'reduce' function-

For example, in our case, it will be-

This is given to reducer as <United Arab Emirates, {1,1,1,1,1,1}>

The next argument is of type OutputCollector<Text,IntWritable> which collects the

reduce() method begins by copying key value and initializing frequency count to 0.

Text key = t_key;

Below code does this-

output.collect(key, new IntWritable(frequencyForCountry));

Explanation of SalesCountryDriver Class

arg[0] and arg[1] are the command-line arguments passed with a command given

$HADOOP_HOME/bin/hadoop jar ProductSalePerCountry.jar

Below code start execution of MapReduce job-

You might also like