0% found this document useful (0 votes)

68 views

Apache Pig - A Data Flow Framework Based On Hadoop Map Reduce

This document discusses Apache Pig, a data flow framework for Hadoop. Pig runs on Hadoop and uses HDFS and MapReduce. It has two main components: Pig Latin, a parallel data flow language similar to SQL; and a runtime environment that executes Pig Latin programs. Pig Latin scripts define data reading, processing, and storage operations as directed acyclic graphs of operators and data flows. This allows for parallel execution on large datasets.

Uploaded by

SYED IBRAHIM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views

Apache Pig - A Data Flow Framework Based On Hadoop Map Reduce

Uploaded by

SYED IBRAHIM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/321537152

Apache Pig - A Data Flow Framework Based on Hadoop Map Reduce

Article · August 2017

DOI: 10.14445/22315381/IJETT-V50P244

CITATIONS READS

0 1,939

2 authors:

Cdsa Swa Zahid Ansari

Jefferson College of Health Science P.A. College of Engineering
1 PUBLICATION 0 CITATIONS 31 PUBLICATIONS 214 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

HPC and soft computing View project

All content following this page was uploaded by Zahid Ansari on 17 May 2018.

The user has requested enhancement of the downloaded file.

International Journal of Engineering Trends and Technology (IJETT) – Volume 50 Number 5 August 2017

Apache Pig - A Data Flow Framework Based

on Hadoop Map Reduce
Swarna C#1, Zahid Ansari*2
#
Department of Computer Science and Engineering, P.A. College of Engineering, Mangaluru, India
*
Department of Computer Science and Engineering, P.A. College of Engineering, Mangaluru, India
Abstract — Big Data is a technology phenomenon files without any schema information. Pig Latin also
happened due to the increased rate of data growth, comes with a novel debugging environment that is
complex new data types and parallel advancements in particularly useful when dealing with massive data
technology stake. Big data can be structured, unstructured sets.
or semi-structured, resulting in ineffectiveness of
conventional data management methods. Hadoop is a II. PIG COMPONENTS AND ARCHITECTURE
framework for the analysis and transformation of very
large data sets using the Map Reduce paradigm. An Pig is an apache open source project .It is an
important characteristic of Hadoop is the splitting of data engine for executing parallel data flows on Hadoop.
and computation across thousands of hosts and running It runs on Hadoop by making use of both HDFS and
applications in parallel close to their data. Hadoop Map Reduce which are the two components of
accomplish this by HDFS and Map Reduce. Pig is an Hadoop [3]. Pig was initially developed at Yahoo.
apache open source project. It runs on Hadoop by making The Pig programming language is designed to
use of both HDFS and Map Reduce. There are two main handle any type of data that is reasonable. Pig is
components for Pig. First component Pig Latin is the
made up of two components: the first component is
parallel dataflow language which is designed in such a
way to fit between the SQL and the Map Reduce. Pig Latin
the Pig Latin-which is the language, and the second
enables the use to define the reading, processing, storing is the runtime environment where Pig Latin
the data in parallel. Pig Latin script explicates a directed programs are executed [4]. Fig. 1 shows the
acyclic graph, where data flows are represented as edges components of Pig.
and operators are represented as nodes. The second
component is the run time environment in which Pig Latin
programs are executed.

Keywords — Big Data, Hadoop, Map Reduce, Pig, Pig

Latin.

I. INTRODUCTION
The term ‘Big Data’ describes inventive
techniques and technologies to capture, store,
distribute, manage and analyse petabyte or larger-
sized datasets with high-velocity and different Fig 1: Components of Pig
structures [1]. Hadoop is open-source software that Figure 1 also describes the various steps during
enables reliable, scalable, distributed computing on the execution. The data is loaded from HDFS and it
clusters of less expensive servers [2]. In 2004 is then converted to many map and reduce tasks.
Google has invented a frame work called Map Lastly the output is either stored in to a file or
Reduce which is mainly used for parallel data dumped to screen.
processing in a distributed computing environment.
But the Map Reduce is too low level and rigid. it has
many drawbacks like writing low level Map Reduce
code is slow, need a lot of expertise to optimize
Map Reduce code, prototyping is slow, a lot of
custom code required even for simple tasks and it is
hard to manage more complex map reduce job
chains. So a new language called Pig Latin was
developed which is a high level declarative query
language like SQL and a low level procedural
programming like Map Reduce.
Pig Latin is implemented on Pig which is open
source software which run on Hadoop. Pig Latin’s
main features include support for an adaptable
nested data model, extensive support for user
defined functions, and the ability to operate on input Fig 2: Pig Architecture

ISSN: 2231-5381 http://www.ijettjournal.org Page 271

International Journal of Engineering Trends and Technology (IJETT) – Volume 50 Number 5 August 2017

Fig. 2 describes Pig Architecture. Grunt is the TABLE I

interactive shell for the users to enter Pig Latin. SCALAR DATA TYPES
Parser converts Pig Latin in to Logical Plan, which Scalar Description Example
is further optimized by the optimizer. Compiler Data type
converts it in to a series of map reduce jobs. These
jobs are executed by the execution engine. Pig Four-byte signed
int
integer. 12
allows three modes of user interaction [7]:
 Interactive mode: Here, the user is entering
long Eight-byte signed
Pig Commands with an interactive shell integer.
80000L
which is known as Grunt. When the user
asks for output through the STORE float Four byte floating-point
6.2f or 6.2e2f
command plan compilation and execution number.
is triggered.
 Batch mode: In this mode, a user submits a double Eight byte floating 2.718 or
point. 6.626e-34.
prewritten script containing a group of Pig
commands, typically finishing with STORE. A string or character
The semantics are identical to interactive chararray array.
Hello
mode.
 Embedded mode: Pig Latin Commands can bytearray A blob or array of bytes.
be submitted through method invocation .
from a java program. For this a Java library
is provided by Pig. Through this option
dynamic construction of Pig Latin Pig’s three complex data types are: maps, tuples,
programs and dynamic control flow can be and bags. All of these types can contain data of any
achieved. e.g. looping for a non- type, including other complex types. So it is possible
predetermined number of iterations, which to have a map where the value field is a bag, which
is not currently supported in Pig Latin contains a tuple where one of the fields is a map.
Table II describes complex data types.
directly.
TABLE II
III. PIG LATIN COMPLEX DATA TYPES
Through this section the details of Pig Latin Complex Description Example
language is described. We describe Pig data model Data type
in Section A, and the Pig Latin statements in the tuple An ordered set of fields (1,`alice‘ )
subsequent subsections. Pig Latin has the
following key properties [15]:
bag A collection of tuples { (1,`alice’),(2)}

 Ease of programming: Complex tasks

A map is a collection of
comprised of multiple interrelated data map data items, where each [‘a’#’pomegranate’]
transformations are explicitly encoded as item has an associated
data flow sequences, making them easy to key.
write, understand, and maintain.
 Optimization opportunities. The tasks are TABLE III
encoded to permit the system to DIAGNOSTIC OPERATORS
automatically optimize their execution. It Operator Description
allows the user to focus on semantics rather
than efficiency.
 Extensibility: Users can create their own Describe Returns the schema of the relation
functions to do special-purpose processing.
Dump Dumps the results to the screen
A. Data Model
Data in Pig Latin is categorized into two types [16]. Explain Displays execution plans.
Scalar and complex data type. Pig’s scalar types are
similar to the data types that appear in most Displays a step-by-step execution of a
programming languages. With the exception of Illustrate
sequence of statements
bytearray, they are all represented in Pig interfaces
by java.lang classes, making them easy to work with
in UDFs: Table I describes the scalar data types .

ISSN: 2231-5381 http://www.ijettjournal.org Page 272

International Journal of Engineering Trends and Technology (IJETT) – Volume 50 Number 5 August 2017

B. Pig diagnostic operators Our current implementation uses Hadoop, an open-

Pig Latin provides four different types of source, scalable implementation of map-reduce [2],
diagnostic operators: Describe Dump, Explain and as the execution platform. Pig Latin programs are
Illustrate. Describe, Explain and Illustrate are compiled into map-reduce jobs, and executed using
provided to allow the operator to work together with Hadoop. Pig, together with its Hadoop compiler, is
the logical plan, for debugging purposes. The Dump an open-source project implemented by Apache and
is a sort of diagnostic operator too because it is used it is available for general use [11].
only to permit interactive debugging of small result
sets or in combination with Limit. Table III will give A. Building a Logical Plan
a brief description about these operators. As clients issue Pig Latin commands, the Pig
interpreter first parses it, and verifies that the input
TABLE IV files and bags referenced by the command are valid.
PIG COMMANDS For example, if the user enters p = COGROUP q
BY:: :, r BY :: :, Pig verifies whether the bags q and
r have already been defined. Pig builds a logical plan
Command Description for the bags q and r user defines. When a new bag
Load Read data from the file system is defined by a command, the logical plan for the
Store Write data to the file system new bag is constructed by combining the logical
plans for the input bags, and the current command.
Dump Write output to stdout Thus, in the above example, the logical plan for p
Foreach Apply expression to each record and consists of a cogroup command having the logical
Generate generate one or more records plans for q and r as inputs. During the construction
of logical plan processing is not carried out.
Apply predicate to each record and remove
Filter
records where false Processing is activated when the user invokes a
STORE command on a bag. While processing is
Group / Collect records with the same key from one activated, the logical plan for that bag is compiled
Cogroup or more inputs
into a physical plan, and is executed. This is
Join Join two or more inputs based on a key illustrated in figure 3.This lazy style of execution is
Order Sort records based on a Key beneficial because it permits in-memory pipelining,
and other optimizations such as filter reordering
Distinct Remove duplicate records across multiple Pig Latin commands.
Union Merge two datasets
Pig is designed in such a way that the parsing of
Pig Latin and the logical plan construction is
Limit Limit the number of records independent of the execution platform. The
Split data into 2 or more sets, based on filter
compilation of the logical plan into a physical plan
Split depends on which specific execution platform is
conditions
chosen. Next, we describe the compilation into
Creates the cross product of two or more
Cross
relations
Hadoop map-reduce, the execution platform
currently used by Pig.
C. Pig Commands
Apache pig present different built in data processing
operators. For input/output processing Load or Store Transform Transform
commands are used. For filtering data, For each, Pig Latin Logical Physical
Generate and Stream commands are used. There are Program Plan Plan
also commands for grouping and joining data.
Important data processing commands are described
in Table IV. Execute
Output Map
IV. IMPLEMENTATION Reduce
Pig Latin is fully implemented by the system, Pig. Plan
Pig’s architecture allows different systems to be
plugged in as the execution platform for Pig Latin.
Figure 3: Pig Latin Workflow

ISSN: 2231-5381 http://www.ijettjournal.org Page 273

International Journal of Engineering Trends and Technology (IJETT) – Volume 50 Number 5 August 2017

map1 mapi reducei map i+1 reduce i+1

load filter group cogroup cogroup

C1 Ci C1+1

load

Fig 4: Pig Latin Map Reduce Compilation

samples the input to find out quantiles of the sort

B. Map-Reduce Plan Compilation key. The second job range partitions the input
Compilation of a Pig Latin logical plan into map- according to the quantiles, which follows a local sort
reduce jobs is straight forward. The map-reduce task in the reduce phase, finally resulting in a globally
fundamentally provides the capacity to do a large- sorted file.
scale group by, where the map tasks assign keys for The inflexibility of the map-reduce primitive
grouping, and the reduce tasks process a group at a causes some overheads while compiling Pig Latin
time. Pig compiler begins by converting each into map-reduce jobs. For example, data must be
(CO)GROUP command in the logical plan into a materialized and replicated on the distributed file
distinct map-reduce job with its own map and reduce system between successive map-reduce jobs. While
functions. The map function for (CO)GROUP dealing with multiple data sets, an additional field
command p first assigns keys to tuples based on the must be added in every tuple to indicate the origin of
BY clause(s) of p. The reduce function has no data set. Since the Hadoop map-reduce
operation initially. The map-reduce boundary is the implementation does provide many desired
cogroup command. The sequence of FILTER, and properties such as parallelism, load balancing, and
FOREACH commands from the LOAD to the first fault-tolerance, the associated overhead is often
COGROUP operation C1, are pushed into the map acceptable.
function corresponding to C1 (see Figure 4). The
commands that lies between subsequent COGROUP V. APPLICATIONS
commands Ci and Ci+1 can be pushed into either Some of the important uses of Pig are described
 The reduce function corresponding to Ci, or below:
 The map function corresponding to Ci+1.  Pig is a powerful tool for querying data in a
Pig currently always follows option (a). Since Hadoop cluster. It's so powerful that Yahoo
grouping is often followed by aggregation, this estimates that between 40% and 60% of its
approach reduces the amount of data that has to be Hadoop workloads are generated from Pig Latin
materialized between map reduce jobs. scripts.[23]
If the COGROUP command consists of more than  Pig is also used at Twitter (processing logs,
one input data set, the map function appends an extra mining tweet data); at AOL and MapQuest (for
field to each tuple that indicates the data set from analytics and batch data processing); and at
which the tuple originated. The corresponding LinkedIn, where Pig is used to discover people
reduce function decodes this information. The you might know.[23]
decoded information is used to insert the tuple into
 With continually increasing population, crimes
the appropriate nested bag when cogrouped tuples and crime rate analyzing related data is a huge
are generated. issue for governments to make strategic
Parallelism for LOAD is obtained as Pig operates decisions so as to maintain law and order. The
over files which reside in the Hadoop distributed file benefit of using Pig for analysis is that fewer
system. Parallelism for FILTER and FOREACH lines of code have to be written which reduces
operations are also achieved since for a given map- overall development and testing time [20].
reduce job, several map and reduce instances are
 Using pig script a large scale data processing
running in parallel. We also get Parallelism for
system for analyzing web log data through Map
(CO)GROUP since the output from the multiple map
Reduce programming in Hadoop framework is
instances is repartitioned in parallel to the multiple
efficient[21].
reduce instances.
While implementing the ORDER command is
two map-reduce jobs are compiled. The first job

ISSN: 2231-5381 http://www.ijettjournal.org Page 274

International Journal of Engineering Trends and Technology (IJETT) – Volume 50 Number 5 August 2017

 Pig is used to evaluate the performance of a [15] Agarwal, Shafali, and Zeba Khanam. "Map Reduce: A
Survey Paper on Recent Expansion." International Journal
commercial RDBMS and Hadoop in astronomy of Advanced Computer Science and Applications 6.8 (2015):
simulation analysis tasks [22]. 209-215.
[16] Olshannikova, Ekaterina, et al. "Conceptualizing Big Social
Data." Journal of Big Data 4.1 (2017): 3.
[17] Tom White foreword by Doug Cutting; ―Hadoop: The
V. CONCLUSIONS Definitive Guide‖; ISBN: 978-1-449-38973-4 [SB]
This paper introduced the concept of Pig and its 1285179414.
associated language Pig Latin which is a new data [18] Bhardwaj, Vibha, Rahul Johari, and Priti Bhardwaj. "Query
execution evaluation in wireless network using MyHadoop."
processing environment deployed at Yahoo. We Reliability, Infocom Technologies and Optimization
have entered an era of Big Data and Hadoop is a (ICRITO)(Trends and Future Directions), 2015 4th
framework for the analysis and transformation of International Conference on. IEEE, 2015.
this Big data using the Map Reduce paradigm. The [19] Tanimura, Yusuke, et al. "Extensions to the Pig data
processing platform for scalable RDF data processing using
Pig system compiles Pig Latin expressions into a Hadoop." Data Engineering Workshops (ICDEW), 2010
sequence of map-reduce jobs, and orchestrates the IEEE 26th International Conference on. IEEE, 2010.
execution of these jobs on Hadoop. Pig structure is [20] Arushi Jaina, Vishal Bhatnagara Ambedkar” Crime Data
susceptible to substantial parallelization. Analysis Using Pig with Hadoop”, International
Conference on Information Security &Privacy (ICISP2015),
11-12 December 2015
REFERENCES [21] Prasad, PS Durga, T. Vivekanandan, and A. Srinivasan. "A
Methodology for WebLog Data analysis using
HadoopMapReduce and PIG." i-manager's Journal on
[1] Bhosale, Harshawardhan S., and Devendra P. Gadekar. "A Cloud Computing 3.1 (2015): 13.
Review Paper on Big Data and Hadoop." International
[22] Loebman, Sarah, et al. "Analyzing massive astrophysical
Journal of Scientific and Research Publications 4.10 (2014):
datasets: Can Pig/Hadoop or a relational DBMS help?."
[2] Chavan, Ms Vibhavari, and Rajesh N. Phursule. "Survey Cluster Computing and Workshops, 2009. CLUSTER'09.
paper on big data." Int. J. Comput. Sci. Inf. Technol 5.6
IEEE International Conference on. IEEE, 2009.
(2014): 7932-7939.
[23] www.wikepedia.org 12/04/2017 at 8:30 pm
[3] Samak, Taghrid, Daniel Gunter, and Valerie Hendrix.
"Scalable analysis of network measurements with Hadoop
and Pig." Network Operations and Management Symposium
(NOMS), 2012 IEEE. IEEE, 2012.
[4] Goyal, Vikas, and Deepak Soni. "SURVEY PAPER ON
BIG DATA ANALYTICS USING HADOOP
TECHNOLOGIES."
[5] Wang, MingXue, Sidath B. Handurukande, and Mohamed
Nassar. "RPig: A scalable framework for machine learning
and advanced statistical functionalities." Cloud Computing
Technology and Science (CloudCom), 2012 IEEE 4th
International Conference on. IEEE, 2012.
[6] Ouaknine, Keren, Michael Carey, and Scott Kirkpatrick.
"The PigMix Benchmark on Pig, MapReduce, and HPCC
Systems." Big Data (BigData congress), 2015 IEEE
International Congress on. IEEE, 2015.
[7] Samak, Taghrid, Daniel Gunter, and Valerie Hendrix.
"Scalable analysis of network measurements with Hadoop
and Pig." Network Operations and Management Symposium
(NOMS), 2012 IEEE. IEEE, 2012.
[8] Gates, Alan F., et al. "Building a high-level dataflow system
on top of Map-Reduce: the Pig experience." Proceedings of
the VLDB Endowment 2.2 (2009): 1414-1425.
[9] Adnan, Muhammad, et al. "Minimizing big data problems
using cloud computing based on Hadoop architecture."
High-capacity Optical Networks and Emerging/Enabling
Technologies (HONET), 2014 11th Annual. IEEE, 2014.
[10] Shang, Weiyi, Bram Adams, and Ahmed E. Hassan. "Using
Pig as a data preparation language for large-scale mining
software repositories studies: An experience report."
Journal of Systems and Software 85.10 (2012): 2195-2204.
[11] Shvachko, Konstantin, et al. "The hadoop distributed file
system." Mass storage systems and technologies (MSST),
2010 IEEE 26th symposium on. IEEE, 2010.
[12] Olston, Christopher, et al. "Pig latin: a not-so-foreign
language for data processing." Proceedings of the 2008
ACM SIGMOD international conference on Management of
data. ACM, 2008.
[13] Shvachko, Konstantin, et al. "The hadoop distributed file
system." Mass storage systems and technologies (MSST),
2010 IEEE 26th symposium on. IEEE, 2010.
[14] Wang, Yaoguang, et al. "Improving MapReduce
performance with partial speculative execution." Journal of
Grid Computing 13.4 (2015): 587-604.

ISSN: 2231-5381 http://www.ijettjournal.org Page 275

View publication stats

Full download Programming Principles and Practice Using C 3rd Edition Stroustrup pdf docx
33% (3)
Full download Programming Principles and Practice Using C 3rd Edition Stroustrup pdf docx
55 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
40 Rabbana Dua Quran Transliteration
0% (1)
40 Rabbana Dua Quran Transliteration
7 pages
Overview Direct Updates For End Customers Worev PDF
No ratings yet
Overview Direct Updates For End Customers Worev PDF
3 pages
CICS Tutorial - HCL - Vijayanand M - 40145533
100% (3)
CICS Tutorial - HCL - Vijayanand M - 40145533
196 pages
Unit 4 Bba
No ratings yet
Unit 4 Bba
10 pages
BDA_UNIT_IV_NOTES (1)
No ratings yet
BDA_UNIT_IV_NOTES (1)
32 pages
Pig and Pig Latin
No ratings yet
Pig and Pig Latin
16 pages
Course On: Big Data Analytics
No ratings yet
Course On: Big Data Analytics
52 pages
unit-4-apachepig-210825041412
No ratings yet
unit-4-apachepig-210825041412
16 pages
Big Data Notes Pig
No ratings yet
Big Data Notes Pig
38 pages
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
No ratings yet
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
9 pages
BDA unit5
No ratings yet
BDA unit5
36 pages
Scet Unit 5
No ratings yet
Scet Unit 5
9 pages
Pig
No ratings yet
Pig
6 pages
Unit IV
No ratings yet
Unit IV
36 pages
Unit-V Pig Programming
No ratings yet
Unit-V Pig Programming
123 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
Pig
No ratings yet
Pig
59 pages
BDA_UNIT-4-PIG-Notes
No ratings yet
BDA_UNIT-4-PIG-Notes
9 pages
Unit No. 8
No ratings yet
Unit No. 8
24 pages
Unit 5
No ratings yet
Unit 5
39 pages
Notes - 5 Unit Big Data
No ratings yet
Notes - 5 Unit Big Data
22 pages
Unit 4
No ratings yet
Unit 4
29 pages
Pig
No ratings yet
Pig
61 pages
pig
No ratings yet
pig
23 pages
UNIT 5 Complete Notes
No ratings yet
UNIT 5 Complete Notes
21 pages
4 Hadoop Ecosystem
No ratings yet
4 Hadoop Ecosystem
16 pages
BDA-Unit 5-notes
No ratings yet
BDA-Unit 5-notes
36 pages
PIG
No ratings yet
PIG
5 pages
Unit 5
No ratings yet
Unit 5
76 pages
Big_Data_Unit-5
No ratings yet
Big_Data_Unit-5
81 pages
5 PIG and HIVE
No ratings yet
5 PIG and HIVE
81 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
PIG
No ratings yet
PIG
9 pages
Apache Pig Handy Notes Lab
No ratings yet
Apache Pig Handy Notes Lab
11 pages
Apache Pig: Pig Is The Abstraction Over Mapreduce
No ratings yet
Apache Pig: Pig Is The Abstraction Over Mapreduce
4 pages
IntroductiontoApachePigByAGurucharan
No ratings yet
IntroductiontoApachePigByAGurucharan
9 pages
Notes Unit 5 Bigdata
No ratings yet
Notes Unit 5 Bigdata
21 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
Unit Iv Part - 2
No ratings yet
Unit Iv Part - 2
59 pages
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
No ratings yet
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
58 pages
BD 5
No ratings yet
BD 5
28 pages
notes of aktu btech 3 yr big data
No ratings yet
notes of aktu btech 3 yr big data
15 pages
Hadoop Pig
No ratings yet
Hadoop Pig
111 pages
Pig Full Lecture
No ratings yet
Pig Full Lecture
38 pages
Apache Pig
No ratings yet
Apache Pig
21 pages
BDA - Unit-4 Part 1
No ratings yet
BDA - Unit-4 Part 1
47 pages
unit-4_SGS
No ratings yet
unit-4_SGS
13 pages
Unit 4
No ratings yet
Unit 4
20 pages
PIG: A Big Data Processor: Tushar B. Kute
No ratings yet
PIG: A Big Data Processor: Tushar B. Kute
50 pages
UNIT-5
No ratings yet
UNIT-5
24 pages
BDA_HIVE & PIG-Other Notes in Detail
No ratings yet
BDA_HIVE & PIG-Other Notes in Detail
162 pages
BDA Module 4 - Part 1 (Pig) 2023
No ratings yet
BDA Module 4 - Part 1 (Pig) 2023
34 pages
Apache PIG.pptx
No ratings yet
Apache PIG.pptx
41 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
KCS 061 - Big Data - Unit V
No ratings yet
KCS 061 - Big Data - Unit V
17 pages
Apache Pig in noSql Databases
No ratings yet
Apache Pig in noSql Databases
5 pages
07 Pig
No ratings yet
07 Pig
5 pages
Big Data Unit-5
No ratings yet
Big Data Unit-5
9 pages
bda-unit-4-060115-big-data-analytics-unit-4
No ratings yet
bda-unit-4-060115-big-data-analytics-unit-4
19 pages
Pig
No ratings yet
Pig
16 pages
Nosql 24 011 Pig
No ratings yet
Nosql 24 011 Pig
41 pages
BDP U4
No ratings yet
BDP U4
58 pages
P16MCAE6
No ratings yet
P16MCAE6
2 pages
4G Communication Technology-Evolution and Impact On Business and Economy in India
No ratings yet
4G Communication Technology-Evolution and Impact On Business and Economy in India
7 pages
Finding and Cleaning Data - For Download
No ratings yet
Finding and Cleaning Data - For Download
6 pages
Automatic LPG Leakage Alert System Using Iot
No ratings yet
Automatic LPG Leakage Alert System Using Iot
8 pages
P11MCAE6
No ratings yet
P11MCAE6
2 pages
The Ultimate Guide To Big Data For Businesses
No ratings yet
The Ultimate Guide To Big Data For Businesses
19 pages
The Dua 'A (Prayer) of Prophet Muhammad (PBUH) in Ta'if BS Foad, MD 2017
No ratings yet
The Dua 'A (Prayer) of Prophet Muhammad (PBUH) in Ta'if BS Foad, MD 2017
16 pages
M.sc. Computer Science
No ratings yet
M.sc. Computer Science
43 pages
Takbirat
No ratings yet
Takbirat
1 page
Athikalai Poluthil Annalaarin Ummath (4.1.19)
No ratings yet
Athikalai Poluthil Annalaarin Ummath (4.1.19)
8 pages
CP E80.50 SecuRemoteClient UserGuide
No ratings yet
CP E80.50 SecuRemoteClient UserGuide
16 pages
Be Computer Engineering Aids Final Year Be Semester 7 8 Rev 2019 c Scheme
No ratings yet
Be Computer Engineering Aids Final Year Be Semester 7 8 Rev 2019 c Scheme
145 pages
Bank Statements
No ratings yet
Bank Statements
14 pages
Marriage Gift Policy-27
No ratings yet
Marriage Gift Policy-27
5 pages
BER Analysis of Collaborative Dual-Hop Wireless Transimissions
No ratings yet
BER Analysis of Collaborative Dual-Hop Wireless Transimissions
19 pages
Epicor ERP Customers Course 10.0.700.2
No ratings yet
Epicor ERP Customers Course 10.0.700.2
42 pages
Patel Urmitkumar Maheshbhai. M: 9898703637 E-Mail: Patelurmit@
No ratings yet
Patel Urmitkumar Maheshbhai. M: 9898703637 E-Mail: Patelurmit@
2 pages
OpenBoard-AM335x ADK
No ratings yet
OpenBoard-AM335x ADK
52 pages
K-Means Document Clustering Using Vector Space Model
No ratings yet
K-Means Document Clustering Using Vector Space Model
5 pages
SQL Assignment
No ratings yet
SQL Assignment
9 pages
459 Bob Statement PDF
No ratings yet
459 Bob Statement PDF
2 pages
Tutorial - Learn Python in 10 Minutes
No ratings yet
Tutorial - Learn Python in 10 Minutes
13 pages
CISSP-Certified Information Systems Security Professional Exam Q&A Pack
0% (1)
CISSP-Certified Information Systems Security Professional Exam Q&A Pack
16 pages
ORACLE Migration
No ratings yet
ORACLE Migration
14 pages
Area of A Circle
No ratings yet
Area of A Circle
3 pages
B250 Brochure
No ratings yet
B250 Brochure
8 pages
simplified-aes-example-v2
No ratings yet
simplified-aes-example-v2
5 pages
Andrew Psaltis - Sparkstreaming
No ratings yet
Andrew Psaltis - Sparkstreaming
28 pages
Modeling and Simulation Lab
No ratings yet
Modeling and Simulation Lab
57 pages
WPF Session Plan
No ratings yet
WPF Session Plan
23 pages
Manual de Beamer
No ratings yet
Manual de Beamer
214 pages
Model Exam.-Question Bank
No ratings yet
Model Exam.-Question Bank
8 pages
Linear System Theory Fall 2011 Final Exam Questions With Solutions
No ratings yet
Linear System Theory Fall 2011 Final Exam Questions With Solutions
6 pages
HTML Beginner Tutorial
100% (1)
HTML Beginner Tutorial
16 pages
Eset 2
No ratings yet
Eset 2
2 pages
EBS R12 2.0.0 Login Page Branding
No ratings yet
EBS R12 2.0.0 Login Page Branding
5 pages
Crontab - Quick Reference
No ratings yet
Crontab - Quick Reference
3 pages

Uploaded by

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Apache Pig - A Data Flow Framework Based on Hadoop Map Reduce

Article · August 2017

Cdsa Swa Zahid Ansari

SEE PROFILE SEE PROFILE

HPC and soft computing View project

The user has requested enhancement of the downloaded file.

Apache Pig - A Data Flow Framework Based

Keywords — Big Data, Hadoop, Map Reduce, Pig, Pig

ISSN: 2231-5381 http://www.ijettjournal.org Page 271

Fig. 2 describes Pig Architecture. Grunt is the TABLE I

 Ease of programming: Complex tasks

ISSN: 2231-5381 http://www.ijettjournal.org Page 272

B. Pig diagnostic operators Our current implementation uses Hadoop, an open-

ISSN: 2231-5381 http://www.ijettjournal.org Page 273

map1 mapi reducei map i+1 reduce i+1

load filter group cogroup cogroup

Fig 4: Pig Latin Map Reduce Compilation

samples the input to find out quantiles of the sort

ISSN: 2231-5381 http://www.ijettjournal.org Page 274

ISSN: 2231-5381 http://www.ijettjournal.org Page 275

View publication stats

You might also like