Apache Pig - A Data Flow Framework Based On Hadoop Map Reduce
Apache Pig - A Data Flow Framework Based On Hadoop Map Reduce
net/publication/321537152
CITATIONS READS
0 1,939
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Zahid Ansari on 17 May 2018.
I. INTRODUCTION
The term ‘Big Data’ describes inventive
techniques and technologies to capture, store,
distribute, manage and analyse petabyte or larger-
sized datasets with high-velocity and different Fig 1: Components of Pig
structures [1]. Hadoop is open-source software that Figure 1 also describes the various steps during
enables reliable, scalable, distributed computing on the execution. The data is loaded from HDFS and it
clusters of less expensive servers [2]. In 2004 is then converted to many map and reduce tasks.
Google has invented a frame work called Map Lastly the output is either stored in to a file or
Reduce which is mainly used for parallel data dumped to screen.
processing in a distributed computing environment.
But the Map Reduce is too low level and rigid. it has
many drawbacks like writing low level Map Reduce
code is slow, need a lot of expertise to optimize
Map Reduce code, prototyping is slow, a lot of
custom code required even for simple tasks and it is
hard to manage more complex map reduce job
chains. So a new language called Pig Latin was
developed which is a high level declarative query
language like SQL and a low level procedural
programming like Map Reduce.
Pig Latin is implemented on Pig which is open
source software which run on Hadoop. Pig Latin’s
main features include support for an adaptable
nested data model, extensive support for user
defined functions, and the ability to operate on input Fig 2: Pig Architecture
C1 Ci C1+1
load
Pig is used to evaluate the performance of a [15] Agarwal, Shafali, and Zeba Khanam. "Map Reduce: A
Survey Paper on Recent Expansion." International Journal
commercial RDBMS and Hadoop in astronomy of Advanced Computer Science and Applications 6.8 (2015):
simulation analysis tasks [22]. 209-215.
[16] Olshannikova, Ekaterina, et al. "Conceptualizing Big Social
Data." Journal of Big Data 4.1 (2017): 3.
[17] Tom White foreword by Doug Cutting; ―Hadoop: The
V. CONCLUSIONS Definitive Guide‖; ISBN: 978-1-449-38973-4 [SB]
This paper introduced the concept of Pig and its 1285179414.
associated language Pig Latin which is a new data [18] Bhardwaj, Vibha, Rahul Johari, and Priti Bhardwaj. "Query
execution evaluation in wireless network using MyHadoop."
processing environment deployed at Yahoo. We Reliability, Infocom Technologies and Optimization
have entered an era of Big Data and Hadoop is a (ICRITO)(Trends and Future Directions), 2015 4th
framework for the analysis and transformation of International Conference on. IEEE, 2015.
this Big data using the Map Reduce paradigm. The [19] Tanimura, Yusuke, et al. "Extensions to the Pig data
processing platform for scalable RDF data processing using
Pig system compiles Pig Latin expressions into a Hadoop." Data Engineering Workshops (ICDEW), 2010
sequence of map-reduce jobs, and orchestrates the IEEE 26th International Conference on. IEEE, 2010.
execution of these jobs on Hadoop. Pig structure is [20] Arushi Jaina, Vishal Bhatnagara Ambedkar” Crime Data
susceptible to substantial parallelization. Analysis Using Pig with Hadoop”, International
Conference on Information Security &Privacy (ICISP2015),
11-12 December 2015
REFERENCES [21] Prasad, PS Durga, T. Vivekanandan, and A. Srinivasan. "A
Methodology for WebLog Data analysis using
HadoopMapReduce and PIG." i-manager's Journal on
[1] Bhosale, Harshawardhan S., and Devendra P. Gadekar. "A Cloud Computing 3.1 (2015): 13.
Review Paper on Big Data and Hadoop." International
[22] Loebman, Sarah, et al. "Analyzing massive astrophysical
Journal of Scientific and Research Publications 4.10 (2014):
datasets: Can Pig/Hadoop or a relational DBMS help?."
[2] Chavan, Ms Vibhavari, and Rajesh N. Phursule. "Survey Cluster Computing and Workshops, 2009. CLUSTER'09.
paper on big data." Int. J. Comput. Sci. Inf. Technol 5.6
IEEE International Conference on. IEEE, 2009.
(2014): 7932-7939.
[23] www.wikepedia.org 12/04/2017 at 8:30 pm
[3] Samak, Taghrid, Daniel Gunter, and Valerie Hendrix.
"Scalable analysis of network measurements with Hadoop
and Pig." Network Operations and Management Symposium
(NOMS), 2012 IEEE. IEEE, 2012.
[4] Goyal, Vikas, and Deepak Soni. "SURVEY PAPER ON
BIG DATA ANALYTICS USING HADOOP
TECHNOLOGIES."
[5] Wang, MingXue, Sidath B. Handurukande, and Mohamed
Nassar. "RPig: A scalable framework for machine learning
and advanced statistical functionalities." Cloud Computing
Technology and Science (CloudCom), 2012 IEEE 4th
International Conference on. IEEE, 2012.
[6] Ouaknine, Keren, Michael Carey, and Scott Kirkpatrick.
"The PigMix Benchmark on Pig, MapReduce, and HPCC
Systems." Big Data (BigData congress), 2015 IEEE
International Congress on. IEEE, 2015.
[7] Samak, Taghrid, Daniel Gunter, and Valerie Hendrix.
"Scalable analysis of network measurements with Hadoop
and Pig." Network Operations and Management Symposium
(NOMS), 2012 IEEE. IEEE, 2012.
[8] Gates, Alan F., et al. "Building a high-level dataflow system
on top of Map-Reduce: the Pig experience." Proceedings of
the VLDB Endowment 2.2 (2009): 1414-1425.
[9] Adnan, Muhammad, et al. "Minimizing big data problems
using cloud computing based on Hadoop architecture."
High-capacity Optical Networks and Emerging/Enabling
Technologies (HONET), 2014 11th Annual. IEEE, 2014.
[10] Shang, Weiyi, Bram Adams, and Ahmed E. Hassan. "Using
Pig as a data preparation language for large-scale mining
software repositories studies: An experience report."
Journal of Systems and Software 85.10 (2012): 2195-2204.
[11] Shvachko, Konstantin, et al. "The hadoop distributed file
system." Mass storage systems and technologies (MSST),
2010 IEEE 26th symposium on. IEEE, 2010.
[12] Olston, Christopher, et al. "Pig latin: a not-so-foreign
language for data processing." Proceedings of the 2008
ACM SIGMOD international conference on Management of
data. ACM, 2008.
[13] Shvachko, Konstantin, et al. "The hadoop distributed file
system." Mass storage systems and technologies (MSST),
2010 IEEE 26th symposium on. IEEE, 2010.
[14] Wang, Yaoguang, et al. "Improving MapReduce
performance with partial speculative execution." Journal of
Grid Computing 13.4 (2015): 587-604.