0% found this document useful (0 votes)

99 views

04 Bigdata Hive

Hive is a data warehousing system for Hadoop that allows users to query large datasets stored in Hadoop files using a SQL-like language called HiveQL. It addresses the issues of lack of structure and expressiveness in MapReduce programs for analyzing large datasets. Hive provides structure to data stored in files on HDFS and the ability to express queries using a simple SQL-like language. It uses a metastore to store metadata about tables, columns and partitions. The Hive driver compiles HiveQL queries into a directed acyclic graph of MapReduce jobs that are executed by the execution engine on Hadoop clusters.

Uploaded by

Rohit Uppal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

99 views

04 Bigdata Hive

Uploaded by

Rohit Uppal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

HIVE

DATA WAREHOUSING USING HIVE QUERY

Additional Data Warehousing System

 The table shows the problems related to data inflow & expressiveness, and
the solutions adapted to address the need for an additional data
warehousing system.
 It was difficult to develop MapReduce program to express the
data , hence data lacked expressiveness.
 Hence, HIVE came into BIG Data . .
What is HIVE ?

HIVE is defined as warehouse system for Hadoop that facilitates ad-hoc Queries
and the analysis of large data sets stored in Hadoop . .
FACTS !!
 Hive provides SQL- like flavour to Big data
i.e. HiveQL(HQL). Because of that it’s a
popular choice for Big data Analytics on
Hadoop platform.
HIVE FACTS  It provides massive scale-out & fault
tolerance capabilities for data storage
and processing of commodity hardware.
 Relying on MapReduce for execution,
hive is batch-oriented and high latency
for query execution.
HIVE | Characteristics

Hive is a system for managing and querying unstructured data into structured data format.
It uses the concepts :
 It uses HDFS for storage and retrieval of data.
 The scripts of Hive uses MapReduce for execution

Hive commands are similar to SQL which Interoperability (extensible framework to

is data warehousing tool. support different files and data formats).
Principles
of
Hive Performance is better in Hive since Hive
Extensibility (pluggable MapReduce scripts
engine uses the best in-built script to
in the language of your choice – rich, user
reduce the execution time while enabling
def data types & user def functions.
high output.
System Architecture and Components of Hive

JDBC ODBC
Web
Command Line
Interface
Interface Thrift Server
MetaStore
Driver
(Compiler, Optimizer, Executor)

Hadoop
MapReduce + HDFS
Job Name
Tracker Node
 Metastore is the component that stores the
system catalog and metadata about tables,
columns, partitions, & so on . .
 Metadata is stored in traditional RDMS format,
Apache Hive uses by default DERBY database.
But its not compulosory its complimentary. If
METADATA you wish you can add any JDBC database like
MySQL.
 Metadata client : metastore_db

MetaStore
Hive driver is the component that:
 Manages the lifecycle of a HIVE Query
Language(HQL) statement as it moves through
Hive;
 It maintains a session handle and any session
statistics.
DRIVER  It includes three basic components:
 Compiler
 Optimizer
 Executer

Driver
(Compiler, Optimizer, Executor)
Query Compiler is the driver components of Hive
and checks for error, if no error encountered it
converts HiveQL to Directed Acyclic Graph(DAG)

Query COMPILER of MapReduce tasks.

Driver
(Compiler, Optimizer, Executor)
 Query optimizer optimizes the HiveQL scripts for
faster execution.
 It consists of a chain of transformations, so that
the operator DAG resulting from on
transformations is passed as an input to the
Query OPTIMIZER next transformations.

Driver
(Compiler, Optimizer, Executor)
 Hive Execution Engine:
 Executes the tasks produced by the compiler in
proper dependency order.
 Interacts with the underlying Hadoop Interface to
ensure perfect synchronization with Hadoop
Query EXECUTOR services.

Driver
(Compiler, Optimizer, Executor)
Hive Server

 Hive Server is the main component, providing a thrift interface & it provides connectivity
to modules Java DB Connectivity/Open DB Connectivity server namely JDBC/ODBC.
 It enables the integration of HIVE with other applications.

JDBC ODBC
Web
Command Line
Interface
Interface Thrift Server
HIVE
DATA MODEL & HIVE QUERY LANGUAGE
HIVE | DATA TYPES

HIVE have three different Data types that are involved in Table Creation.

PREMITIVE TYPES COMPLEX TYPES USER-DEFINED TYPES

Integers: TINYINT, SMALLINT, INT and BIGINT

Boolean: BOOLEAN
Floating Types: FLOAT , DOUBLE
String: STRING (VARCHAR, CHAR)
HIVE | DATA TYPES

HIVE have three different Data types that are involved in Table Creation.

PREMITIVE TYPES COMPLEX TYPES USER-DEFINED TYPES

Structs: {a INT; b:INT}

Maps: M[‘group’]
Arrays: [‘a’, ‘b’, ‘c’], A[1] returns ‘b’
HIVE | DATA TYPES

HIVE have three different Data types that are involved in Table Creation.

PREMITIVE TYPES COMPLEX TYPES USER-DEFINED TYPES

Structures with Attributes

Attributes can be of Any Type
HIVE | DATA MODELS

Tables in HIVE are analogous to Tables in Relational Databases. Tables can be filtered, projected,
joined and unioined. Additionally all the data of a table is stored in a directory in HDFS. Hive also
supports the notion of external tables wherein a table can be created on pre-existing files or
directories in HDFS by providing the appropriate location to the table creation.

Two Types of tables in HIVE

Managed Tables External Tables

HIVE | DATA MODELS TABLE

 HQL Command used to create Tables:

CREATE [TEMPORARY] [EXTERNAL] TABLE IF NOT EXISTS [db_name.]
tab_name[(col_name data_type [COMMENT col_comment], . . . )]
[COMMENT table_comment]
[ROW FORMAT row_format_type]
[STORED AS file_format_type]

COMMENT ‘db_details’ ROW FORMAT DELIMITED

FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’ STORED AS file_format_type

IT SORES TABLES IN HIVE HDFS WAREHOUSE . . . .

HIVE | DATA MODELS EXTERNAL TABLE

 The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive
does not use a default location for this table. This comes in handy if you
already have data generated.
 Dropping an EXTERNAL table, data in the table is NOT deleted from the
file system.
 An EXTERNAL table points to any HDFS location for its storage, rather
than being stored in a folder specified by the configuration
property.
 HQL Command to create External Commands:

CREATE EXTERNAL TABLE weatherext ( wban INT, date STRING)

COMMENT ‘this is external table view’
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
STORED AS TEXTFILE LOCATION ‘ /hive/data/weatherext’;
HIVE | PARTITIONING TABLES

 Hive stores tables in partitions. Partitions are used to divide the table into related parts.
Partitions make data querying more efficient. For example in the above weather table
the data can be partitioned on the basis of year and month and when query is fired on
weather table this partition can be used as one of the column.
 HQL Commands to create the Partitioning the table:
CREATE EXTERNAL TABLE IF NOT EXSISTS weatherext
( wban INT, date STRING)
PARTITIONED BY (year INT, month STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
LOCATION ‘location of text_file’;
HIVE | FLOW OF CSV FILE INTO HIVE
That’s All for Theory
LETS PRACTICE NOW . . . .

Introduction to information and big data security
No ratings yet
Introduction to information and big data security
39 pages
Python Programming. A Step-by-Step Guide For Absolute Beginners
93% (43)
Python Programming. A Step-by-Step Guide For Absolute Beginners
181 pages
Applying Artificial Intelligence at Scale in Semiconductor Manufacturing - McKinsey
No ratings yet
Applying Artificial Intelligence at Scale in Semiconductor Manufacturing - McKinsey
25 pages
Excel Advanced Filter
No ratings yet
Excel Advanced Filter
5 pages
APC Building Data Lakes On AWS SG
No ratings yet
APC Building Data Lakes On AWS SG
187 pages
Insur X
No ratings yet
Insur X
15 pages
LB
No ratings yet
LB
5 pages
C++ Practical File
60% (5)
C++ Practical File
61 pages
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Eagle Company Profile v1.5
No ratings yet
Eagle Company Profile v1.5
24 pages
Faculty: İTİ Profession:Process Automation Engineering Group: 631.18 Subject: English
No ratings yet
Faculty: İTİ Profession:Process Automation Engineering Group: 631.18 Subject: English
11 pages
HBase
No ratings yet
HBase
31 pages
Node - Js v6.10
No ratings yet
Node - Js v6.10
655 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
55 pages
Hadoop Yarn - What Is It ?
No ratings yet
Hadoop Yarn - What Is It ?
7 pages
Installing and Using Impala
No ratings yet
Installing and Using Impala
248 pages
Apache Hadoop YARN
No ratings yet
Apache Hadoop YARN
24 pages
Apache Sqoop
No ratings yet
Apache Sqoop
21 pages
Hadoop and Related Tools
No ratings yet
Hadoop and Related Tools
57 pages
BDA Unit-3
No ratings yet
BDA Unit-3
24 pages
Microservices Architecture Interview Questions
No ratings yet
Microservices Architecture Interview Questions
5 pages
HBase Interview Questions
No ratings yet
HBase Interview Questions
12 pages
Pattern Saga
No ratings yet
Pattern Saga
5 pages
Cloudera Hadoop Introduction PDF
100% (1)
Cloudera Hadoop Introduction PDF
50 pages
Oltp Olap Rtap
No ratings yet
Oltp Olap Rtap
53 pages
Big Data Assignment PDF
No ratings yet
Big Data Assignment PDF
18 pages
Big Data Analytics – Unit 4
No ratings yet
Big Data Analytics – Unit 4
32 pages
Data Flow Diagram Symbols
No ratings yet
Data Flow Diagram Symbols
9 pages
Kudu
No ratings yet
Kudu
9 pages
UK Communications Industry Architecture Summit: 30 September 2015 London
No ratings yet
UK Communications Industry Architecture Summit: 30 September 2015 London
35 pages
Large-Scale Data Management: Hbase
No ratings yet
Large-Scale Data Management: Hbase
36 pages
Stream Processing at Lyft
No ratings yet
Stream Processing at Lyft
20 pages
Explain in Detail About Hadoop Framework
No ratings yet
Explain in Detail About Hadoop Framework
4 pages
Unit Iii
No ratings yet
Unit Iii
43 pages
A Path To Event Sourcing With Amazon MSK - James Ousby
No ratings yet
A Path To Event Sourcing With Amazon MSK - James Ousby
42 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
CII's Young Indians 2011-12 National Annual Report
100% (1)
CII's Young Indians 2011-12 National Annual Report
77 pages
Message Queues (ActiveMQs and Kafka)
No ratings yet
Message Queues (ActiveMQs and Kafka)
7 pages
Cloud Computing New One Project
No ratings yet
Cloud Computing New One Project
77 pages
Corinex Hybrid Fiber-BPL Solution
No ratings yet
Corinex Hybrid Fiber-BPL Solution
21 pages
InterfaceInc - Case Study PDF
No ratings yet
InterfaceInc - Case Study PDF
13 pages
Elastic Search Tutorial
No ratings yet
Elastic Search Tutorial
152 pages
Mesosphere Guide To Data-Rich Apps in Financial Services 1
No ratings yet
Mesosphere Guide To Data-Rich Apps in Financial Services 1
11 pages
BDA Unit - II
No ratings yet
BDA Unit - II
66 pages
COM-421-Lecture-Notes-7 - Open Stack
No ratings yet
COM-421-Lecture-Notes-7 - Open Stack
24 pages
Four Distributed System Architectural Patterns
No ratings yet
Four Distributed System Architectural Patterns
10 pages
Microservices 1714410840
No ratings yet
Microservices 1714410840
18 pages
Slide 3 Hadoop MapReduce Tutorial
No ratings yet
Slide 3 Hadoop MapReduce Tutorial
119 pages
Strategic Relation Supply Chain and Product Life Cycle: January 2014
No ratings yet
Strategic Relation Supply Chain and Product Life Cycle: January 2014
9 pages
Tomcat
100% (1)
Tomcat
36 pages
Content: Chapter I Java Programming Language, Sqlite and Android Studio .3
No ratings yet
Content: Chapter I Java Programming Language, Sqlite and Android Studio .3
36 pages
Apache Hadoop Yarn Architecture PDF
No ratings yet
Apache Hadoop Yarn Architecture PDF
3 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
Hadoop Ecosystem
100% (2)
Hadoop Ecosystem
33 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
17 pages
Cloudera Kudu
100% (1)
Cloudera Kudu
102 pages
Operating System
No ratings yet
Operating System
60 pages
Hadoop Interview Questions Faq
No ratings yet
Hadoop Interview Questions Faq
14 pages
Hands-On Microservices with JavaScript: Build scalable web applications with JavaScript, Node.js, and Docker
From Everand
Hands-On Microservices with JavaScript: Build scalable web applications with JavaScript, Node.js, and Docker
Tural Suleymani
No ratings yet
Siebel Insurance 8 Guide
From Everand
Siebel Insurance 8 Guide
Mohammed Azizuddin Aamer
4/5 (2)
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Threading in Python - Real Python
No ratings yet
Threading in Python - Real Python
3 pages
What Will Be The Output of The Following Code Snippet?: Print (2 3 + (5 + 6) (1 + 1) )
No ratings yet
What Will Be The Output of The Following Code Snippet?: Print (2 3 + (5 + 6) (1 + 1) )
70 pages
CSC 2105 DataStructure Theory New Spring20-21 New 3 Credit
No ratings yet
CSC 2105 DataStructure Theory New Spring20-21 New 3 Credit
7 pages
Universiti Teknologi Mara Kelantan Branch Machang Campus
No ratings yet
Universiti Teknologi Mara Kelantan Branch Machang Campus
16 pages
Bugreport OnePlusN10METRO QKQ1.200830.002 2021 07 15 22 10 34 Dumpstate - Log 24231
No ratings yet
Bugreport OnePlusN10METRO QKQ1.200830.002 2021 07 15 22 10 34 Dumpstate - Log 24231
28 pages
Class 11 Computer Project Isc
No ratings yet
Class 11 Computer Project Isc
52 pages
JMI-2023-Complete-Paper-INPS-Classes
No ratings yet
JMI-2023-Complete-Paper-INPS-Classes
12 pages
Kotlin Apprentice Second Edition Beginning Programming with Kotlin Irina Galata download
100% (4)
Kotlin Apprentice Second Edition Beginning Programming with Kotlin Irina Galata download
69 pages
BDII Tema02
No ratings yet
BDII Tema02
5 pages
Designing Data-Intensive Apps - Ch 3
No ratings yet
Designing Data-Intensive Apps - Ch 3
7 pages
Unit 1 Introduction of Oracle
No ratings yet
Unit 1 Introduction of Oracle
7 pages
Gandhinagar Institute of Technology: B.E. Sem 8 Computer Engineering Department
No ratings yet
Gandhinagar Institute of Technology: B.E. Sem 8 Computer Engineering Department
10 pages
Tbscript en Manual
No ratings yet
Tbscript en Manual
44 pages
computer science class 12
No ratings yet
computer science class 12
8 pages
Java Database Connectivity
No ratings yet
Java Database Connectivity
63 pages
Software Construction: Unit Testing
No ratings yet
Software Construction: Unit Testing
22 pages
TMC1433 1434-LAB03 ArrayPointer Answer
No ratings yet
TMC1433 1434-LAB03 ArrayPointer Answer
12 pages
OS LAB Manual Final(1)
No ratings yet
OS LAB Manual Final(1)
47 pages
JAVA Syllabus
No ratings yet
JAVA Syllabus
3 pages
Examen Oracle
67% (3)
Examen Oracle
237 pages
Database Management Systems: Unit 4 - Parallel DBMS
No ratings yet
Database Management Systems: Unit 4 - Parallel DBMS
14 pages
Java Notes 2021
No ratings yet
Java Notes 2021
125 pages
Ai Complete Notes
No ratings yet
Ai Complete Notes
83 pages
MuleSoft - Functional Test Case - DZone Performance
No ratings yet
MuleSoft - Functional Test Case - DZone Performance
4 pages
Sample Laboratory Manual For Programming Assignment PDF
No ratings yet
Sample Laboratory Manual For Programming Assignment PDF
53 pages
JCL - Job Control Language On Mainframes
No ratings yet
JCL - Job Control Language On Mainframes
32 pages

Uploaded by

Uploaded by

HIVE

DATA WAREHOUSING USING HIVE QUERY

Hive commands are similar to SQL which Interoperability (extensible framework to

Query COMPILER of MapReduce tasks.

PREMITIVE TYPES COMPLEX TYPES USER-DEFINED TYPES

Integers: TINYINT, SMALLINT, INT and BIGINT

PREMITIVE TYPES COMPLEX TYPES USER-DEFINED TYPES

Structs: {a INT; b:INT}

PREMITIVE TYPES COMPLEX TYPES USER-DEFINED TYPES

Structures with Attributes

Two Types of tables in HIVE

Managed Tables External Tables

 HQL Command used to create Tables:

COMMENT ‘db_details’ ROW FORMAT DELIMITED

IT SORES TABLES IN HIVE HDFS WAREHOUSE . . . .

CREATE EXTERNAL TABLE weatherext ( wban INT, date STRING)

You might also like