0% found this document useful (1 vote)

184 views

Hive Tutorial PDF

Uploaded by

bewithyou2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

184 views

Hive Tutorial PDF

Uploaded by

bewithyou2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Apache Hive

i
Apache Hive

About the Tutorial

Hive is a data warehouse infrastructure tool to process structured data in Hadoop.
It resides on top of Hadoop to summarize Big Data, and makes querying and
analyzing easy.

This is a brief tutorial that provides an introduction on how to use Apache Hive
HiveQL with Hadoop Distributed File System. This tutorial can be your first step
towards becoming a successful Hadoop Developer with Hive.

Audience
This tutorial is prepared for professionals aspiring to make a career in Big Data
Analytics using Hadoop Framework. ETL developers and professionals who are into
analytics in general may as well use this tutorial to good effect.

Prerequisites
Before proceeding with this tutorial, you need a basic knowledge of Core Java,
Database concepts of SQL, Hadoop File system, and any of Linux operating system
flavors.

Disclaimer & Copyright

All the content and graphics published in this e-book are the property of Tutorials
Point (I) Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy,
distribute or republish any contents or a part of contents of this e-book in any
manner without written consent of the publisher. We strive to update the contents
of our website and tutorials as timely and as precisely as possible, however, the
contents may contain inaccuracies or errors. Tutorials Point (I) Pvt. Ltd. provides
no guarantee regarding the accuracy, timeliness or completeness of our website
or its contents including this tutorial. If you discover any errors on our website or
in this tutorial, please notify us at [email protected].

i
Apache Hive

Table of Contents
About the Tutorial ····································································································································· i

Audience ··················································································································································· i

Prerequisites ············································································································································· i

Disclaimer & Copyright ······························································································································ i

Table of Contents ····································································································································· ii

1. INTRODUCTION ···················································································································· 1

Hadoop ···················································································································································· 1

What is Hive? ··········································································································································· 2

Features of Hive ······································································································································· 2

Architecture of Hive ································································································································· 2

Working of Hive ······································································································································· 4

2. HIVE INSTALLATION ·············································································································· 6

Step 1: Verifying JAVA Installation ··········································································································· 6

Step 2: Verifying Hadoop Installation ······································································································· 8

Step 3: Downloading Hive ······················································································································ 15

Step 4: Installing Hive ····························································································································· 15

Step 5: Configuring Hive ························································································································· 16

Step 6: Downloading and Installing Apache Derby ················································································· 17

Step 7: Configuring Metastore of Hive ···································································································· 19

Step 8: Verifying Hive Installation ·········································································································· 20

3. HIVE DATA TYPES················································································································ 22

Column Types ········································································································································· 22

Literals ··················································································································································· 24

Null Value ··············································································································································· 24

Complex Types ······································································································································· 24

ii
Apache Hive

4. CREATE DATABASE ············································································································· 25

Create Database Statement ··················································································································· 25

5. DROP DATABASE ················································································································ 28

Drop Database Statement ······················································································································ 28

6. CREATE TABLE ···················································································································· 30

Create Table Statement·························································································································· 30

Load Data Statement······························································································································ 33

7. ALTER TABLE······················································································································· 36

Alter Table Statement ···························································································································· 36

Rename To… Statement ························································································································· 36

Change Statement ·································································································································· 38

Add Columns Statement························································································································· 40

Replace Statement ································································································································· 41

8. DROP TABLE ······················································································································· 44

Drop Table Statement ···························································································································· 44

9. PARTITIONING ···················································································································· 47

Adding a Partition ·································································································································· 48

Renaming a Partition ······························································································································ 48

Dropping a Partition ······························································································································· 48

10. BUILT-IN OPERATORS ········································································································· 50

Relational Operators ······························································································································ 50

Arithmetic Operators ····························································································································· 52

Logical Operators ··································································································································· 53

Complex Operators ································································································································ 54

11. BUILT-IN FUNCTIONS ·········································································································· 55

iii
Apache Hive

Built-In Functions ··································································································································· 55

Aggregate Functions ······························································································································· 57

12. VIEWS AND INDEXES ·········································································································· 59

Creating a View ······································································································································ 59

Example ················································································································································· 59

Dropping a View ····································································································································· 60

Creating an Index ··································································································································· 60

Dropping an Index ·································································································································· 61

13. HIVEQL SELECT…WHERE ····································································································· 62

Syntax ···················································································································································· 62

14. HIVEQL SELECT…ORDER BY································································································· 66

15. HIVEQL GROUP BY ·············································································································· 70

16. HIVEQL JOINS ····················································································································· 74

JOIN ······················································································································································· 75

LEFT OUTER JOIN ···································································································································· 76

RIGHT OUTER JOIN ································································································································· 76

FULL OUTER JOIN ··································································································································· 77

iv
1. INTRODUCTION Apache Hive

The term ‘Big Data’ is used for collections of large datasets that include huge volume, high
velocity, and a variety of data that is increasing day by day. Using traditional data
management systems, it is difficult to process Big Data. Therefore, the Apache Software
Foundation introduced a framework called Hadoop to solve Big Data management and
processing challenges.

Hadoop
Hadoop is an open-source framework to store and process Big Data in a distributed
environment. It contains two modules, one is MapReduce and another is Hadoop Distributed
File System (HDFS).

 MapReduce: It is a parallel programming model for processing large amounts of

structured, semi-structured, and unstructured data on large clusters of commodity
hardware.
1.
 HDFS: Hadoop Distributed File System is a part of Hadoop framework, used to store
and process the datasets. It provides a fault-tolerant file system to run on commodity
hardware.

The Hadoop ecosystem contains different sub-projects (tools) such as Sqoop, Pig, and Hive
that are used to help Hadoop modules.

 Sqoop: It is used to import and export data to and fro between HDFS and RDBMS.

 Pig: It is a procedural language platform used to develop a script for MapReduce

operations.

 Hive: It is a platform used to develop SQL type scripts to do MapReduce operations.

Note: There are various ways to execute MapReduce operations:

 The traditional approach using Java MapReduce program for structured, semi-
structured, and unstructured data.

 The scripting approach for MapReduce to process structured and semi structured data
using Pig.

 The Hive Query Language (HiveQL or HQL) for MapReduce to process structured data
using Hive.

What is Hive?
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides
on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.

5
Apache Hive

Initially Hive was developed by Facebook, later the Apache Software Foundation took it up
and developed it further as an open source under the name Apache Hive. It is used by
different companies. For example, Amazon uses it in Amazon Elastic MapReduce.

Hive is not
 A relational database
 A design for OnLine Transaction Processing (OLTP)
 A language for real-time queries and row-level updates

Features of Hive
Here are the features of Hive:

 It stores schema in a database and processed data into HDFS.

 It is designed for OLAP.
 It provides SQL type language for querying called HiveQL or HQL.
 It is familiar, fast, scalable, and extensible.

Architecture of Hive
The following component diagram depicts the architecture of Hive:

This component diagram contains different units. The following table describes each unit:

Unit Name Operation

6
Apache Hive

User Interface Hive is a data warehouse infrastructure software that can create
interaction between user and HDFS. The user interfaces that Hive
supports are Hive Web UI, Hive command line, and Hive HD Insight
(In Windows server).

Meta Store Hive chooses respective database servers to store the schema or
Metadata of tables, databases, columns in a table, their data
types, and HDFS mapping.

HiveQL Process HiveQL is similar to SQL for querying on schema info on the
Engine Metastore. It is one of the replacements of traditional approach for
MapReduce program. Instead of writing MapReduce program in
Java, we can write a query for MapReduce job and process it.

Execution Engine The conjunction part of HiveQL process Engine and MapReduce is
Hive Execution Engine. Execution engine processes the query and
generates results as same as MapReduce results. It uses the flavor
of MapReduce.

HDFS or HBASE Hadoop distributed file system or HBASE are the data storage
techniques to store data into file system.

Working of Hive
The following diagram depicts the workflow between Hive and Hadoop.

7
Apache Hive

The following table defines how Hive interacts with Hadoop framework:

Step No. Operation

1 Execute Query

The Hive interface such as Command Line or Web UI sends query to Driver
(any database driver such as JDBC, ODBC, etc.) to execute.

2 Get Plan

The driver takes the help of query compiler that parses the query to check
the syntax and query plan or the requirement of query.

3 Get Metadata

The compiler sends metadata request to Metastore (any database).

4 Send Metadata

Metastore sends metadata as a response to the compiler.

5 Send Plan

8
Apache Hive

The compiler checks the requirement and resends the plan to the driver.
Up to here, the parsing and compiling of a query is complete.

6 Execute Plan

The driver sends the execute plan to the execution engine.

7 Execute Job

Internally, the process of execution job is a MapReduce job. The execution

engine sends the job to JobTracker, which is in Name node and it assigns
this job to TaskTracker, which is in Data node. Here, the query executes
MapReduce job.

7.1 Metadata Ops

Meanwhile in execution, the execution engine can execute metadata

operations with Metastore.

8 Fetch Result

The execution engine receives the results from Data nodes.

9 Send Results

The execution engine sends those resultant values to the driver.

10 Send Results

The driver sends the results to Hive Interfaces.

9
2. HIVE INSTALLATION Apache Hive

All Hadoop sub-projects such as Hive, Pig, and HBase support Linux operating system.
Therefore, you need to install any Linux flavored OS. The following simple steps are executed
for Hive installation:

Step 1: Verifying JAVA Installation

Java must be installed on your system before installing Hive. Let us verify java installation
using the following command:

$ java –version

If Java is already installed on your system, you get to see the following response:

java version "1.7.0_71"

Java(TM) SE Runtime Environment (build 1.7.0_71-b13)
Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)

If java is not installed in your system, then follow the steps given below for installing java.

Installing Java
Step I:
Download java (JDK <latest version> - X64.tar.gz) by visiting the following link
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html.

Then jdk-7u71-linux-x64.tar.gz will be downloaded onto your system.

Step II:
Generally, you will find the downloaded java file in the Downloads folder. Verify it and extract
the jdk-7u71-linux-x64.gz file using the following commands.

$ cd Downloads/

$ ls

jdk-7u71-linux-x64.gz

10
Apache Hive

$ tar zxf jdk-7u71-linux-x64.gz

$ ls

jdk1.7.0_71 jdk-7u71-linux-x64.gz

Step III:
To make java available to all the users, you have to move it to the location “/usr/local/”. Open
root, and type the following commands.

$ su

password:

# mv jdk1.7.0_71 /usr/local/

# exit

Step IV:
For setting up PATH and JAVA_HOME variables, add the following commands to ~/.bashrc
file.

export JAVA_HOME=/usr/local/jdk1.7.0_71

export PATH=PATH:$JAVA_HOME/bin

Now apply all the changes into the current running system.

$ source ~/.bashrc

Step V:
Use the following commands to configure java alternatives:

# alternatives --install /usr/bin/java java usr/local/java/bin/java 2

# alternatives --install /usr/bin/javac javac usr/local/java/bin/javac 2

# alternatives --install /usr/bin/jar jar usr/local/java/bin/jar 2

# alternatives --set java usr/local/java/bin/java

11
Apache Hive

# alternatives --set javac usr/local/java/bin/javac

# alternatives --set jar usr/local/java/bin/jar

12
Apache Hive

End of ebook preview

If you liked what you saw…
Buy it from our store @ https://store.tutorialspoint.com

Teradata
100% (2)
Teradata
971 pages
Teradata Tutorial PDF
100% (1)
Teradata Tutorial PDF
120 pages
Hadoop Classroom Notes
100% (2)
Hadoop Classroom Notes
76 pages
Kelly Hadoop NEW
100% (1)
Kelly Hadoop NEW
252 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Heroku Cloud Application Development
From Everand
Heroku Cloud Application Development
Anubhav Hanjura
No ratings yet
What Is A Stack?: Some Key Points Related To Stack
No ratings yet
What Is A Stack?: Some Key Points Related To Stack
23 pages
Hive Tutorial For Beginners: Learn With Examples in 3 Days
No ratings yet
Hive Tutorial For Beginners: Learn With Examples in 3 Days
3 pages
Spark SQL Tutorial
0% (1)
Spark SQL Tutorial
7 pages
Hadoop HDFS Commands
No ratings yet
Hadoop HDFS Commands
6 pages
1.language Fundamentals Study Material PDF
No ratings yet
1.language Fundamentals Study Material PDF
32 pages
Apache Storm Tutorial Point
0% (1)
Apache Storm Tutorial Point
20 pages
Spark Notes
No ratings yet
Spark Notes
71 pages
Edureka - Scala Interview Questions
No ratings yet
Edureka - Scala Interview Questions
21 pages
HDFS Internals
No ratings yet
HDFS Internals
30 pages
Spark Concept
No ratings yet
Spark Concept
18 pages
Parallel Programming With Spark: Matei Zaharia
No ratings yet
Parallel Programming With Spark: Matei Zaharia
40 pages
Avro Tutorial
100% (2)
Avro Tutorial
49 pages
Sqoop Commands - Latest
No ratings yet
Sqoop Commands - Latest
4 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
Pair RDD Operations: Flat Map
No ratings yet
Pair RDD Operations: Flat Map
4 pages
Spark Notes
No ratings yet
Spark Notes
6 pages
Kalyan Hadoop Course Material at ORIENIT PDF
100% (1)
Kalyan Hadoop Course Material at ORIENIT PDF
393 pages
What Is Spark?: History of Apache Spark
No ratings yet
What Is Spark?: History of Apache Spark
65 pages
Pyspark Material
No ratings yet
Pyspark Material
16 pages
Linux Command List
No ratings yet
Linux Command List
8 pages
7 Steps For A Developer To Learn Apache Spark
No ratings yet
7 Steps For A Developer To Learn Apache Spark
30 pages
Big Data Introduction PDF
No ratings yet
Big Data Introduction PDF
180 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
58 pages
Distributed Database Systems: - Spark I
No ratings yet
Distributed Database Systems: - Spark I
59 pages
Hive Cheat Sheet - Quick Reference
No ratings yet
Hive Cheat Sheet - Quick Reference
19 pages
Hive Query Optimization Infinity
No ratings yet
Hive Query Optimization Infinity
13 pages
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
No ratings yet
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
7 pages
Xstream Tutorial
100% (1)
Xstream Tutorial
68 pages
Spark Summit East 2015 - Adv Dev Ops - Student Slides
No ratings yet
Spark Summit East 2015 - Adv Dev Ops - Student Slides
219 pages
PySpark SQL Cheat Sheet Python PDF
No ratings yet
PySpark SQL Cheat Sheet Python PDF
1 page
Hadoop - PIG User Material
No ratings yet
Hadoop - PIG User Material
292 pages
5 - Programming With RDDs and Dataframes
No ratings yet
5 - Programming With RDDs and Dataframes
32 pages
4 - Action and RDD Transformations
No ratings yet
4 - Action and RDD Transformations
25 pages
Py Spark
No ratings yet
Py Spark
427 pages
Spark RDD Dataframes SQL
No ratings yet
Spark RDD Dataframes SQL
3 pages
Research On AWS Glue
No ratings yet
Research On AWS Glue
5 pages
Advanced Spark Training
0% (1)
Advanced Spark Training
49 pages
Cloudera Certification Dump - 410-Anil
100% (3)
Cloudera Certification Dump - 410-Anil
49 pages
Oozie Tutorial
No ratings yet
Oozie Tutorial
84 pages
Apache Hive: Prashant Gupta
100% (1)
Apache Hive: Prashant Gupta
61 pages
SS1123 - D2T - Apache Cassandra Overview PDF
100% (1)
SS1123 - D2T - Apache Cassandra Overview PDF
45 pages
Spark With Bigdata
No ratings yet
Spark With Bigdata
94 pages
Facebook Hive POC
No ratings yet
Facebook Hive POC
18 pages
Spark Interview Questions: Click Here
No ratings yet
Spark Interview Questions: Click Here
35 pages
DVS SPARK Course Content PDF
No ratings yet
DVS SPARK Course Content PDF
2 pages
HDFS Interview Questions
No ratings yet
HDFS Interview Questions
29 pages
Scala PDF
No ratings yet
Scala PDF
29 pages
Instant Pentaho Data Integration Kitchen
From Everand
Instant Pentaho Data Integration Kitchen
Sergio Ramazzina
No ratings yet
Oracle SOA BPEL Process Manager 11gR1 A Hands-on Tutorial
From Everand
Oracle SOA BPEL Process Manager 11gR1 A Hands-on Tutorial
Ravi Saraswathi
5/5 (1)
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
HBase Administration Cookbook
From Everand
HBase Administration Cookbook
Yifeng Jiang
No ratings yet
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
From Everand
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
Saba Shah
No ratings yet
Instant Redis Optimization How-to
From Everand
Instant Redis Optimization How-to
Arun Chinnachamy
No ratings yet
PostgreSQL 9 High Availability Cookbook
From Everand
PostgreSQL 9 High Availability Cookbook
Shaun M. Thomas
5/5 (2)
Oracle SQL & PL-SQL Optimization For Developers Documentation PDF
No ratings yet
Oracle SQL & PL-SQL Optimization For Developers Documentation PDF
103 pages
1955 PDF
No ratings yet
1955 PDF
16 pages
Hiveql Example Queries: Ukdataservice - Ac.Uk
No ratings yet
Hiveql Example Queries: Ukdataservice - Ac.Uk
9 pages
Amazon Emr Migration Guide
No ratings yet
Amazon Emr Migration Guide
141 pages
Security With Apache Ranger: 2 Days - Subject Matter Expert
No ratings yet
Security With Apache Ranger: 2 Days - Subject Matter Expert
4 pages
How To Find Mapping of ASM Disks To Physical Devices
No ratings yet
How To Find Mapping of ASM Disks To Physical Devices
1 page
Oracle RAC Performance Tuning
No ratings yet
Oracle RAC Performance Tuning
44 pages
Script For Nulify
No ratings yet
Script For Nulify
1 page
Resize Undo Tablespace After ORA-03297 File Contains Used Data Beyond
No ratings yet
Resize Undo Tablespace After ORA-03297 File Contains Used Data Beyond
2 pages
Size Your Undo Tablespace
No ratings yet
Size Your Undo Tablespace
2 pages
Execution Plan of A Running SQL Statement
No ratings yet
Execution Plan of A Running SQL Statement
3 pages
My Book
No ratings yet
My Book
529 pages
Code Bài Tập Lập Trình C Cơ Bản
No ratings yet
Code Bài Tập Lập Trình C Cơ Bản
4 pages
Types of DSP Architectures
100% (3)
Types of DSP Architectures
45 pages
SAS Interview Practice Question With Answers
100% (1)
SAS Interview Practice Question With Answers
50 pages
Programming in C#: Microsoft 70-483 Dumps Available Here at
No ratings yet
Programming in C#: Microsoft 70-483 Dumps Available Here at
14 pages
Azure Service Bus Topic
No ratings yet
Azure Service Bus Topic
6 pages
Pc1425 Sm Sharp en Text
No ratings yet
Pc1425 Sm Sharp en Text
11 pages
Understanding Oracle Forms Timeout Parameters (Or Should I Say FORMS - TIMEOUT) FRM-92102 - A Network Error Has Occured
No ratings yet
Understanding Oracle Forms Timeout Parameters (Or Should I Say FORMS - TIMEOUT) FRM-92102 - A Network Error Has Occured
4 pages
P39-Bakery Management System
No ratings yet
P39-Bakery Management System
3 pages
Sathyabama: Packet Sniffing Using Python in Kali Linux
No ratings yet
Sathyabama: Packet Sniffing Using Python in Kali Linux
19 pages
DBMS Lab Record
No ratings yet
DBMS Lab Record
42 pages
JSPM'S Jayawantrao Sawant College of Engineeringhadpsar, Pune-33 Department of Information Technology Multiple Choice Questions Unit-1
No ratings yet
JSPM'S Jayawantrao Sawant College of Engineeringhadpsar, Pune-33 Department of Information Technology Multiple Choice Questions Unit-1
30 pages
CrystalDiskInfo 20201128003915
No ratings yet
CrystalDiskInfo 20201128003915
6 pages
Example1: (Using Friend Function With Class)
No ratings yet
Example1: (Using Friend Function With Class)
15 pages
Coa 4
No ratings yet
Coa 4
281 pages
Containers Primer v2
No ratings yet
Containers Primer v2
1 page
DB - Presentation 1
No ratings yet
DB - Presentation 1
23 pages
7 Security
No ratings yet
7 Security
15 pages
What Is Direct Memory Access (DMA) and Why Should We Know About It?
No ratings yet
What Is Direct Memory Access (DMA) and Why Should We Know About It?
23 pages
19bec1025 - Ece3003 - Lab - 8
No ratings yet
19bec1025 - Ece3003 - Lab - 8
7 pages
A+ Guide To Hardware: Managing, Maintaining, and Troubleshooting, 5e
No ratings yet
A+ Guide To Hardware: Managing, Maintaining, and Troubleshooting, 5e
66 pages
(GUIDE) (PORT) (ROM) All About Mediatek ROM Porting - Xda-Developers
No ratings yet
(GUIDE) (PORT) (ROM) All About Mediatek ROM Porting - Xda-Developers
14 pages
FTP Commands List PDF
No ratings yet
FTP Commands List PDF
6 pages
Chapter 1 - Closing Case PDF
No ratings yet
Chapter 1 - Closing Case PDF
2 pages
SOLUTION_CS_Pre_Brd_1_2024-25_SetA
No ratings yet
SOLUTION_CS_Pre_Brd_1_2024-25_SetA
14 pages
Description of Captured Data
No ratings yet
Description of Captured Data
4 pages
# Operating System (CH 3)
No ratings yet
# Operating System (CH 3)
48 pages
Hacking APIs Breaking Web Application Programming Interfaces Final
No ratings yet
Hacking APIs Breaking Web Application Programming Interfaces Final
53 pages
Internet and Resource Reservation Protocols
No ratings yet
Internet and Resource Reservation Protocols
15 pages