0% found this document useful (0 votes)
82 views3 pages

Creating Tables in Hive

To create tables in Hive using a dataset: 1. Download the dataset file and transfer it to an EC2 instance. 2. Use the Hive CLI to create an internal table with the schema of the dataset. 3. Load the data into the internal table.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views3 pages

Creating Tables in Hive

To create tables in Hive using a dataset: 1. Download the dataset file and transfer it to an EC2 instance. 2. Use the Hive CLI to create an internal table with the schema of the dataset. 3. Load the data into the internal table.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Creating Tables in Hive using Hive CLI

To create tables in Hive using the given dataset, follow the below instructions.
1. Download the file to your laptop
2. Open the terminal on your laptop and use the below command to send file to your EC2
instance
scp -i ​locationofpemfile​ ​locationofdownloadeddataset

ec2-user@​yourpublicip​:/home/ec2-user

3. Login to your EC2 instance and use the ​ls​ command to check if the data has been
transferred

4. To look at the headers of the file use the command ​head data_2008.csv
Note:​ You can use the head command on either your local machine or EC2. If using your local
machine you will have to ​change data_2008.csv to the exact location of your file.
5. Now enter the Hive Command Line Interface by entering the command ​hive​ on your EC2
instance
6. To create an internal table use the below command

create​ ​table​ flights_data( ​`Year`​ ​int​, ​`Month`​ ​int​, ​`DayofMonth`​ ​int​,


`DayOfWeek`​ ​int​, ​`DepTime`​ ​int​, ​`CRSDepTime`​ ​int​, ​`ArrTime`​ ​int​,
`CRSArrTime`​ ​int​, ​`UniqueCarrier`​ ​string​, ​`FlightNum`​ ​string​,
`TailNum`​ ​string​, ​`ActualElapsedTime`​ ​int​, ​`CRSElapsedTime`​ ​int​,
`AirTime`​ ​int​, ​`ArrDelay`​ ​int​, ​`DepDelay`​ ​int​, ​`Origin`​ ​string​,
`Dest`​ ​string​, ​`Distance`​ ​bigint​, ​`TaxiIn`​ ​int​, ​`TaxiOut`​ ​int​,
`Cancelled`​ ​int​, ​`CancellationCode`​ ​string​, ​`Diverted`​ ​int​,
`CarrierDelay`​ ​string​, ​`WeatherDelay`​ ​string​, ​`NASDelay`​ ​string​,
`SecurityDelay`​ ​string​, ​`LateAircraftDelay`​ ​string​)
row​ ​format​ ​delimited​ ​fields​ ​terminated​ ​by​ ​','​;

7. To view the current tables use the command ​show tables

8. To load data into the internal table use the command below
load​ ​data​ ​local​ inpath ​'location of your dataset'​ overwrite ​into​ ​table
flights_data;

9. To view the data in the table you can use the below command
select​ * ​from​ flights_data ​limit​ ​10​;
10. To create an external table you need to modify the create table statement as given
below
create​ ​external​ ​table​ flights_data_ext( ​`Year`​ ​int​, ​`Month`​ ​int​,
`DayofMonth`​ ​int​, ​`DayOfWeek`​ ​int​, ​`DepTime`​ ​int​, ​`CRSDepTime`​ ​int​,
`ArrTime`​ ​int​, ​`CRSArrTime`​ ​int​, ​`UniqueCarrier`​ ​string​, ​`FlightNum`
string​, ​`TailNum`​ ​string​, ​`ActualElapsedTime`​ ​int​, ​`CRSElapsedTime`
int​, ​`AirTime`​ ​int​, ​`ArrDelay`​ ​int​, ​`DepDelay`​ ​int​, ​`Origin`​ ​string​,
`Dest`​ ​string​, ​`Distance`​ ​bigint​, ​`TaxiIn`​ ​int​, ​`TaxiOut`​ ​int​,
`Cancelled`​ ​int​, ​`CancellationCode`​ ​string​, ​`Diverted`​ ​int​,
`CarrierDelay`​ ​string​, ​`WeatherDelay`​ ​string​, ​`NASDelay`​ ​string​,
`SecurityDelay`​ ​string​, ​`LateAircraftDelay`​ ​string​)
row​ ​format​ ​delimited​ ​fields​ ​terminated​ ​by​ ​','
location ​'s3a://yourS3bucketname/flights_data'​;

11. To load data into this table use the command


load​ ​data​ ​local​ inpath ​'location of your dataset'​ overwrite ​into​ ​table
flights_data_ext;

You might also like