Creating Tables in Hive
Creating Tables in Hive
To create tables in Hive using the given dataset, follow the below instructions.
1. Download the file to your laptop
2. Open the terminal on your laptop and use the below command to send file to your EC2
instance
scp -i locationofpemfile locationofdownloadeddataset
ec2-user@yourpublicip:/home/ec2-user
3. Login to your EC2 instance and use the ls command to check if the data has been
transferred
4. To look at the headers of the file use the command head data_2008.csv
Note: You can use the head command on either your local machine or EC2. If using your local
machine you will have to change data_2008.csv to the exact location of your file.
5. Now enter the Hive Command Line Interface by entering the command hive on your EC2
instance
6. To create an internal table use the below command
8. To load data into the internal table use the command below
load data local inpath 'location of your dataset' overwrite into table
flights_data;
9. To view the data in the table you can use the below command
select * from flights_data limit 10;
10. To create an external table you need to modify the create table statement as given
below
create external table flights_data_ext( `Year` int, `Month` int,
`DayofMonth` int, `DayOfWeek` int, `DepTime` int, `CRSDepTime` int,
`ArrTime` int, `CRSArrTime` int, `UniqueCarrier` string, `FlightNum`
string, `TailNum` string, `ActualElapsedTime` int, `CRSElapsedTime`
int, `AirTime` int, `ArrDelay` int, `DepDelay` int, `Origin` string,
`Dest` string, `Distance` bigint, `TaxiIn` int, `TaxiOut` int,
`Cancelled` int, `CancellationCode` string, `Diverted` int,
`CarrierDelay` string, `WeatherDelay` string, `NASDelay` string,
`SecurityDelay` string, `LateAircraftDelay` string)
row format delimited fields terminated by ','
location 's3a://yourS3bucketname/flights_data';