0% found this document useful (0 votes)
23 views12 pages

SQL Assignment Aggregation Solution

Uploaded by

ng.huy0708
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views12 pages

SQL Assignment Aggregation Solution

Uploaded by

ng.huy0708
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

ASSIGNMENT SOLUTION:

AGGREGATION
SOLUTION: AGGREGATION QUESTIONS

Below is the solution for the SQL Aggregation Assignment. For visualization questions, you can
use anything you want, the preferred way is to use Mode’s Dashboard. After finishing the
assignment, you can create a complete Exploratory Analysis Dashboard (pdf attachment) to put
in your portfolio or share with your employers or your friends.

1. CRUNCHBASE_COMPANIES
Answer the following questions using tutorial.crunchbase_companies table

1. How many records in this table?


• Number of records: 27,327
SELECT COUNT(*) AS cnt
FROM tutorial.crunchbase_companies

2. Write a query that determines counts of every single column. Which column has the most
null values? Can you check the number of NULL records in that column?
• Column has the most NULL values: last_milestone_at
SELECT
COUNT(permalink) as permalink,
COUNT(name) AS name,
COUNT(homepage_url) AS homepage_url,
COUNT(category_code) AS category_code,
COUNT(funding_total_usd) AS funding_total_usd,
COUNT(status) AS status,
COUNT(country_code) AS country_code,
COUNT(state_code) AS state_code,
COUNT(region) AS region,
COUNT(city) AS city,
COUNT(funding_rounds) AS funding_rounds,
COUNT(founded_at) AS founded_at,
COUNT(founded_month) AS founded_month,
COUNT(founded_quarter) AS founded_quarter,
COUNT(founded_year) AS founded_year,
COUNT(first_funding_at) AS first_funding_at,
COUNT(last_funding_at) AS last_funding_at,

© 2021, Tuan Vu. All rights reserved.


COUNT(last_milestone_at) AS last_milestone_at
FROM tutorial.crunchbase_companies

• Check the number of NULL records in that column: 12,273 records


SELECT
COUNT(*) AS cnt
FROM tutorial.crunchbase_companies
WHERE last_milestone_at IS NULL

3. What is the minimum, maximum, total, and average funding total amount in USD? Find
out which company has the highest funding amount.
SELECT
MIN(funding_total_usd) AS min_funding,
MAX(funding_total_usd) AS max_funding,
SUM(funding_total_usd) AS total_funding,
AVG(funding_total_usd) AS avg_funding
FROM tutorial.crunchbase_companies

• Company has the highest funding amount: Clearwire


SELECT
*
FROM tutorial.crunchbase_companies
WHERE funding_total_usd = 5700000000

4. Each company is categorized with a category code. Can you find the number of
companies, total funding per category code? Which category has the highest total funding
amount?
• Query to find the number of companies, total funding per category code
SELECT
category_code,
COUNT(*) AS num_companies,
SUM(funding_total_usd) AS total_funding
FROM tutorial.crunchbase_companies
GROUP BY 1
ORDER BY 3 DESC

© 2021, Tuan Vu. All rights reserved. 2


• Biotech is the category that has the highest total funding amount.
• Visualization: Plot a chart to visualize the total funding amount from different
category codes.

5. What is the average funding amount for each category code? Is the top 10 highest
average funding category code the same as the top 10 category code in the previous
questions? Why do you think that is?
• This is a tricky question, we cannot use the AVG function to get the average of
the funding amount since this aggregate function ignores the NULL values. If you
want to treat the NULLs value as zero, you will need to use both SUM and

© 2021, Tuan Vu. All rights reserved. 3


COUNT to calculate the average - the sum of all of the values in the column
divided by the number of values in a column.
• The top category with the highest average funding per company is automotive.
The top 10 category based on average funding amount per company is different
from the top 10 category based on total funding amount. This makes sense since
these category like automotive has smaller number of company and the require
funding amount is larger per company in average.
SELECT
category_code,
COUNT(*) AS num_companies,
SUM(funding_total_usd) AS total_funding,
SUM(funding_total_usd)/COUNT(*) AS avg_funding
FROM tutorial.crunchbase_companies
GROUP BY 1
ORDER BY 4 DESC

• Visualization: Plot a chart to visualize the average funding amount from different
category codes.

© 2021, Tuan Vu. All rights reserved. 4


• If you dig deeper in the "automotive" companies, you will find an interesting
insight that the top 2 automotive companies: Fisker and Tesla, both produce
electric vehicles, have the majority of the funding in this category, which skew
the average funding amount in this category.
SELECT
*
FROM tutorial.crunchbase_companies
WHERE category_code = 'automotive'
AND funding_total_usd IS NOT NULL
ORDER BY funding_total_usd DESC

• Visualization: Plot a chart to show the percentage of funding amount by different


companies in the top category

© 2021, Tuan Vu. All rights reserved. 5


6. Show the different number of companies by status between 2 states "CA" and "NY”.
SELECT
state_code,
status,
COUNT(*) AS num_companies
FROM tutorial.crunchbase_companies
WHERE state_code IN ('CA', 'NY')
GROUP BY 1,2

• Visualization: Plot a chart to visualize the difference between the 2 states

© 2021, Tuan Vu. All rights reserved. 6


2. CRUNCHBASE_ACQUISITIONS
Answer the following questions using tutorial.crunchbase_acquisitions table

7. Write a query that determines counts of every single column. Which column has the most
null values? Can you check the number of NULL records in that column?
• Column has the most NULL values: price_amount
SELECT
COUNT(company_permalink) as company_permalink,
COUNT(company_name) AS company_name,
COUNT(company_category_code) AS company_category_code,
COUNT(company_country_code) AS company_country_code,
COUNT(company_state_code) AS company_state_code,
COUNT(company_region) AS company_region,
COUNT(company_city) AS company_city,
COUNT(acquirer_permalink) AS acquirer_permalink,
COUNT(acquirer_category_code) AS acquirer_category_code,
COUNT(acquirer_country_code) AS acquirer_country_code,
COUNT(acquirer_state_code) AS acquirer_state_code,
COUNT(acquirer_region) AS acquirer_region,
COUNT(acquirer_city) AS acquirer_city,
COUNT(acquired_month) AS acquired_month,
COUNT(acquired_quarter) AS acquired_quarter,
COUNT(acquired_year) AS acquired_year,
COUNT(price_amount) AS price_amount,
COUNT(price_currency_code) AS price_currency_code
FROM tutorial.crunchbase_acquisitions

• Number of NULL records in the price_amount column: 5372


SELECT
COUNT(*) AS cnt
FROM tutorial.crunchbase_acquisitions
WHERE price_amount IS NULL

8. Show all the different price_currency_code in this dataset.


SELECT
price_currency_code,
COUNT(*) AS cnt
FROM tutorial.crunchbase_acquisitions
GROUP BY 1

© 2021, Tuan Vu. All rights reserved. 7


9. What is the minimum, maximum, total, and average funding total amount in USD? Find
out which company has the lowest funding amount.
SELECT
MIN(price_amount) AS min_amount,
MAX(price_amount) AS max_amount,
SUM(price_amount) AS total_amount,
AVG(price_amount) AS avg_amount
FROM tutorial.crunchbase_acquisitions
WHERE price_currency_code = 'USD'

-- Find which company has the lowest funding amount


SELECT
*
FROM tutorial.crunchbase_acquisitions
WHERE price_currency_code = 'USD'
AND price_amount = 1

10. For each acquirer, find the number of companies they acquire and the total amount they
spend.
SELECT
acquirer_name,
COUNT(DISTINCT company_permalink) AS num_companies,

© 2021, Tuan Vu. All rights reserved. 8


SUM(price_amount) AS price_amount
FROM tutorial.crunchbase_acquisitions
WHERE price_currency_code = 'USD'
GROUP BY 1
ORDER BY 2 desc

• Visualization: Plot a chart to visualize the top 20 acquirers by number of acquired


companies.

11. What is the number of acquired companies and total acquiring amount each year?
SELECT
acquired_year,
COUNT(DISTINCT company_permalink) AS num_companies,
SUM(price_amount) AS price_amount
FROM tutorial.crunchbase_acquisitions

© 2021, Tuan Vu. All rights reserved. 9


WHERE price_currency_code = 'USD'
GROUP BY 1
ORDER BY 1

• Visualization: Plot a chart to visualize the number of acquired companies every


year. Does the trend look good? Which year have the highest amount of
acquisitions?
o There's a sharp drop in 2014, probably due to missing data issue.
o 2011 is the year that has the highest amount of acquisitions.

© 2021, Tuan Vu. All rights reserved. 10


12. What is the number of acquired companies by category code each year? (Hint: exclude
the NULL values)
SELECT
acquired_year,
company_category_code,
COUNT(DISTINCT company_permalink) AS num_companies,
SUM(price_amount) AS price_amount
FROM tutorial.crunchbase_acquisitions
WHERE price_currency_code = 'USD'
AND company_category_code IS NOT NULL
AND price_amount IS NOT NULL
GROUP BY 1,2

• Visualization: Plot a chart to visualize the number of acquired companies by


company category code every year. Can you find any insights from this graph?
(Hint: look at the peak)
• Insight: In the previous question, 2011 is the year with the highest total number
of acquisitions. But when we add another dimension, "company_category_code",
we see that in 2008, when the financial crisis happened, the total number of
acquired software company reached its peak as 82 companies. While in 2011, that
number drops to only 23 companies.

© 2021, Tuan Vu. All rights reserved. 11


© 2021, Tuan Vu. All rights reserved. 12

You might also like