0% found this document useful (0 votes)

55 views

Exercise - 3 Submission - Group - 12

This document contains an exercise on data munging, cleaning, rankings and scores. It includes 3 parts: 1) analyzing a dataset of storage prices over time and making projections, 2) identifying potential outliers in datasets of student grades, salary data and lifespans, and 3) cleaning and analyzing a dataset of city population, GDP and life expectancy. The student provides detailed solutions and code snippets to model and visualize trends in the data.

Uploaded by

Mehmet Yalçın

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

Exercise - 3 Submission - Group - 12

Uploaded by

Mehmet Yalçın

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

exercise_3

May 12, 2023

1 Data Science I
1.1 Exercise 3: Data Munging, Data Cleaning, Rankings & Scores
<div style="position: absolute; top: -90px; right: 10px; padding: 5px; background-color: #ddd;
<span style="font-weight: bold;">Overall Points:       / 100</span
Submission Deadline: May 15 2023, 07:00 UTC
University of Oldenburg Summer 2023 Instructors: Maria Fernanda “MaFe” Davila Re-
strepo, Wolfram “Wolle” Wingerath
Submitted by: <Akalin,Alp| Bagdatli,Ilayda| Yalcin,Mehmet >

2 Part 1: Data Munging & Data Cleaning

<div style="position: absolute; top: -45px; right: 10px; padding: 5px; background-color: #ddd;
<span style="">      / 40</span>
</div>
1.) Find a table of storage prices over time (HDD/SDD).
a) How would you assess the quality of the data set you found? Do you need to do
any preparation before doing the analysis? (If so, what exactly did you do?)
Solution:
Based on the information available in the table, the data set seems to be reliable and trustworthy
as it contains clear and consistent data for both HDD and SSD prices over a significant period of
time.
Regarding data preparation before analysis, it would depend on the specific requirements of the
analysis and the data format. In this case, the data set is already in a structured tabular format
and appears to be clean and ready for analysis. However, if there were missing values, duplicates,
or inconsistencies in the data set, it would be necessary to clean and preprocess it before starting
the analysis.
Overall, it is always important to carefully examine the data set before starting any analysis, and
ensure that it is clean, complete, and relevant to the research question at hand.
b) Analyze this data and make a projection about the cost/volume of data storage five
years from now.

1
Solution:

[31]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt

# Load the data

file_path = "C:/Users/10126426/Desktop/storage prices over time (HDD.SDD).xlsx"
data = pd.read_excel(file_path)

# Set the 'Year' column as the index

data.set_index('Year', inplace=True)

# Display the first few rows of the data

print(data.head())

HDD Price (per GB) SSD Price (per GB)

Year
2000 11.05 NaN
2001 5.72 NaN
2002 2.70 NaN
2003 1.42 NaN
2004 0.83 NaN

[32]: import pandas as pd

import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Load the data from an xlsx file

df = pd.read_excel(file_path)

# Drop any rows with NaN values

df = df.dropna()

# Plot the data

plt.plot(df['Year'], df['HDD Price (per GB)'], label='HDD')
plt.plot(df['Year'], df['SSD Price (per GB)'], label='SSD')
plt.xlabel('Year')
plt.ylabel('Price (USD per GB)')
plt.title('Storage Prices Over Time')
plt.legend()
plt.show()

# Fit a linear regression model to the data

X = df['Year'].values.reshape(-1, 1)
y = df['SSD Price (per GB)'].values.reshape(-1, 1)
model = LinearRegression().fit(X, y)

2
# Make a projection for five years from now
future_year = [[2028]]
predicted_price = model.predict(future_year)[0][0]
print(f"The projected SSD price per GB in 2028 is: {predicted_price:.2f} USD")

The projected SSD price per GB in 2028 is: -2.04 USD

In this code, we load the data from an xlsx file using the pd.read_excel() method from the Pandas
library. Then, we plot the trend of HDD and SSD prices over time using Matplotlib.
Next, we fit a linear regression model to the SSD price data using Scikit-learn and make a projection
for the SSD price per GB five years from now (in 2028) based on the linear regression model.
c) What will disk prices be in 25 or 50 years?
Solution:

[33]: # Fit a linear regression model to the historical SSD data

X_ssd = df['Year'].values.reshape(-1, 1)
y_ssd = df['SSD Price (per GB)'].values.reshape(-1, 1)
model_ssd = LinearRegression().fit(X_ssd, y_ssd)

3
# Make predictions for the historical SSD data and future years
y_ssd_pred = model_ssd.predict(X_ssd)
future_years = np.arange(df['Year'].max(), df['Year'].max()+51).reshape(-1,1)
future_ssd_pred = model_ssd.predict(future_years)

# Plot the historical SSD data and the linear regression line
plt.scatter(X_ssd, y_ssd)
plt.plot(X_ssd, y_ssd_pred, color='red')
plt.title('Historical SSD Prices')
plt.xlabel('Year')
plt.ylabel('Price (per GB)')
plt.show()

# Plot the future SSD price predictions

plt.plot(future_years, future_ssd_pred, color='green')
plt.title('Projected SSD Prices')
plt.xlabel('Year')
plt.ylabel('Price (per GB)')
plt.show()

# Fit a linear regression model to the historical HDD data

X_hdd = df['Year'].values.reshape(-1, 1)
y_hdd = df['HDD Price (per GB)'].values.reshape(-1, 1)
model_hdd = LinearRegression().fit(X_hdd, y_hdd)

# Make predictions for the historical HDD data and future years
y_hdd_pred = model_hdd.predict(X_hdd)
future_hdd_pred = model_hdd.predict(future_years)

# Plot the historical HDD data and the linear regression line
plt.scatter(X_hdd, y_hdd)
plt.plot(X_hdd, y_hdd_pred, color='blue')
plt.title('Historical HDD Prices')
plt.xlabel('Year')
plt.ylabel('Price (per GB)')
plt.show()

# Plot the future HDD price predictions

plt.plot(future_years, future_hdd_pred, color='purple')
plt.title('Projected HDD Prices')
plt.xlabel('Year')
plt.ylabel('Price (per GB)')
plt.show()

4
5
6
7
We first load the data from an xlsx file, drop any rows with NaN values, and fit separate linea
Again, keep in mind that these projections are just estimates based on historical data and trends,
and many external factors can affect the actual prices in the future. The purpose of these visual-
izations is to provide a rough idea of how HDD and SSD prices might evolve in the future based
on historical data.
2.) What types of outliers might you expect to occur in the following data sets?
a) Student grades
Solution:
In a data set of student grades, we might expect to see outliers that are caused by factors such as:
1. Errors in data entry, where a student’s grade is recorded incorrectly due to a mistake in data
input.
2. Extreme performances by individual students, such as a student who performs exceptionally
well or poorly on a particular assessment.
3. Cheating or academic misconduct, where a student’s grade is artificially inflated or deflated
due to academic dishonesty.
4. Variations in teaching quality or assessment diﬀiculty, where a particular teacher or assessment
is significantly different from others in the data set.

8
5. External factors that impact student performance, such as illness or personal circumstances,
which may cause a student to perform unusually well or poorly compared to their peers.
It’s important to note that outliers in student grade data can be particularly sensitive, as they can
have significant consequences for individual students and their academic careers. Therefore, it’s
important to investigate any potential outliers and determine whether they are genuine or due to
errors or other factors.
b) Salary data
Solution:
In a data set of salary data, we might expect to see outliers that are caused by factors such as:
1. Executive salaries, which may be significantly higher than other salaries in the same company
or industry.
2. Extreme performances by individual employees, such as a highly successful salesperson or a
poorly performing employee with a high salary.
3. Seasonal or temporary employment, where an employee’s salary is higher or lower than usual
due to the nature of their employment.
4. Unusual bonuses or incentives, which may cause a particular employee’s salary to be higher
or lower than expected.
5. Data entry errors, where an employee’s salary is recorded incorrectly due to a mistake in data
input.
It’s important to note that outliers in salary data can have significant impacts on an organization’s
budget and financial planning, as well as the morale and motivation of employees. Therefore, it’s
important to investigate any potential outliers and determine whether they are genuine or due to
errors or other factors. Additionally, organizations may want to consider whether extreme salaries
are appropriate or sustainable in the long term.
c) Lifespans in Wikipedia
Solution:
1. Errors in data entry, where a person’s lifespan is recorded incorrectly due to a mistake in
data input.
2. Longevity records, where individuals who have lived exceptionally long lives are included in
the data set.
3. Unusual circumstances, such as accidents or illnesses, which may cause an individual’s lifespan
to be much shorter than expected.
4. Variations in lifespans across different cultures and historical periods, where certain individ-
uals may have lived much longer or shorter lives than others due to factors such as diet,
healthcare, or living conditions.
5. Individuals who have achieved significant accomplishments or notoriety, where their lifespan
is of interest to researchers or the general public.
It’s important to note that outliers in lifespan data can be particularly sensitive, as they can
have significant impacts on research and our understanding of longevity and aging. Therefore, it’s
important to investigate any potential outliers and determine whether they are genuine or due to
errors or other factors. Additionally, researchers may want to consider how variations in lifespans
across different cultures and historical periods may impact their analysis and conclusions.:

9
3 Part 2: Scores & Rankings
<div style="position: absolute; top: -45px; right: 10px; padding: 5px; background-color: #ddd;
<span style="">      / 60</span>
</div>
3.) Let X represent a random variable drawn from the normal distribution defined by
µ = 2 and � = 3. Suppose we observe X = 5.08.
Find the Z-score of x, and determine how many standard deviations away from the
mean that x is.
Solution:
To find the Z-score of X, we can use the formula:
Z = (X - µ) / �
where X is the observed value, µ is the mean, and � is the standard deviation. Substituting the
given values, we get:
Z = (5.08 - 2) / 3 Z = 1.0267
Therefore, the Z-score of X is 1.0267.
To determine how many standard deviations away from the mean X is, we can use the fact that
the Z-score measures the number of standard deviations away from the mean:
Z = (X - µ) / � X - µ = Z * �
Substituting the values of Z, µ, and �, we get:
X - 2 = 1.0267 * 3 X - 2 = 3.0801 X = 5.0801
Therefore, X is 1.0267 standard deviations away from the mean, or approximately 3.08 units above
the mean.
4.) What percentage of the standard normal distribution (µ = 0, � = 1) is found in
each region?
a) Z > 1.13
Solution:
To find the percentage of the standard normal distribution that is in the region Z > 1.13, we can
use a standard normal distribution table or a calculator with a normal distribution function.
Using a standard normal distribution table, we can find the area under the curve to the right of Z
= 1.13, which is 0.1292 or approximately 12.92%. This means that approximately 12.92% of the
standard normal distribution is in the region Z > 1.13.
Alternatively, we can use a calculator with a normal distribution function to find the same result.
For example, using the function 1 - normdist(1.13, 0, 1, TRUE) in Microsoft Excel, we get the
result 0.1292 or approximately 12.92%.
b) Z < 0.18
Solution:

10
To find the percentage of the standard normal distribution that is in the region Z < 0.18, we can
use a standard normal distribution table or a calculator with a normal distribution function.
Using a standard normal distribution table, we can find the area under the curve to the left of Z
= 0.18, which is 0.5714 or approximately 57.14%. This means that approximately 57.14% of the
standard normal distribution is in the region Z < 0.18.
Alternatively, we can use a calculator with a normal distribution function to find the same result.
For example, using the function normdist(0.18, 0, 1, TRUE) in Microsoft Excel, we get the
result 0.5714 or approximately 57.14%.:
c) Z > 8
Solution:
To find the percentage of the standard normal distribution that is in the region Z > 8, we can use
a standard normal distribution table or a calculator with a normal distribution function.
Using a standard normal distribution table, we can see that the area under the curve to the right
of Z = 8 is extremely small and close to 0. Therefore, we can approximate that the percentage of
the standard normal distribution in the region Z > 8 is effectively 0%.
Alternatively, we can use a calculator with a normal distribution function to confirm this result.
For example, using the function 1 - normdist(8, 0, 1, TRUE) in Microsoft Excel, we get the
result 7.99E-18, which is extremely close to 0. This confirms that the percentage of the standard
normal distribution in the region Z > 8 is effectively 0%.
d) |Z| < 0.5
Solution:
To find the percentage of the standard normal distribution that is in the region |Z| < 0.5, we can
use a standard normal distribution table or a calculator with a normal distribution function.
We know that the standard normal distribution is symmetrical around the mean of 0, so the area
to the left of Z = -0.5 is the same as the area to the right of Z = 0.5. Therefore, we can find the
area under the curve to the left of Z = 0.5 and double it to find the total area in the region |Z| <
0.5.
Using a standard normal distribution table, we can find the area under the curve to the left of Z
= 0.5, which is 0.6915 or approximately 69.15%. Doubling this value, we get the total area in the
region |Z| < 0.5, which is approximately 138.3%.
Alternatively, we can use a calculator with a normal distribution function to find the same result.
For example, using the function normdist(0.5, 0, 1, TRUE) in Microsoft Excel, we get the result
0.6915 or approximately 69.15%. Doubling this value, we get the total area in the region |Z| < 0.5,
which is approximately 138.3%.
However, it’s important to note that probabilities cannot exceed 100%, so this value is not a valid
probability and may indicate an error in the calculation. The correct percentage in this case would
be 69.15%.
5.) Amanda took the Graduate Record Examination (GRE), and scored 160 in verbal
reasoning and 157 in quantitative reasoning. The mean score for verbal reasoning was

11
151 with a standard deviation of 7, compared with mean µ = 153 and � = 7.67 for
quantitative reasoning. Assume that both distributions are normal.
a) What were Amanda’s Z-scores on these exam sections? Mark these scores on a
standard normal distribution curve.
Solution:
To find Amanda’s Z-scores for each section, we can use the formula:
Z = (X - µ) / �
where X is the observed score, µ is the mean, and � is the standard deviation.
For the verbal reasoning section, Amanda’s Z-score is:
Z_verbal = (160 - 151) / 7 = 1.29
For the quantitative reasoning section, her Z-score is:
Z_quantitative = (157 - 153) / 7.67 = 0.52
To mark these scores on a standard normal distribution curve, we can plot the Z-scores on the
x-axis. A Z-score of 1.29 is approximately 1.29 standard deviations above the mean, and a Z-score
of 0.52 is approximately 0.52 standard deviations above the mean.
b) Which section did she do better on, relative to other students?
Solution:
To compare Amanda’s performance relative to other students, we need to look at her percentile
ranks for each section. Percentile rank indicates the percentage of scores that fall below a given
score.
To find Amanda’s percentile rank for the verbal reasoning section, we can use a standard normal
distribution table or a calculator with a normal distribution function. Using a table, we can find
the area under the curve to the left of Amanda’s Z-score of 1.29, which is 0.9015 or approximately
90.15%. This means that Amanda scored higher than approximately 90.15% of other students who
took the verbal reasoning section.
To find Amanda’s percentile rank for the quantitative reasoning section, we can use the same
method. Using a calculator or table, we can find the area under the curve to the left of Amanda’s
Z-score of 0.52, which is 0.6985 or approximately 69.85%. This means that Amanda scored higher
than approximately 69.85% of other students who took the quantitative reasoning section.
Therefore, Amanda did better relative to other students in the verbal reasoning section, as her
percentile rank was higher (90.15%) compared to the percentile rank for the quantitative reasoning
section (69.85%).
c) Find her percentile scores for the two exams.
Solution:
To find Amanda’s percentile scores for the two exams, we can use the same method as before to find
the area under the curve to the left of her Z-scores. Then we can convert that area to a percentile
score using a standard normal distribution table or calculator.

12
For the verbal reasoning section, Amanda’s Z-score was 1.29. Using a standard normal distribution
table or calculator, we find that the area under the curve to the left of 1.29 is approximately 0.9015.
To convert this to a percentile score, we multiply by 100 to get:
Percentile score for verbal reasoning section = 0.9015 x 100 = 90.15%
This means that Amanda scored higher than approximately 90.15% of other students who took the
verbal reasoning section.
For the quantitative reasoning section, Amanda’s Z-score was 0.52. Using a standard normal distri-
bution table or calculator, we find that the area under the curve to the left of 0.52 is approximately
0.6985. To convert this to a percentile score, we multiply by 100 to get:
Percentile score for quantitative reasoning section = 0.6985 x 100 = 69.85%
This means that Amanda scored higher than approximately 69.85% of other students who took the
quantitative reasoning section.
6.) Identify three successful and well-used scoring functions in areas of personal in-
terest to you. For each, explain what makes it a good scoring function and how it is
used to create rankings in that domain.
Solution:
1. Search Engine Ranking Function: One of the most successful scoring functions is the PageR-
ank algorithm used by Google Search. This algorithm measures the relevance and authority
of a webpage by analyzing the number and quality of links pointing to it. The more links a
page has from other relevant and authoritative pages, the higher its PageRank score, and the
higher it appears in search engine results. This makes it a good scoring function because it can
effectively identify high-quality and relevant content on the web, and rank them accordingly
in search results.
2. Credit Score Function: A credit score is a numerical value that indicates a person’s credit-
worthiness and ability to pay back debts. It is used by lenders to evaluate the risk of lending
money to a borrower. The FICO score is one of the most widely used scoring functions for
credit evaluation. It considers factors such as payment history, credit utilization, length of
credit history, types of credit, and recent credit inquiries to calculate a score between 300
and 850. A higher score indicates a lower credit risk, making it a good scoring function for
lenders to assess the creditworthiness of an individual.
3. Sports Ranking Function: A commonly used scoring function in sports is the Elo rating
system. This system was originally developed for chess, but has been adopted by various
sports, including tennis, football, and basketball. It measures the relative skill level of players
or teams by analyzing their performance in previous matches and adjusting their ratings
based on the outcome of each match. A player or team’s rating increases when they win
against a stronger opponent, and decreases when they lose against a weaker opponent. This
makes it a good scoring function for ranking players or teams in a given sport, as it takes into
account their performance over time and the strength of their opponents.

13
4 Finally: Submission
Save your notebook and submit it (as both notebook and PDF file). And please don’t forget to
…
- … choose a file name according to convention (see Exercise Sheet 1) and to
- … include the execution output in your submission!

Car Driving Schhol Project Report
100% (1)
Car Driving Schhol Project Report
41 pages
Cache-Conscious Algorithms and Data Structures
No ratings yet
Cache-Conscious Algorithms and Data Structures
47 pages
Mapping Designs To Code: Larman, Chapter 20 CSE432 Object Oriented Software Engineering
No ratings yet
Mapping Designs To Code: Larman, Chapter 20 CSE432 Object Oriented Software Engineering
8 pages
Fill in The Blanks: Is A Physical or Conceptual Connection Between Objects
No ratings yet
Fill in The Blanks: Is A Physical or Conceptual Connection Between Objects
3 pages
OOAD Question Bank
100% (2)
OOAD Question Bank
5 pages
CSE 330 My Exam Cheat Sheet PDF
No ratings yet
CSE 330 My Exam Cheat Sheet PDF
2 pages
2023 Winter Question Paper (Msbte Study Resources)
No ratings yet
2023 Winter Question Paper (Msbte Study Resources)
2 pages
Introduction To Os
No ratings yet
Introduction To Os
34 pages
Unit II: Software Requirement Analysis and Specifications
No ratings yet
Unit II: Software Requirement Analysis and Specifications
64 pages
Unit-II BDA
No ratings yet
Unit-II BDA
19 pages
2012 IN4392 Lecture-5 CloudProgrammingModels
100% (1)
2012 IN4392 Lecture-5 CloudProgrammingModels
95 pages
cs401 PDF
33% (3)
cs401 PDF
55 pages
Se Module 2 PPT
No ratings yet
Se Module 2 PPT
86 pages
University of Management and Technology Lahore: (SPRING 2021)
No ratings yet
University of Management and Technology Lahore: (SPRING 2021)
4 pages
CPU Scheduling
No ratings yet
CPU Scheduling
48 pages
Component Technology Notes 2 PDF
No ratings yet
Component Technology Notes 2 PDF
26 pages
Fundamentals of Algorithmic Problem Solving: B.B. Karki, LSU 2.1 CSC 3102
No ratings yet
Fundamentals of Algorithmic Problem Solving: B.B. Karki, LSU 2.1 CSC 3102
4 pages
Lab Manual No 03
No ratings yet
Lab Manual No 03
29 pages
Assignment 1 Answer
0% (1)
Assignment 1 Answer
11 pages
Unit - V Implementation, Testing & Maintenance
No ratings yet
Unit - V Implementation, Testing & Maintenance
60 pages
PPS - Unit 1
No ratings yet
PPS - Unit 1
69 pages
Note On Operating System and Kernel
No ratings yet
Note On Operating System and Kernel
3 pages
Unit 1 - Exercise - Solution
No ratings yet
Unit 1 - Exercise - Solution
8 pages
OS PYQs
No ratings yet
OS PYQs
23 pages
Chapter 6 AJAX
No ratings yet
Chapter 6 AJAX
9 pages
Practical - Image Editing Tool 1 PDF
0% (1)
Practical - Image Editing Tool 1 PDF
1 page
TE7265 - Introduction To Data Science
No ratings yet
TE7265 - Introduction To Data Science
4 pages
Prog 1: Write C++ Programs To Implement The Stack ADT Using An Array
No ratings yet
Prog 1: Write C++ Programs To Implement The Stack ADT Using An Array
47 pages
Set 2
No ratings yet
Set 2
8 pages
Operating System: Operating Systems: Internals and Design Principles
No ratings yet
Operating System: Operating Systems: Internals and Design Principles
81 pages
Java Lab Record
No ratings yet
Java Lab Record
110 pages
Deadlock Exercise & Answer
No ratings yet
Deadlock Exercise & Answer
3 pages
DSA Course Outline 20022024 042844pm
100% (1)
DSA Course Outline 20022024 042844pm
5 pages
PST UnitV
No ratings yet
PST UnitV
14 pages
Multithreading and Answers MCQ Java
No ratings yet
Multithreading and Answers MCQ Java
5 pages
Object Oriented Programming
No ratings yet
Object Oriented Programming
18 pages
Important Questions of SE: Chapter 1:-Introduction To Software and Software Engineering
No ratings yet
Important Questions of SE: Chapter 1:-Introduction To Software and Software Engineering
4 pages
Unit I Fundamentals of Computer Design and Ilp-1-14
No ratings yet
Unit I Fundamentals of Computer Design and Ilp-1-14
14 pages
DAA-2020-21 Final Updated Course File
No ratings yet
DAA-2020-21 Final Updated Course File
49 pages
WIT Important Questions-1
No ratings yet
WIT Important Questions-1
7 pages
Kernel I/O Subsystem in Operating System
No ratings yet
Kernel I/O Subsystem in Operating System
2 pages
Cloud Computing Unit 1
No ratings yet
Cloud Computing Unit 1
23 pages
Model Driven Engineering (MDE) : ITC-708 by Dr. Mir Sajjad Hussain Talpur Dated: 08-2-2021
0% (1)
Model Driven Engineering (MDE) : ITC-708 by Dr. Mir Sajjad Hussain Talpur Dated: 08-2-2021
17 pages
Ge2112 Fundamentals of Computing and Programming: Introduction To Computers
No ratings yet
Ge2112 Fundamentals of Computing and Programming: Introduction To Computers
25 pages
CS217 - Object-Oriented Programming (OOP) Assignment # 1: Carefully Read The Following Instructions!
No ratings yet
CS217 - Object-Oriented Programming (OOP) Assignment # 1: Carefully Read The Following Instructions!
2 pages
Unit 1 2
No ratings yet
Unit 1 2
44 pages
Indexer
No ratings yet
Indexer
11 pages
IGNOU MCS-011 Previous Years Questions
No ratings yet
IGNOU MCS-011 Previous Years Questions
64 pages
Software Engineering Lab File
No ratings yet
Software Engineering Lab File
26 pages
CN and WP Lab Manual
No ratings yet
CN and WP Lab Manual
101 pages
CS302 Unit1-III
No ratings yet
CS302 Unit1-III
18 pages
Evolution of Programming Methodologies and Consepts of Oop
100% (1)
Evolution of Programming Methodologies and Consepts of Oop
48 pages
Demand Paging
No ratings yet
Demand Paging
3 pages
Files in Python
No ratings yet
Files in Python
12 pages
ACP Question Bank
No ratings yet
ACP Question Bank
5 pages
Model Question Paper
No ratings yet
Model Question Paper
2 pages
Cse-IV-unix and Shell Programming (10cs44) - Notes
No ratings yet
Cse-IV-unix and Shell Programming (10cs44) - Notes
161 pages
Chapter 1 Graphical User Interface (GUI)
No ratings yet
Chapter 1 Graphical User Interface (GUI)
53 pages
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
From Everand
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
Kameron Hussain
No ratings yet
Some Exercises
No ratings yet
Some Exercises
9 pages
Chapter 12 Measures of Central Tendency Part 1
No ratings yet
Chapter 12 Measures of Central Tendency Part 1
8 pages
TAPPI/ANSI T 1200 sp-14: 1. Scope
No ratings yet
TAPPI/ANSI T 1200 sp-14: 1. Scope
16 pages
Carbon Black-Evaluation of Standard Reference Blacks
No ratings yet
Carbon Black-Evaluation of Standard Reference Blacks
3 pages
TP BF-STB: Technical Testing Regulations For Soil and Rock in Road Construction
No ratings yet
TP BF-STB: Technical Testing Regulations For Soil and Rock in Road Construction
34 pages
Distribution Analyzer User Guide
No ratings yet
Distribution Analyzer User Guide
132 pages
Ramachandran Score
No ratings yet
Ramachandran Score
13 pages
IoGAS Quick Start Tutorial1
100% (2)
IoGAS Quick Start Tutorial1
42 pages
Data Analytics Certification Program Learnbay
No ratings yet
Data Analytics Certification Program Learnbay
36 pages
CHAPTER (2) Statistical Treatment of Experimental Data
100% (1)
CHAPTER (2) Statistical Treatment of Experimental Data
30 pages
AP Statistics Free-Response Practice Test 2 One-Variable Data Analysis
No ratings yet
AP Statistics Free-Response Practice Test 2 One-Variable Data Analysis
2 pages
Lecture-8 Outlier Detection
No ratings yet
Lecture-8 Outlier Detection
72 pages
Bhabav 6
No ratings yet
Bhabav 6
12 pages
Uncertainty Analysis PDF
No ratings yet
Uncertainty Analysis PDF
17 pages
SB Mid Term
No ratings yet
SB Mid Term
83 pages
Neural Networks For The Identification and Control of Blast Furnace Hot Metal Quality
No ratings yet
Neural Networks For The Identification and Control of Blast Furnace Hot Metal Quality
16 pages
Data Wrangling Python - Suwarti
No ratings yet
Data Wrangling Python - Suwarti
12 pages
Statistics Notes
No ratings yet
Statistics Notes
17 pages
Velocity Petrel
100% (1)
Velocity Petrel
24 pages
Kuiper Ch03
No ratings yet
Kuiper Ch03
35 pages
Data Mining With Weka Heart Disease Dataset: 1 Problem Description
No ratings yet
Data Mining With Weka Heart Disease Dataset: 1 Problem Description
4 pages
ISO Stability Study Guidance Protocol RDI-731-031
100% (2)
ISO Stability Study Guidance Protocol RDI-731-031
49 pages
Organizing Visualizing and Describing Data
No ratings yet
Organizing Visualizing and Describing Data
35 pages
The Amylose Project
No ratings yet
The Amylose Project
46 pages
Short Circuit Current Estimation Using PMU Measurements During Normal Load Variation
No ratings yet
Short Circuit Current Estimation Using PMU Measurements During Normal Load Variation
5 pages
Ai-Enabled Blockchain: An Outlier-Aware Consensus Protocol For Blockchain-Based Iot Networks
No ratings yet
Ai-Enabled Blockchain: An Outlier-Aware Consensus Protocol For Blockchain-Based Iot Networks
6 pages
Game Based Adaptive Anomaly Detection in Wireless Body Ar 2019 Computer Netw
No ratings yet
Game Based Adaptive Anomaly Detection in Wireless Body Ar 2019 Computer Netw
14 pages
Map The Data: Classifying Numerical Fields For Graduated Symbols
No ratings yet
Map The Data: Classifying Numerical Fields For Graduated Symbols
7 pages
Predicting Sales of Rossman Stores: Machine Learning Engineer Nanodegree
No ratings yet
Predicting Sales of Rossman Stores: Machine Learning Engineer Nanodegree
23 pages
Ratcliff - Methods for Dealing With Reaction Time Outliers
No ratings yet
Ratcliff - Methods for Dealing With Reaction Time Outliers
23 pages
Archie's Dream-Petrophysics From Sidewall Samples and Cuttings
No ratings yet
Archie's Dream-Petrophysics From Sidewall Samples and Cuttings
10 pages

Uploaded by

Uploaded by

exercise_3

May 12, 2023

2 Part 1: Data Munging & Data Cleaning

[31]: import pandas as pd

# Load the data

# Set the 'Year' column as the index

# Display the first few rows of the data

HDD Price (per GB) SSD Price (per GB)

[32]: import pandas as pd

# Load the data from an xlsx file

# Drop any rows with NaN values

# Plot the data

# Fit a linear regression model to the data

The projected SSD price per GB in 2028 is: -2.04 USD

[33]: # Fit a linear regression model to the historical SSD data

# Plot the future SSD price predictions

# Fit a linear regression model to the historical HDD data

# Plot the future HDD price predictions

You might also like