0% found this document useful (0 votes)

55 views

AMCCATALAN DS Python Summative

This document provides instructions for analyzing a Pokemon dataset using Python. It includes importing necessary libraries, reading in the CSV file, exploring the data shape and head, creating cross tabulations to examine relationships between variables, plotting bar graphs and histograms, binning a variable, and obtaining the computer's hostname and timestamp. The overall goal is to analyze the Pokemon dataset using various pandas and matplotlib methods and print the notebook as a PDF.

Uploaded by

lastna put4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

AMCCATALAN DS Python Summative

Uploaded by

lastna put4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

CS170 - Introduction to Data Science (Jupyter

Notebook)
Instructions
Answer each line item by replacing the blanks with the necessary operator or a value. Make sure
the kernel is set to Python 3** Once done, right click the actual notebook page and print as PDF.
Last part of the notebook is the code for timestamp from your computer - Run it!.

#Import the necesssary library such as pandas and matplotlib

import pandas as pd
import matplotlib.pyplot as plt

#read the dataset

pokemon = pd.read_csv('AMCCATALAN - pokemon.csv')

pokemon.shape
#get the shape of the dataset

(801, 8)

pokemon.head(10)
#complete the syntax to disply the first 10 rows of the record.

pokedex_num sp_attack sp_defense p_speed p_generation

is_legendary \
0 1 43 135 105 1
0
1 2 58 196 24 1
0
2 3 8 77 199 1
0
3 4 73 20 69 1
0
4 5 11 143 193 1
0
5 6 124 174 112 1
0
6 7 172 91 56 1
0
7 8 109 62 75 1
0
8 9 11 3 76 1
0
9 10 25 15 16 1
0
p_published p_stamina
0 YES 10
1 YES 5
2 NO 5
3 YES 1
4 NO 3
5 NO 1
6 NO 5
7 YES 9
8 YES 5
9 NO 3

# complete the syntex by creating a crosstab of the record based on

stamina and generation
crosstab_01 = pd.crosstab(pokemon['p_stamina'],
pokemon['p_generation'])

# plot a bar graph (frequency), make sure it is stacked

crosstab_01.plot(kind='bar', stacked=True, title='Pokemon stamina by
generation')

<Axes: title={'center': 'Pokemon stamina by generation'},

xlabel='p_stamina'>
# create crosstab by using div and sum command.
crosstab_norm = crosstab_01.div(crosstab_01.sum(1), axis=0)

# plot a normalized bar type crosstab data with proportions

crosstab_norm.plot(kind='bar', stacked=True, title='Pokemon stamina by
generation')

<Axes: title={'center': 'Pokemon stamina by generation'},

xlabel='p_stamina'>

# create a contingency table showing the generation and legendary

crosstab_02 = pd.crosstab(pokemon['p_generation'],
pokemon['is_legendary'])
pokemon.head(7)

pokedex_num sp_attack sp_defense p_speed p_generation

is_legendary \
0 1 43 135 105 1
0
1 2 58 196 24 1
0
2 3 8 77 199 1
0
3 4 73 20 69 1
0
4 5 11 143 193 1
0
5 6 124 174 112 1
0
6 7 172 91 56 1
0

p_published p_stamina
0 YES 10
1 YES 5
2 NO 5
3 YES 1
4 NO 3
5 NO 1
6 NO 5

# create a contingency table showing the generation and legendary by

its percentage
round(crosstab_02.div(crosstab_02.sum(0), axis = 1)*100 , 1 )

is_legendary 0 1
p_generation
1 20.0 7.1
2 12.9 8.6
3 17.1 14.3
4 12.9 18.6
5 19.6 18.6
6 9.0 8.6
7 8.6 24.3

# import required package second task

import numpy as np
import matplotlib.pyplot as plt

# then using the percentage data, create a subset for each element of
the overlay
# is legendary overlay generation
pok_y = pokemon[pokemon.is_legendary == 0]['p_generation']
pok_n = pokemon[pokemon.is_legendary == 1]['p_generation']

# now create a histogram based on the two subsets, 7 bins

plt.hist([pok_y, pok_n], bins = 7 , stacked = True)

plt.legend(['Not Legendary = 0', 'Legendary = 1'])
plt.title('Histogram of Legendary Pokemon Overlay')
plt.xlabel('Generation'); plt.ylabel('Frequency'); plt.show ()
# save the output from the non-normalized plot into variables
(n, bins, patches) = plt.hist([pok_y , pok_n ], bins = 7 , stacked =
True)
# create a table and combine the height of the variables into single
array
n_table = np.column_stack((n[0], n[1]))

# divide each row by the sum of that row

# no revisions on this
n_norm = n_table / n_table.sum(axis=1)[:, None]

# determin upper and lower bounds of each bin (use the number of bins)

ourbins = np.column_stack((bins[0:7], bins[1:8]))

# construct normalized plot plt.bar p1 and p2

p1 = plt.bar(x=ourbins[:, 0], height=n_norm[:, 0], width=ourbins[:, 1]
- ourbins[:, 0], label='Legendary = Yes')
p2 = plt.bar(x=ourbins[:, 0], height=n_norm[:, 1], width=ourbins[:, 1]
- ourbins[:, 0], bottom=n_norm[:, 0], label='Legendary = No')

#plot the table

plt.legend(['Legendary = Yes', 'Legendary = No'])
plt.title('Normalized Histogram of Pokemon with Response Overlay')
plt.xlabel('Generation'); plt.ylabel('Proportion'); plt.show()
# use the cut function in Pandas to create the bins based on pokemon
attack
# should be: Under 50, 50 to 75, 75 to 100, and over 100

pokemon['VAR'] = pd.cut(x=pokemon['sp_attack'], bins=[-float("inf"),

50, 75, 100, float("inf")] ,
labels=["Under 50", "50 to 75", "75 to 100", "Over 100"], right =
True)

# create contingency table based on its type (legendary and non-

legendary) and if published on not
crosstab_02 = pd.crosstab(pokemon['VAR'], pokemon['is_legendary'])
crosstab_02.head(4)

is_legendary 0 1
VAR
Under 50 200 17
50 to 75 89 5
75 to 100 95 5
Over 100 347 43

# crete a contingency table based on percentage

round(crosstab_02.div(crosstab_02.sum(0), axis = 1)*100 , 1 )
is_legendary 0 1
VAR
Under 50 27.4 24.3
50 to 75 12.2 7.1
75 to 100 13.0 7.1
Over 100 47.5 61.4

# then plot a binned bar graph of the crosstab data based on VAR
(frequency)
crosstab_02.plot(kind='bar', stacked=True, title='Bar Graph of VAR
(Binned / Frequency) with Response Overlay')

<Axes: title={'center': 'Bar Graph of VAR (Binned / Frequency) with

Response Overlay'}, xlabel='VAR'>

crosstab_02 = pd.crosstab(pokemon['VAR'], pokemon['p_published'])

crosstab_02_norm = crosstab_02.div(crosstab_02.sum(0), axis=1)

# then plot a binned bar graph of the crosstab data based on VAR
(normalized)
crosstab_02_norm.plot (kind='bar', stacked = True, title = 'Bar Graph
of VAR (Binned / Proportion) with Response Overlay')

<Axes: title={'center': 'Bar Graph of VAR (Binned / Proportion) with

Response Overlay'}, xlabel='VAR'>

import datetime
import socket
def get_Host_name_IP():
try:
host_name = socket.gethostname()
host_ip = socket.gethostbyname(host_name)
print("Hostname-7:",host_name)
print("IP Address:",host_ip)
except:
print("No visible IP Address")
get_Host_name_IP()
now = datetime.datetime.now()
print ("Time Stamp:", now.strftime("%Y-%m-%d %H:%M:%S"))
Hostname-7: AMCCATALAN
IP Address: 192.168.0.115
Time Stamp: 2023-09-16 18:21:21

How To Hack A Credit Card
67% (6)
How To Hack A Credit Card
5 pages
IRemoval Pro Premium Guide Latest V3.1.1
No ratings yet
IRemoval Pro Premium Guide Latest V3.1.1
4 pages
Solutions
50% (4)
Solutions
178 pages
Cost Proposal Template
No ratings yet
Cost Proposal Template
22 pages
23111462_unit3
No ratings yet
23111462_unit3
7 pages
05 Numpy
No ratings yet
05 Numpy
2 pages
KNN - Jupyter Notebook (1)
No ratings yet
KNN - Jupyter Notebook (1)
7 pages
Diabetes - Prediction - Project - Ipynb - Colab
No ratings yet
Diabetes - Prediction - Project - Ipynb - Colab
11 pages
Ml1.ipynb - Colaboratory
No ratings yet
Ml1.ipynb - Colaboratory
5 pages
Student - Linear Regression Example - Colaboratory
No ratings yet
Student - Linear Regression Example - Colaboratory
6 pages
1 Simple Linear Regression
No ratings yet
1 Simple Linear Regression
9 pages
Python Solution
No ratings yet
Python Solution
30 pages
Reliability & Maintainability Engineering Ebeling Chapter 12 Book Solutions - Data Collection ..
No ratings yet
Reliability & Maintainability Engineering Ebeling Chapter 12 Book Solutions - Data Collection ..
15 pages
Heart Disease Prediction! ❤️?
No ratings yet
Heart Disease Prediction! ❤️?
52 pages
Correlation: Import As Import As Import As Import As From Import From Import Import Matplotlib Import
No ratings yet
Correlation: Import As Import As Import As Import As From Import From Import Import Matplotlib Import
1 page
ML Lab Exp 7 K-Means Clustering
No ratings yet
ML Lab Exp 7 K-Means Clustering
14 pages
labpg3.ipynb - Colab
No ratings yet
labpg3.ipynb - Colab
2 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
Shailesh020902@gmail - Com 1
No ratings yet
Shailesh020902@gmail - Com 1
1 page
Eidd S8 TD1
No ratings yet
Eidd S8 TD1
3 pages
Week 13 1-Pandas
No ratings yet
Week 13 1-Pandas
10 pages
Heart Disease Prediction (1) (1) - 1
No ratings yet
Heart Disease Prediction (1) (1) - 1
1 page
Experiment 8 Heirarchical Clustering
No ratings yet
Experiment 8 Heirarchical Clustering
17 pages
Numpy and Pandas
No ratings yet
Numpy and Pandas
11 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
Data Pre Processing 1
No ratings yet
Data Pre Processing 1
35 pages
Assignment5 VidulGarg
No ratings yet
Assignment5 VidulGarg
12 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
38 pages
Maghda Zakiyah Muthi'Ah - Colab
No ratings yet
Maghda Zakiyah Muthi'Ah - Colab
4 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
chapter3_11
No ratings yet
chapter3_11
58 pages
Advanced Statistics Problems (New) 1
No ratings yet
Advanced Statistics Problems (New) 1
5 pages
vertopal.com_lec 16 pandas_continue
No ratings yet
vertopal.com_lec 16 pandas_continue
17 pages
PRACTICAL_1
No ratings yet
PRACTICAL_1
26 pages
XIME-QT-1 Assignment-II
No ratings yet
XIME-QT-1 Assignment-II
2 pages
AD3411 (2)
No ratings yet
AD3411 (2)
28 pages
Data Science & Analytics Lab Manual
No ratings yet
Data Science & Analytics Lab Manual
39 pages
Sampling Distribution 556 G
No ratings yet
Sampling Distribution 556 G
2 pages
PROGRAMACION
No ratings yet
PROGRAMACION
3 pages
Fuzzy Set
No ratings yet
Fuzzy Set
20 pages
Data Science
No ratings yet
Data Science
21 pages
ML Practice Assignment
No ratings yet
ML Practice Assignment
7 pages
dISCRIPTIVE 6707
No ratings yet
dISCRIPTIVE 6707
39 pages
chapter3
No ratings yet
chapter3
58 pages
Keeraiit 2
No ratings yet
Keeraiit 2
19 pages
Numpy For Data Science
No ratings yet
Numpy For Data Science
94 pages
DSA_1
No ratings yet
DSA_1
8 pages
PRACTICE1
No ratings yet
PRACTICE1
7 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
183 pages
Computational Sem 5
No ratings yet
Computational Sem 5
20 pages
Data Science Practical Problems
No ratings yet
Data Science Practical Problems
40 pages
Predicting heart disease using ML
No ratings yet
Predicting heart disease using ML
57 pages
NguyenCongSang ITITIU20292 Lab2
No ratings yet
NguyenCongSang ITITIU20292 Lab2
13 pages
CISC 504 Assignment 5 - O
No ratings yet
CISC 504 Assignment 5 - O
7 pages
DOC%201728741951381
No ratings yet
DOC%201728741951381
19 pages
Data Science Practical No 03
No ratings yet
Data Science Practical No 03
5 pages
FDA_BATCH2PROGRAM
No ratings yet
FDA_BATCH2PROGRAM
18 pages
EXP - 7- Prasham Doshi - 22bec097
No ratings yet
EXP - 7- Prasham Doshi - 22bec097
7 pages
QB Stat
No ratings yet
QB Stat
7 pages
Heart Diseases EDA
No ratings yet
Heart Diseases EDA
1 page
Multiplication Tables and Flashcards: Times Tables for Children
From Everand
Multiplication Tables and Flashcards: Times Tables for Children
Jack Goldstein
4/5 (1)
Rylie & Kylie Adventures: The Missing Tickets
From Everand
Rylie & Kylie Adventures: The Missing Tickets
Monique Scarver
No ratings yet
Times Tables
From Everand
Times Tables
Darrell Butters
No ratings yet
Catalan - Andrea - Sa 1.1
No ratings yet
Catalan - Andrea - Sa 1.1
3 pages
PDF 2
No ratings yet
PDF 2
4 pages
PDF 10
No ratings yet
PDF 10
5 pages
Amccatalan - Fa2.2
No ratings yet
Amccatalan - Fa2.2
3 pages
TuringMachine Assignment Answer
No ratings yet
TuringMachine Assignment Answer
2 pages
Hackers Community Obooko Ref0014
No ratings yet
Hackers Community Obooko Ref0014
80 pages
Cheat Sheets v1.0 PDF
No ratings yet
Cheat Sheets v1.0 PDF
1 page
Using Wildcards in MS Word
No ratings yet
Using Wildcards in MS Word
2 pages
Data Chat HD & Resolve Reason DCT-ITSupp 02-Jan-2025(1)(1)
No ratings yet
Data Chat HD & Resolve Reason DCT-ITSupp 02-Jan-2025(1)(1)
1,014 pages
Hybrid Teams Flexible Collaboration Between Humans Robots and Virtual Agents
No ratings yet
Hybrid Teams Flexible Collaboration Between Humans Robots and Virtual Agents
18 pages
Homework Timetable
100% (1)
Homework Timetable
6 pages
Avamar Technical Differences SRG PDF
No ratings yet
Avamar Technical Differences SRG PDF
237 pages
Gti250W Toolbox Manual: Software and Hardware To Adjust The Power Curve of The GTI250W Grid Tie Inverter
No ratings yet
Gti250W Toolbox Manual: Software and Hardware To Adjust The Power Curve of The GTI250W Grid Tie Inverter
10 pages
Logical DB
No ratings yet
Logical DB
8 pages
Microsoft Zero Trust TEI Study
No ratings yet
Microsoft Zero Trust TEI Study
43 pages
ACL Exercises With Solutions
No ratings yet
ACL Exercises With Solutions
3 pages
ETERNITY V10 System Manual
No ratings yet
ETERNITY V10 System Manual
1,860 pages
Project Report On Distance Learning System: Chapter - 1
No ratings yet
Project Report On Distance Learning System: Chapter - 1
20 pages
F PUB - Inc.2022-11-23 154415
No ratings yet
F PUB - Inc.2022-11-23 154415
46 pages
EMTECH Google Site Presentation
No ratings yet
EMTECH Google Site Presentation
63 pages
Introduction To SITL: Before You Begin
100% (1)
Introduction To SITL: Before You Begin
16 pages
Operating System IBPS
No ratings yet
Operating System IBPS
22 pages
Download Complete Multiple Imputation in Practice Using IVEware First Edition Berglund PDF for All Chapters
100% (1)
Download Complete Multiple Imputation in Practice Using IVEware First Edition Berglund PDF for All Chapters
65 pages
Compiler Design.: Why To Learn About Compilers
No ratings yet
Compiler Design.: Why To Learn About Compilers
12 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
Fiery Color Profiler Suite
No ratings yet
Fiery Color Profiler Suite
93 pages
Advance database Assignment 1-1
No ratings yet
Advance database Assignment 1-1
9 pages
Frank Piller - Handout WORKSHOP Developing Platform Based Business Models For Industrie 4.0 Methodology
No ratings yet
Frank Piller - Handout WORKSHOP Developing Platform Based Business Models For Industrie 4.0 Methodology
78 pages
WINSEM2023-24 BCSE305L TH VL2023240501070 2024-01-03 Reference-Material-I
No ratings yet
WINSEM2023-24 BCSE305L TH VL2023240501070 2024-01-03 Reference-Material-I
12 pages
Autocad 2019 Tips and Tricks en PDF
No ratings yet
Autocad 2019 Tips and Tricks en PDF
44 pages
Hotel Management System: User Requirements Document
100% (1)
Hotel Management System: User Requirements Document
7 pages

Uploaded by

Uploaded by

CS170 - Introduction to Data Science (Jupyter

#Import the necesssary library such as pandas and matplotlib

#read the dataset

pokedex_num sp_attack sp_defense p_speed p_generation

# complete the syntex by creating a crosstab of the record based on

# plot a bar graph (frequency), make sure it is stacked

<Axes: title={'center': 'Pokemon stamina by generation'},

# plot a normalized bar type crosstab data with proportions

<Axes: title={'center': 'Pokemon stamina by generation'},

# create a contingency table showing the generation and legendary

pokedex_num sp_attack sp_defense p_speed p_generation

# create a contingency table showing the generation and legendary by

# import required package second task

# now create a histogram based on the two subsets, 7 bins

plt.hist([pok_y, pok_n], bins = 7 , stacked = True)

# divide each row by the sum of that row

ourbins = np.column_stack((bins[0:7], bins[1:8]))

# construct normalized plot plt.bar p1 and p2

#plot the table

pokemon['VAR'] = pd.cut(x=pokemon['sp_attack'], bins=[-float("inf"),

# create contingency table based on its type (legendary and non-

# crete a contingency table based on percentage

<Axes: title={'center': 'Bar Graph of VAR (Binned / Frequency) with

crosstab_02 = pd.crosstab(pokemon['VAR'], pokemon['p_published'])

<Axes: title={'center': 'Bar Graph of VAR (Binned / Proportion) with

You might also like