0% found this document useful (0 votes)
55 views

AMCCATALAN DS Python Summative

This document provides instructions for analyzing a Pokemon dataset using Python. It includes importing necessary libraries, reading in the CSV file, exploring the data shape and head, creating cross tabulations to examine relationships between variables, plotting bar graphs and histograms, binning a variable, and obtaining the computer's hostname and timestamp. The overall goal is to analyze the Pokemon dataset using various pandas and matplotlib methods and print the notebook as a PDF.

Uploaded by

lastna put4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

AMCCATALAN DS Python Summative

This document provides instructions for analyzing a Pokemon dataset using Python. It includes importing necessary libraries, reading in the CSV file, exploring the data shape and head, creating cross tabulations to examine relationships between variables, plotting bar graphs and histograms, binning a variable, and obtaining the computer's hostname and timestamp. The overall goal is to analyze the Pokemon dataset using various pandas and matplotlib methods and print the notebook as a PDF.

Uploaded by

lastna put4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

CS170 - Introduction to Data Science (Jupyter

Notebook)
Instructions
Answer each line item by replacing the blanks with the necessary operator or a value. Make sure
the kernel is set to Python 3** Once done, right click the actual notebook page and print as PDF.
Last part of the notebook is the code for timestamp from your computer - Run it!.

#Import the necesssary library such as pandas and matplotlib


import pandas as pd
import matplotlib.pyplot as plt

#read the dataset


pokemon = pd.read_csv('AMCCATALAN - pokemon.csv')

pokemon.shape
#get the shape of the dataset

(801, 8)

pokemon.head(10)
#complete the syntax to disply the first 10 rows of the record.

pokedex_num sp_attack sp_defense p_speed p_generation


is_legendary \
0 1 43 135 105 1
0
1 2 58 196 24 1
0
2 3 8 77 199 1
0
3 4 73 20 69 1
0
4 5 11 143 193 1
0
5 6 124 174 112 1
0
6 7 172 91 56 1
0
7 8 109 62 75 1
0
8 9 11 3 76 1
0
9 10 25 15 16 1
0
p_published p_stamina
0 YES 10
1 YES 5
2 NO 5
3 YES 1
4 NO 3
5 NO 1
6 NO 5
7 YES 9
8 YES 5
9 NO 3

# complete the syntex by creating a crosstab of the record based on


stamina and generation
crosstab_01 = pd.crosstab(pokemon['p_stamina'],
pokemon['p_generation'])

# plot a bar graph (frequency), make sure it is stacked


crosstab_01.plot(kind='bar', stacked=True, title='Pokemon stamina by
generation')

<Axes: title={'center': 'Pokemon stamina by generation'},


xlabel='p_stamina'>
# create crosstab by using div and sum command.
crosstab_norm = crosstab_01.div(crosstab_01.sum(1), axis=0)

# plot a normalized bar type crosstab data with proportions


crosstab_norm.plot(kind='bar', stacked=True, title='Pokemon stamina by
generation')

<Axes: title={'center': 'Pokemon stamina by generation'},


xlabel='p_stamina'>

# create a contingency table showing the generation and legendary


crosstab_02 = pd.crosstab(pokemon['p_generation'],
pokemon['is_legendary'])
pokemon.head(7)

pokedex_num sp_attack sp_defense p_speed p_generation


is_legendary \
0 1 43 135 105 1
0
1 2 58 196 24 1
0
2 3 8 77 199 1
0
3 4 73 20 69 1
0
4 5 11 143 193 1
0
5 6 124 174 112 1
0
6 7 172 91 56 1
0

p_published p_stamina
0 YES 10
1 YES 5
2 NO 5
3 YES 1
4 NO 3
5 NO 1
6 NO 5

# create a contingency table showing the generation and legendary by


its percentage
round(crosstab_02.div(crosstab_02.sum(0), axis = 1)*100 , 1 )

is_legendary 0 1
p_generation
1 20.0 7.1
2 12.9 8.6
3 17.1 14.3
4 12.9 18.6
5 19.6 18.6
6 9.0 8.6
7 8.6 24.3

# import required package second task


import numpy as np
import matplotlib.pyplot as plt

# then using the percentage data, create a subset for each element of
the overlay
# is legendary overlay generation
pok_y = pokemon[pokemon.is_legendary == 0]['p_generation']
pok_n = pokemon[pokemon.is_legendary == 1]['p_generation']

# now create a histogram based on the two subsets, 7 bins

plt.hist([pok_y, pok_n], bins = 7 , stacked = True)


plt.legend(['Not Legendary = 0', 'Legendary = 1'])
plt.title('Histogram of Legendary Pokemon Overlay')
plt.xlabel('Generation'); plt.ylabel('Frequency'); plt.show ()
# save the output from the non-normalized plot into variables
(n, bins, patches) = plt.hist([pok_y , pok_n ], bins = 7 , stacked =
True)
# create a table and combine the height of the variables into single
array
n_table = np.column_stack((n[0], n[1]))

# divide each row by the sum of that row


# no revisions on this
n_norm = n_table / n_table.sum(axis=1)[:, None]

# determin upper and lower bounds of each bin (use the number of bins)

ourbins = np.column_stack((bins[0:7], bins[1:8]))

# construct normalized plot plt.bar p1 and p2


p1 = plt.bar(x=ourbins[:, 0], height=n_norm[:, 0], width=ourbins[:, 1]
- ourbins[:, 0], label='Legendary = Yes')
p2 = plt.bar(x=ourbins[:, 0], height=n_norm[:, 1], width=ourbins[:, 1]
- ourbins[:, 0], bottom=n_norm[:, 0], label='Legendary = No')

#plot the table


plt.legend(['Legendary = Yes', 'Legendary = No'])
plt.title('Normalized Histogram of Pokemon with Response Overlay')
plt.xlabel('Generation'); plt.ylabel('Proportion'); plt.show()
# use the cut function in Pandas to create the bins based on pokemon
attack
# should be: Under 50, 50 to 75, 75 to 100, and over 100

pokemon['VAR'] = pd.cut(x=pokemon['sp_attack'], bins=[-float("inf"),


50, 75, 100, float("inf")] ,
labels=["Under 50", "50 to 75", "75 to 100", "Over 100"], right =
True)

# create contingency table based on its type (legendary and non-


legendary) and if published on not
crosstab_02 = pd.crosstab(pokemon['VAR'], pokemon['is_legendary'])
crosstab_02.head(4)

is_legendary 0 1
VAR
Under 50 200 17
50 to 75 89 5
75 to 100 95 5
Over 100 347 43

# crete a contingency table based on percentage


round(crosstab_02.div(crosstab_02.sum(0), axis = 1)*100 , 1 )
is_legendary 0 1
VAR
Under 50 27.4 24.3
50 to 75 12.2 7.1
75 to 100 13.0 7.1
Over 100 47.5 61.4

# then plot a binned bar graph of the crosstab data based on VAR
(frequency)
crosstab_02.plot(kind='bar', stacked=True, title='Bar Graph of VAR
(Binned / Frequency) with Response Overlay')

<Axes: title={'center': 'Bar Graph of VAR (Binned / Frequency) with


Response Overlay'}, xlabel='VAR'>

crosstab_02 = pd.crosstab(pokemon['VAR'], pokemon['p_published'])


crosstab_02_norm = crosstab_02.div(crosstab_02.sum(0), axis=1)

# then plot a binned bar graph of the crosstab data based on VAR
(normalized)
crosstab_02_norm.plot (kind='bar', stacked = True, title = 'Bar Graph
of VAR (Binned / Proportion) with Response Overlay')

<Axes: title={'center': 'Bar Graph of VAR (Binned / Proportion) with


Response Overlay'}, xlabel='VAR'>

import datetime
import socket
def get_Host_name_IP():
try:
host_name = socket.gethostname()
host_ip = socket.gethostbyname(host_name)
print("Hostname-7:",host_name)
print("IP Address:",host_ip)
except:
print("No visible IP Address")
get_Host_name_IP()
now = datetime.datetime.now()
print ("Time Stamp:", now.strftime("%Y-%m-%d %H:%M:%S"))
Hostname-7: AMCCATALAN
IP Address: 192.168.0.115
Time Stamp: 2023-09-16 18:21:21

You might also like