0% found this document useful (0 votes)

74 views

Module 5 Programming Foundation and Exploratory Data Analysis

This module introduces Python programming and exploratory data analysis. It covers Python basics like types, expressions, variables, strings, selection and iteration. It also introduces commonly used Python data structures like lists, tuples, dictionaries and sets. The module then covers exploratory data analysis concepts like definition, motivation, steps in data exploration and basic tools. It concludes with introducing tools like Anaconda, Jupyter notebooks and real-world datasets for learning Python and data analysis.

Uploaded by

Farheen Nawazi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views

Module 5 Programming Foundation and Exploratory Data Analysis

Uploaded by

Farheen Nawazi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 152

Module 5 Programming Foundation

and Exploratory Data Analysis

Module 5 Programming Foundation and
Exploratory Data Analysis
• Introduction to Python Programming, Types,
Expressions and Variables, String Operations,
selection, iteration, Data Structures- Strings,
Regular Expression, List and Tuples,
Dictionaries, Sets; Exploratory Data Analysis
(EDA) - Definition, Motivation, Steps in data
exploration, The basic datatypes, Data type
Portability, Basic Tools of EDA, Data Analytics
Life cycle Discovery
Tools

• Recommend the Anaconda distribution that packages

together the Python 3 language interpreter, IPython libraries,
and the Jupyter notebook environment.

• Complete introduction to all of these computational tools.

• Learn to write programs, generate images from data, and

work with real-world data sets that are published online.
Brief introduction of Python
• Invented in the Netherlands, early 90s by Guido
van Rossum
• Open sourced from the beginning
• Considered a scripting language, but is much
more
– No compilation needed
– Scripts are evaluated by the interpreter, line by line
– Functions need to be defined before they are called
Different ways to run python
• Call python program via python interpreter from a Unix/windows command line
– $ python testScript.py
– Or make the script directly executable, with additional header lines in the script
• Using python console
– Typing in python statements. Limited functionality
>>> 3 +3
6
>>> exit()
• Using ipython console
– Typing in python statements. Very interactive.
In [167]: 3+3
Out [167]: 6
– Typing in %run testScript.py
– Many convenient “magic functions”
Anaconda for python3
• We’ll be using anaconda which includes python
environment and an IDE (spyder) as well as many
additional features
• Most python modules needed in data science are
already installed with the anaconda distribution
• Install with python 3.6 (and
install python 2.7 as secondary from anaconda pr
ompt
)
• Key diff between Python 2 and python 3
Ipython magic functions
• who, whos, who_ls
• time, timeit
• debug
• pwd, ls, cd, etc.
• ?
• ??
Python programming in <2 hours
• This is not a comprehensive python language class
• Will focus on parts of the language that is worth
attention and useful in data science
• Two parts:
– Basics - today
– More advanced – next week and/or as we go
• Comprehensive Python language reference and
tutorial available in Anacondo Navigator under
“Learning” and on python.org
Formatting
• Many languages use curly braces to delimit blocks of code.
Python uses indentation. Incorrect indentation causes error.
• Comments start with #
• Colons start a new block in many constructs, e.g. function
definitions, if-then clause, for, while
for i in [1, 2, 3, 4, 5]:
# first line in "for i" block
print (i)
for j in [1, 2, 3, 4, 5]:
# first line in "for j" block
print (j)
# last line in "for j" block
print (i + j)
# last line in "for i" block print "done looping
print (i)
print ("done looping”)
• Whitespace is ignored inside parentheses and
brackets.
long_winded_computation = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 +
9 + 10 + 11 + 12 + 13 + 14 +
15 + 16 + 17 + 18 + 19 + 20)

list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

easier_to_read_list_of_lists =
[ [1, 2, 3],
[4, 5, 6],
[7, 8, 9] ]

Alternatively:
long_winded_computation = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + \
9 + 10 + 11 + 12 + 13 + 14 + \
15 + 16 + 17 + 18 + 19 + 20
Modules
• Certain features of Python are not loaded by
default
• In order to use these features, you’ll need to
import the modules that contain them.
• E.g.
import matplotlib.pyplot as plt
import numpy as np
Variables and objects
• Variables are created the first time it is assigned a value
– No need to declare type
– Types are associated with objects not variables
• X=5
• X = [1, 3, 5]
• X = ‘python’
– Assignment creates references, not copies
X = [1, 3, 5]
Y= X
X[0] = 2
Print (Y) # Y is [2, 3, 5]
Assignment
• You can assign to multiple names at the same
time
x, y = 2, 3
• To swap values
x, y = y, x
• Assignments can be chained
x=y=z=3
• Accessing a name before it’s been created (by
assignment), raises an error
Arithmetic
• a=5+2 # a is 7
• b = 9 – 3. # b is 6.0
• c=5*2 # c is 10
• d = 5**2 # d is 25
• e=5%2 # e is 1

Built in numerical types: int, float, complex

• f=7/2
# in python 2, f will be 3, unless “from __future__
import division”
• f = 7 / 2 # in python 3 f = 3.5
• f = 7 // 2 # f = 3 in both python 2 and 3
• f = 7 / 2. # f = 3.5 in both python 2 and 3

• f = 7 / float(2) # f is 3.5 in both python 2 and 3

• f = int(7 / 2) # f is 3 in both python 2 and 3
String - 1
• Strings can be delimited by matching single or double quotation
marks
single_quoted_string = 'data science'
double_quoted_string = "data science"
escaped_string = 'Isn\'t this fun'
another_string = "Isn't this fun"

real_long_string = 'this is a really long string. \

It has multiple parts, \
but all in one line.'

• Use triple quotes for multi line strings

multi_line_string = """This is the first line.
and this is the second line
and this is the third line"""
String - 2
• Use raw strings to output backslashes
tab_string = "\t" # represents the tab character
len(tab_string) # is 1

not_tab_string = r"\t" # represents the characters '\' and 't'

len(not_tab_string) # is 2

• Strings can be concatenated (glued together) with the + operator, and repeated
with *
s = 3 * 'un' + 'ium' # s is 'unununium'

• Two or more string literals (i.e. the ones enclosed between quotes) next to
each other are automatically concatenated
s1 = 'Py' 'thon'
s2 = s1 + '2.7'
real_long_string = ('this is a really long string. '
‘It has multiple parts, '
‘but all in one line.‘)
List - 1
integer_list = [1, 2, 3]
heterogeneous_list = ["string", 0.1, True]
list_of_lists = [ integer_list, heterogeneous_list, [] ]
list_length = len(integer_list) # equals 3
list_sum = sum(integer_list) # equals 6
• Get the i-th element of a list
x = [i for i in range(10)] # is the list [0, 1, ..., 9]
zero = x[0] # equals 0, lists are 0-indexed
one = x[1] # equals 1
nine = x[-1] # equals 9, 'Pythonic' for last element
eight = x[-2] # equals 8, 'Pythonic' for next-to-last element
• Get a slice of a list
one_to_four = x[1:5] # [1, 2, 3, 4]
first_three = x[:3] # [0, 1, 2]
last_three = x[-3:] # [7, 8, 9]
three_to_end = x[3:] # [3, 4, ..., 9]
without_first_and_last = x[1:-1] # [1, 2, ..., 8]
copy_of_x = x[:] # [0, 1, 2, ..., 9]
another_copy_of_x = x[:3] + x[3:] # [0, 1, 2, ..., 9]
List - 2
• Check for memberships
1 in [1, 2, 3] # True
0 in [1, 2, 3] # False
• Concatenate lists
x = [1, 2, 3]
y = [4, 5, 6]
x.extend(y) # x is now [1,2,3,4,5,6]

x = [1, 2, 3]
y = [4, 5, 6]
z = x + y # z is [1,2,3,4,5,6]; x is unchanged.
• List unpacking (multiple assignment)
x, y = [1, 2] # x is 1 and y is 2
[x, y] = 1, 2 # same as above
x, y = [1, 2] # same as above
x, y = 1, 2 # same as above
_, y = [1, 2] # y is 2, didn't care about the first element
List - 3
• Modify content of list
x = [0, 1, 2, 3, 4, 5, 6, 7, 8]
x[2] = x[2] * 2 # x is [0, 1, 4, 3, 4, 5, 6, 7, 8]
x[-1] = 0 # x is [0, 1, 4, 3, 4, 5, 6, 7, 0]
x[3:5] = x[3:5] * 3 # x is [0, 1, 4, 9, 12, 5, 6, 7, 0]
x[5:6] = [] # x is [0, 1, 4, 9, 12, 7, 0]
del x[:2] # x is [4, 9, 12, 7, 0]
del x[:] # x is []
del x # referencing to x hereafter is a NameError

• Strings can also be sliced. But they cannot modified (they are immutable)
s = 'abcdefg'
a = s[0] # 'a'
x = s[:2] # 'ab'
y = s[-3:] # 'efg'
s[:2] = 'AB' # this will cause an error
s = 'AB' + s[2:] # str is now ABcdefg
The range() function
for i in range(5):
print (i) # will print 0, 1, 2, 3, 4 (in separate lines)
for i in range(2, 5):
print (i) # will print 2, 3, 4
for i in range(0, 10, 2):
print (i) # will print 0, 2, 4, 6, 8
for i in range(10, 2, -2):
print (i) # will print 10, 8, 6, 4
>>> a = ['Mary', 'had', 'a', 'little', 'lamb']
>>> for i in range(len(a)):
... print(i, a[i])
...
0 Mary
1 had
2 a
3 little
4 lamb
Range() in python 2 and 3
• In python 2, range(5) is equivalent to [0, 1, 2, 3, 4]
• In python 3, range(5) is an object which can be iterated,
but not identical to [0, 1, 2, 3, 4] (lazy iterator)
print (range(3)) # in python 3, will see "range(0, 3)"
print (range(3)) # in python 2, will see "[0, 1, 2]"
print (list(range(3))) # will print [0, 1, 2] in python 3

x = range(5)
print (x[2]) # in python 2, will print "2"
print (x[2]) # in python 3, will also print “2”

x[2] = 5 # in python 2, will result in [0, 1, 5, 3, 4, 5]

x[2] = 5 # in python 3, will cause an error.
Ref to lists
• What are the expected output for the following code?

a = list(range(10))
b = a
b[0] = 100
print(a) [100, 1, 2, 3, 4, 5, 6, 7, 8, 9]

a = list(range(10))
b = a[:]
b[0] = 100
print(a) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
tuples
• Similar to lists, but are immutable
• a_tuple = (0, 1, 2, 3, 4) Note: tuple is defined by comma, not parens,
which is only used for convenience. So a = (1)
• Other_tuple = 3, 4 is not a tuple, but a = (1,) is.

• Another_tuple = tuple([0, 1, 2, 3, 4])

• Hetergeneous_tuple = (‘john’, 1.1, [1, 2])

• Can be sliced, concatenated, or repeated

a_tuple[2:4] # will print (2, 3)
• Cannot be modified
a_tuple[2] = 5
TypeError: 'tuple' object does not support item assignment
Tuples - 2
• Useful for returning multiple values from
functions
def sum_and_product(x, y):
return (x + y),(x * y)
sp = sum_and_product(2, 3) # equals (5, 6)
s, p = sum_and_product(5, 10) # s is 15, p is 50
• Tuples and lists can also be used for multiple
assignments
x, y = 1, 2
[x, y] = [1, 2]
(x, y) = (1, 2)
x, y = y, x
Dictionaries
• A dictionary associates values with unique keys
empty_dict = {} # Pythonic
empty_dict2 = dict() # less Pythonic
grades = { "Joel" : 80, "Tim" : 95 } # dictionary literal

• Access/modify value with key

joels_grade = grades["Joel"] # equals 80

grades["Tim"] = 99 # replaces the old value

grades["Kate"] = 100 # adds a third entry
num_students = len(grades) # equals 3

try:
kates_grade = grades["Kate"]
except KeyError:
print "no grade for Kate!"
Dictionaries - 2
• Check for existence of key
joel_has_grade = "Joel" in grades # True
kate_has_grade = "Kate" in grades # False

• Use “get” to avoid keyError and add default value

joels_grade = grades.get("Joel", 0) # equals 80
kates_grade = grades.get("Kate", 0) # equals 0
no_ones_grade = grades.get("No One") # default
default is None
#Which of the following is faster?
'Joel' in grades # faster.
• Get all items 'Joel' in all_keys
Hashtable
'Joel' in all_keys # slower. List.
all_keys = grades.keys() # return a list of all keys
all_values = grades.values() # return a list of all values
all_pairs = grades.items() # a list of (key, value) tuples
Dictionaries - 2
• Check for existence of key
joel_has_grade = "Joel" in grades # True
kate_has_grade = "Kate" in grades # False

• Use “get” to avoid keyError and add default value

joels_grade = grades.get("Joel", 0) # equals 80
kates_grade = grades.get("Kate", 0) # equals 0
no_ones_grade = grades.get("No One") # default
default is None

• Get all items In python3, The following will not return lists but
iterable objects
all_keys = grades.keys() # return a list of all keys
all_values = grades.values() # return a list of all values
all_pairs = grades.items() # a list of (key, value) tuples
Difference between python 2 and python 3:
Iterable objects vs lists
• In Python 3, range() returns a lazy iterable object.
– Value created when needed x = range(10000000) #fast
– Can be accessed by index x[10000] #allowed. fast

• Similarly, dict.keys(), dict.values(), and dict.items()

(also map, filter, zip, see next)
– Value can NOT be accessed by index
– Can convert to list if really needed
– Can use for loop to iterate
keys = grades.keys()
keys[0] # error
for key in keys: print (key) #ok
Control flow - 1
• if-else
if 1 > 2:
message = "if only 1 were greater than two..."
elif 1 > 3:
message = "elif stands for 'else if'"
else:
message = "when all else fails use else (if you want
to)"
print (message)
parity = "even" if x % 2 == 0 else "odd"

• Difference between python 2 and python3 print

• In python 2, print is a statement
• Print(message) and print message are both valid
• In python 3, print is a function
• Only print(message) is valid
Truthiness
• True All keywords are case sensitive.
• False 0, 0.0, [], (), ‘’, None are considered
False. Most other values are True.
• None
• and In [137]: print ("True") if '' else print ('False')
False
• or
a = [0, 0, 0, 1]
• not
any(a)
• any Out[135]: True

• all all(a)
Out[136]: False
Comparison
Operation Meaning a = [0, 1, 2, 3, 4]
b = a
< strictly less than
c = a[:]
<= less than or equal
a == b
> strictly greater than Out[129]: True
>= greater than or equal
a is b
== equal Out[130]: True

!= not equal a == c
Out[132]: True
is object identity
a is c
is not negated object identity
Out[133]: False

Bitwise operators: & (AND), | (OR), ^ (XOR), ~(NOT), << (Left Shift), >> (Right Shift)
Control flow - 2
• loops
x = 0
while x < 10:
print (x, "is less than 10“)
x += 1

What happens if we forgot to indent?

for x in range(10): Keyword pass in loops:

pass Does nothing, empty statement placeholder

for x in range(10):
if x == 3:
continue # go immediately to the next iteration
if x == 5:
break # quit the loop entirely
print (x)
Exceptions
try:
print 0 / 0
except ZeroDivisionError:
print ("cannot divide by zero")

https://docs.python.org/3/tutorial/errors.html
Functions - 1
• Functions are defined using def
def double(x):
"""this is where you put an optional docstring
that explains what the function does.
for example, this function multiplies its
input by 2"""
return x * 2
• You can call a function after it is defined
z = double(10) # z is 20
• You can give default values to parameters
def my_print(message="my default message"):
print (message)

my_print("hello") # prints 'hello'

my_print() # prints 'my default message‘
Functions - 2
• Sometimes it is useful to specify arguments by name

def subtract(a=0, b=0):

return a – b

subtract(10, 5) # returns 5
subtract(0, 5) # returns -5
subtract(b = 5) # same as above
subtract(b = 5, a = 0) # same as above
Functions - 3
• Functions are objects too
In [12]: def double(x): return x * 2
    ...: DD = double;
    ...: DD(2)
    ...:
Out[12]: 4
In [16]: def apply_to_one(f):
    ...: return f(1)
    ...: x=apply_to_one(DD)
    ...: x
    ...:
Out[16]: 2
Functions – lambda expression
• Small anonymous functions can be created
with the lambda keyword.
In [18]: y=apply_to_one(lambda x: x+4)

In [19]: y
Out[19]: 5

In [104]: def small_func(x): return x+4

      ...: apply_to_one(small_func)
Out[104]: 5
lambda expression - 2
• Small anonymous functions can be created
with the lambda keyword.
In [22]: pairs = [(2, 'two'), (3, 'three'), (1, 'one'), (4, 'four')]
    ...: pairs.sort(key=lambda pair: pair[0])
    ...: pairs
Out[22]: [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]

In [107]: def getKey(pair): return pair[0]

...: pairs.sort(key=getKey)
...: pairs
Out[107]: [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')
Sorting list
• Sorted(list): keeps the original list intact and returns
a new sorted list
• list.sort: sort the original list
x = [4,1,2,3]
y = sorted(x) # is [1,2,3,4], x is unchanged
x.sort() # now x is [1,2,3,4]

• Change the default behavior of sorted

# sort the list by absolute value from largest to smallest
x = [-4,1,-2,3]
y = sorted(x, key=abs, reverse=True) # is [-4,3,-2,1]
# sort the grades from highest count to lowest
# using an anonymous function
newgrades = sorted(grades.items(),
key=lambda (name, grade): grade,
reverse=True)
List comprehension
• A very convenient way to create a new list

In [51]: squares = [x * x for x in range(5)]

In [52]: squares
Out[52]: [0, 1, 4, 9, 16]

In [64]: for x in range(5): squares[x] = x

*x
...: squares
Out[64]: [0, 1, 4, 9, 16]
List comprehension - 2
• Can also be used to filter list
In [65]: even_numbers = [x for x in range(5) if x % 2 == 0]
In [66]: even_numbers
Out[66]: [0, 2, 4]

In [68]: even_numbers = []
In [69]: for x in range(5):
    ...: if x % 2 == 0:
    ...: even_numbers.append(x)
    ...: even_numbers
Out[69]: [0, 2, 4]
List comprehension - 3
• More complex examples:
# create 100 pairs (0,0) (0,1) ... (9,8), (9,9)
pairs = [(x, y)
for x in range(10)
for y in range(10)]

# only pairs with x < y,

# range(lo, hi) equals
# [lo, lo + 1, ..., hi - 1]
increasing_pairs = [(x, y)
for x in range(10)
for y in range(x + 1, 10)]
Functools: map, reduce, filter
• Do not confuse with MapReduce in big data
• Convenient tools in python to apply function
to sequences of data
In [203]: def double(x): return 2*x In [205]: [double(i) for i in range(5)]
...: b=range(5) Out[205]: [0, 2, 4, 6, 8]
...: list(map(double, b))
Out[203]: [0, 2, 4, 6, 8]

In [204]: double(b)
Traceback (most recent call last):
…
TypeError: unsupported operand type(s) for *: 'int' and 'range'
Functools: map, reduce, filter
• Do not confuse with MapReduce in big data
• Convenient tools in python to apply function
to sequences of data
In [208]: def is_even(x): return x%2==0
     ...: a=[0, 1, 2, 3]
     ...: list(filter(is_even, a))
     ...:
Out[208]: [0, 2]

In [209]: [a[i] for i in a if is_even(i)]

Out[209]: [0, 2]
Functools: map, reduce, filter
• Do not confuse with MapReduce in big data
• Convenient tools in python to apply function
to sequences of data
In [216]: from functools import reduce
In [217]: reduce(lambda x, y: x+y, range(10))
Out[217]: 45

In [220]: reduce(lambda x, y: x*y, [1, 2, 3, 4])

Out[220]: 24
zip
• Useful to combined multiple lists into a list of
tuples
In [238]: list(zip(['a', 'b', 'c'], [1, 2, 3], ['A', 'B', 'C']))
Out[238]: [('a', 1, 'A'), ('b', 2, 'B'), ('c', 3, 'C')]
In [245]: names = ['James', 'Tom', 'Mary']
     ...: grades = [100, 90, 95]
     ...: list(zip(names, grades))
     ...:
Out[245]: [('James', 100), ('Tom', 90), ('Mary', 95)]
Argument unpacking
• zip(*[a, b,c]) same as zip(a, b, c)
In [252]: gradeBook = [['James', 100],
['Tom', 90],
['Mary', 95]]
     ...: [names, grades]=zip(*gradeBook)
In [253]: names
Out[253]: ('James', 'Tom', 'Mary')
In [254]: grades
Out[254]: (100, 90, 95)

In [259]: list(zip(['James', 100], ['Tom', 90], ['Mary', 95]))

Out[259]: [('James', 'Tom', 'Mary'), (100, 90, 95)]
args and kargs
• Convenient for taking variable number of
unnamed and named parameters
In [260]: def magic(*args, **kwargs):
     ...: print ("unnamed args:", args)
     ...: print ("keyword args:", kwargs)
     ...: magic(1, 2, key="word", key2="word2")
     ...:
unnamed args: (1, 2)
keyword args: {'key': 'word', 'key2': 'word2'}
Useful methods and modules
• The Python Tutorial
– Input and Output
• The Python Standard Library Reference
– Common string methods
– Regular expression operations
– Numeric and Mathematical Modules
– CSV File Reading and Writing
Files - input
inflobj = open(‘data’, ‘r’) Open the file ‘data’ for
input
S = inflobj.read() Read whole file into one
String
S = inflobj.read(N) Reads N bytes (N >= 1)

L = inflobj.readline () Read one line

L = inflobj.readlines() Returns a list of line strings

https://docs.python.org/3/tutorial/inputoutput.html
Files - output

outflobj = open(‘data’, ‘w’) Open the file ‘data’

for writing
outflobj.write(S) Writes the string S to
file
outflobj.writelines(L) Writes each of the
strings in list L to file
outflobj.close() Closes the file

https://docs.python.org/3/tutorial/inputoutput.html
Module math
Command name Description Constant Description
abs(value) absolute value e 2.7182818...
ceil(value) rounds up pi 3.1415926...
cos(value) cosine, in radians
floor(value) rounds down
log(value) logarithm, base e
log10(value) logarithm, base 10
max(value1, value2) larger of two values
min(value1, value2) smaller of two values
round(value) nearest whole number # preferred.
sin(value) sine, in radians import math
sqrt(value) square root math.abs(-0.5)

#bad style. Many unknown #This is fine

#names in name space. from math import abs
from math import * abs(-0.5)
abs(-0.5)
Module random
• Generating random numbers are important in
statistics
In [75]: import random
    ...: four_uniform_randoms = [random.random() for _ in range(4)]
    ...: four_uniform_randoms
    ...:
Out[75]:
[0.5687302894847388,
0.6562738117250464,
0.3396960191199996,
0.016968446644451407]
• Other useful functions: seed(), randint, randrange, shuffle, etc.
• Type in “random” and then use tab completion to see available
functions and use “?” to see docstring of function.
Important python modules for data science

• Numpy
– Key module for scientific computing
– Convenient and efficient ways to handle multi dimensional arrays
• pandas
– DataFrame
– Flexible data structure of labeled tabular data
• Matplotlib: for plotting
• Scipy: solutions to common scientific computing problem
such as linear algebra, optimization, statistics, sparse
matrix
Module paths
• In order to be able to find a module called myscripts.py, the
interpreter scans the list sys.path of directory names.
• The module must be in one of those directories.

>>> import sys

>>> sys.path
['C:\\Python26\\Lib\\idlelib', 'C:\\WINDOWS\\system32\\python26.zip',
'C:\\Python26\\DLLs', 'C:\\Python26\\lib', 'C:\\Python26\\lib\\plat-win',
'C:\\Python26\\lib\\lib-tk', 'C:\\Python26', 'C:\\Python26\\lib\\site-
packages']
>>> import myscripts
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
import myscripts.py
ImportError: No module named myscripts.py
Appendix
Sequence types: Tuples,
Lists, and Strings
Sequence Types
1. Tuple: (‘john’, 32, [CMSC])
· A simple immutable ordered sequence of
items
· Items can be of mixed types, including
collection types
2. Strings: “John Smith”
– Immutable
– Conceptually very much like a tuple
3. List: [1, 2, ‘john’, (‘up’, ‘down’)]
· Mutable ordered sequence of items of mixed
types
Similar Syntax
• All three sequence types (tuples, strings, and lists)
share much of the same syntax and functionality.
• Key difference:
– Tuples and strings are immutable
– Lists are mutable
• The operations shown in this section can be
applied to all sequence types
– most examples will just show the operation
performed on one
Defining Sequence
• Define tuples using parentheses and commas
>>> tu = (23, ‘abc’, 4.56, (2,3),
‘def’)
• Define lists are using square brackets and commas
>>> li = [“abc”, 34, 4.34, 23]
• Define strings using quotes (“, ‘, or “““).
>>> st = “Hello World”
>>> st = ‘Hello World’
>>> st = “““This is a multi-line
string that uses triple quotes.”””
Accessing one element
• Access individual members of a tuple, list, or string
using square bracket “array” notation
• Note that all are 0 based…
>>> tu = (23, ‘abc’, 4.56, (2,3), ‘def’)
>>> tu[1] # Second item in the tuple.
‘abc’
>>> li = [“abc”, 34, 4.34, 23]
>>> li[1] # Second item in the list.
34
>>> st = “Hello World”
>>> st[1] # 2nd character in string. Still str type
‘e’
Positive and negative indices

>>> t = (23, ‘abc’, 4.56, (2,3),

‘def’)
Positive index: count from the left, starting with 0
>>> t[1]
‘abc’
Negative index: count from right, starting with –1
>>> t[-3]
4.56
Slicing: return copy of a subset

>>> t = (23, ‘abc’, 4.56,

(2,3), ‘def’)
Return a copy of the container with a subset of the
original members. Start copying at the first index,
and stop copying before second.
>>> t[1:4]
(‘abc’, 4.56, (2,3))
Negative indices count from end
>>> t[1:-1]
(‘abc’, 4.56, (2,3))
Slicing: return copy of a subset

>>> t = (23, ‘abc’, 4.56,

(2,3), ‘def’)
Omit first index to make copy starting from
beginning of the container
>>> t[:2]
(23, ‘abc’)
Omit second index to make copy starting at first
index and going to end
>>> t[2:]
(4.56, (2,3), ‘def’)
Copying the Whole Sequence
• [ : ] makes a copy of an entire sequence
>>> t[:]
(23, ‘abc’, 4.56, (2,3), ‘def’)
• Note the difference between these two lines for mutable
sequences
>>> l2 = l1 # Both refer to 1 ref,
# changing one affects both
>>> l2 = l1[:] # Independent copies,
two refs
The ‘in’ Operator
• Boolean test whether a value is inside a container:
>>> t = [1, 2, 4, 5]
>>> 3 in t
False
>>> 4 in t
True
>>> 4 not in t
False

• For strings, tests for substrings

>>> a = 'abcde'
>>> 'c' in a
True
>>> 'cd' in a
True
>>> 'ac' in a
False
The + Operator
The + operator produces a new tuple, list, or string whose value is
the concatenation of its arguments.

>>> (1, 2, 3) + (4, 5, 6)

(1, 2, 3, 4, 5, 6)

>>> [1, 2, 3] + [4, 5, 6]

[1, 2, 3, 4, 5, 6]

>>> “Hello” + “ ” + “World”

‘Hello World’
The * Operator
• The * operator produces a new tuple, list, or string
that “repeats” the original content.
>>> (1, 2, 3) * 3
(1, 2, 3, 1, 2, 3, 1, 2, 3)

>>> [1, 2, 3] * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]

>>> “Hello” * 3
‘HelloHelloHello’
Mutability:
Tuples vs. Lists
Lists are mutable
>>> li = [‘abc’, 23, 4.34, 23]
>>> li[1] = 45
>>> li
[‘abc’, 45, 4.34, 23]
• We can change lists in place.
• Name li still points to the same memory
reference when we’re done.
Tuples are immutable
>>> t = (23, ‘abc’, 4.56, (2,3), ‘def’)
>>> t[2] = 3.14
Traceback (most recent call last):
File "<pyshell#75>", line 1, in -toplevel-
tu[2] = 3.14
TypeError: object doesn't support item assignment

• You can’t change a tuple.

• You can make a fresh tuple and assign its reference
to a previously used name.
>>> t = (23, ‘abc’, 3.14, (2,3), ‘def’)
• The immutability of tuples means they’re faster
than lists.
Operations on Lists Only

>>> li = [1, 11, 3, 4, 5]

>>> li.append(‘a’) # Note the method

syntax
>>> li
[1, 11, 3, 4, 5, ‘a’]

>>> li.insert(2, ‘i’)

>>>li
[1, 11, ‘i’, 3, 4, 5, ‘a’]
The extend method vs +
• + creates a fresh list with a new memory ref
• extend operates on list li in place.
>>> li.extend([9, 8, 7])
>>> li
[1, 2, ‘i’, 3, 4, 5, ‘a’, 9, 8, 7]
• Potentially confusing:
– extend takes a list as an argument.
– append takes a singleton as an argument.
>>> li.append([10, 11, 12])
>>> li
[1, 2, ‘i’, 3, 4, 5, ‘a’, 9, 8, 7, [10, 11, 12]]
Operations on Lists Only
Lists have many methods, including index, count, remove, reverse,
sort
>>> li = [‘a’, ‘b’, ‘c’, ‘b’]
>>> li.index(‘b’) # index of 1st
occurrence
1
>>> li.count(‘b’) # number of
occurrences
2
>>> li.remove(‘b’) # remove 1st occurrence
>>> li
[‘a’, ‘c’, ‘b’]
Operations on Lists Only
>>> li = [5, 2, 6, 8]

>>> li.reverse() # reverse the list in place

>>> li
[8, 6, 2, 5]

>>> li.sort() # sort the list in place

>>> li
[2, 5, 6, 8]

>>> li.sort(some_function)
# sort in place using user-defined comparison
Tuple details
• The comma is the tuple creation operator, not parens
>>> 1,
(1,)

• Python shows parens for clarity (best practice)

>>> (1,)
(1,)

• Don't forget the comma!

>>> (1)
1

• Trailing comma only required for singletons others

• Empty tuples have a special syntactic form
>>> ()
()
>>> tuple()
()
Summary: Tuples vs. Lists
• Lists slower but more powerful than tuples
– Lists can be modified, and they have lots of handy
operations and mehtods
– Tuples are immutable and have fewer features
• To convert between tuples and lists use the list() and
tuple() functions:
li = list(tu)
tu = tuple(li)
Python for Data Analysis
Lecture Content Overview of Python Libraries
for Data Scientists

Reading Data; Selecting and Filtering the Data; Data

manipulation, sorting, grouping, rearranging

Plotting the data

Descriptive statistics

Inferential statistics

80
Python Libraries for Data Science
Many popular Python toolboxes/libraries:
– NumPy
– SciPy
– Pandas
– SciKit-Learn
– TensorFlow
– Keras
– PyTorch

Visualization libraries
– matplotlib
– Seaborn

and many more …

81
Python Libraries for Data Science
NumPy:
 introduces objects for multidimensional arrays and
matrices, as well as functions that allow to easily
perform advanced mathematical and statistical
operations on those objects

 provides vectorization of mathematical operations

on arrays and matrices which significantly
improves the performance
Link: http://www.numpy.org/

82
Python Libraries for Data Science
SciPy:
 collection of algorithms for linear algebra,
differential equations, numerical integration,
optimization, statistics and more

 part of SciPy Stack

 built on NumPy
Link: https://www.scipy.org/scipylib/

83
Python Libraries for Data Science
Pandas:
 adds data structures and tools designed to work
with table-like data (similar to Series and Data
Frames in R)

 provides tools for data manipulation: reshaping,

merging, sorting, slicing, aggregation etc.

 http://pandas.pydata.org/
Link: allows handling missing data
84
Python Libraries for Data Science
SciKit-Learn:
 provides machine learning algorithms:
classification, regression, clustering, model
validation etc.

 built on NumPy, SciPy and matplotlib

Link: http://scikit-learn.org/

85
TensorFl
ow

Features:
• Better computational graph visualizations
• Reduces error by 50 to 60 percent in neural machine learning
• Parallel computing to execute complex models
• Seamless library management backed by Google
• Quicker updates and frequent new releases to provide you with the
latest features
• TensorFlow is particularly useful for the following applications:
• Speech and image recognition
• Text-based applications
• Time-series analysis
• Video detection
• Link : https://www.tensorflow.org

86
Keras
• Features:
• Keras provides a vast prelabeled datasets which can be used to directly
import and load.
• It contains various implemented layers and parameters that can be used
for construction, configuration, training, and evaluation of neural networks
• Applications:
• One of the most significant applications of Keras are the
deep learning models that are available with their pretrained weights. You
can use these models directly to make predictions or extract its features
without creating or training your own new model.
• Link https://keras.io

87
Python Libraries for Data Science
matplotlib:
 python 2D plotting library which produces
publication quality figures in a variety of hardcopy
formats

 a set of functionalities similar to those of MATLAB

 line plots, scatter plots, barcharts, histograms, pie

charts etc.
Link: https://matplotlib.org/

88
Python Libraries for Data Science
Seaborn:
 based on matplotlib

 provides high level interface for drawing attractive

statistical graphics

 Similar (in style) to the popular ggplot2 library in R

Link: https://seaborn.pydata.org/

89
Start Jupyter notebook
# On the Shared Computing Cluster
[scc1 ~] jupyter notebook

90
Loading Python Libraries
In [ #Import Python Libraries
]: import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mpl
import seaborn as sns

Press Shift+Enter to execute the jupyter cell

91
Reading data using pandas
#Read csv file
In [ ]:
df =
pd.read_csv("http://rcs.bu.edu/examples/python/data_analysis/
Note: The above command has many optional arguments to fine-tune the
Salaries.csv")
data import process.

There is a number of pandas commands to read other data formats:

pd.read_excel('myfile.xlsx',sheet_name='Sheet1',
index_col=None, na_values=['NA'])
pd.read_stata('myfile.dta')
pd.read_sas('myfile.sas7bdat')
pd.read_hdf('myfile.h5','df')
92
Exploring data frames
#List first 5 records
In [3]:
df.head()

Out[3]:

93
Hands-on exercises

 Try to read the first 10, 20, 30 records;

 Can you guess how to view the last few records;

Hint:

94
Data Frame data types
Pandas Type Native Python Type Description
object string The most general dtype.
Will be assigned to your
column if column has
mixed types (numbers and
strings).
int64 int Numeric characters. 64
refers to the memory
allocated to hold this
character.
float64 float Numeric characters with
decimals. If a column
contains numbers and
NaNs(see below), pandas
will default to float64, in
case your missing value
has a decimal.
datetime64, timedelta[ns] N/A (but see the datetime Values meant to hold time
module in Python’s data. Look into these for
standard library) time series experiments.

95
Data Frame data types
#Check a particular column type
In [4]:
df['salary'].dtype

Out[4]: dtype('int64')

#Check types for all the columns

In [5]:
df.dtypes

Out[4]:
rank object
discipline object
phd int64
service int64
sex object
salary int64
dtype: object
96
Data Frames attributes
Python objects have attributes and methods.

df.attribute description
dtypes list the types of the columns
columns list the column names
axes list the row labels and column names
ndim number of dimensions

size number of elements

shape return a tuple representing the dimensionality
values numpy representation of the data

97
Hands-on exercises

 Find how many records this data frame has;

 How many elements are there?

 What are the column names?

 What types of columns we have in this data frame?

98
Data Frames methods
Unlike attributes, python methods have parenthesis.
All attributes and methods can be listed with a dir()
function: dir(df)
df.method() description
head( [n] ), tail( [n] first/last n rows
)
describe() generate descriptive statistics (for numeric
columns only)
max(), min() return max/min values for all numeric
columns
mean(), median() return mean/median values for all numeric
columns
std() standard deviation

sample([n]) returns a random sample of the data frame

dropna() drop all the records with missing values

99
Hands-on exercises

 Give the summary for the numeric columns in the dataset

 Calculate standard deviation for all numeric columns;

 What are the mean values of the first 50 records in the

dataset? Hint: use head() method to subset the first 50

records and then calculate the mean

100
Selecting a column in a Data Frame
Method 1: Subset the data frame using column
name:
df['sex']

Method 2: Use the column name as an

attribute:
df.sex

Note: there is an attribute rank for pandas data frames, so to select

101
a column with a name "rank" we should use method 1.
Hands-on exercises

 Calculate the basic statistics for the salary column;

 Find how many values in the salary column (use count

method);

 Calculate the average salary;

102
Data Frames groupby method

Using "group by" method we can:

• Split the data into groups based on some criteria
• Calculate statistics (or apply a function) to each group
• Similar to dplyr() function in R
#Group data using rank
In [ ]:
df_rank = df.groupby(['rank'])

#Calculate mean value for each numeric column per each

In [ ]:
group
df_rank.mean()

103
Data Frames groupby method

Once groupby object is create we can calculate various

#Calculate mean salary for each professor rank:
In [ ]:
statistics for each group:
df.groupby('rank')[['salary']].mean()

Note: If single brackets are used to specify the column (e.g. salary), then the
output is Pandas Series object. When double brackets are used the output is a
104
Data Frame
Data Frames groupby method

groupby performance notes:

- no grouping/splitting occurs until it's needed. Creating
the groupby object only verifies that you have passed a
valid mapping
- by default the group keys are sorted during the groupby
operation. You may want to pass sort=False for potential
In [ speedup:
#Calculate mean salary for each professor rank:
]:
df.groupby(['rank'], sort=False)[['salary']].mean()

105
Data Frame: filtering

To subset the data we can apply Boolean indexing. This

indexing is commonly known as a filter. For example if we
want to subset the rows in which the salary value is greater
than $120K:
#Calculate mean salary for each professor rank:
In [ ]:
df_sub = df[ df['salary'] > 120000 ]

Any Boolean operator can be used to subset the data:

> greater; >= greater or equal;
< less; <= less or equal;
== equal; != not equal;
#Select only those rows that contain female
In [ ]:
professors:
106
df_f = df[ df['sex'] == 'Female' ]
Data Frames: Slicing

There are a number of ways to subset the Data Frame:

• one or more columns
• one or more rows
• a subset of rows and columns

Rows and columns can be selected by their position or

label

107
Data Frames: Slicing

When selecting one column, it is possible to use single set of

brackets, but the resulting object will be a Series (not a
DataFrame):
#Select column salary:
In [ ]:
df['salary']

When we need to select more than one column and/or make

the output to be a DataFrame, we should use double
brackets:
#Select column salary:
In [ ]:
df[['rank','salary']]

108
Data Frames: Selecting rows

If we need to select a range of rows, we can specify the range

using ":"
#Select rows by their position:
In [ ]:
df[10:20]

Notice that the first row has a position 0, and the last value in
the range is omitted:
So for 0:10 range the first 10 rows are returned with the
positions starting with 0 and ending with 9

109
Data Frames: method loc

If we need to select a range of rows, using their labels we can

use method loc:
#Select rows by their labels:
In [ ]:
df_sub.loc[10:20,['rank','sex','salary']]

Out[ ]:

110
Data Frames: method iloc

If we need to select a range of rows and/or columns, using

their positions we can use method iloc:
#Select rows by their labels:
In [ ]:
df_sub.iloc[10:20,[0, 3, 4, 5]]

Out[ ]:

111
Data Frames: method iloc
(summary)
df.iloc[0] # First row of a data frame
df.iloc[i] #(i+1)th row
df.iloc[-1] # Last row

df.iloc[:, 0] # First column

df.iloc[:, -1] # Last column

df.iloc[0:7] #First 7 rows

df.iloc[:, 0:2] #First 2 columns
df.iloc[1:3, 0:2] #Second through third rows and
first 2 columns
df.iloc[[0,5], [1,3]] #1st and 6th rows and 2nd and 4th
columns
112
Data Frames: Sorting

We can sort the data by a value in the column. By default the

sorting will occur in ascending order and a new data frame is
return.
# Create a new data frame from the original sorted by
In [ ]:
the column Salary
df_sorted = df.sort_values( by ='service')
df_sorted.head()
Out[ ]:

113
Data Frames: Sorting

We can sort the data using 2 or more columns:

df_sorted = df.sort_values( by =['service', 'salary'], ascending
In [ ]:
= [True, False])
df_sorted.head(10)

Out[ ]:

114
Missing Values
Missing values are marked as NaN
# Read a dataset with missing values
In [ ]:
flights =
pd.read_csv("http://rcs.bu.edu/examples/python/data_analysis/fli
ghts.csv")
In [ ]: Select the rows that have at least one missing value
#
flights[flights.isnull().any(axis=1)].head()

Out[ ]:

115
Missing Values
There are a number of methods to deal with missing values in
the data frame:
df.method() description
dropna() Drop missing observations
dropna(how='all') Drop observations where all cells is NA
dropna(axis=1, Drop column if all the values are missing
how='all')
dropna(thresh = 5) Drop rows that contain less than 5 non-
missing values
fillna(0) Replace missing values with zeros

isnull() returns True if the value is missing

notnull() Returns True for non-missing values

116
Missing Values
• When summing the data, missing values will be treated as
zero
• If all values are missing, the sum will be equal to NaN
• cumsum() and cumprod() methods ignore missing values
but preserve them in the resulting arrays
• Missing values in GroupBy method are excluded (just like
in R)
• Many descriptive statistics methods have skipna option to
control if missing data should be excluded . This value is
set to True by default (unlike R)

117
Aggregation Functions in Pandas
Aggregation - computing a summary statistic about each
group, i.e.
• compute group sums or means
• compute group sizes/counts

Common aggregation functions:

min, max
count, sum, prod
mean, median, mode, mad
std, var

118
Aggregation Functions in Pandas
agg() method are useful when multiple statistics are
In
computed per column:
flights[['dep_delay','arr_delay']].agg(['min','mean','max'])
[ ]:

Out[ ]:

119
Basic Descriptive Statistics
df.method() description
describe Basic statistics (count, mean, std, min,
quantiles, max)
min, max Minimum and maximum values

mean, median, Arithmetic average, median and mode

mode
var, std Variance and standard deviation

sem Standard error of mean

skew Sample skewness

kurt kurtosis

120
Graphics to explore the data

Seaborn package is built on matplotlib but provides

high level interface for drawing attractive statistical
graphics, similar to ggplot2 library in R. It
specifically targets statistical data visualization

To show graphs within Python notebook include inline

directive:
%matplotlib inline
In [ ]:

121
Graphics
description
distplot histogram
barplot estimate of central tendency for a numeric
variable
violinplot similar to boxplot, also shows the probability
density of the data
jointplot Scatterplot
regplot Regression plot
pairplot Pairplot
boxplot boxplot
swarmplot categorical scatterplot
factorplot General categorical plot

122
Basic statistical Analysis
statsmodel and scikit-learn - both have a number of function for
statistical analysis

The first one is mostly used for regular analysis using R style
formulas, while scikit-learn is more tailored for Machine Learning.

statsmodels:
• linear regressions
• ANOVA tests
• hypothesis testings
• many more ...

scikit-learn:
• kmeans
• support vector machines
• random forests
• many more ...
123
See examples in the Tutorial Notebook
Descriptive vs. Inferential Statistics
• Descriptive: e.g., Median; describes data you have
but can't be generalized beyond that
– We’ll talk about Exploratory Data Analysis
• Inferential: e.g., t-test, that enable inferences about
the population beyond our data
– These are the techniques we’ll leverage for
Machine Learning and Prediction
EDA Tools
• Python and R language are the two most commonly used data
science tools to create an EDA
• Perform k-means clustering. It is an unsupervised learning
algorithm where the data points are assigned to clusters, also
known as k-groups. K-means clustering is commonly used in market
segmentation, image compression, and pattern recognition.
• EDA can be used in predictive models such as linear regression,
where it is used to predict outcomes.
• It is also used in univariate, bivariate, and multivariate visualization
for summary statistics, establishing relationships between each
variable, and for understanding how different fields in the data
interact with each other.

125
Outline
• Exploratory Data Analysis
– Chart types
– Some important distributions
– Hypothesis Testing
Examples of Business Questions
• Simple (descriptive) Stats
– “Who are the most profitable customers?”
• Hypothesis Testing
– “Is there a difference in value to the company of these
customers?”
• Segmentation/Classification
– What are the common characteristics of these customers?
• Prediction
– Will this new customer become a profitable customer? If
so, how profitable?

adapted from Provost and Fawcett, “Data Science for Business”

Applying techniques
• Most business questions are causal: what would
happen if? (e.g. I show this ad)
• But its easier to ask correlational questions, (what
happened in this past when I showed this ad).
• Supervised Learning:
– Classification and Regression
• Unsupervised Learning:
– Clustering and Dimension reduction
• Note: Unsupervised Learning is often used inside a
larger Supervised learning problem.
– E.g. auto-encoders for image recognition neural nets.
Applying techniques
• Supervised Learning:
– kNN (k Nearest Neighbors)
– Naïve Bayes
– Logistic Regression
– Support Vector Machines
– Random Forests
• Unsupervised Learning:
– Clustering
– Factor analysis
– Latent Dirichlet Allocation
Exploratory Data Analysis 1977
• Based on insights developed at Bell Labs
in the 60’s
• Techniques for visualizing and
summarizing data
• What can the data tell us? (in contrast to
“confirmatory” data analysis)
• Introduced many basic techniques:
• 5-number summary, box plots, stem
and leaf diagrams,…
• 5 Number summary:
• extremes (min and max)
• median & quartiles
• More robust to skewed & longtailed
distributions
The Trouble with Summary Stats
Looking at Data
Data Presentation
• Data Art

133
The “R” Language
• An evolution of the “S” language developed at Bell
labs for EDA.
• Idea was to allow interactive exploration and
visualization of data.
• The preferred language for statisticians, used by
many other data scientists.
• Features:
– Probably the most comprehensive collection of statistical
models and distributions.
– CRAN: a very large resource of open source statistical
models.

Chart examples from Jeff Hammerbacher’s 2012 CS194 class 134

Chart types
• Single variable
– Dot plot
– Jitter plot
– Error bar plot
– Box-and-whisker plot
– Histogram
– Kernel density estimate
– Cumulative distribution function

(note: examples using qplot library from R)

Chart examples from Jeff Hammerbacher’s 2012 CS194 class 135
Chart types

• Dot plot

136
Chart types
• Jitter plot
• Noise added to the y-axis to spread the points

137
Chart types
• Error bars: usually based on confidence intervals (CI).
95% CI means 95% of points are in the range,
so 2.5% of points are above or below the bar.
• Not necessarily symmetric:

138
Chart types
• Box-and-whisker plot : a graphical form of 5-number
summary (Tukey)

139
Chart types
• Histogram

140
Chart types
• Kernel density estimate

141
Chart types
• Histogram and Kernel Density Estimates
– Histogram
• Proper selection of bin width is important
• Outliers should be discarded
– KDE (like a smooth histogram)
• Kernel function
– Box, Epanechnikov, Gaussian
• Kernel bandwidth

142
Chart types
• Cumulative distribution function
• Integral of the histogram – simpler to build than KDE
(don’t need smoothing)

143
Chart types
• Two variables
– Bar chart
– Scatter plot
– Line plot
– Log-log plot

144
Chart types
• Bar plot: one variable is discrete

145
Chart types
• Scatter plot

146
Chart types
• Line plot

147
Chart types
• Log-log plot: Very useful for power law data

Frequency of
words in tweets

slope ~ -1

Rank of words in tweets, most frequent to least:

I, the, you,…
148
Chart types
• More than two variables
– Stacked plots
– Parallel coordinate plot

149
Chart types
• Stacked plot: stack variable is discrete:

150
Chart types
• Parallel coordinate plot: one discrete variable, an
arbitrary number of other variables:

151
References
• The material is prepared by taking inputs from
various text books and other internet sources.

Sample Python Programming Previous Year Solved Paper (AKTU)
100% (1)
Sample Python Programming Previous Year Solved Paper (AKTU)
10 pages
The Absolute Basics: Basic and Intermediate Python 3 - Notes/Cheat Sheet
100% (3)
The Absolute Basics: Basic and Intermediate Python 3 - Notes/Cheat Sheet
11 pages
Devops Practice Test Series 1 Practice Test
100% (1)
Devops Practice Test Series 1 Practice Test
5 pages
MachineLearningNotes PDF
100% (1)
MachineLearningNotes PDF
299 pages
Python Fundamentals For Machine Learning Version1
No ratings yet
Python Fundamentals For Machine Learning Version1
58 pages
CAT1 F1 Key
No ratings yet
CAT1 F1 Key
6 pages
(C) Qadvis: Iec 62304 and Iec 82304-1 - How To Make Them Work
No ratings yet
(C) Qadvis: Iec 62304 and Iec 82304-1 - How To Make Them Work
33 pages
SELLICS-Amazon PPC TheUltimateGuide
No ratings yet
SELLICS-Amazon PPC TheUltimateGuide
30 pages
PythonTraining MD Saiful Azad UMP
No ratings yet
PythonTraining MD Saiful Azad UMP
54 pages
Introduction To Data Science-Python
No ratings yet
Introduction To Data Science-Python
18 pages
Python Crash Course
No ratings yet
Python Crash Course
12 pages
Python Cheat Sheet: Conditional Tests (Comparisons)
No ratings yet
Python Cheat Sheet: Conditional Tests (Comparisons)
2 pages
Introduction To Python Programming
No ratings yet
Introduction To Python Programming
36 pages
DS Lab Manual
No ratings yet
DS Lab Manual
110 pages
Python
No ratings yet
Python
25 pages
Week2 Lecture2
No ratings yet
Week2 Lecture2
27 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
86 pages
weeks 4 to 7
No ratings yet
weeks 4 to 7
155 pages
Basic Python
No ratings yet
Basic Python
12 pages
Learn Python in Y Minutes
No ratings yet
Learn Python in Y Minutes
18 pages
Lecture 1
No ratings yet
Lecture 1
26 pages
Python Tutorial
No ratings yet
Python Tutorial
173 pages
Q-Step WS 02102019 Practical Introduction To Python
No ratings yet
Q-Step WS 02102019 Practical Introduction To Python
88 pages
PYTHON - 1
No ratings yet
PYTHON - 1
18 pages
Lab 02 Tools and Techniques For Data Science
No ratings yet
Lab 02 Tools and Techniques For Data Science
16 pages
Python Introduction
No ratings yet
Python Introduction
53 pages
Introduction To Programming With Python
No ratings yet
Introduction To Programming With Python
45 pages
Programming PPT CS
No ratings yet
Programming PPT CS
43 pages
NumPy, SciPy and MatPlotLib
100% (1)
NumPy, SciPy and MatPlotLib
18 pages
04 Python3 Intro DRAFT
No ratings yet
04 Python3 Intro DRAFT
177 pages
An Introduction To Python Programming Language
No ratings yet
An Introduction To Python Programming Language
63 pages
CT Hndpyth
No ratings yet
CT Hndpyth
11 pages
PYTHONPPT
No ratings yet
PYTHONPPT
65 pages
Python CheatSheet - Sahil
No ratings yet
Python CheatSheet - Sahil
8 pages
0A Python Prerequisites
No ratings yet
0A Python Prerequisites
9 pages
01_Python_I_All_Master_13_02_2025
No ratings yet
01_Python_I_All_Master_13_02_2025
258 pages
Introduction To Python
No ratings yet
Introduction To Python
16 pages
0A Python Prerequisites
No ratings yet
0A Python Prerequisites
9 pages
Full Python Tutorial - Dipartimento Di Informatica
No ratings yet
Full Python Tutorial - Dipartimento Di Informatica
116 pages
Python 1
No ratings yet
Python 1
14 pages
Discrete Structures Lab 1 Python Basics: 1 Python Installation 2 Python Tutorial
No ratings yet
Discrete Structures Lab 1 Python Basics: 1 Python Installation 2 Python Tutorial
8 pages
Python 2
No ratings yet
Python 2
45 pages
Python_1_merged
No ratings yet
Python_1_merged
101 pages
PythonGuide V1.2.9
100% (2)
PythonGuide V1.2.9
2 pages
Python Important Topics
No ratings yet
Python Important Topics
12 pages
Paper Solution
No ratings yet
Paper Solution
11 pages
WEEK 02 - Variables and Types Control Statements
No ratings yet
WEEK 02 - Variables and Types Control Statements
16 pages
Python in 90 Minutes
No ratings yet
Python in 90 Minutes
53 pages
Python
No ratings yet
Python
125 pages
Python
No ratings yet
Python
71 pages
python1
No ratings yet
python1
49 pages
Python 1
No ratings yet
Python 1
47 pages
Python Lecture II & IIIqkekrk2k1i
No ratings yet
Python Lecture II & IIIqkekrk2k1i
10 pages
Tutorial 1
No ratings yet
Tutorial 1
8 pages
01 Python I 08-02-24
No ratings yet
01 Python I 08-02-24
228 pages
Report of Python (1.) (1)
No ratings yet
Report of Python (1.) (1)
52 pages
Chapter 2 Python Basics
No ratings yet
Chapter 2 Python Basics
80 pages
1 - Python Basics
100% (1)
1 - Python Basics
37 pages
Python Numpy Co Ban
No ratings yet
Python Numpy Co Ban
26 pages
Pyp Ques Bank
No ratings yet
Pyp Ques Bank
24 pages
Babaoskag
No ratings yet
Babaoskag
76 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Week 12 - Lecture Notes Special Matrices
No ratings yet
Week 12 - Lecture Notes Special Matrices
25 pages
Week 6 - Lecture Notes Maxima and Minima: Dy DX
No ratings yet
Week 6 - Lecture Notes Maxima and Minima: Dy DX
13 pages
Module 5 C
No ratings yet
Module 5 C
44 pages
Module 2 - Statistical Foundations
No ratings yet
Module 2 - Statistical Foundations
108 pages
CAT1 F2 Final Key
No ratings yet
CAT1 F2 Final Key
6 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
96 pages
Contact For The Course: - Instructor: Dr. Kauser Ahmed P
No ratings yet
Contact For The Course: - Instructor: Dr. Kauser Ahmed P
54 pages
Water Soluble Vitamins 4
No ratings yet
Water Soluble Vitamins 4
15 pages
Water Soluble Vitamins 3
No ratings yet
Water Soluble Vitamins 3
20 pages
13.1 Why Fourier Series?
No ratings yet
13.1 Why Fourier Series?
38 pages
Peer Pressure
No ratings yet
Peer Pressure
21 pages
Hueristic Searchh
No ratings yet
Hueristic Searchh
19 pages
GA Convex Hulls
No ratings yet
GA Convex Hulls
24 pages
Katalog Sapa Sistem
No ratings yet
Katalog Sapa Sistem
72 pages
C Programming
No ratings yet
C Programming
19 pages
12.5.13-Packet-Tracer - Troubleshoot-Enterprise-Networks
No ratings yet
12.5.13-Packet-Tracer - Troubleshoot-Enterprise-Networks
7 pages
SIGMETRICS 2009 SHARPE Age TwentyTwo
No ratings yet
SIGMETRICS 2009 SHARPE Age TwentyTwo
6 pages
ERP MOV Consideraciones
No ratings yet
ERP MOV Consideraciones
16 pages
Digital Notes
No ratings yet
Digital Notes
43 pages
Function in PHP
No ratings yet
Function in PHP
11 pages
HTML Notes_0941b5dc-95a0-4908-96e0-d1d0bb10b5f2
No ratings yet
HTML Notes_0941b5dc-95a0-4908-96e0-d1d0bb10b5f2
6 pages
Sample Report Cyber Threat Assessment Black Hat
No ratings yet
Sample Report Cyber Threat Assessment Black Hat
10 pages
Sap BPC Optimized
No ratings yet
Sap BPC Optimized
7 pages
Differential Phase Shift Keying
100% (1)
Differential Phase Shift Keying
7 pages
Flex 3 Bible 1st Edition David Gassnerinstant download
100% (2)
Flex 3 Bible 1st Edition David Gassnerinstant download
53 pages
About BITS Cmdlets
No ratings yet
About BITS Cmdlets
3 pages
Spam_Detection_Viva_Questions_Full
No ratings yet
Spam_Detection_Viva_Questions_Full
5 pages
Section7 - SQL
No ratings yet
Section7 - SQL
64 pages
Freedom Bundles - Telkom Kenya Limited PDF
No ratings yet
Freedom Bundles - Telkom Kenya Limited PDF
5 pages
ONIXtraining Handout
No ratings yet
ONIXtraining Handout
65 pages
SQL Injection Complete Guide
No ratings yet
SQL Injection Complete Guide
45 pages
Asynchronous Bus.
No ratings yet
Asynchronous Bus.
3 pages
Project Synopsis Format
No ratings yet
Project Synopsis Format
3 pages
ECB-641 2nd Quick Installation Guide PDF
No ratings yet
ECB-641 2nd Quick Installation Guide PDF
20 pages
Z620 Honeywell Planning Installation and Service Guide HWDOC-X223-en
No ratings yet
Z620 Honeywell Planning Installation and Service Guide HWDOC-X223-en
110 pages
PDF (Ebook) Simulation Tools and Techniques: 13th EAI International Conference, SIMUtools 2021, Virtual Event, November 5-6, 2021, Proceedings (Lecture Notes of ... and Telecommunications Engineering) by Dingde Jiang, Houbing Song ISBN 9783030971236, 3030971236 download
100% (6)
PDF (Ebook) Simulation Tools and Techniques: 13th EAI International Conference, SIMUtools 2021, Virtual Event, November 5-6, 2021, Proceedings (Lecture Notes of ... and Telecommunications Engineering) by Dingde Jiang, Houbing Song ISBN 9783030971236, 3030971236 download
71 pages
SB Projects IR NEC Protocol
No ratings yet
SB Projects IR NEC Protocol
3 pages
Consumer App Self Reading-1
No ratings yet
Consumer App Self Reading-1
13 pages
Ictict532 Case Study
No ratings yet
Ictict532 Case Study
2 pages
Standard Features of HTML and CSS
No ratings yet
Standard Features of HTML and CSS
5 pages

Uploaded by

Uploaded by

Module 5 Programming Foundation

and Exploratory Data Analysis

• Recommend the Anaconda distribution that packages

• Complete introduction to all of these computational tools.

• Learn to write programs, generate images from data, and

list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Built in numerical types: int, float, complex

• f = 7 / float(2) # f is 3.5 in both python 2 and 3

real_long_string = 'this is a really long string. \

• Use triple quotes for multi line strings

not_tab_string = r"\t" # represents the characters '\' and 't'

x[2] = 5 # in python 2, will result in [0, 1, 5, 3, 4, 5]

• Another_tuple = tuple([0, 1, 2, 3, 4])

• Can be sliced, concatenated, or repeated

• Access/modify value with key

grades["Tim"] = 99 # replaces the old value

• Use “get” to avoid keyError and add default value

• Use “get” to avoid keyError and add default value

• Similarly, dict.keys(), dict.values(), and dict.items()

• Difference between python 2 and python3 print

What happens if we forgot to indent?

for x in range(10): Keyword pass in loops:

my_print("hello") # prints 'hello'

def subtract(a=0, b=0):

In [104]: def small_func(x): return x+4

In [107]: def getKey(pair): return pair[0]

• Change the default behavior of sorted

In [51]: squares = [x * x for x in range(5)]

In [64]: for x in range(5): squares[x] = x

# only pairs with x < y,

In [209]: [a[i] for i in a if is_even(i)]

In [220]: reduce(lambda x, y: x*y, [1, 2, 3, 4])

In [259]: list(zip(['James', 100], ['Tom', 90], ['Mary', 95]))

L = inflobj.readline () Read one line

L = inflobj.readlines() Returns a list of line strings

outflobj = open(‘data’, ‘w’) Open the file ‘data’

#bad style. Many unknown #This is fine

>>> import sys

>>> t = (23, ‘abc’, 4.56, (2,3),

>>> t = (23, ‘abc’, 4.56,

>>> t = (23, ‘abc’, 4.56,

• For strings, tests for substrings

>>> (1, 2, 3) + (4, 5, 6)

>>> [1, 2, 3] + [4, 5, 6]

>>> “Hello” + “ ” + “World”

• You can’t change a tuple.

>>> li = [1, 11, 3, 4, 5]

>>> li.append(‘a’) # Note the method

>>> li.insert(2, ‘i’)

>>> li.reverse() # reverse the list *in place*

>>> li.sort() # sort the list *in place*

• Python shows parens for clarity (best practice)

• Don't forget the comma!

• Trailing comma only required for singletons others

Reading Data; Selecting and Filtering the Data; Data

Plotting the data

and many more …

 provides vectorization of mathematical operations

 part of SciPy Stack

 provides tools for data manipulation: reshaping,

 built on NumPy, SciPy and matplotlib

 a set of functionalities similar to those of MATLAB

 line plots, scatter plots, barcharts, histograms, pie

 provides high level interface for drawing attractive

 Similar (in style) to the popular ggplot2 library in R

Press Shift+Enter to execute the jupyter cell

There is a number of pandas commands to read other data formats:

 Try to read the first 10, 20, 30 records;

 Can you guess how to view the last few records;

#Check types for all the columns

size number of elements

 Find how many records this data frame has;

 How many elements are there?

 What are the column names?

 What types of columns we have in this data frame?

sample([n]) returns a random sample of the data frame

dropna() drop all the records with missing values

 Give the summary for the numeric columns in the dataset

>>> li.reverse() # reverse the list in place

>>> li.sort() # sort the list in place