Numpy
Numpy
NumPy stands for ‘Numerical Python’ or ‘Numeric Python’. It is an open source module of Python
which provides fast mathematical computation on arrays and matrices. Since, arrays and matrices
are an essential part of the Machine Learning ecosystem, NumPy along with Machine Learning
modules like Scikit-learn, Pandas, Matplotlib, TensorFlow, etc. complete the Python Machine
Learning Ecosystem.
There are a number of ways to initialize new Numpy arrays, for example from
From lists
For example, to create new vector and matrix arrays from Python lists we can use the
numpy.array function
So far the numpy.ndarray looks a lot like a Python list (or nested list). Why not simply use
Python lists for computations instead of creating a new array type?
For larger arrays it is inpractical to initialize the data manually, using explicit pythons lists. Instead
we can use one of the many functions in numpy that generates arrays of different forms.
Most of the times, we use NumPy built-in methods to create arrays. These are much simpler and
faster.
arange()
linspace()
zeros()
ones()
eye()
diag()
full()
Random
rand()
random()
randn()
normal()
randint()
choice()
reshape()
a. arange()
arange() is very much similar to Python function range()
Syntax: arange([start,] stop[, step,], dtype=None)
Return evenly spaced values within a given interval.
Out[8]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
b. linspace()
Return evenly spaced numbers over a specified interval.
Press shift+tab for the documentation.
In [9]: 1 # start from 1 & end at 15 with 15 evenly spaced points b/w 1 to 15.
2 print(np.linspace(1, 15, 15))
3 type(np.linspace(1, 15, 15))
Out[9]: numpy.ndarray
In [10]: 1 # Lets find the step size with "retstep" which returns the array and the ste
2 my_linspace = np.linspace(5, 15, 9, retstep=True)
3 my_linspace[1]
Out[10]: 1.25
In [11]: 1 # my_linspace[1] to get the stepsize only
2 type(my_linspace)
Out[11]: tuple
In [12]: 1 my_linspace[0]
Out[12]: array([ 5. , 6.25, 7.5 , 8.75, 10. , 11.25, 12.5 , 13.75, 15. ])
In [13]: 1 my_linspace
Out[13]: (array([ 5. , 6.25, 7.5 , 8.75, 10. , 11.25, 12.5 , 13.75, 15. ]), 1.25)
Don't Confuse!
arange() takes 3rd argument as step size.
linspace() take 3rd argument as no of point we want.
c. zeros()
We want to create an array with all zeros
d. ones()
We want to create an array with all ones
In [16]: 1 np.ones(3)
e. eye()
Creates an identity matrix must be a square matrix, which is useful in several linear algebra
problems.
Return a 2-D array with ones on the diagonal and zeros elsewhere.
In [18]: 1 np.eye(5)
f. diag()
g. full()
In [21]: 1 np.full((3,3),'hello')
h. rand()
Create an array of the given shape and populate it with random samples from a uniform
continuous distribution over half-open interval [0, 1) .
In [23]: 1 np.random.rand(3,2) # row, col, note we are not passing a tuple here, each d
In [24]: 1 np.random.rand([3,2])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-24-509995c2e821> in <module>
----> 1 np.random.rand([3,2])
mtrand.pyx in mtrand.RandomState.rand()
mtrand.pyx in mtrand.RandomState.random_sample()
mtrand.pyx in mtrand.cont0_array()
i. random()
This will return random floats in the half-open interval [0, 1) following the “continuous uniform”
distribution.
np.random.random((4,3))
"The only difference is in how the arguments are handled. With numpy.random.rand(), the length of
each dimension of the output array is a separate argument. With numpy.random.random(), the
shape argument is a single tuple
j. randn()
Return a sample (or samples) from the "standard normal" or a "Gaussian" distribution. Unlike rand
which is uniform.it gives a distribution from some standardized normal distribution (mean 0 and
variance 1).
Press shift+tab for the documentation.
In [26]: 1 np.random.randn(2)
In [28]: 1 np.random.randn((7,7))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-28-0701a3589c9f> in <module>
----> 1 np.random.randn((7,7))
mtrand.pyx in mtrand.RandomState.randn()
mtrand.pyx in mtrand.RandomState.standard_normal()
mtrand.pyx in mtrand.cont0_array()
k. normal()
numpy.random.normal(loc=0.0, scale=1.0, size=None) Draw random samples from a normal
(Gaussian) distribution.
Parameters :
loc : float -- Mean (“centre”) of the distribution. scale : float -- Standard deviation (spread or “width”)
of the distribution. size : tuple of ints -- Output shape. If the given shape is, e.g., (m, n, k), then m *
n * k samples are drawn.
In [29]: 1 np.random.normal()
Out[29]: 1.0980093126932042
L. randint()
Return random integers from low (inclusive) to high (exclusive).
Return random integers from the "discrete uniform" distribution of the specified dtype in the "half-
open" interval [ low , high ). If high is None (the default), then results are from [0, low ).
Out[31]: 73
Out[32]: array([15, 66, 91, 34, 83, 43, 64, 68, 15, 19])
m. choice()
Docstring:
In [33]: 1 y=np.random.choice(['a','b'],1000,p=[0.8,0.2])
2 np.unique(y, return_counts=True)
n. reshape()
shapes an array without changing data of array.
Original array :
[0 1 2 3 4 5 6 7]
[[4 5]
[6 7]]]
Attributes of a NumPy :
Ndim: displays the dimension of the array
Shape: returns a tuple of integers indicating the size of the array
Size: returns the total number of elements in the NumPy array
Dtype: returns the type of elements in the array, i.e., int64, character
Itemsize: returns the size in bytes of each item
nbytes: which lists the total size (in bytes) of the array
Reshape: Reshapes the NumPy array
x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60
Each array has attributes ndim (the number of dimensions), shape (the size of each dimension),
and size (the total size of the array):
In [40]: 1 array_1d
In [41]: 1 # In the simplest case, selecting one or more elements of NumPy array looks
2 # Getting value at certain index
3 array_1d[0]
Out[41]: -10
In [45]: 1 # Getting up-to and from certain index -- remember index starts from '0'
2 # (no need to give start and stop indexes)
3 array_1d[:2], array_1d[2:]
In [47]: 1 array_1d
2 # The first element is changed to -102
To access any single element from 2D-Numpy, the general format is:
array_2d[row][col]
or
array_2d[row,col] .
We will use [row,col] , easier to use comma ',' for clarity. However if we suppose to access the
more than one element then these two expression will give different result.
In [49]: 1 an_array
In [52]: 1 xx[1:3]
In [53]: 1 an_array[[1,2],[1,3]]
In [54]: 1 an_array[[0,1],[1,0]]
In [55]: 1 an_array[[0,1]][[1,0]]
[[11 12 13 14]
[21 22 23 24]
[31 32 33 34]]
Use array slicing to get a subarray consisting of the first 2 rows x 2 columns.
[[12 13]
[22 23]]
When you modify a slice, you actually modify the underlying array.
Before: 12
After: 1000
[[11 12 13 14]
[21 22 23 24]
[31 32 33 34]]
In [60]: 1 # Using both integer indexing & slicing generates an array of lower rank
2 row_rank1 = an_array[1, :] # Rank 1 view
3
4 print(row_rank1, row_rank1.shape) # notice only a single []
In [61]: 1 # Slicing alone: generates an array of the same rank as the an_array
2 row_rank2 = an_array[1:2, :] # Rank 2 view
3
4 print(row_rank2, row_rank2.shape) # Notice the [[ ]]
[[12]
[22]
[32]] (3, 1)
Original Array:
[[11 12 13]
[21 22 23]
[31 32 33]
[41 42 43]]
In [65]: 1 # Examine the pairings of row_indices and col_indices. These are the elemen
2 for row,col in zip(row_indices,col_indices):
3 print(row, ", ",col)
0 , 0
1 , 1
2 , 2
3 , 0
In [66]: 1
2 # Select one element from each row
3 print('Values in the array at those indices: ',an_array[row_indices, col_ind
In [67]: 1 # Change one element from each row using the indices selected
2 an_array[row_indices, col_indices] += 100000
3
4 print('\nChanged Array:')
5 print(an_array)
Changed Array:
[[100011 12 13]
[ 21 100022 23]
[ 31 32 100033]
[100041 42 43]]
In [69]: 1 # create a filter which will be boolean values for whether each element meet
2 c=a > 2
3 print(c)
[[False False]
[ True True]
[ True True]]
Notice that the c is a same size ndarray as array a, array c is filled with True for each element
whose corresponding element in array a is greater than 2 and False for those elements whose
value is less than 2.
We can use , these comparison expressions directly for access. Result is only those elements for
which the expression evaluates to True.
In [70]: 1 print(a[c])
2 print(a[c].shape)
[3 4 5 6]
(4,)
Lets see if this works with writing mulitple conditions as well. In that process we'll also see that we
dont have to store results in one variable and then pass for subsetting. We can instead, write the
conditional expression directly for subsetting.
In [71]: 1 a>2
In [72]: 1 a<5
In [75]: 1 print(a)
2 print(a[(a>2) | (a<5)] )
3 a[(a>2) & (a<5)] ###### A, B i.e Multiple operation in one line
[[1 2]
[3 4]
[5 6]]
[1 2 3 4 5 6]
[[111 112]
[121 122]]
[[211.1 212.1]
[221.1 222.1]]
In [77]: 1 # add
2 print(x + y) # The plus sign works
3 print()
4 print(np.add(x, y)) # so does the numpy function "add"
[[322.1 324.1]
[342.1 344.1]]
[[322.1 324.1]
[342.1 344.1]]
In [78]: 1 # subtract
2 print(x - y)
3 print()
4 print(np.subtract(x, y))
[[-100.1 -100.1]
[-100.1 -100.1]]
[[-100.1 -100.1]
[-100.1 -100.1]]
In [79]: 1 # multiply
2 print(x * y)
3 print()
4 print(np.multiply(x, y))
[[23432.1 23755.2]
[26753.1 27096.2]]
[[23432.1 23755.2]
[26753.1 27096.2]]
In [80]: 1 # divide
2 print(x / y)
3 print()
4 print(np.divide(x, y))
[[0.52581715 0.52805281]
[0.54726368 0.54930212]]
[[0.52581715 0.52805281]
[0.54726368 0.54930212]]
[[10.53565375 10.58300524]
[11. 11.04536102]]
In [82]: 1 # exponent (e ** x)
2 print(np.exp(x))
[[1.60948707e+48 4.37503945e+48]
[3.54513118e+52 9.63666567e+52]]
In general you'll find that , mathematical functions from numpy [being referred as np here ] when
applied on array, give back result as an array where that function has been applied on individual
elements. However the functions from package math on the other hand give error when applied to
arrays. They only work for scalars.
In [83]: 1 # square root
2 import math
3 math.sqrt(x)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-83-f63d9241fcd6> in <module>
1 # square root
2 import math
----> 3 math.sqrt(x)
np.dot() in Numpy
If both a and b are 1-D (one dimensional) arrays — Inner product of two vectors (without a
complex conjugation)
If both a and b are 2-D (two dimensional) arrays — Matrix multiplication
If either a or b is 0-D (also known as a scalar) — Multiply by using numpy.multiply(a, b) or a *
b.
If a is an N-D array and b is a 1-D array — Sum product over the last axis of a and b.
If a is an N-D array and b is an M-D array provided that M>=2 — Sum product over the last
axis of a and the second-to-last axis of b:
If the last dimension of a is not the same size as the second-to-last dimension of b.
Out[86]: 219
You can see that result is not what you'd expect from matrix multiplication. This happens because
a single dimensional array is not a matrix.
In [87]: 1 print(v.shape)
2 print(w.shape)
(2,)
(2,)
In [88]: 1 v=v.reshape((1,2))
2 w=w.reshape((1,2))
3 v
Now if you simply try to do v.dot(w) or np.dot(v,w) [both are same] , you will get and error because
you can multiple a mtrix of shape 2X1 with a matrix of 2X1 .
In [89]: 1 np.dot(v,w)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-89-efb51945670c> in <module>
----> 1 np.dot(v,w)
matrix v : [[ 9 10]]
matrix v Transpose: [[ 9]
[10]]
matrix w: [[11 12]]
matrix w Transpose: [[11]
[12]]
~~~~~~~~~ v multiply with transpose of w
[[219]]
~~~~~~~~~ transpose of v is multiply by w
[[ 99 108]
[110 120]]
If you leave v to be a single dimensional array . you will simply get an element wise multiplication.
Here is an example
In [91]: 1 print(x)
2 v=np.array([9,10])
3 print("~~~~~")
4 print(v)
5 x.dot(v)
[[111 112]
[121 122]]
~~~~~
[ 9 10]
In [92]: 1 print(x)
2 print("~~~")
3 print(y)
4 x.dot(y)
[[111 112]
[121 122]]
~~~
[[211.1 212.1]
[221.1 222.1]]
Broadcasting
Numpy arrays are different from normal Python lists because of their ability to broadcast. We will
only cover the basics, for further details on broadcasting rules, click here
(https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
Another good read on broadcasting (https://jakevdp.github.io/PythonDataScienceHandbook/02.05-
computation-on-arrays-broadcasting.html)!
Take a slice of the array and set it equal to some number, say 500.
array_1d[0:5] = 500
this will broadcast the value of 500 to the first 5 elements of the array_1d
In [99]: 1 np.arange(0,4).shape
Out[99]: (4,)
In [100]: 1 array_2d
In [104]: 1 array_1
In [105]: 1 array_2
Out[105]: array([[1],
[2],
[3]])
In [106]: 1 from IPython.display import Image
2 Image("newaxis.jpg")
Out[106]:
In [107]: 1 # Official way of printing is used, format() and len() are used for revision
2 print(array_1)
3 print("Shape of the array is: {}, this is {}-D array".format(array_1.shape,l
4 # (3,) indicates that this is a one dimensional array (vector)
[1 2 3]
Shape of the array is: (3,), this is 1-D array
In [108]: 1 # Official way of printing is used, format() and len() are used for revision
2 print(array_2)
3 print("Shape of the array is: {}, this is {}-D array".format(array_2.shape,l
4 # (3, 1) indicates that this is a 2-D array (matrix)
[[1]
[2]
[3]]
Shape of the array is: (3, 1), this is 2-D array
In [109]: 1 # Broadcasting arrays
2 array_1 + array_2