0% found this document useful (0 votes)
21 views

Numpy

Uploaded by

Chirag Saraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Numpy

Uploaded by

Chirag Saraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

NumPy

NumPy is ‘Numerical Python’.


What is NumPy?
NumPy is an open-source Python library that is used in almost every field of
science and engineering. It contains a powerful N-dimensional array object. An
N-dimensional array is simply an array with any number of dimensions.
An array with a single dimension is known as vector, while a matrix refers to an
array with two dimensions. For 3-D or higher dimensional arrays, the term tensor
is also commonly used. In NumPy, dimensions are also called axes.

Differences between NumPy array and the standard Python list


❑ NumPy arrays have fixed size, unlike Python lists which can grow
dynamically.
❑ All elements in a NumPy array are required to be of the same data
type whereas the Python list can contain any type of element.
❑ NumPy arrays are faster than lists.

How to install?
• In windows write pip install numpy in command prompt.
• In Ubuntu write sudo apt install python3-numpy in terminal.

How to import NumPy


In order to start using NumPy and all of the functions available in NumPy,
you’ll need to import it. This can be easily done with this import statement:
>>> import numpy
Or you can shorten numpy to an alias version np.
>>> import numpy as np
1- Dimensional Array:

>>> x = np.array([10, 20, 30])


>>> type(x) # Type of object
<class 'numpy.ndarray'>
>>> len(x) # Length: No. of elements # 3
>>> sum(x) # Sum of elements #60

2- Dimensional Array:
>>> M = np.array([[1, 2, 3], [4, 5, 6]])
>>> M
array([[1, 2, 3],
[4, 5, 6]])
[The display is like a Matrix with 2 rows and 3 columns.]
>>> M.ndim
2
>>> M.size
6
>>> M.shape # [Shape of the 2D array as a tuple]
(2, 3)
Shape, Reshape, Size and Resize of arrays:
>>> A = np.array([2, 4, 6, 8, 10, 12])
>>> A.shape
(6,)
[1D array with 6 elements.]
# Converted to a 2D array ( 2 × 3)
>>> B =A.reshape(2, 3)
>>> B
array([[ 2, 4, 6],
[ 8, 10, 12]])
>>> A.reshape(3, 2)
array([[ 2, 4],
[ 6, 8],
[10, 12]])
>>> A
array([ 2, 4, 6, 8, 10, 12])
Thus, In case of reshape, the original array remains unchanged.
>>> A.size
6
>>> B.size
6
>>> A.resize(2, 3)
>>> A
array([[ 2, 4, 6],
[ 8, 10, 12]])
Thus, in case of using resize, the original array becomes modified.

# 1 Range of numbers
np.arange(start, stop, step)

# Examples
>>> np.arange(2, 10, 3)
array([2, 5, 8])
# Default step = 1
>>> np.arange(2, 10)
array([2, 3, 4, 5, 6, 7, 8, 9])
# Default start = 0
>>> np.arange(5)
array([0, 1, 2, 3, 4])
>>> np.arange(0.2, 2, 0.4)
array([0.2, 0.6, 1. , 1.4, 1.8])
>>> np.arange(5,10, 2, dtype= ‘f’)
array([5., 7., 9.], dtype=float32)

# 2 Linear space
np.linspace(start, end, number of elements)
# Examples
>>> np.linspace(10, 20, 5)
array([10., 12.5, 15., 17.5, 20.])
Note: Step size = (20 – 10)/4 = 2.5 [4 gaps]
>>> np.linspace(10, 20, 5, endpoint = True)
array([10., 12.5, 15., 17.5, 20.])
>>> np.linspace(10, 20, 5, endpoint = False)
array([10., 12., 14., 16., 18.])

Concatenating arrays:

1D:
>>> a = np.array([10, 20, 30])
>>> b =np.array([40, 50, 60])
>>> np.concatenate((a, b))
array([10, 20, 30, 40, 50, 60])

2D:
>>> x
array([[1, 2],
[3, 4]])
>>> y
array([[5, 6],
[7, 8]])
>>> np.concatenate((x, y), axis = 1)
array([[1, 2, 5, 6 ],
[3, 4, 7, 8 ]])
>>> np.concatenate((x, y), axis = 0)
array([[1, 2],
[3, 4],
[5, 6],
[7, 8] ])
# By default axis = 0
>>> np.concatenate((x, y))
array([[1, 2],
[3, 4],
[5, 6] ,
[7, 8] ])
Some other numpy fuctions:

>>> x = np.array([10, 20, 30, 40, 50,60, 70, 80, 90])


>>> np.sum(x)
450
>>>x.sum()
450
# Minimum and Maximum
>>> x.min() # Minimum
10
>>> x.max() # Maximum
90
>>> y = x.reshape(3, 3)
>>> y
array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90]])
# Sum of each row
>>> np.sum(y, axis = 1) #row sum
array([ 60, 150, 240])
# Methods as attribute
>>> y.sum(0) # Along axis = 0, #column sum
array([120, 150, 180])
>>> y.sum(1) # Along axis = 1
array([ 60, 150, 240])
>>> y.sum() # No axis ref. # 450 [Sum of all elements.]

# Mean, Median
>>> np.mean(y, axis = 0)
array([40., 50., 60.])
[Also, y.mean(0) as attribute.]
>>> np.mean(y, 1)
array([20., 50., 80.])
>>> np.mean(y)
50.0
>>> np.median(y, 0)
array([40., 50., 60.])
>>> np.median(y, 1)
array([20., 50., 80.])
>>>np.median(y)
50.0
# Sorting
>>> x = [-2, 0, 3, 1, 10, 5, -7]
# Ascending order
>>> np.sort(x)
array([-7, -2, 0, 1, 3, 5, 10])
# Descending order
>>> np.sort(x)[::-1]
array([10, 5, 3, 1, 0, -2, -7])

Matrix operations in NumPy:


>>>a1 = np.arange(6).reshape(2, 3)
print(a1)
# [[0 1 2]
[3 4 5]]

>>>a2 = np.arange(6, 18, 2).reshape(2, 3)


print(a2)
# [[ 6 8 10]
[12 14 16]]
Transpose:
>>>a1.transpose()
#[[0 3]
[1 4]
[5 2]]

>>>a1.T
#[[0 3]
[1 4]
[5 2]]

Element wise operations:

>>>print(a1 + a2)
# [[ 6 9 12]
[15 18 21]]
>>>print(a1 - a2)
# [[ -6 -7 -8]
[ -9 -10 -11]]

>>>print(a1 * a2)
# [[ 0 8 20]
[36 56 80]]

>>>print(a1 / a2)
# [[0. 0.125 0.2 ]
[0.25 0.28571429 0.3125 ]]

>>>print(a1**a2)
# [[ 0 1 1024]
[ 531441 268435456 152587890625]]

Calculations with scalar values are also possible:


>>>print(a1 * 100)
# [[ 0 100 200]
[300 400 500]]

Matrix multiplication: np.matmul(), np.dot()

>>>a1 = np.arange(4).reshape((2, 2))


print(a1)
# [[0 1]
[2 3]]

>>>a2 = np.arange(6).reshape((2, 3))


print(a2)
# [[0 1 2]
[3 4 5]]
>>>print(np.matmul(a1, a2))
# [[ 3 4 5]
[ 9 14 19]]

>>>print(np.dot(a1, a2))
# [[ 3 4 5]
[ 9 14 19]]

>>>print(a1.dot(a2))
# [[ 3 4 5]
[ 9 14 19]]

Determinant and inverse of matrix: np.linalg.det() and np.linalg.inv()

>>>a = np.array([[0, 1], [2, 3]])


print(a)
# [[0 1]
[2 3]]
>>>print(np.linalg.det(a))
# -2.0

>>>a = np.array([[2, 5], [1, 3]])


print(a)
# [[2 5]
[1 3]]

>>>print(np.linalg.inv(a))
# [[ 3. -5.]
[-1. 2.]]
# For singular matrix it will produce error.
>>>a_singular = np.array([[0, 0], [1, 3]])
>>>print(a_singular)
# [[0 0]
[1 3]]

>>>print(np.linalg.inv(a_singular))
# LinAlgError: Singular matrix

Some NumPy arrays:

np.empty((3,3))
# It return a new empty array of given shape and type, without any entries.
np.identity(4)
# It returns identity matrix of given order
np.eye(4)
#It returns a 2-D array with ones on the diagonal and zeros elsewhere.
np.ones((3, 4))
#It returns an array of the given shape with all elements 1.
np.zeros((3, 3))
It returns an array with given shape with all elements zero.
np.diag((1, 2, 3))
It returns an array with given diagonal elements and all other elements are
zeroes.
##
>>>A=np.array([[1,2],[2,3],[3,4]])
>>>print(A.flatten())
#[1 2 2 3 3 4]
>>> print(A)
#[[1 2]
[2 3]
[3 4]]
>>>print(A.ravel())
#[1 2 2 3 3 4]
>>> print(A)
#[1 2 2 3 3 4]

Thus, ravel() modifies the original array but flatten() does not modify
original array.

Some problems:
Write a program to add two 2-dimensional NumPy arrays where the
elements of the arrays are user given.

import numpy as np

# Taking input for the dimensions of the arrays


rows = int(input("Enter the number of rows: "))
cols = int(input("Enter the number of columns: "))

# Taking input for the elements of the first array


print("Enter the elements of the first array (one row at a time):")
array1_elements = []
for i in range(rows):
row = [float(x) for x in input().split()]
array1_elements.append(row)

# Taking input for the elements of the second array


print("Enter the elements of the second array (one row at a time):")
array2_elements = []
for i in range(rows):
row = [float(x) for x in input().split()]
Output sample:
array2_elements.append(row)
Enter the number of rows: 2
# Creating NumPy arrays from the input elements Enter the number of columns: 2
array1 = np.array(array1_elements) Enter the elements of the first array (one row at
array2 = np.array(array2_elements) a time):
12
# Adding the arrays 34
result_array = array1 + array2 Enter the elements of the second array (one row
at a time):
# Displaying the result 56
print("Resultant array after addition:") 78
print(result_array) Resultant array after addition:
[[ 6. 8.]
[10. 12.]]
Write a program to find the matrix-product of two
NumPy arrays.
import numpy as np

# Taking input for the dimensions of the arrays


rows1 = int(input("Enter the number of rows for array 1: "))
cols1 = int(input("Enter the number of columns for array 1: "))

rows2 = int(input("Enter the number of rows for array 2: "))


cols2 = int(input("Enter the number of columns for array 2: "))
# Taking input for the elements of the arrays
print("Enter the elements of array 1 (one row at a time):")
array1_elements = []
for i in range(rows1):
row = [float(x) for x in input().split()]
array1_elements.append(row)
print("Enter the elements of array 2 (one row at a time):")
array2_elements = []
for i in range(rows2):
row = [float(x) for x in input().split()]
array2_elements.append(row)

# Creating NumPy arrays from the input elements


array1 = np.array(array1_elements)
array2 = np.array(array2_elements)

# Multiplying the arrays


result_array =np.matmul( array1, array2)

# Displaying the result


print("Resultant array after addition:")
print(result_array)
Pandas

What is Pandas?
Pandas is an open-source, BSD-licensed Python library providing high-performance, easy-to-use data
structures and data analysis tools for the Python programming language.

Why Pandas?
The beauty of Pandas is that it simplifies the task related to data frames and makes it simple to do
many of the time-consuming, repetitive tasks involved in working with data frames, such as:
Import datasets - available in the form of spreadsheets, comma-separated values (CSV) files, and
more.
Data cleansing - dealing with missing values and representing them as NaN, NA, or NaT.
Size mutability - columns can be added and removed from DataFrame and higher-dimensional
objects.
Data normalization – normalize the data into a suitable format for analysis.
Reshaping and pivoting of datasets – datasets can be reshaped and pivoted as per the need.
Efficient manipulation and extraction - manipulation and extraction of specific parts of extensive
datasets using intelligent label-based slicing, indexing, and subsetting techniques.
Statistical analysis - to perform statistical operations on datasets.
Data visualization - Visualize datasets and uncover insights.

Data structures in Pandas:


How to install?
• In windows write pip install pandas in command prompt.
• In Ubuntu write sudo apt install python3-pandas in terminal.
pandas.Series
pandas.Series( data, index, dtype, copy)
# Series from list OUTPUT:
0 1
#import the pandas library and aliasing as pd 2 2
import pandas as pd 5 3
8 4
import numpy as np dtype: int64

data = np.array([1,2,3,4]) 0 6
s = pd.Series(data, index=[0,2,5,8], dtype=int, copy=False) 2 2
5 3
print(s) 8 4
dtype: int64
s.iloc[0] = 6 [6 2 3 4]
print(s)
print(data)

#Series from dictionary


#import the pandas library and aliasing as pd
import pandas as pd a 0.0
import numpy as np b 1.0
data = {'a' : 0., 'b' : 1., 'c' : 2.} c 2.0
s = pd.Series(data) dtype: float64
print(s)

#Series from scalar


#import the pandas library and aliasing as pd 0 5
import pandas as pd 1 5
2 5
import numpy as np 3 5
s = pd.Series(5, index=[0, 1, 2, 3]) dtype: int64

print(s)
How to retrieve elements of series?
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s[[‘b’]]) # b 2

a 1
#Retrieve multiple elements c 3
print(s[['a','c','d’]]) d 4
dtype: int64
print(s[[‘f’]]) KeyError: 'f'

DataFrame
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in
rows and columns.
A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)
A pandas DataFrame can be created using various inputs like −
Lists
dictionary
Series
Numpy ndarrays
Another DataFrame

Creating DataFrame:
#import the pandas library and aliasing as pd Empty DataFrame
import pandas as pd Columns: []
df = pd.DataFrame() Index: []
print(df)

# Data frame from list 0


0 1
import pandas as pd 1 2
data = [1,2,3,4,5] 2 3
df = pd.DataFrame(data) 3 4
print(df) 4 5
import pandas as pd Name Age
data = [['Alex',10],['Bob',12],['Clarke',13]] a Alex 10
df = pd.DataFrame(data,columns=['Name','Age'],index=['a','b','c']) b Bob 12
print(df)
c Clarke 13

# Data frame from dictionary


Name Age
import pandas as pd
0 Tom 28
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data) 1 Jack 34
print(df) 2 Steve 29
3 Ricky 42

Read a CSV/Excel file as a pandas DataFrame:


import pandas as pd
# Replace 'your_file.csv' with the actual path to your CSV file
file_path = 'your_file.csv'
# Read the CSV file into a DataFrame
df = pd.read_csv(file_path)
# Read the Excel file into a DataFrame
File_path2 = 'your_file.xlsx’
df1 = pd.read_excel(file_path2)
# Display the first few rows of the DataFrame
print(df.head())

Export pandas DataFrame to a CSV/Excel file:


import pandas as pd
# Replace 'your_file.csv' with the actual path to your CSV file
file_path = 'your_file.csv'
# Export DataFrame to CSV file
df.to_csv(file_path)
# Export DataFrame to Excel file
File_path2 = 'your_file.xlsx’
df.to_excel(file_path2)
# Export DataFrame to JSON file
File_path3 = 'your_file.json’
df.to_json(file_path3)
Fetch rows from the DataFrame based on a specific attribute
import pandas as pd
# Load the data and create a DataFrame (replace with your data loading mechanism)
# For demonstration, let's create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'Age': [30, 25, 35, 28, 32]} Original DataFrame:
Name Age
df = pd.DataFrame(data)
0 Alice 30
# Print the original DataFrame 1 Bob 25
2 Charlie 35
print("Original DataFrame:")
3 David 28
print(df) 4 Emily 32
# Example: Fetch rows where 'Age' is greater than or equal to 30
target_age = 30 Filtered DataFrame (Age
filtered_df = df[df['Age'] >= target_age] >= 30):
Name Age
# Print the filtered DataFrame 0 Alice 30
print(f"\nFiltered DataFrame (Age >= {target_age}):") 2 Charlie 35
4 Emily 32
print(filtered_df)

Find the mean and standard deviation of a specific column


containing numeric data.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'Age': [30, 25, 35, 28, 32],
'Score': [85, 90, 78, 92, 88]}
df = pd.DataFrame(data)
# Print the original DataFrame
print("Original DataFrame:")
print(df)
# Specify the column for which calculate mean and standard deviation
target_column = 'Score'
# Calculate mean and standard deviation for the specified column
mean_value = df[target_column].mean()
std_value = df[target_column].std()
# Display the mean and standard deviation
print(f"\nMean of '{target_column}': {mean_value}")
print(f"Standard Deviation of '{target_column}': {std_value}")

Matplotlib
1. Introduction
When we want to convey some information to others, there are several ways to do so. The
process of conveying the information with the help of plots and graphics is called Data
Visualization.
• In python, we will use the basic data visualisation tool Matplotlib.
• It can create popular visualization types – line plot, scatter plot, histogram, bar chart,
error charts, pie chart, box plot, and many more types of plot.

• It can export visualizations to all of the common formats like PDF, SVG, JPG, PNG,
BMP etc.

How to install?
• In windows write pip install matplotlib in command prompt.
• In Ubuntu write sudo apt install python3-matplotlib in terminal.
How to import Matplotlib?
We can import Matplotlib as follows:-
import matplotlib
Most of the time, we have to work with pyplot interface of Matplotlib. So, I will import
pyplot interface of Matplotlib as follows:-
import matplotlib.pyplot as plt
# Code for line plot using matplotlib
# Changing line colors

Some predefined colors and its notations:

# Changing line types:

Scatter plot:
Here the points are represented individually with a dot or a circle. We cannot change markers
in this type of plot.
Scatter Plot with plt.plot():
We have used plt.plot() to produce line plots. We can use the same functions to
produce the scatter plots as follows:-
X = np.linspace(0, 10, 30)
y = np.sin(x7)
plt.plot(X, y, 'o', color = 'black')

Markers:

Some markers:
"." point

"," pixel

"o" circle Marker Description


"v" triangle_down
"^" triangle_up
“>" triangle_right
"s" square
"p" pentagon
"*" star
"+" plus
"d" diamond

Adding a grid in plot:

Histogram:
Histogram charts are a graphical display of frequencies. They
are represented as bars. They show what portion of the dataset
falls into each category, usually specified as non-overlapping
intervals. These categories are called bins.
The plt.hist() function can be used to plot a simple histogram
as follows:-
Bar Chart:
Bar charts display rectangular bars either in vertical or
horizontal form. Their length is proportional to the values they
represent. They are used to compare two or more values.
We can plot a bar chart using plt.bar() function. We can plot a
bar chart as follows:-

Pie chart:
Pie charts are circular representations, divided into sectors. The sectors are also
called wedges. The arc length of each sector is proportional to the quantity we
are describing. It is an effective way to represent information when we are
interested mainly in comparing the wedge against the whole pie, instead of
wedges against each other.
Matplotlib provides the plt.pie() function to plot pie charts from an array X.
Wedges are created proportionally, so that each value x of array X generates a
wedge proportional to x/sum(X).

You might also like