ML Lab File Vijay Kumar
ML Lab File Vijay Kumar
THEORY - Python is a high-level, interpreted programming language known for its simplicity
and readability. It is widely used in various fields such as web development, data analysis,
machine learning, automation, and more. Python's syntax is designed to be easy to read and
write, making it an excellent choice for beginners and experienced programmers alike.
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def greet(self):
return f'Hello, my name is {self.name} and I am {self.age}
years old.'
2. Functions: Functions are blocks of reusable code that perform a specific task. They
allow for modular and organized code, making it easier to manage and debug. Python
functions are defined using the def keyword followed by the function name and
parameters.
3. Data Structures: Python provides several built-in data structures to store and
manipulate data efficiently:
● Lists: Ordered collections of items that can be of different types. Lists are mutable,
meaning their elements can be changed.
# Creating a list
numbers = [1, 2, 3, 4, 5]
# Creating a dictionary
student_grades = {'Alice': 90, 'Bob': 85, 'Charlie': 92}
● Tuples: Tuples are similar to lists, but they are immutable, meaning their elements
cannot be changed once defined.
# Creating a tuple
fruits = ('apple', 'banana', 'cherry')
# Accessing elements
print(fruits[0]) # Output: apple
# Output:
# apple
# banana
# cherry
● Sets: Sets are unordered collections of unique elements. They are useful for
membership testing and removing duplicates from a sequence.
# Creating a set
my_set = {1, 2, 3, 4, 5}
● Queues: Queues follow the First-In-First-Out (FIFO) principle. We can use the deque
(double-ended queue) from the collections module to implement a queue.
# Creating a queue
queue = deque()
try:
# Try to open a file
with open('non_existent_file.txt', 'r') as file:
content = file.read()
except FileNotFoundError:
print('The file was not found.')
try:
result = 10 / 0
except ZeroDivisionError:
print('Cannot divide by zero.')
EXPERIMENT - 2
AIM - To use Python libraries (Pandas, NumPy, Matplotlib, SciPy, and Scikit-Learn) to
load, clean, visualize, analyze, and make predictions on data, demonstrating a
straightforward data analysis workflow.
THEORY –
1. NumPy stands for "Numerical Python." It's a powerful library in Python for
numerical and scientific computing. At its core, NumPy provides support for arrays
(grids of values) and a collection of functions to operate on these arrays.
Creating Arrays - Arrays are the basic data structure in NumPy. They can be one-
dimensional (like a list) or multi-dimensional (like a matrix).
1. import numpy as np
2.
3. # Creating a 1-dimensional array
4. array_1d = np.array([1, 2, 3, 4, 5])
5. print("1D array:", array_1d)
6.
7. # Creating a 2-dimensional array
8. array_2d = np.array([[1, 2, 3], [4, 5, 6]])
9. print("2D array:\n", array_2d)
10.
OUTPUT –
Array Operations - NumPy provides functions to create arrays filled with zeros, ones,
or a range of numbers. These operations are useful for initializing arrays.
1. # Array of zeros
2. zeros_array = np.zeros((3, 3))
3. print("Zeros array:\n", zeros_array)
4.
5. # Array of ones
6. ones_array = np.ones((2, 4))
7. print("Ones array:\n", ones_array)
8.
9. # Array with a range of values
10. range_array = np.arange(0, 10, 2)
11. print("Range array:", range_array)
12.
13. # Array with evenly spaced values
14. linspace_array = np.linspace(0, 1, 5)
15. print("Linspace array:", linspace_array)
16.
OUTPUT –
1. original_array = np.arange(12)
2. reshaped_array = original_array.reshape((3, 4))
3. print("Original array:", original_array)
4. print("Reshaped array:\n", reshaped_array)
5.
OUTPUT –
Basic Arithmetic Operations - NumPy allows for element-wise arithmetic operations
on arrays, which means you can perform operations like addition, subtraction,
multiplication, and division on corresponding elements of arrays.
1. a = np.array([1, 2, 3])
2. b = np.array([4, 5, 6])
3.
4. # Element-wise addition
5. print("Addition:", a + b)
6.
7. # Element-wise subtraction
8. print("Subtraction:", a - b)
9.
10. # Element-wise multiplication
11. print("Multiplication:", a * b)
12.
13. # Element-wise division
14. print("Division:", a / b)
15.
OUTPUT –
Statistical Operations - NumPy provides functions to calculate statistical measures
such as mean, sum, standard deviation, minimum, and maximum on arrays.
OUTPUT –
Indexing and Slicing - Indexing and slicing in NumPy allows accessing and
modifying specific elements or subsets of an array. This is similar to list indexing and
slicing in Python but more powerful for multi-dimensional arrays.
OUTPUT –
OUTPUT –
Linear Algebra - NumPy supports various linear algebra operations, such as matrix
multiplication, transpose, inverse, and determinant. These operations are essential for
scientific computing and machine learning.
1. # Creating matrices
2. matrix1 = np.array([[1, 2], [3, 4]])
3. matrix2 = np.array([[5, 6], [7, 8]])
4.
5. # Matrix multiplication
6. matrix_product = np.dot(matrix1, matrix2)
7. print("Matrix product:\n", matrix_product)
8.
9. # Transpose of a matrix
10. transpose_matrix = np.transpose(matrix1)
11. print("Transpose of matrix1:\n", transpose_matrix)
12.
OUTPUT –
2. Pandas provides two primary data structures: Series (1-dimensional) and
DataFrame (2-dimensional). These structures are optimized for handling and
analyzing data, making it easier to perform operations like filtering, grouping, and
statistical analysis.
1. Series: A one-dimensional array-like object that can hold any data type (e.g.,
integers, strings, floats). It’s similar to a column in a spreadsheet.
1. import pandas as pd
2.
3. # Creating a Series
4. series = pd.Series([10, 20, 30, 40, 50])
5. print("Series:\n", series)
6.
7. # Creating a DataFrame
8. data = {
9. 'Name': ['Alice', 'Bob', 'Charlie', 'David'],
10. 'Age': [25, 30, 35, 40],
11. 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
12. }
13. df = pd.DataFrame(data)
14. print("DataFrame:\n", df)
15.
1. # Selecting a column
2. print("Selecting 'Age' column:\n", df['Age'])
3.
4. # Adding a new column
5. df['Salary'] = [50000, 60000, 70000, 80000]
6. print("DataFrame with new 'Salary' column:\n", df)
7.
8. # Filtering rows based on a condition
9. filtered_df = df[df['Age'] > 30]
10. print("Filtered DataFrame (Age > 30):\n", filtered_df)
11.
1. # Selecting a column
2. print("Selecting 'Age' column:\n", df['Age'])
3.
4. # Selecting multiple columns
5. print("Selecting 'Name' and 'City' columns:\n", df[['Name', 'City']])
6.
7. # Selecting a row by index
8. print("Selecting the first row:\n", df.iloc[0])
9.
10. # Selecting a row by condition
11. print("Selecting rows where Age > 30:\n", df[df['Age'] > 30])
12.
OUTPUT –
This part demonstrates how to select specific columns and rows. You can select
single or multiple columns and filter rows based on conditions using boolean indexing.
Here, we cover adding and removing columns, and handling missing data. You can
introduce NaN (missing) values, fill them with a specific value, or drop rows containing
NaN values.
OUTPUT –
Sorting and Grouping Data
OUTPUT –
This section shows how to sort and group data. You can sort data by single or multiple
columns and group data to calculate aggregate functions like mean or count.
OUTPUT –
Here, we cover merging and joining DataFrames. You can merge DataFrames based
on a common column using the merge method, or join DataFrames using the join
method when they share a common index.
Reading and Writing Data
This part demonstrates how to read data from and write data to CSV files using
read_csv and to_csv methods. These methods are useful for importing and exporting
data.
1. Figure: The overall window or page that everything is drawn on. It can contain
multiple plots.
2. Axes: The area on which data is plotted. A single figure can have multiple axes
(plots) arranged in a grid.
3. Plot: The actual visual representation of data, such as a line plot, scatter plot,
bar chart, etc.
OUTPUT –
1. # Creating data
2. x = [1, 2, 3, 4, 5]
3. y = [2, 3, 5, 7, 11]
4.
5. # Creating a scatter plot
6. plt.scatter(x, y)
7. plt.xlabel('X-axis')
8. plt.ylabel('Y-axis')
9. plt.title('Simple Scatter Plot')
10. plt.show()
11.
A scatter plot is created using the plt.scatter() function. It is useful for visualizing the
relationship between two variables.
OUTPUT –
Creating a Bar Chart
1. # Creating data
2. categories = ['A', 'B', 'C', 'D']
3. values = [4, 7, 1, 8]
4.
5. # Creating a bar chart
6. plt.bar(categories, values)
7. plt.xlabel('Categories')
8. plt.ylabel('Values')
9. plt.title('Simple Bar Chart')
10. plt.show()
11.
12.
A bar chart is created using the plt.bar() function. It is useful for comparing different
categories of data.
OUTPUT –
Creating a Histogram
1. # Creating data
2. data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5]
3.
4. # Creating a histogram
5. plt.hist(data, bins=5)
6. plt.xlabel('Data')
7. plt.ylabel('Frequency')
8. plt.title('Simple Histogram')
9. plt.show()
10.
11.
A histogram is created using the plt.hist() function. It is useful for visualizing the distribution of
a dataset.
OUTPUT –
4. SciPy stands for "Scientific Python" and is an open-source Python library used for scientific
and technical computing. It builds on NumPy and provides a large collection of mathematical
algorithms and convenience functions, making it easier to perform scientific and engineering
tasks. Here are a few key components of SciPy:
1. Linear Algebra: Provides functions for matrix operations, solving linear systems,
eigenvalue problems, and more.
2. Optimization: Contains functions for finding the minimum or maximum of functions
(optimization), including linear programming and curve fitting.
3. Integration: Offers methods for calculating integrals, including numerical integration and
ordinary differential equations (ODE) solvers.
4. Statistics: Includes functions for statistical distributions, hypothesis testing, and
descriptive statistics.
5. Signal Processing: Provides tools for filtering, signal analysis, and Fourier transforms.
Linear Algebra
1. import numpy as np
2. from scipy import linalg
3.
4. # Creating a matrix
5. A = np.array([[1, 2], [3, 4]])
6.
7. # Computing the determinant
8. det = linalg.det(A)
9. print("Determinant:", det)
10.
11. # Solving a linear system of equations
12. b = np.array([5, 6])
13. x = linalg.solve(A, b)
14. print("Solution:", x)
15.
This code demonstrates how to compute the determinant of a matrix and solve a linear system
of equations using SciPy's linear algebra module
OUTPUT –
Optimization
1. from scipy.optimize import minimize
2.
3. # Defining the objective function
4. def objective(x):
5. return x**2 + 5*np.sin(x)
6.
7. # Finding the minimum
8. result = minimize(objective, x0=0)
9. print("Minimum:", result.x)
10.
This code demonstrates how to find the minimum of a function using SciPy's optimization
module.
OUTPUT –
Integration
1. from scipy.integrate import quad
2.
3. # Defining the function to integrate
4. def integrand(x):
5. return x**2
6.
7. # Performing the integration
8. integral, error = quad(integrand, 0, 1)
9. print("Integral:", integral)
10.
11.
This code demonstrates how to perform numerical integration using SciPy's integration module.
OUTPUT –
Statistics
1. from scipy import stats
2.
3. # Creating a dataset
4. data = np.array([1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5])
5.
6. # Computing descriptive statistics
7. mean = np.mean(data)
8. std_dev = np.std(data)
9. median = np.median(data)
10. print("Mean:", mean)
11. print("Standard Deviation:", std_dev)
12. print("Median:", median)
13.
14. # Performing a t-test
15. t_stat, p_value = stats.ttest_1samp(data, 3)
16. print("T-statistic:", t_stat)
17. print("P-value:", p_value)
18.
This code demonstrates how to compute descriptive statistics and perform a t-test using SciPy's
statistics module.
OUTPUT –
Signal Processing
1. from scipy import signal
2.
3. # Creating a signal
4. t = np.linspace(0, 1, 500)
5. signal = np.sin(2 * np.pi * 50 * t) + np.sin(2 * np.pi * 120 * t)
6.
7. # Applying a Butterworth filter
8. b, a = signal.butter(3, 0.05)
9. filtered_signal = signal.filtfilt(b, a, signal)
10.
11. import matplotlib.pyplot as plt
12. plt.plot(t, signal, label='Original Signal')
13. plt.plot(t, filtered_signal, label='Filtered Signal')
14. plt.xlabel('Time')
15. plt.ylabel('Amplitude')
16. plt.legend()
17. plt.show()
18.
19.
This code demonstrates how to create and filter a signal using SciPy's signal processing
module.
OUTPUT –
OUTPUT –
Unsupervised Learning: Clustering
This code demonstrates how to load the Iris dataset, train a KMeans clustering model, and
visualize the clusters.
OUTPUT –
Model Selection and Evaluation: Cross-Validation
1. from sklearn import datasets
2. from sklearn.model_selection import cross_val_score
3. from sklearn.ensemble import RandomForestClassifier
4.
5. # Load the dataset
6. iris = datasets.load_iris()
7. X = iris.data
8. y = iris.target
9.
10. # Train a Random Forest classifier with cross-validation
11. clf = RandomForestClassifier(n_estimators=100, random_state=42)
12. scores = cross_val_score(clf, X, y, cv=5)
13.
14. # Print the cross-validation scores
15. print("Cross-Validation Scores:", scores)
16. print("Mean Cross-Validation Score:", scores.mean())
17.
This code demonstrates how to use cross-validation to evaluate the performance of a Random
Forest classifier on the Iris dataset.
OUTPUT –
Preprocessing: Feature Scaling
1. from sklearn.preprocessing import MinMaxScaler
2. import numpy as np
3.
4. # Create a sample dataset
5. data = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
6.
7. # Apply Min-Max scaling
8. scaler = MinMaxScaler()
9. scaled_data = scaler.fit_transform(data)
10.
11. # Print the scaled data
12. print("Scaled Data:\n", scaled_data)
13. Preprocessing: Feature Scaling
14.
This code demonstrates how to apply Min-Max scaling to a sample dataset to normalize the
features.
OUTPUT –