Exp3a
Exp3a
Develop a program to implement Principal Component Analysis (PCA) for reducing the
dimensionality of the Iris dataset from 4 features to 2.
Objective :
-To reduce the large attribute set to smaller influenceable set in a data set
- Apply concept of Principal diagonal Analysis to reduce the dimension
Description:
Principal Component Analysis is basically a statistical procedure to convert a set of
observations of possibly correlated variables into a set of values of linearly uncorrelated
variables
PCA technique used for dimensionality reduction, data visualization, and feature extraction
The goal of PCA is to reduce the number of variables in a dataset while preserving as much
information as possible.
It uses the various steps to reduce attributes in data set as listed below
1. Standardization : Before applying PCA, it's important to standardize the data,
especially if the variables have different units or scales.
2. Covariance Matrix Computation : The covariance matrix captures the relationships
between different variables in the dataset
3. Eigenvalues and Eigenvectors: The covariance matrix is then decomposed into
eigenvalues and eigenvectors. Eigenvalues represent the amount of variance captured
by each principal component and eigenvectors represent the direction of the PCs
4. Selecting Principal Components: The eigenvalues are sorted in descending order.
The top eigenvalues correspond to the principal components that capture the most
variance in the data. You can choose a subset of these components based on the
desired level of variance retention
5. Transforming the Data: Finally, the original data is transformed onto the new
principal component axes. This results in a reduced-dimensionality dataset, where the
axes are now the selected principal components.
Input: Iris data set
Attributes : SEPAL_LENGTH, SEPAL_WIDTH, PETAL_LENGTH and PETAL_WIDTH
The values are specified for identification of variety of iris flower and varieties are setosa, versicolor
and virginica
Code
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
# Load the Iris dataset
iris = load_iris()
print(iris)
#iris1=iris.frame
print(iris)
X = iris.data
Output