CIOT-701 Lab Manual DATA SCIENCE
CIOT-701 Lab Manual DATA SCIENCE
LAB MANUAL
(2024-25)
Name:
Year: Semester:
ClassRollNo.: Enrollment No.:
CONTENTS
To be the fountainhead of novel ideas & innovations in science & technology & persist to
be a foundation of pride for all Indians.
• To provide value based broad Engineering, Technology and Science where education in
students are urged to develop their professionalskills.
• To inculcate dedication, hard work, sincerity, integrity and ethics in building up overall
professional personality of our student andfaculty.
Provide quality undergraduate and postgraduate education, in both the theoretical and applied
foundations of computer science, and train students to effectively apply this education to solve
real-world problems, thus amplifying their potential for lifelong high-quality careers.
1. To prepare students for successful careers in software industry that meet the needs
of Indian and multinational companies.
2. To develop the skills among students to analyze real world problem & implement
with computer engineering solution and in multidisciplinary projects.
4. To develop the ability to work with the core competence of computer science &
engineering i.e. software engineering, hardware structure & networking concepts so
that one can find feasible solution to real world problems.
PO5. Modern tool usage:Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex engineering
activities with an understanding of the limitations.
PO6. The engineer and society:Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.
PO7. Environment and sustainability: Understand the impact of the professional engineering
solutions in societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.
PO8. Ethics:Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineeringpractice.
PO12. Life-long learning:Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological change
INDEX
R is an open-source programming language that is widely used as a statistical software and data analysis
tool. R generally comes with the Command-line interface. R is available across widely used platforms
likeWindows,Linux,andmacOS.Also,theRprogramminglanguageisthelatestcutting-edgetool.
It was designed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is
currently developed by the R Development Core Team. R programming language is an implementation
oftheSprogramminglanguage.ItalsocombineswithlexicalscopingsemanticsinspiredbyScheme.
Moreover, the project conceives in 1992, with an initial version released in 1995 and a stable beta version
in2000.
R programming is used as a leading tool for machine learning, statistics, and data analysis.
Objects, functions, and packages can easily be created byR.
Statistical Features of R:
Basic Statistics: The most common basic statistics terms are the mean, mode, and median.
These are all known as “Measures of Central Tendency.” So using the R language we can
measure central tendency veryeasily.
Static graphics: R is rich with facilities for creating and developing interesting static
graphics. R contains functionality for many plot types including graphic maps, mosaic plots,
biplots, and the list goeson.
Probability distributions: Probability distributions play a vital role in statistics and by using
R we can easily handle various types of probability distribution such as Binomial Distribution,
Normal Distribution, Chi-squared Distribution and manymore.
Dataanalysis:Itprovidesalarge,coherentandintegratedcollectionoftoolsfordataanalysis.
Programming Features of R:
R Packages: One of the major features of R is it has a wide availability of libraries. R has
CRAN(Comprehensive R Archive Network), which is a repository holding more than 10, 0000
packages.
Distributed Computing: Distributed computing is a model in which components of a
software system are shared among multiple computers to improve efficiency and performance.
Two new packages ddR and multidplyrused for distributed programming in R were released
in November2015.
Programming in R:
Since R is much similar to other widely used languages syntactically, it is easier to code and learn in R.
Programs can be written in R in any of the widely used IDE like R Studio, Rattle, Tinn-R, etc. After
writing the program save the file with the extension .r. To run the program use the following command
on the command line:
R file_name.r
Advantages of R:
R is the most comprehensive statistical analysis package. As new technology and concepts
often appear first inR.
As R programming language is an open source. Thus, you can run R anywhere and at any
time.
Disadvantages of R:
In the R programming language, the standard of some packages is less thanperfect.
Although, R commands give little pressure to memory management. So R programming
language may consume all availablememory.
In R basically, nobody to complain if something doesn’twork.
R programming language is much slower than other programming languages such as Python
andMATLAB.
Applications of R:
We use R for Data Science. It gives us a broad variety of libraries related to statistics. It also
provides the environment for statistical computing anddesign.
R is used by many quantitative analysts as its programming tool. Thus, it helps in data
importing andcleaning.
R is the most prevalent language. So many data analysts and research programmers use it.
Hence, it is used as a fundamental tool forfinance.
Tech giants like Google, Facebook, bing, Twitter, Accenture, Wipro and many more using R
nowadays.
Mean
Median
Distribution
Covariance
Regression
Non-linear
Mixed Effects
GLM
GAM.etc.
The popular data visualization tools that are available are Tableau, Plotly, R, Google Charts, Infogram,
and Kibana. The various data visualization platforms have different capabilities, functionality, and use
cases.
They also require a different skill set. This article discusses the use of R for data visualization.
R is a language that is designed for statistical computing, graphical data analysis, and scientific research. It
is usually preferred for data visualization as it offers flexibility and minimum required coding through its
packages.
Bar Plot
There are two types of bar plots- horizontal and vertical which represent data points as horizontal or
vertical bars of certain lengths proportional to the value of the data item. They are generally used for
continuous and categorical variable plotting. By setting the horiz parameter to true and false, we can get
horizontal and vertical bar plots respectively.
Output:
Output:
Histogram
A histogram is like a bar chart as it uses bars of varying height to represent data distribution. However, in a
histogram values are grouped into consecutive intervals called bins. In a Histogram, continuous values are
grouped and displayed in these bins whose size can be varied.
data(airquality)
Output
Box Plot
The statistical summary of the given data is presented graphically using a boxplot. A boxplot depicts
information like the minimum and maximum data point, the median value, first and third quartile, and
interquartile range.
data(airquality)
Output
A scatter plot is composed of many points on a Cartesian plane. Each point denotes the value taken by two
parameters and helps us easily identify the relationship between them.
plot(airquality$Ozone, airquality$Month,
main ="Scatterplot Example",
xlab ="Ozone Concentration in parts per billion",
ylab =" Month of observation ", pch = 19)
Output
Q 1) Explain RStudio.
Ans. RStudio is an integrated development environment which allows us to interact with R more
readily. RStudio is similar to the standard RGui, but it is considered more user-friendly. This IDE has
various drop-down menus, windows with multiple tabs, and so many customization processes. The first
time when we open RStudio, we will see three Windows. The fourth Window will be hidden by default.
K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering problems in
machine learning or data science. In this topic, we will learn what is K-means clustering algorithm, how the
algorithm works, along with the Python implementation of k-means clustering.
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different
clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2,
there will be two clusters, and for K=3, there will be three clusters, and so on.
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each
dataset belongs only one group that has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover the categories of groups
in the unlabeled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between the data point and their corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and repeats
the process until it does not find the best clusters. The value of k should be predetermined in this algorithm.
Hence each cluster has datapoints with some commonalities, and it is away from other clusters.
The below diagram explains the working of the K-means Clustering Algorithm:
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster.
Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables is given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into different clusters.
It means here we will try to group these datasets into two differentclusters.
o Now we will assign each data point of the scatter plot to its closest K-point or centroid. We will
compute it by applying some mathematics that we have studied to calculate the distance between two
points. So, we will draw a median between both the centroids. Consider the belowimage:
From the above image, it is clear that points left side of the line is near to the K1 or blue centroid, and points to
the right of the line are close to the yellow centroid. Let's color them as blue and yellow for clear visualization.
o Next, we will reassign each datapoint to the new centroid. For this, we will repeat the same processof
finding a median line. The median will be like belowimage:
As reassignment has taken place, so we will again go to the step-4, which is finding new centroids or K-points.
o We will repeat the process by finding the center of gravity of centroids, so the new centroids will beas
shown in the belowimage:
o We can see in the above image; there are no dissimilar data points on either side of the line,which
means our model is formed. Consider the belowimage:
VIVA QUESTIONS
Ans. K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into
different clusters. Here K defines the number of pre- defined clusters that need to be created in the process, as
if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.
Ans. The working of the K-Means algorithm is explained in the below steps:
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each
cluster.
Program:
df = pd.DataFrame({
'name':['rohit','rahul','virat','shreyas','rishabh','ravindra','shardul','axar','harshal','yuzi','
bumrah'],
'age':[34,29,33,27,24,33,30,27,31,31,28],
'test':[43,40,96,0,25,56,4,4,0,0,24],
'odi':[227,38,254,21,18,168,15,38,0,56,67],
't20':[119,55,91,29,39,50,22,15,0,50,51]
}) #creating adataframe
df['limited_overs']=df['odi']+df['t20']
print(df)
Q.3) According to above analysis, which player is most experienced & which is least experienced in
cricket?
Ans. According to analysis done by us, Virat Kohli is the most experienced player while Harshalpatel is the least
experienced player who is yet to make his debut.
Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical
method that is used for predictive analysis. Linear regression makes predictions for continuous/real or
numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the linear
relationship, which means it finds how the value of the dependent variable is changing according to the
value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship between the
variables. Consider the below image:
y= a0+a1x+ ε
Here,
The values for x and y variables are training datasets for Linear Regression model representation.
Linear regression can be further divided into two types of the algorithm:
o Simple LinearRegression:
If a single independent variable is used to predict the value of a numerical dependent variable,
then such a Linear Regression algorithm is called Simple Linear Regression.
o Multiple Linearregression:
If more than one independent variable is used to predict the value of a numerical dependent
variable, then such a Linear Regression algorithm is called Multiple Linear Regression.
A linear line showing the relationship between the dependent and independent variables is called
a regression line. A regression line can show two types of relationship:
o Positive LinearRelationship:
If the dependent variable increases on the Y-axis and independent variable increases on X-axis,
then such a relationship is termed as a Positive linear relationship.
When working with linear regression, our main goal is to find the best fit line that means the error
between predicted values and actual values should be minimized. The best fit line will have the least
error.
The different values for weights or the coefficient of lines (a 0, a1) gives a different line of regression,
so we need to calculate the best values for a0 and a1 to find the best fit line, so to calculate this we use
cost function.
Cost function-
o The different values for weights or coefficient of lines (a 0, a1) gives the different line of
regression, and the cost function is used to estimate the values of the coefficient for the bestfit
line.
o Cost function optimizes the regression coefficients or weights. It measures how alinear
regression model isperforming.
o We can use the cost function to find the accuracy of the mapping function, which maps the
input variable to the output variable. This mapping function is also known as Hypothesis
function.
For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average of
squared error occurred between the predicted values and actual values. It can be written as:
Residuals: The distance between the actual value and predicted values is called residual. If the observed
points are far from the regression line, then the residual will be high, and so cost function will high. If
the scatter points are close to the regression line, then the residual will be small and hence the cost
function.
Gradient Descent:
o Gradient descent is used to minimize the MSE by calculating the gradient of the costfunction.
o A regression model uses gradient descent to update the coefficients of the line by reducingthe
costfunction.
o It is done by a random selection of values of coefficient and then iteratively update the valuesto
reach the minimum costfunction.
Model Performance:
The Goodness of fit determines how the line of regression fits the set of observations. The process of
finding the best model out of various models is called optimization. It can be achieved by below
method:
1. R-squaredmethod:
Below are some important assumptions of Linear Regression. These are some formal checks while
building a Linear Regression model, which ensures to get the best possible result from the given dataset.
Ans. Yes, Overfitting is possible even with linear regression. This happens when multiple linear
regression is used to fit an extremely high-degree polynomial. When the parameters of such a model
are learned, they will fit too closely to the training data, fitting even the noise, and thereby fail to
generalize on test data.
Ans. Yes, it is necessary to remove outliers as they can have a huge impact on the model's predictions.
Take, for instance, plots 3 and 4 for the Anscombe's quartet provided above. It is apparent from these
plots that the outliers have caused a significant change in the best fit line in comparison to what it would
have been in their absence.
AIM: Apply pre-processing techniques on Boston Housing Data Set using python with
various operations use clustering techniques.
Program:
# Success
print 'Boston housing dataset has {0} data points with {1} variables
each'.format(*data.shape)
# There are other statistics you can calculate too like quartiles first_quartile=
np.percentile(prices, 25) third_quartile= np.percentile(prices, 75)
inter_quartile= third_quartile- first_quartile
# Using pyplot
import matplotlib.pyplotas plt
plt.figure(figsize=(20, 5))
# i: index
for i, col in enumerate(features.columns):
plt.title(col)
plt.xlabel(col)
plt.ylabel('prices')
• Density-Based Methods: These methods consider the clusters as the dense region having some
similarities and differences from the lower dense region of the space. These methods have good
accuracy and the ability to merge two clusters. Example DBSCAN (Density-Based Spatial
Clustering of Applications with Noise), OPTICS (Ordering Points to Identify Clustering
Structure),etc.
• Hierarchical Based Methods: The clusters formed in this method forma tree-type structure based
on the hierarchy. New clusters are formed using the previously formed one. It is divided into
twocategory
• Agglomerative (bottom-up approach)
• Divisive (top-downapproach)
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:
o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of the
occurrence of other features. Such as if the fruit is identified on the bases of color, shape, and taste, then
red, spherical, and sweet fruit is recognized as an apple. Hence each feature individually contributes to
identify that it is an apple without depending on eachother.
o Bayes: It is called Bayes because it depends on the principle of Bayes'Theorem.
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the probability
of a hypothesis with prior knowledge. It depends on the conditionalprobability.
o The formula for Bayes' theorem is givenas:
Where,
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true.
vs Java
Working of Naïve Bayes' Classifier can be understood with the help of the below example:
Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this
dataset we need to decide that whether we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:
Problem: If the weather is sunny, then the Player should play or not?
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny)= 0.35
P(Yes)=0.71
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
There are three types of Naive Bayes Model, which are given below:
o Gaussian: The Gaussian model assumes that features follow a normal distribution. This means if
predictors take continuous values instead of discrete, then the model assumes that these values are
sampled from the Gaussiandistribution.
o Multinomial: The Multinomial Naïve Bayes classifier is used when the data is multinomial distributed.
It is primarily used for document classification problems, it means a particular document belongs to
which category such as Sports, Politics, education, etc.
The classifier uses the frequency of words for thepredictors.
o Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the predictor
variables are the independent Booleans variables. Such as if a particular word is present or not in a
document. This model is also famous for document classificationtasks.
It is a supervised classification algorithm. Naive Bayes also assumes that all the features have an equal effect on
the outcome.
Decision Tree
o Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured
classifier, where internal nodes represent the features of a dataset, branches represent the decision
rules and each leaf node represents theoutcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes
are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any furtherbranches.
o The decisions or the test are performed on the basis of features of the givendataset.
o It is a graphical representation for getting all the possible solutions to a problem/decision based on
givenconditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which expands on
further branches and constructs a tree-likestructure.
o In order to build a tree, we use the CART algorithm, which stands for Classification and Regression
Treealgorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into
subtrees.
o Below diagram explains the general structure of a decisiontree:
Note: A decision tree can contain categorical data (YES/NO) as well as numericdata.
There are various algorithms in Machine learning, so choosing the best algorithm for the given dataset and
problem is the main point to remember while creating a machine learning model. Below are the two reasons for
using the Decision tree:
o Decision Trees usually mimic human thinking ability while making a decision, so it is easy to
understand.
o The logic behind the decision tree can be easily understood because it shows a tree-likestructure.
Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further
gets divided into two or more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a
leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the
given conditions.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child
nodes.
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node of the
tree. This algorithm compares the values of root attribute with the record (real dataset) attribute and, based on
the comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the other sub-nodes and move further.
It continues the process until it reaches the leaf node of the tree. The complete process can be better understood
using the belowalgorithm:
o Step-1: Begin the tree with the root node, says S, which contains the completedataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure(ASM).
o Step-3: Divide the S into subsets that contains possible values for the bestattributes.
o Step-4: Generate the decision tree node, which contains the bestattribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and called the
final node as a leafnode.
Example: Suppose there is a candidate who has a job offer and wants to decide whether he should accept the
offer or Not. So, to solve this problem, the decision tree starts with the root node (Salary attribute by ASM). The
root node splits further into the next decision node (distance from the office) and one leaf node based on the
corresponding labels. The next decision node further gets split into one decision node (Cab facility) and one leaf
node. Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offer). Consider the
below diagram:
While implementing a Decision tree, the main issue arises that how to select the best attribute for the root node
and for sub-nodes. So, to solve such problems there is a technique which is called as Attribute selection
measure or ASM. By this measurement, we can easily select the best attribute for the nodes of the tree. There
are two popular techniques for ASM, which are:
o InformationGain
o GiniIndex
1. InformationGain:
o Information gain is the measurement of changes in entropy after the segmentation of a dataset based on
anattribute.
o It calculates how much information a feature provides us about aclass.
o According to the value of information gain, we split the node and build the decisiontree.
o A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute
having the highest information gain is split first. It can be calculated using the belowformula:
Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data.
Entropy can be calculated as:
Where,
2. Gini Index:
o Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree)algorithm.
o An attribute with the low Gini index should be preferred as compared to the high Giniindex.
o It only creates binary splits, and the CART algorithm uses the Gini index to create binarysplits.
o Gini index can be calculated using the belowformula:
Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal decision tree.
A too-large tree increases the risk of overfitting, and a small tree may not capture all the important features of
the dataset. Therefore, a technique that decreases the size of the learning tree without reducing accuracy is
known as Pruning. There are mainly two types of treepruning technology used:
o Cost ComplexityPruning
o Reduced ErrorPruning.
o It is simple to understand as it follows the same process which a human follow while making any
decision inreal-life.
o It can be very useful for solving decision-relatedproblems.
o It helps to think about all the possible outcomes for aproblem.
o There is less requirement of data cleaning compared to otheralgorithms.
Ans. A Decision Tree is a supervised machine learning algorithm that can be used for both Regression and
Classification problem statements. It divides the complete dataset into smaller subsets while at the same time
an associated Decision Tree is incrementally developed.
The final output of the Decision Trees is a Tree having Decision nodes and leaf nodes. A Decision Tree
can operate on both categorical and numerical data.
Ans. The CART stands for Classification and Regression Trees is a greedy algorithm that greedily searches
for an optimum split at the top level, then repeats the same process at each of the subsequent levels.
Moreover, it does verify whether the split will lead to the lowest impurity or not as well as the solution
provided by the greedy algorithm is not guaranteed to be optimal, it often produces a solution that’s
reasonably good since finding the optimal Tree is an NP-Complete problem that requires exponential time
complexity.
As a result, it makes the problem intractable even for small training sets. This is why we must go for a
“reasonably good” solution instead of an optimal solution.
Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt import seaborn as snssns.set()
the dataset
Ans. A boxplot is a graph that gives you a good indication of how the values in the data are spread out.
Although boxplots may seem primitive in comparison to
a histogram or density plot, they have the advantage of taking up less space, which is useful when
comparing distributions between many groups or datasets.
Ans. Seaborn is an open-source Python library built on top of matplotlib. It is used for data visualization
and exploratory data analysis. Seaborn works easily with dataframes and the Pandas library. The graphs
created can also be customized easily. Below are a few benefits of Data Visualization.
Graphs can help us find data trends that are useful in any machine learning or forecasting project.
Program:
b=a.loc[a['Embarked']=='Q']
c=a.loc[a['Embarked']=='S']
d=a.loc[a['Embarked']=='C'] label=np.array(['Queenstown','Southampton','Cherbough'])
e=np.array([len(b.index),len(c.index),len(d.index)]) plt.pie(e,labels=label) plt.show()
# procedure for plotting a triple bar graph based on survival status f=len(a.loc[(a['Survived']==1) &
(a['Sex']=='male')])
g=len(a.loc[(a['Survived']==1) & (a['Sex']=='female')])
h=len(a.loc[(a['Survived']==0) & (a['Sex']=='male')])
i=len(a.loc[(a['Survived']==0) & (a['Sex']=='female')])
s=len(a.loc[(a['Survived'].isnull())& (a['Sex']=='male')])
t=len(a.loc[(a['Survived'].isnull())& (a['Sex']=='female')]) p=np.array([f,g]) q=np.array([h,i])
r=np.array([s,t]) print(p,q,r)
gender=np.arange(len(['male','female'])) plt.bar(gender -0.2,p,0.2,label='Survivors') plt.bar(gender
,q,0.2,label='Succumbers')
plt.bar(gender +0.2,r,0.2,label='Unknown') plt.xticks(gender,['male','female']) plt.title('Survivors vs
Succumbers vs Unknown') plt.xlabel('Gender') plt.ylabel('No. of people') plt.legend()
plt.show()
j=len(a.loc[a['Pclass']==1])
k=len(a.loc[a['Pclass']==2])
l=len(a.loc[a['Pclass']==3])
m=np.array([j,k,l]) nlabel=[1,2,3] plt.bar(nlabel,m,width=0.4)
Supervised learning is a type of machine learning method in which we provide sample labeled data to the
machine learning system in order to train it, and on that basis, it predicts the output.
Unsupervised learning is a learning method in which a machine learns without any supervision. The training is
provided to the machine with the set of data that has not been labeled, classified, or categorized, and the algorithm
needs to act on that data without any supervision.
Reinforcement learning is a feedback-based learning method, in which a learning agent gets a reward for each
right action and gets a penalty for each wrong action. The agent learns automatically with these feedbacks and
improves its performance.