0% found this document useful (0 votes)
32 views

Unit - I Chap-4 Model Evaluation and Development

Uploaded by

thirosul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Unit - I Chap-4 Model Evaluation and Development

Uploaded by

thirosul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Model Design and Development

Model Evaluation
 Model evaluation is the process of understanding a performance, as well as its strengths and
weaknesses using different evaluation metrics.
 It helps to find the best model that represents our data
 It is an important part of the model development process.
 Model evaluation is important to assess the efficiency of a model.
Evaluating a models through:
1. Hold-Out
2. Cross-Validation
Model Evaluation Methods
1. Hold-Out
2. Cross-Validation.
1. Hold-Out
 The dataset is divided into a training set and a validation set.
 The model is “trained” on the training set,
and its “performance” is “validated” on the validation set.
 A large dataset is randomly divided to three subsets:
♦ Training data is a subset of the dataset used to build predictive models.
♦ Validation data is a subset of the dataset used to assess the performance of model and
selecting the best-performing model.
♦ Test data is used for assess the likely future performance of a model. If a model fit to the
training set much better than it fits the test set.
Example: If there are 20 data items, 12 are used in training set and remaining 8 are used in test set.
Model Evaluation Methods
2. Cross-Validation
 Cross validation is a technique used to evaluate the performance of a model on unseen data.
 Several types of cross validation techniques are- k-fold cross validation, leave-one-out cross
validation, and stratified cross validation.
 When only a limited amount of data is available, to achieve an unbiased estimate of the model
performance, we use k-fold cross-validation.
 In k-fold cross-validation, we divide the dataset into ‘k’ subsets of equal size.
 We build models ‘k’ times, each time leaving out one of the subsets from training and use it as
the test set. If ‘k’ equals the sample size, this is called "leave-one-out".
The main purpose of cross validation is
to prevent overfitting, which occurs
when a model is trained too well on
the training data and performs
poorly on new, unseen data
Model Evaluation Metrics
 Classification Evaluation
 Regression Evaluation
Model Evaluation Metrics
 Classification Evaluation
To evaluate the performance of a model, different types of classification metrics are: accuracy,
precision, confusion matrix, log-loss, and AUC (area under the ROC curve).
1. Confusion Matrix
 Confusion Matrix is the graphical representation of the Actual and Predicted values.
 It is used to describe the performance of a classification model on a set of the test data for
which the true values are known.
Model Evaluation Metrics
Model Evaluation Metrics
Example: Suppose we are trying to create a model that can predict if a patient has cancer or not.

2. Accuracy:
 A number of correct predictions divided by the total number of predictions.
Example:
Accuracy = (True Positives + True Negatives) /(True Positives + False Positives + False
Negatives + True Negatives)
Accuracy = 100+50 / 100+10+5+50 = 150/165 *100 = 91%
Accuracy is useful when the target class is well balanced but is not a good for the unbalanced classes.
Model Evaluation Metrics
3. Precision
 A number of true positives divided by the number of predicted positives.
Example:
Precision = True Positives / (True Positives + False Positives)
Precision = 100 / 100+10 = 100/110 =0.91 *100 = 91%
When our model predicts that a patient does not have cancer, it is correct 91percent of the
time.
4. Recall or Sensitivity
 A number of true positives divided by the total number of true positives and false negatives.
 Recall is also known as TPR- True Positive Rate.
Example:
Recall = True Positives / (True Positives + False Negative)
Recall = 100 / 100+05 = 100/105 = 0.95 *100 = 95%
The 95 percent of all cancer patients are correctly predicted by the model to have cancer.
 Increasing precision decreases recall and vice versa, this is known as the precision/recall
Model Evaluation Metrics
5. Specificity:
 A number of true negatives divided by the total number of true negatives and false positives.
 Specificity is the opposite of sensitivity, also known as TNR true negative rate.
Example:
Specificity = True Negatives / (True Negative + False Positives)
Specificity = 50 / 50+10 = 50/60 =0.83 *100 = 83%
A specificity of 0.83 means 83 percent of all patients that didn’t have cancer are predicted correctly.
6. F1-Score
 When two models have low precision and high recall or vice versa, it becomes hard to compare those
models, therefore to solve this issue we can use F-score.
 “F-score is a harmonic mean of Precision and Recall”.
 By calculating F-score, we can evaluate the recall and precision at the same time. Also, if the recall is
equal to precision, The F-score is maximum.
Example:
F1-Score = (2*Recall*precision)/ (Recall + Precision)
F1-Score = (2*0.95 * 0.91) / (0.95 + 0.91) = 1.729 / 1.86 =0.929 * 100 = 93%
 F1 score is high, i.e., both precision and recall of the classifier indicate good results.
Model Evaluation Metrics
7. Error Rate (Misclassification Rate):
 A number of all incorrect predictions divided by the total number of the datasets.
 The best error rate is 0.0 and worst error rate is 1.0.
Error Rate = 1 - Accuracy or Error Rate= (False Negative + False Positive) / Total
Error Rate = 1- 0.91 or (10+5)/165 = 0.09
8. AUC-ROC curve
Receiver Operating Characteristic (ROC)
 ROC Curve is a graphical plotting the True Positive Rate (y-axis) against the False Positive
Rate (x-axis)
 It shows the performance of a classification model at all at different classification thresholds.
 It was originally developed to test military radar receivers.
Model Evaluation Metrics
Area Under Curve (AUC)
 This curve represents the area under the ROC curve.
 It measures the overall performance of the binary classification model.
 Both TPR and FPR range between 0 to 1, Also area will always lie between 0 and 1, and A
greater value of AUC denotes better model performance.
 Our main goal is to maximize this area in order to have the highest TPR and lowest FPR at the
given threshold.
 The AUC measures the probability that the model will assign a randomly chosen positive
instance a higher predicted probability compared to a randomly chosen negative instance.
Model Evaluation Metrics
 Regression Evaluation or Metrics
 A regression model can only predict values that are lower or higher than the actual value.
 These metrics aim to shows us the prediction error of our model.
 An error is defined as the difference between predicted and the actual value.
 The lower the error, the better the performance of the model, and the higher the error, the
worse the performance of the model.
 Linear regression
▲ It finds the linear relationship between the dependent and independent variables using a best-
fit straight line.
▲ Linear regression adjusts the line between the data for accurate predictions.
▲ The objective of Linear Regression is to find a line that minimizes the prediction errors.
▲ Linear equation to be $y = mx+c$,
where y is the dependent and x is the independent data given in your dataset.
Model Evaluation Metrics
Example: Implementation of Linear Regression
GLUCOSE
AGE
LEVEL x-x^ y-y^ (x-x^)2 x-x^ * y-y^

σሺ௫ ି௫ ௜ሻሺ௬ ି௬ ௜ሻ
X
Y

σ ௫ ି௫ ௜ ଶ
43 99 1.8333 18 3.3611 33 m=
21 65 -20.17 -16 406.69 322.666667
25 79 -16.17 -2 261.36 32.3333333 0.385224983
42 75 0.8333 -6 0.6944 -5
57 87 15.833 6 250.69 95
59 81 17.833 0 318.03 0
41.167 81 0 0 1240.8 478
Avg of X Avg of Y Sum Sum Sum Sum
y=mx+c 81=(0.3852*41.16)+C Coefficient C= 65.14157152
Model Evaluation Metrics
1. Mean Absolute Error
 Mean Absolute Error (MAE) is the average of the difference between the actual values and
predicted values. This difference is known as prediction error.
 It measures the average of the residuals in the dataset.
 It doesn’t give any idea about the direction of the error
 It measures how far the predictions made by a model
from the actual output.

Sum all the errors and divide them by a total number of


observations Error
Age Glucose Predicted abs(Error)
(Predicted
X Level (Y) (y^) (y-y˄)
- Actual)
43 99 81.7056 17.2944 17.2944
21 65 73.2312 8.2312 8.2312
25 79 74.772 4.228 4.228
42 75 81.3204 6.3204 6.3204
57 87 87.0984 0.0984 0.0984
59 81 87.8688 6.8688 6.8688
55 Predicted 86.328 7.17353 1.0247
Model Evaluation Metrics
2. Mean Square Error (MSE)
 It measures the average squared difference between the predicted values and the actual values.
 It is similar to mean absolute error. i.e. squaring the error.
 MSE will be larger than MAE
 MSE punishes to model for large errors.
 It eliminates all nulls/infinites from the input.

 Less error decreases the MSE and produces more accurate predictions.
MSE=1/6 X (99-81.7056)2+(65-73.2312)2+(79-74.772)2+
(75-81.3204)2+(87-87.0984)2+(81-87.8688)2
= [(17.2944)2+(8.2312)2+(4.228)2+(6.3204)2+(0.0984)2+(6.8688)2 ] / 6
The mean square error (MSE) for above dataset is 78.6437
Model Evaluation Metrics
3. Root Mean Square Error (RMSE)
 Root mean squared error is the square root of the mean of the square of all of the error. i.e.
average square root difference between the predicted values and the actual values.
 RMSE assess the amount of error in a regression
 Low RMSE values indicate that the model predicted more accurate. Higher values suggest
more error and less accurate predictions
 RMSE is more sensitive to outliers.
= sqrt(17.2944)2+(8.2312)2+(4.228)2+(6.3204)2+(0.0984)2+(6.8688)2
= sqrt(78.6437) Error Squared Root Mean
Age Glucose
Predicted (y^) (Predicted - (Error) Sequare
RMSE=8.8681 X Level (Y)
Actual) (y-y˄)2 Error
43 99 81.7056 17.2944 299.0963
21 65 73.2312 8.2312 67.7527
25 79 74.772 4.228 17.8760
42 75 81.3204 6.3204 39.9475 8.8681
57 87 87.0984 0.0984 0.0097
59 81 87.8688 6.8688 47.1804
86.328 78.6437
Model Evaluation Metrics
4. Root Mean Square Log Error (RMSLE)
 Root Mean Squared Logarithmic Error is calculated by applying log to the actual and the
predicted values and then taking their differences.
 It punishes the model more if the predicted value is less than the actual value while the model
is less punished if the predicted value is more than the actual value. It does not punish high
errors due to the log. RMSE > RMSLE

Add constant Error Root


Add constant Log value
value into Log value of (Predicted - Squared Mean
Age Glucose Predicted value into of Actual
predicted Predicted Actual) (Error) Sequare
X Level (y) (y^) actual value data
value Log(y^) Log (y + 1)- (y-y˄)2 Log
(y + 1) Log(y) log (y^ + 1) Error
(y^ + 1)
43 99 81.706 100 82.706 2.000 1.918 0.082 0.007
21 65 73.231 66 74.231 1.820 1.871 0.051 0.003
25 79 74.772 80 75.772 1.903 1.880 0.024 0.001
0.045
42 75 81.320 76 82.320 1.881 1.916 0.035 0.001
57 87 87.098 88 88.098 1.944 1.945 0.000 0.000
59 81 87.869 82 88.869 1.914 1.949 0.035 0.001
0.038 0.002
Model Evaluation using Visualization
 Data visualization is the graphical representation of data in charts, graphs, and maps.
 This visualization technique aims to identify the Patterns, Trends, Correlations, and Outliers of
data sets.
Why Use Data Visualization?
• To make easier in understand and remember.
• To discover unknown facts, outliers, and trends.
• To visualize relationships and patterns quickly.
• To ask a better question and make better decisions.
• To competitive analyze.
• To improve insights.
Residual Plot
 Residual is the differences between actual value and predicted values. They are also known as
errors. i.e. Residual = Actual value - Predicted value; means e = y – ŷ.
 It measures how data points are long from the regression line.
 This used for assessing the quality of a model.
Residual Plot
 A residual plot is used to identify the underlying patterns in the residual values
 A residual plot is a scatterplot graph shows that
• X-axis: the Predicted values or independent target variable
• Y-axis: the residual values
 If the points in a residual plot are randomly scattered around the horizontal axis, a linear
regression model is appropriate for the data; otherwise, a nonlinear model is more appropriate.
x y ŷ e
60 70 65.411 4.589
70 65 71.849 -6.849
80 70 78.288 -8.288
85 95 81.507 13.493
95 85 87.945 -2.945
Residual Plot
 Types of Residual Plots
1. Random Pattern
 Residual values are randomly distributed, and there is no visible pattern in the values. The
developed model is considered a good fit.
 This random pattern indicates that a linear model provides a decent fit to the data.
2. U-Shaped Pattern
 The residual plot follows a U-shaped curve, the model is not considered a good fit, and a non-
linear model might be required.
Residual Plot
 Residual Plot Analysis
 Residual plot analysis is used to assess the validity of linear regression models by plotting the
residuals and checking whether the assumptions of linear regression models are met.
 A linear regression model can be combination of deterministic and stochastic. Using linear
equation models, we try to predict the deterministic part, and the remaining part is considered
as errors or residuals. These error terms or residuals must be independent and normally
distributed, i.e., stochastic.
 Characteristics of a Good Residual Plot
o A high density of points near the X-axis, i.e., points should be more concentrated near the
horizontal axis and less dense away from the horizontal axis.
o It should be symmetric around the X-axis.
Distribution Plot
 Distribution plots shows the distribution of data or numerical variable. Also known as a Kernel
Density Plot or Density Trace Graph.
 They are used to visually assess the distribution by comparing the actual and predicted data.
 It shows how the data points are distributed throughout the set.
 These distributions show the spread (dispersion, variability, and scatter) of the data.
 The spread may be stretched a wider range or squeezed a narrower range.

 A histogram can be used to represent the actual data which is covered with a density curve that
represents the predicted result.
 distplot displays a histogram overlapped with a density curve.
Distribution Plot
 Distribution plots are important for exploratory data analysis.
 They help us detect outliers and skewness, or get an overview of the measures of central
tendency (mean, median, and mode).
 This chart is a variation of a Histogram that uses kernel smoothing to plot values, allowing for
smoother distributions by smoothing out the noise.

 The horizontal axis of the histogram represents the entire range of data values. The vertical
axis represents the relative frequency.
Distribution Plot
 Use distribution plots in addition to more formal hypothesis tests to determine whether the
sample data comes from a specified distribution.
 Normal Probability Plots: Use normplot a assessing whether a data set is almost normally
distributed. Use probplot to create Probability Plots for distributions other than normal.

 Quantile-Quantile Plots: Use qqplot to determining if two data sets come from populations
with a common distribution. the quantiles of the first data set against the quantiles of the
second data set.

 Cumulative Distribution Plots: Use cdfplot or ecdf to display the empirical cumulative
distribution function (cdf) of the sample data for visual comparison to the theoretical cdf of a
specified distribution
Generalization error
Generalization error
 Generalization error is the out-of-sample error that measures how accurately a model can predict
values for previously unseen data. i.e. determines the model’s ability to react to new unseen data.
 Generalization error is composition bias-variance. Generalization error measured by MSE. E=b+v
 A difference between predicted and actual data is made by model inaccuracy, sampling error
and noise.
• Noise-An unnecessary or irrelevant data that can reduce our model’s performance. It is the
irreducible error, the lowest bound of generalization error
• Bias- ­
Differences between the expected predicted values and the actual values.
Bias is the prediction error that is introduced in the model due to oversimplifying.
It represents how much a model’s predictions different from the correct values.
High bias indicates that model is underfitting the data because it cannot capture meaningful patterns.
Low Bias: Model’s average predictions are very close to the actual values. When bias is low, the model
performs best at fitting the training data and making accurate predictions on new or unseen
data.
High Bias: Model’s average predictions are very long from the actual values and the model unable to
Generalization error
Variance- ­
• The amount of difference in the prediction if the different training data was used.
• High variance indicates that the model is overfitting the data
• Variance errors are either of low variance or high variance.
Low variance- A very small change in predictions when we change the input dataset.
High variance- A large difference in the predictions when we change the input data.
Overfitting
• Overfitting occurs when the trained model performs well on the training data and poorly
performs on the testing datasets (new data). i.e. when a model tries to cover all the data
points present in the given dataset.
• Model becomes more complex with the presence of noise in a data set.
• Overfitting can happen due to low bias and high variance.
• The data points covered by this line are training data.
Reasons for Overfitting
• training data contains noise.
• The model has a high variance
• Training dataset size is insufficient
• The model is too complex
Techniques to reduce Overfitting
• Using K-fold cross-validation
• Using Regularization techniques such as Lasso and Ridge
• Training model with sufficient data
• Adopting ensembling techniques
• Increase training data.
• Reduce model complexity.
Underfitting
• Underfitting occurs when a model is not able to make accurate predictions based on training
data.
• Underfitting is just the opposite of overfitting.
• Underfitting have poor performance both in training and testing sets.
• Underfitting occurs due to high bias and low variance.
Reasons for Underfitting
• High bias and low variance
• The size of the training dataset used is not enough.
• The model is too simple.
• Training data is not cleaned and also contains noise in it.

Techniques to reduce Underfitting


•Increase model complexity
•Increase the number of features, performing feature engineering
•Remove noise from the data.
•Increase the duration of training to get better results.
Prediction by using Ridge Regression
 Multicollinearity: A high correlation between the independent variables or model parameters,
then the model can become too complex and prone to overfitting.
 The ridge regression is an extension of linear regression that analyzing multicollinearity in
multiple regression data. It also known as L2 regularization.
 The main goal of Ridge regression is to find a new line that doesn’t fit well on the training data
but reduces the error on the testing data.

• The best fit line passes through the 3 training dataset


points, sum of squared residuals error is 0
• But, for the testing dataset, the sum of residuals error is
large so the line has a high variance.
• Variance is a difference in fit between the training
dataset and the testing dataset.
• This regression model is overfitting the training dataset
Prediction by using Ridge Regression
• Ridge regression changing the slope of the line
• The model will perform constantly well on both the training and testing datasets.

 It aims to reduce the sum of squared errors between the actual and predicted values by adding
a penalty term to the regression estimates that reduces the coefficients and brings them closer
to zero. i.e. penalizes large coefficient values.
 After adding a hyperparameter, the model will also change the best-fit line.
Prediction by using Ridge Regression

λ (lambda): regularization parameter


slope²: sum of squared coefficients
 Ridge regression is a powerful technique that avoid overfitting and reduce model complexity.
 Ridge Regression produces more stable and accurate predictions, effectively mitigating the
problems associated with multicollinearity.
 Example:
• A housing dataset where you want to predict the price of a house based on various features such
as size, number of bedrooms, location, and age.
• There are some features are highly correlated with each other.
• The size of the house and the number of bedrooms might be strongly correlated. i.e.
multicollinearity
• Ridge regression model for adding a penalty term that would reduce the coefficients of the
Model Selection
 Model selection is the process of choosing one of the models as the final model that addresses
the problem.
 All models have some predictive error, given the statistical noise in the data, the
incompleteness of the data sample, and the limitations of each different model type.
 The best approach to model selection requires “sufficient” data, which may be nearly infinite
depending on the complexity of the problem.
 Two main classes of techniques to model selection:
 Probabilistic Measures: Choose a model via in-sample error and complexity.
 Resampling Methods: Choose a model via estimated out-of-sample error.
Testing Multiple Parameters by using Grid Search
 A model parameter is an internal characteristic of the model and its value can be estimated from
data. Ex:- slope, intercept, weight, bias
 A model hyperparameter is an external characteristic of a model and its value cannot be
estimated from data. Ex:- Learning rate, hide layer, no. of epochs
 Hyperparameters are parameters that are set by the user before training the model.
 Grid search is a hyperparameter tuning technique is used to identify the optimal
hyperparameters with all possible combinations for a model which predictions more ‘accurate’.
 The model performance is measured by a scoring metric such as accuracy or F1 score.
 Finding hyperparameters in training data is impossible and hyperparameters values affect the
performance of a model. So, to find the optimal hyperparameters, we create a Grid.
 The main goal is finding the perfect combination of hyperparameters that will minimize the
error and maximize the accuracy of our model i.e. try out different values of hyperparameter
and pick the value that gives best result.
 The models are trained and evaluated through cross-validation.
 Cross-validation measures how a model generalizes itself to an independent dataset.
Testing Multiple Parameters by using Grid Search

 Circles represent different hyperparameters. We begin with one value


for hyperparameters and train the model.
 We use different hyperparameters to train the model continuously
until beat the various parameter values.
 Every model produces an error. We pick the best hyperparameter that
minimizes the error.
Testing Multiple Parameters by using Grid Search
Grid Search performs hyperparameter tuning through the following process:
 Define the hyperparameter space: Specify the hyperparameters and their possible values to be
considered in the search.
 Evaluate combinations: Evaluate all possible combinations of hyperparameter values using a
predefined evaluation metric, typically employing cross-validation to estimate model
performance.
 Select the best combination: Identify the combination of hyperparameter values that yields the
best performance according to the evaluation metric.
Benefits of Grid Search:
 Optimal model performance: helps to identify the best hyperparameter, resulting in improved
model performance.
 Automated tuning: Grid Search automates the hyperparameter tuning process, reducing
manual effort and improving productivity.
 Robust evaluation: Grid Search often employs cross-validation, providing a robust estimate of
model performance and reducing the risk of overfitting.

You might also like