ML Unit 1
ML Unit 1
Learning by Induction, Reinforcement Learning, Types of Data, Matching, Stages in Machine Learning, Data Acquisition,
Feature Engineering, Data Representation, Model Selection, Model Learning, Model Evaluation, Model Prediction, Search
and Learning, Data Sets.
The evolution of Machine Learning (ML) can be traced back to the 1950s, marking its formal beginning. Here’s a timeline of key
milestones:
The evolution of Machine Learning (ML) can be traced back to the 1950s, marking its formal beginning. Here’s a timeline of
key milestones:
1950s: Early Foundations - 1950: Alan Turing -Turing Test - a framework for evaluating a machine's ability to exhibit
intelligent behavior.
1952: Arthur Samuel - developed the first machine learning program—a checkers-playing program
1960s-1980s: Symbolic AI and Early Algorithms- on symbolic reasoning and rule-based systems-learning algorithms
like nearest neighbors and perceptron (a simple neural network).
1980s-1990s: Shift to Statistical Learning- symbolic AI to data-driven approaches - Decision Tree, Support Vector
Machines (SVMs), Backpropagation algorithm for neural networks - pattern recognition and learning from data.
2000s: Big Data and Practical Applications- Explosion of digital data and computational power- Random Forests,
Gradient Boosting - healthcare, finance, and e-commerce.
The 2010s: Deep Learning Revolution - Convolutional Neural Networks (CNNs) for image recognition, Recurrent
Neural Networks (RNNs) for sequence data, Transformers for NLP (e.g., BERT, GPT) - Enhanced by GPUs and cloud
computing, enabling large-scale training.
2020s and Beyond: Modern Era - unsupervised and self-supervised learning, reinforcement learning applications,
ethical AI, and explainable ML, Large-scale models like GPT-4 and BERT dominate applications in language and image
processing.
Learning:
1. data - structured:- Data organized in rows and columns (tabular format) - Spreadsheets, databases.
2. Algorithms:-a set of mathematical instructions that guide the learning process – decision tree
3. Training:- feeding data to the model so it can learn patterns – training – teaching model
- Testing – evaluate model
4. Model:- output of the training process, representing the learned patterns.
5. Features & labels:- input variables used to make prediction & target variable model aims to predict.
Reinforcement Learning is a type of machine learning where an agent learns to make decisions by interacting with an
environment. The agent receives feedback in the form of rewards or penalties based on its actions and aims to maximize
cumulative rewards over time.
1. Positive Reinforcement
Definition: Encouraging desired behavior by providing a positive reward.
Objective: Increase the likelihood of repeating actions that yield positive outcomes.
Examples:
o A robot learning to move correctly by receiving points for each correct movement.
2. Negative Reinforcement
Definition: Encouraging desired behavior by removing negative consequences.
Objective: Reinforce behavior that helps avoid unfavorable outcomes.
Examples:
o An autonomous vehicle avoiding obstacles to minimize penalties.
Learning by Rote is a method of memorization that relies on repetition. Instead of focusing on understanding the underlying
principles or concepts, it emphasizes repeatedly going over the material until it can be recalled automatically. This approach is
often used for foundational knowledge that requires quick recall, such as facts, formulas, vocabulary, or sequences.
2. Writing It Down: They write the number multiple times in their notebook to reinforce memory.
3. Testing Recall: The child covers the written number and tries to recite it from memory. If they forget a digit, they start
over and continue until they can recite it perfectly.
In this case, the child has memorized the number but may not understand other details, like how area codes work or how phone
systems are structured.
A child learns the multiplication table (e.g., 7 × 8 = 56) by repeatedly chanting the table until it is memorized.
Process:
1. Listening to or reading the table repeatedly.
2. Write the table multiple times for reinforcement.
3. Testing recall by covering the answers and reciting them.
Benefits of Rote Learning:
Quick Recall: It's effective for remembering basic information (e.g., multiplication tables, dates, or formulas).
Limitations:
Lack of Understanding: The learner might not grasp why something is true or how it works.
Poor Application: It doesn't promote critical thinking or applying knowledge to new situations.
This involves a collection of labeled examples, which the ML system uses to make predictions on a new data pattern. For learning
from examples, we deal with two ML problems:
Classification: Consider the handwritten digits shown in Figure 1.3. Here, each row has 15 examples of each digit. The problem
is to learn an ML model using such data to classify a new data pattern. This is also called supervised learning.
Regression: Contrary to classification, regression deals with cases where labels come from a possibly infinite set. For example,
the share value of a stock could be represented as a positive real number. The stock may have different values at a particular time.
Predicting the share value of a stock at a future time is a typical regression or curve-fitting problem.
The image displays handwritten digits labeled from 0 to 9 (Figure 1.3). Each row contains samples of individual digits. These are
used as training examples in supervised learning for classification tasks.
By considering the handwritten digit dataset of classes 0, 1, and 3, and clustering patterns separately for each class, we obtain
three distinct clusters.
When clustering is performed on the same dataset (digits 0, 1, and 3), but without using the class labels, the algorithm produces 9
clusters.
These clusters have different centroids compared to when class labels are used.
Types Of Data:-
In general, data can be categorical or numerical
Categorical: This type of data can be nominal or ordinal. In the case of nominal data, there is no order among the domain
elements. For example, the domain is (brown, black, and red) for hair color. This data is of categorical type and the elements of
the domain are not ordered. On the contrary, in ordinal data, there is an order among the values of the domain. For example, the
domain of variable employee number could be {1, 2,..., 1011) if there are 1011 employees in an organization.
Numerical: In the case of numerical data, the domain of values of the data type could be a set/subset of integers or a set/subset of
real numbers. The domain of Diagnosis, the class label, is a binary set with values Malignant and Benign. The domain of ID
Number is a subset of integers in the range (8670, 917897) and the domain of Area Mean is a collection of floating point numbers
(interval) in the range [143.5, 2501]. It is possible to have binary values in the domain for categorical or numerical data. For
example, the domain of Status could be (Pass, Fail) and this variable is nominal; an example of a binary ordinal type is (short,
tall} for humans based on their height. A very popular binary numerical type is {0, 1}
Matching is carried out by using a proximity measure which can be a distance/dissimilarity measure or a similarity measure. Two
data items, u and v , represented as I-dimensional vectors, match better when the distance between them is smaller of when the
similarity between them is larger.
A popular distance measure is Manhattan, Hamilton and the Euclidean distance and a popular similarity measure is the
cosine of the angle between vectors.
cos(u, v) = where u,v is the dot product between vectors u and v and ||u|| is the Euclidean distance between u and the origin: it is
also called the Euclidean norm.
Finding the Nearest Neighbor of a Pattern: Let z be a l-dimensional pattern vector. Let X = (x1,x2…..xn) be a collection of n
data vectors. The nearest neighbor of x and X, denoted by NNx(X), is xj if d(x,xj) ≤ d(x,xj), Vxi, EX.
Stages In Machine Learning
Application domain Data acquisition Data acquisition
Application Domain:
o The specific field or industry where the machine learning solution will be applied (e.g., healthcare, finance, e-
commerce).
o Domain knowledge is crucial to understand the problem and collect relevant data.
Data Acquisition:
o The process of gathering raw data from various sources like databases, APIs, sensors, or user interactions.
o The quality and relevance of the data are essential for building effective models.
Feature Engineering:
o The process of preparing raw data into a suitable format for machine learning.
Preprocessing: Cleaning, transforming, and normalizing data (e.g., handling missing values, encoding
categorical variables).
Representation: Creating meaningful features from raw data (e.g., deriving age from date of birth,
generating embeddings for text).
Data Preprocessing
Feature Scaling
Feature scaling ensures all numerical features have the same scale to improve model performance.
Methods:
2. Standardization: z = (x - μ) / σ
1. Min-Max Scaling
Data representation refers to how data is organized, stored, and presented in a way that can be processed by
computers or understood by humans. This can include visual formats (charts, graphs, tables) and computational
formats (binary, text, vectors, etc.).
In the context of machine learning, data representation often involves transforming raw data into a format that a
model can process effectively. For example:
Text Data: Words transformed into vectors (e.g., using embeddings like Word2Vec or BERT).
The peaking phenomenon refers to the behavior observed in machine learning models, particularly during the
training process, where increasing the model's complexity initially improves performance but eventually leads to a
decline in performance. This decline is often due to overfitting, where the model captures noise and irrelevant
patterns in the training data rather than generalizing well to unseen data.
when the number of training objects is smaller than the dimensionality of the data, adding more data to the training
set may first increase the error rate before decreasing it. This, possibly counterintuitive, phenomenon is known as
peaking.
The model is too simple to capture the underlying structure of the data.
The model achieves the best balance between capturing patterns in the data and avoiding noise.
The model becomes overly complex and starts to memorize noise or irrelevant details.
Data representation refers to how data is organized, stored, and presented in a way that can be processed by
computers or understood by humans. This can include visual formats (charts, graphs, tables) and computational
formats (binary, text, vectors, etc.).
Let’s explore feature extraction, feature selection, feature transformation, feature creation, handling missing or
sparse data, and feature dimensionality reduction using an example.
1. Feature Extraction
Feature extraction involves creating new features from raw data to represent the information in the dataset better
Example:
Extract Age_Group from the Age column (e.g., Young: <30, Middle: 30–50, Senior: >50).
Convert Income into buckets (e.g., Low: <60k, Medium: 60k–100k, High: >100k).
Age_Group Income_Bucket
Young Low
Middle Medium
Middle NaN
Senior High
2. Feature Selection
Feature selection identifies the most relevant features for predicting the target variable. It helps reduce noise and
improves model performance.
Methods:
Example:
3. Feature Transformation
Feature transformation changes the scale, distribution, or format of the data to improve model performance.
Scaling:
Normalize Income, Loan_Amount, and Credit_Score to bring them to the same scale (e.g., Min-Max Scaling).
Encoding:
Convert categorical variables like Marital_Status into numerical values using One-Hot Encoding:
4. Feature Creation
Example:
Debt_to_Income_Ratio Risk_Factor
0.4 High
0.4 Medium
NaN Medium
0.42 Low
Dealing with missing or sparse values is essential for ensuring clean data.
Example:
For categorical Missing_Field, use mode or create a new category like "Unknown".
For NaN in Debt_to_Income_Ratio, set it to the mean or create a flag feature (Has_Missing_Data ).
This reduces the number of features while preserving most of the information.
If you have a hundred clusters but need the information in 10 clusters, the information will be reduced from 100 to
10 clusters only using PCA.
Combine correlated features like Income, Loan_Amount, and Debt_to_Income_Ratio into principal components.
Feature Elimination:
Model Selection
Involves choosing the most appropriate machine learning algorithm or model for a specific problem.
Examples: Decision Trees, Support Vector Machines (SVMs), Neural Networks, etc.
Model Learning
The process where a machine learning algorithm is trained on a dataset to adjust its parameters.
It involves:
Goal: Learn patterns and relationships within the data to generalize well to unseen examples.
Model Evaluation
o Cross-validation.
o Confusion matrix for classification.
Metrics:
Ensures the model is neither underfitting nor overfitting and performs well on unseen data.
Model Prediction
After training and evaluation, the model is used to make predictions on new or unseen data.
This is the end-use of a model, where it provides insights, classifications, or forecasts based on its learned parameters.
Search:
o Refers to exploring the solution space to find the best model, hyperparameters, or other configurations.
Learning:
o The broader process encompassing how models are trained and updated to capture data patterns.
Model explanation
o Interrupting the model's prediction to ensure they are meaningful and justifiable
o Using domain knowledge and experts' feedback to valid at the prediction
o Build trust and ensure the model alliance with real-world expectations
o Example why a credit card score in the model denied a loan application by highlighting the model in fluctuation
features
o Example low-income or high debt-to-income ratio
Data Sets