0% found this document useful (0 votes)
255 views15 pages

ML Unit 1

The document provides an extensive introduction to Machine Learning (ML), covering its evolution, paradigms, types of data, and applications across various industries such as healthcare and finance. It discusses key concepts like supervised and unsupervised learning, feature engineering, model evaluation, and the importance of data acquisition. Additionally, it outlines the stages in the ML process and the significance of data representation and preprocessing for effective model training.

Uploaded by

maneeshgopisetty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
255 views15 pages

ML Unit 1

The document provides an extensive introduction to Machine Learning (ML), covering its evolution, paradigms, types of data, and applications across various industries such as healthcare and finance. It discusses key concepts like supervised and unsupervised learning, feature engineering, model evaluation, and the importance of data acquisition. Additionally, it outlines the stages in the ML process and the significance of data representation and preprocessing for effective model training.

Uploaded by

maneeshgopisetty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

UNIT-I: Introduction to Machine Learning: Evolution of Machine Learning, Paradigms for ML, Learning by Rote,

Learning by Induction, Reinforcement Learning, Types of Data, Matching, Stages in Machine Learning, Data Acquisition,
Feature Engineering, Data Representation, Model Selection, Model Learning, Model Evaluation, Model Prediction, Search
and Learning, Data Sets.

Introduction to Machine Learning


Machine Learning (ML) is a branch of artificial intelligence (AI) that enables systems to learn and improve from
experience without being explicitly programmed. ML focuses on building algorithms that can process data, identify patterns, and
make decisions or predictions.
Why Machine Learning?
 To handle complex problems where traditional programming is infeasible.
 To enable automation in industries like healthcare, finance, and e-commerce.
 To make predictions and recommendations in real time.

Applications of Machine Learning

 Healthcare: Disease prediction, personalized treatment.


 Finance: Fraud detection, stock market analysis.
 E-commerce: Recommendation systems, customer segmentation.
 Transportation: Autonomous vehicles, traffic prediction.
 NLP: Chatbots, language translation.
Evolution of Machine Learning

The evolution of Machine Learning (ML) can be traced back to the 1950s, marking its formal beginning. Here’s a timeline of key
milestones:

The evolution of Machine Learning (ML) can be traced back to the 1950s, marking its formal beginning. Here’s a timeline of
key milestones:

 1950s: Early Foundations - 1950: Alan Turing -Turing Test - a framework for evaluating a machine's ability to exhibit
intelligent behavior.
 1952: Arthur Samuel - developed the first machine learning program—a checkers-playing program
 1960s-1980s: Symbolic AI and Early Algorithms- on symbolic reasoning and rule-based systems-learning algorithms
like nearest neighbors and perceptron (a simple neural network).
 1980s-1990s: Shift to Statistical Learning- symbolic AI to data-driven approaches - Decision Tree, Support Vector
Machines (SVMs), Backpropagation algorithm for neural networks - pattern recognition and learning from data.
 2000s: Big Data and Practical Applications- Explosion of digital data and computational power- Random Forests,
Gradient Boosting - healthcare, finance, and e-commerce.
 The 2010s: Deep Learning Revolution - Convolutional Neural Networks (CNNs) for image recognition, Recurrent
Neural Networks (RNNs) for sequence data, Transformers for NLP (e.g., BERT, GPT) - Enhanced by GPUs and cloud
computing, enabling large-scale training.
 2020s and Beyond: Modern Era - unsupervised and self-supervised learning, reinforcement learning applications,
ethical AI, and explainable ML, Large-scale models like GPT-4 and BERT dominate applications in language and image
processing.

Learning:

1. data - structured:- Data organized in rows and columns (tabular format) - Spreadsheets, databases.

-Unstructured:- Data without a predefined structure - Text, images, audio, videos.

2. Algorithms:-a set of mathematical instructions that guide the learning process – decision tree
3. Training:- feeding data to the model so it can learn patterns – training – teaching model
- Testing – evaluate model
4. Model:- output of the training process, representing the learned patterns.
5. Features & labels:- input variables used to make prediction & target variable model aims to predict.

Paradigms for ML:


Supervised learning is a type of machine learning where the model learns from labeled data. Each data point in the
training dataset has input features (independent variables) and corresponding labels (dependent variables). The goal is
to learn a mapping function that maps inputs to outputs, enabling the model to predict new, unseen data.
1. Classification

 Definition: Assigning input data to predefined categories or classes.


 Goal: Predict discrete labels.
 Examples:
o Email classification: Spam or Not Spam.
o Medical diagnosis: Disease type identification.
2. Regression
 Definition: Predicting continuous numerical values based on input features.
 Goal: Model the relationship between variables to predict quantitative outcomes.
 Examples:
o Predicting house prices based on location and size.
o Estimating stock prices.
Unsupervised Learning is a type of machine learning where the model learns patterns, structures, or relationships in the
data without labeled outputs. It is used to explore the underlying structure of data, identify groups, or reduce dimensionality.
1. Clustering
 Definition: Grouping similar data points into clusters based on their characteristics.
 Objective: Find natural groupings in the data.
 Examples:
o Customer segmentation in marketing.
o Grouping genes with similar expression patterns in biology.
2. Dimensionality Reduction
 Definition: Reducing the number of input features while preserving important information.
 Objective: Simplify data for visualization, reduce computational complexity, or remove noise.
 Examples:
o Visualizing high-dimensional data in 2D or 3D.
o Compressing image data.
3. Anomaly Detection
 Definition: Identifying rare or unusual data points that differ significantly from the majority.
 Objective: Detect outliers in the data.
 Examples:
o Fraud detection in banking.
o Identifying defective products in manufacturing.
4. Association Rule Learning
 Definition: Discovering relationships or associations between variables in large datasets.
 Objective: Identify patterns of co-occurrence.
 Examples:
o Market basket analysis (e.g., products frequently bought together).

Reinforcement Learning is a type of machine learning where an agent learns to make decisions by interacting with an
environment. The agent receives feedback in the form of rewards or penalties based on its actions and aims to maximize
cumulative rewards over time.
1. Positive Reinforcement
 Definition: Encouraging desired behavior by providing a positive reward.
 Objective: Increase the likelihood of repeating actions that yield positive outcomes.
 Examples:
o A robot learning to move correctly by receiving points for each correct movement.
2. Negative Reinforcement
 Definition: Encouraging desired behavior by removing negative consequences.
 Objective: Reinforce behavior that helps avoid unfavorable outcomes.
 Examples:
o An autonomous vehicle avoiding obstacles to minimize penalties.
Learning by Rote is a method of memorization that relies on repetition. Instead of focusing on understanding the underlying
principles or concepts, it emphasizes repeatedly going over the material until it can be recalled automatically. This approach is
often used for foundational knowledge that requires quick recall, such as facts, formulas, vocabulary, or sequences.

Real-World Example: Memorizing a Phone Number


Imagine a child who wants to remember their parent's phone number, 9876543210, for emergencies. They use rote learning by:

1. Repeating the Number: They say the number aloud repeatedly:


"9876543210, 9876543210, 9876543210..."

2. Writing It Down: They write the number multiple times in their notebook to reinforce memory.

3. Testing Recall: The child covers the written number and tries to recite it from memory. If they forget a digit, they start
over and continue until they can recite it perfectly.

In this case, the child has memorized the number but may not understand other details, like how area codes work or how phone
systems are structured.

Example 2: Memorizing Mathematical Tables

 A child learns the multiplication table (e.g., 7 × 8 = 56) by repeatedly chanting the table until it is memorized.

 Process:
1. Listening to or reading the table repeatedly.
2. Write the table multiple times for reinforcement.
3. Testing recall by covering the answers and reciting them.
Benefits of Rote Learning:

 Quick Recall: It's effective for remembering basic information (e.g., multiplication tables, dates, or formulas).

 Efficiency in Repetition: It is useful when comprehension isn't immediately required.

 Foundational Knowledge: It provides a base for further learning and understanding.

Limitations:

 Lack of Understanding: The learner might not grasp why something is true or how it works.

 Poor Application: It doesn't promote critical thinking or applying knowledge to new situations.

 Forgetfulness: Without periodic reinforcement, memorized information can fade quickly.


Learning by Induction
Induction is a popular and effective form of Machine Learning (ML). Here, learning is achieved with the help of examples or
observations. It may be categorized as follows:

1. Learning from Examples

This involves a collection of labeled examples, which the ML system uses to make predictions on a new data pattern. For learning
from examples, we deal with two ML problems:

Classification: Consider the handwritten digits shown in Figure 1.3. Here, each row has 15 examples of each digit. The problem
is to learn an ML model using such data to classify a new data pattern. This is also called supervised learning.

Regression: Contrary to classification, regression deals with cases where labels come from a possibly infinite set. For example,
the share value of a stock could be represented as a positive real number. The stock may have different values at a particular time.
Predicting the share value of a stock at a future time is a typical regression or curve-fitting problem.

2. Learning from Observations


Observations need not be labeled. In this case, clustering algorithms are used to group observations into smaller numbers of
clusters. Each cluster is represented by its centroid or mean.

Examples of Handwritten Digits

The image displays handwritten digits labeled from 0 to 9 (Figure 1.3). Each row contains samples of individual digits. These are
used as training examples in supervised learning for classification tasks.

Clustering Using Class Labels

By considering the handwritten digit dataset of classes 0, 1, and 3, and clustering patterns separately for each class, we obtain
three distinct clusters.

The centroids of these clusters, derived using class labels, are:

Clustering Without Class Labels

When clustering is performed on the same dataset (digits 0, 1, and 3), but without using the class labels, the algorithm produces 9
clusters.
These clusters have different centroids compared to when class labels are used.

Types Of Data:-
In general, data can be categorical or numerical

Categorical: This type of data can be nominal or ordinal. In the case of nominal data, there is no order among the domain
elements. For example, the domain is (brown, black, and red) for hair color. This data is of categorical type and the elements of
the domain are not ordered. On the contrary, in ordinal data, there is an order among the values of the domain. For example, the
domain of variable employee number could be {1, 2,..., 1011) if there are 1011 employees in an organization.

Numerical: In the case of numerical data, the domain of values of the data type could be a set/subset of integers or a set/subset of
real numbers. The domain of Diagnosis, the class label, is a binary set with values Malignant and Benign. The domain of ID
Number is a subset of integers in the range (8670, 917897) and the domain of Area Mean is a collection of floating point numbers
(interval) in the range [143.5, 2501]. It is possible to have binary values in the domain for categorical or numerical data. For
example, the domain of Status could be (Pass, Fail) and this variable is nominal; an example of a binary ordinal type is (short,
tall} for humans based on their height. A very popular binary numerical type is {0, 1}

Feature Attribute Type of Data Domain


Number
1 Diagnosis Nominal {Malignant, Benign}
2 ID Number Ordinal [8670, 917897]
3 Perimeter_Mean Numerical [43.79, 188.5]
4 Area_mean Numerical [144.4, 67.5]
5 Smoothness_Mean Numerical [0.0562,0.25]
Matching:-

Matching is carried out by using a proximity measure which can be a distance/dissimilarity measure or a similarity measure. Two
data items, u and v , represented as I-dimensional vectors, match better when the distance between them is smaller of when the
similarity between them is larger.

A popular distance measure is Manhattan, Hamilton and the Euclidean distance and a popular similarity measure is the
cosine of the angle between vectors.

The Euclidean distance is given by

The cosine similarity is given by

cos(u, v) = where u,v is the dot product between vectors u and v and ||u|| is the Euclidean distance between u and the origin: it is
also called the Euclidean norm.

Some of the important applications of matching in ML are in:

Finding the Nearest Neighbor of a Pattern: Let z be a l-dimensional pattern vector. Let X = (x1,x2…..xn) be a collection of n
data vectors. The nearest neighbor of x and X, denoted by NNx(X), is xj if d(x,xj) ≤ d(x,xj), Vxi, EX.
Stages In Machine Learning
Application domain Data acquisition Data acquisition

Feature engineering = Preprocessing + Representation

Model selection choose a model Domain knowledge

Model learning Train the model Training data

Model evaluation validate the model Evaluation data (validation data)

Model prediction Learn the model Test data

Model explanation Explain the model Expert feedback

1. Application Domain and Data Acquisition

 Application Domain:

o The specific field or industry where the machine learning solution will be applied (e.g., healthcare, finance, e-
commerce).

o Domain knowledge is crucial to understand the problem and collect relevant data.

 Data Acquisition:

o The process of gathering raw data from various sources like databases, APIs, sensors, or user interactions.

o The quality and relevance of the data are essential for building effective models.

2. Feature Engineering = Preprocessing + Representation

 Feature Engineering:

o The process of preparing raw data into a suitable format for machine learning.

 Preprocessing: Cleaning, transforming, and normalizing data (e.g., handling missing values, encoding
categorical variables).

 Representation: Creating meaningful features from raw data (e.g., deriving age from date of birth,
generating embeddings for text).

Data Preprocessing

Handling Missing Values


Example Dataset:

ID Age Salary City Purchased

1 25 50000 New York Yes

2 30 54000 Los Angeles No

3 nan 58000 Chicago Yes


4 35 56000 Chicago No

5 40 nan New York Yes

first, fill missing values for Age and Salary:

 Replace missing Age with the mean: 25+30+35+404=32.5

{25 + 30 + 35 + 40}/{4} = 32.5425+30+35+40=32.5.

 Replace missing Salary with the mean: 50000+54000+58000+620004=56000

{50000 + 54000 + 58000 + 62000}/{4} = 56000450000+54000+58000+62000=56000.

Feature Scaling
Feature scaling ensures all numerical features have the same scale to improve model performance.

Methods:

1. Min-Max Scaling: x_scaled = (x - x_min) / (x_max - x_min)

2. Standardization: z = (x - μ) / σ

1. Min-Max Scaling

Min-Max scaling scales features to a range of [0,1] using the formula:

. Min-Max Scaling: x_scaled = (x - x_min) / (x_max - x_min)

 For Age: Min = 25, Max = 40

 For Salary: Min = 50000, Max = 62000


Data Representation

Data representation refers to how data is organized, stored, and presented in a way that can be processed by
computers or understood by humans. This can include visual formats (charts, graphs, tables) and computational
formats (binary, text, vectors, etc.).

In the context of machine learning, data representation often involves transforming raw data into a format that a
model can process effectively. For example:

Numerical Data: Continuous values like age, weight, or price.

Categorical Data: Labels or categories encoded as numbers.

Text Data: Words transformed into vectors (e.g., using embeddings like Word2Vec or BERT).

Image Data: Pixels represented as arrays or tensors.

The peaking phenomenon refers to the behavior observed in machine learning models, particularly during the
training process, where increasing the model's complexity initially improves performance but eventually leads to a
decline in performance. This decline is often due to overfitting, where the model captures noise and irrelevant
patterns in the training data rather than generalizing well to unseen data.

Stages of the Peaking Phenomenon

when the number of training objects is smaller than the dimensionality of the data, adding more data to the training
set may first increase the error rate before decreasing it. This, possibly counterintuitive, phenomenon is known as
peaking.

Underfitting (Low Complexity):

The model is too simple to capture the underlying structure of the data.

Training and testing performance are both poor.

Optimal Fit (At the Peak):

The model achieves the best balance between capturing patterns in the data and avoiding noise.

This is the "sweet spot" for model complexity.

Overfitting (Beyond the Peak):

The model becomes overly complex and starts to memorize noise or irrelevant details.

Training performance remains high, but test performance declines.

Data representation refers to how data is organized, stored, and presented in a way that can be processed by
computers or understood by humans. This can include visual formats (charts, graphs, tables) and computational
formats (binary, text, vectors, etc.).

Let’s explore feature extraction, feature selection, feature transformation, feature creation, handling missing or
sparse data, and feature dimensionality reduction using an example.

Example Dataset: Customer Data for Loan Approval

1. Feature Extraction

Feature extraction involves creating new features from raw data to represent the information in the dataset better

Customer_ID Age Income Marital_Status Loan_Amount Credit_Score Missing_Field Loan_Status


1 25 50000 Single 20000 720 Yes Approved
2 40 100000 Married 40000 680 No Denied
3 30 NaN Single 30000 710 Yes Approved
4 45 120000 Divorced 50000 670 NaN Denied
1 25 50000 Single 20000 720 Yes Approved
.

Example:

Extract Age_Group from the Age column (e.g., Young: <30, Middle: 30–50, Senior: >50).

Convert Income into buckets (e.g., Low: <60k, Medium: 60k–100k, High: >100k).

Age_Group Income_Bucket

Young Low

Middle Medium

Middle NaN

Senior High
2. Feature Selection

Feature selection identifies the most relevant features for predicting the target variable. It helps reduce noise and
improves model performance.

Methods:

Correlation matrix to check relationships with Loan_Status.

Feature importance using models like Random Forest.

Example:

Features selected: Age, Income, Credit_Score, Loan_Amount.

Dropped: Customer_ID, Marital_Status, Missing_Field (irrelevant or redundant).

3. Feature Transformation

Feature transformation changes the scale, distribution, or format of the data to improve model performance.

Scaling:

Normalize Income, Loan_Amount, and Credit_Score to bring them to the same scale (e.g., Min-Max Scaling).

Encoding:

Convert categorical variables like Marital_Status into numerical values using One-Hot Encoding:

Single → [1, 0, 0], Married → [0, 1, 0], Divorced → [0, 0, 1].

4. Feature Creation

Feature creation involves deriving new features from existing ones.

Example:

Create Debt_to_Income_Ratio = Loan_Amount / Income.

Create Risk_Factor by combining Credit_Score and Debt_to_Income_Ratio.

Debt_to_Income_Ratio Risk_Factor

0.4 High

0.4 Medium

NaN Medium

0.42 Low

5. Handling Missing or Sparse Data

Dealing with missing or sparse values is essential for ensuring clean data.

Example:

Fill in the missing Income with the median (e.g., 75,000).

For categorical Missing_Field, use mode or create a new category like "Unknown".

For NaN in Debt_to_Income_Ratio, set it to the mean or create a flag feature (Has_Missing_Data ).

6. Feature Dimensionality Reduction

This reduces the number of features while preserving most of the information.
If you have a hundred clusters but need the information in 10 clusters, the information will be reduced from 100 to
10 clusters only using PCA.

Name. Att%. Ass missing. Gender. Final. Engagement


score= att%- ass
miss×5
John 85 2 M 78 Find out?
SARAH 75 NAN F 87
PRIYA NAN 3 F NAN
SARATH 89 0 M 76S

Principal Component Analysis (PCA):

Combine correlated features like Income, Loan_Amount, and Debt_to_Income_Ratio into principal components.

Feature Elimination:

Dropless impactful features based on low variance or importance scores.

Final Transformed Dataset (Example)

Age_Group Income_Bucket Loan_Amount Credit_Score Debt_to_Income_Ratio Risk_Factor Loan_Status


Young Low 20000 720 0.4 High Approved
Middle Medium 40000 680 0.4 Medium Denied
Middle Medium 30000 710 0.4 Medium Approved
Senior High 50000 670 0.42 Low Denied
Young Low 20000 720 0.4 High Approved

Model Selection

 Involves choosing the most appropriate machine learning algorithm or model for a specific problem.

 Factors influencing model selection:

o Type of problem (classification, regression, clustering, etc.).

o Size and quality of the dataset.

o Interpretability vs. complexity tradeoff.

o Performance metrics (e.g., accuracy, precision, recall).

 Examples: Decision Trees, Support Vector Machines (SVMs), Neural Networks, etc.

Model Learning

 The process where a machine learning algorithm is trained on a dataset to adjust its parameters.

 It involves:

o Optimizing an objective or loss function.

o Using methods like gradient descent, backpropagation, or evolutionary algorithms.

o Dividing the data into training, validation, and testing sets.

 Goal: Learn patterns and relationships within the data to generalize well to unseen examples.

Model Evaluation

 Refers to assessing the performance of a trained model.

 Common evaluation techniques:

o Cross-validation.
o Confusion matrix for classification.

o Mean Squared Error (MSE) for regression.

 Metrics:

o Precision, recall, F1-score, ROC-AUC, etc.

 Ensures the model is neither underfitting nor overfitting and performs well on unseen data.

Model Prediction

 After training and evaluation, the model is used to make predictions on new or unseen data.

 This is the end-use of a model, where it provides insights, classifications, or forecasts based on its learned parameters.

 Example: A spam detection model predicts whether an email is spam or not.

Search and Learning

 Search:

o Refers to exploring the solution space to find the best model, hyperparameters, or other configurations.

o Techniques: Grid search, Random search, Bayesian optimization, Genetic algorithms.

 Learning:

o The broader process encompassing how models are trained and updated to capture data patterns.

o Includes supervised, unsupervised, and reinforcement learning paradigms.

Model explanation
o Interrupting the model's prediction to ensure they are meaningful and justifiable
o Using domain knowledge and experts' feedback to valid at the prediction
o Build trust and ensure the model alliance with real-world expectations
o Example why a credit card score in the model denied a loan application by highlighting the model in fluctuation
features
o Example low-income or high debt-to-income ratio

Data Sets

You might also like