0% found this document useful (0 votes)
224 views

Concept Learning

This document provides an overview of concept learning in machine learning. It discusses concept learning tasks, examples of concepts and instances, hypotheses for representing concepts, and training examples. It describes general to specific ordering of hypotheses and searching the hypothesis space. The document outlines the FIND-S and Candidate Elimination algorithms for concept learning and their properties, including traversing the hypothesis lattice and maintaining the version space.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
224 views

Concept Learning

This document provides an overview of concept learning in machine learning. It discusses concept learning tasks, examples of concepts and instances, hypotheses for representing concepts, and training examples. It describes general to specific ordering of hypotheses and searching the hypothesis space. The document outlines the FIND-S and Candidate Elimination algorithms for concept learning and their properties, including traversing the hypothesis lattice and maintaining the version space.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 62

CS 9633 Machine Learning

Concept Learning
References:
Machine Learning by Tom Mitchell, 1997, Chapter 2
Artificial Intelligence: A Modern Approach, by Russell and Norvig,
Second Edition, 2003, pages 678 – 686
Elements of Machine Learning, by Pat Langley, 1996, Chapter 2

Computer Science Department


CS 9633 KDD
Concept Learning

• Inferring a Boolean-valued function from


training examples
• Training examples are labeled as
members or non-members of the
concept

Computer Science Department


CS 9633 KDD
Concept Learning Task
Defined By

• Set of instances over which target


function is defined
• Target function
• Set of candidate hypotheses considered
by the learner
• Set of available training examples

Computer Science Department


CS 9633 KDD
Example Concept

• Days when you would enjoy water


sports

Computer Science Department


CS 9633 KDD
Instances X

Possible days, each described by the


attributes
– Sky (Sunny, Cloudy, Rainy)
– AirTemp (Warm, Cold)
– Humidity (Normal, High)
– Wind (Strong, Weak)
– Water (Warm, Cold)
– Forecast (Same, Change)

Computer Science Department


CS 9633 KDD
Hypotheses H

• Each hypothesis is a vector of 6 constraints,


specifying the values of 6 attributes
• For each attribute, hypothesis is:
– Value of ? if any value is acceptable for this
attribute
– Single required value for the attribute
– Value of 0 if no value is acceptable
• Sample hypothesis
(Rainy, ?, ?,?, Warm ,?)
Computer Science Department
CS 9633 KDD
General and Specific
Hypotheses

• Most general hypothesis


(?, ?, ?, ?, ?, ?)
• Most specific hypothesis
(0, 0, 0, 0, 0, 0)

Computer Science Department


CS 9633 KDD
Target Concept c

EnjoySport: X  (0,1)

Computer Science Department


CS 9633 KDD
Training Examples D

Example Sky Air Humidity Wind Water Forecast Enjoy


Temp Sport

1 Sunny Warm Normal Strong Warm Same Yes

2 Sunny Warm High Strong Warm Same Yes

3 Rainy Cold High Strong Warm Change No

4 Sunny Warm High Strong Cool Change Yes

Computer Science Department


CS 9633 KDD
Determine:

A hypothesis h in H such that


h(x) = c(x)
for all x in X

Computer Science Department


CS 9633 KDD
Inductive Learning
Hypothesis
Any hypothesis found to approximate
the target function well over a
sufficiently large set of training
examples will also approximate the
target function well over other
unobserved examples.

Computer Science Department


CS 9633 KDD
Concept Learning as
Search
• Concept learning can be viewed as
searching through a large space of
hypotheses implicitly defined by the
hypothesis representation.

Computer Science Department


CS 9633 KDD
Sample and hypothesis
space size
How many instances?
How many hypotheses?
How many semantically distinct hypotheses?
• Sky (Sunny, Cloudy, Rainy)
• AirTemp (Warm, Cold)
• Humidity (Normal, High)
• Wind (Strong, Weak)
• Water (Warm, Cold)
• Forecast (Same, Change)
Computer Science Department
CS 9633 KDD
Searching hypothesis
space
• Goal is to efficiently search hypothesis
space to find the hypothesis that best
fits the training data
• Hypothesis space is potentially
– Very large
– Possibly infinite

Computer Science Department


CS 9633 KDD
General to Specific
Ordering of Hypotheses
• It is often possible to use a natural
general-to-specific ordering of the
hypothesis space to organize the
search
• Can often exhaustively search all of the
space without explicitly enumerating all
hypotheses

Computer Science Department


CS 9633 KDD
Example

h1 = <Sunny, ?, ?, Strong, ?, ?>


h2 = <Sunny, ?, ?, ?, ?, ?>
Which is more general?

Computer Science Department


CS 9633 KDD
Notation

• For any instance x in X and hypothesis


h in H, we say that x satisfies h if and
only if h(x) = 1.

Computer Science Department


CS 9633 KDD
Definition
• Let hj and hk be boolean-valued functions defined over X. Then hj is
more general than or equal to hk iff

• Definition of strictly more general than


h j  g hk iff (x  X )  [(hk ( x)  1)  (h j ( x)  1)]

h j  g hk iff (h j  g hk )  (hk  g h j )

Computer Science Department


CS 9633 KDD
Partially Ordered Sets

• Properties of a partial order


– Reflexive
– Transitive
– Antisymmetric
• Form a lattice

Computer Science Department


CS 9633 KDD
Important Point

• The g and >g are dependent only on


which instances satisfy the hypotheses
and not on the target concept.
• We will now consider algorithms that
take advantage of this partial order
among hypotheses to organize the
search space

Computer Science Department


CS 9633 KDD
FIND-S Algorithm

• Approach: start with most specific


hypothesis and then generalize the
hypothesis when it does not cover a
training example
• A hypothesis “covers” a training
example”—correctly classifies example
as true

Computer Science Department


CS 9633 KDD
FIND-S

1. Initialize h to the most specific hypothesis in


H
2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x then
Do nothing
Else
Replace ai in h by the next more general
constraint that is satisfied by x
3. Output hypothesis h

Computer Science Department


CS 9633 KDD
Apply to Training Examples
D
Example Sky Air Humidity Wind Water Forecast Enjoy
Temp Sport

1 Sunny Warm Normal Strong Warm Same Yes

2 Sunny Warm High Strong Warm Same Yes

3 Rainy Cold High Strong Warm Change No

4 Sunny Warm High Strong Cool Change Yes

Computer Science Department


CS 9633 KDD
Traversing lattice

Specific

General

Computer Science Department


CS 9633 KDD
Properties of FIND-S

• Hypothesis space is described as a


conjunction of attribute constraints
• Guaranteed to output the most specific
hypothesis within H that is consistent with
training examples.
• Final hypothesis is also consistent with
negative examples if:
– Correct target concept is contained in H
– Training examples are correct

Computer Science Department


CS 9633 KDD
Consider this example

Attribute 1 Attribute 2 Label


(Possible (Possible
values X,Y) values A, B, C)
X A Yes

X B Yes

X C No

Computer Science Department


CS 9633 KDD
Issues

• Has the learner converged to the


correct target concept? Are there other
consistent hypotheses?
• Why prefer the most specific
hypothesis?
• Are the training examples consistent?
• What if there are several maximally
specific consistent hypotheses?
Computer Science Department
CS 9633 KDD
Candidate Elimination
Algorithm
• Goal is to output a description of all
hypotheses consistent with the training
examples.
• Computes description without explicitly
enumerating all members.
• Is also called Least Commitment Search.
• Like FIND-S, it uses more-general-than
partial ordering

Computer Science Department


CS 9633 KDD
Definition

• A hypothesis h is consistent with a set


of training examples D iff h(x) = c(x) for
each example <x, c(x)> in D.
Consistent(h,D)  ( x, c( x)  D)h( x)  c( x)

Computer Science Department


CS 9633 KDD
Definition

• The version space, denoted VSH,D, with


respect to hypothesis space H and
training examples D, is the subset of
hypotheses from H consistent with the
training examples in D
VS H , D  {h  H | Consistent(h,D)}

Computer Science Department


CS 9633 KDD
List-Then-Eliminate
Algorithm
1. VersionSpace a list of every
hypothesis in H
2. For each training example <x, c(x)>
Remove from VersionSpace any hypothesis
for which h(x)  c(s)
3. Output the list of hypotheses in
VersionSpace

Computer Science Department


CS 9633 KDD
More compact
representation of VS
• Candidate-elimination algorithm uses a
more compact representation of VS
• VS represented by most general and
specific members.
• These members form a boundary that
delimits the version space within the
partially ordered hypothesis space.
• Also called Least Commitment Search.
Computer Science Department
CS 9633 KDD
{<Sunny, Warm, ?, Strong, ?, ?>}
S:

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,>

G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}


Inconsistent
Region

G-set G1 G2 G3 ... Gm

S-set S1 S2 S3 ... Sn

Inconsistent
Region
Definitions of general and
specific boundaries
• Definition : The general boundary G, with respect to
hypothesis space H and training data D, is the set of
maximally general members of H consistent with D.
G  {g  H | Consistent( g , D)  (g ' H [( g '  g g )  Consistent( g ' , D)]}

• Definition : The specific boundary G, with respect to


hypothesis space H and training data D, is the set of minimally
general (maximally specific) members of H consistent with D.

S  {s  H | Consistent ( s, D)  (s ' H [( s  g s ' )  Consistent ( s' , D)]}


Computer Science Department
CS 9633 KDD
Theorem 2.1

Version space representation theorem


X is an arbitrary set of instances
H a set of boolean-valued hypotheses defined over X
c:X{0,1} is an arbitrary concept defined over X
D is an arbitrary set of training examples {<x, c(x)>}.
For all X, H, c, and D such that S and G are well
defined,

VS H , D  {h  H || (s  S )(g  G )( g  g h  g s)}

Computer Science Department


CS 9633 KDD
Candidate-Elimination
Learning Algorithm

• Initialize G to most general and S to


most specific
• Use examples to refine

Computer Science Department


CS 9633 KDD
Initialize G to the set of maximally general hypotheses in H
Initialize S to the set of maximally specific hypotheses in H
For each training example d, do
•If d is a positive example
•Remove from G any hypothesis inconsistent with d
•For each hypothesis s in S that is not consistent with d
•Remove s from S
•Add to S all minimal generalizations h of s such that
•h is consistent with d, and some member of G is more general
than h
•Remove from S any hypothesis that is more general than another
hypothesis in S
•If d is a negative example
•Remove from S any hypothesis inconsistent with d
•For each hypothesis g in G that is not consistent with d
•Remove g from G
•Add to G all minimal generalizations h of g such that
•h is consistent with d, and some member of S is more specific
than h
•Remove from G any hypothesis that is less general than another hypothesis
in G
S0: {<0, 0, 0, 0, 0, 0>} S1: {<Sunny,
{<0, 0, 0, Warm,
0, 0, 0>}
Normal, Strong,
Warm, Same>}

G0: {<?, ?, ?, ?, ?, ?>} G1: {<?, ?, ?, ?, ?, ?>}

Training Example 1:
<Sunny, Warm, Normal, Strong, Warm, Same>, Enjoy Sport = Yes
{<Sunny, Warm, {<Sunny, Warm, ?,
S1 : S2: Normal,Warm,
Strong, Strong,
Normal, Strong,
Warm, Same>} Warm, Same>}
Same>}

G1: {<?, ?, ?, ?, ?, ?>} G2: {<?, ?, ?, ?, ?, ?>}

Training Example 2:
<Sunny, Warm, High, Strong, Warm, Same>, Enjoy Sport = Yes
S2: {<Sunny, Warm, S3: {<Sunny, Warm, ?,
Normal, Strong, Strong, Warm,
Warm, Same>} Same>}

G2: {<?, ?, ?, ?, ?, ?>} G3: {<Sunny,


{<?, ?, ?, ?, ?, ?,
?>}?, ?>,
?>,}
<?, Warm, ?, ?, ?, ?>}
?>,
<?, ?, ?, ?, ?, Same>}
Training Example 3:
<Rainy, Cold, High, Strong, Warm, Change>, Enjoy Sport = No
S3: {<Sunny, Warm, ?, S3: {<Sunny, Warm, ?,
Strong, Warm, Strong, ?,
Warm,
?>}
Same>} Same>}

{<Sunny, ?, ?, ?, ?, ?>, {<Sunny, ?, ?, ?, ?, ?>,


G4: G3: <?, Warm, ?, ?, ?, ?>,}
?>,
<?, Warm, ?, ?, ?, ?>,
<?, ?, ?, ?, ?, Same>} <?, ?, ?, ?, ?, Same>}
Training Example 4:
<Sunny, Warm, High, Strong, Cool, Change>, Enjoy Sport = Yes
Questions

• Does the order of presentation of examples


matter?
• How will you know when the concept has
been learned?
• How do you know when you have presented
enough training data?
• What happens if incorrectly labeled examples
are presented?
• If the learner can request examples, which
example should be requested next?

Computer Science Department


CS 9633 KDD
{<Sunny, Warm, ?, Strong, ?, ?>}
S:

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,>

G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}

Select an example that would be classified as positive by


some hypotheses and negative by others.
Training Example Possibility:
<Sunny, Warm, Normal, Light, Warm, Same> Positive
Partially Learned Concepts

• Can we classify unseen examples even


though we still have multiple
hypotheses?
• The answer is yes for some.

Computer Science Department


CS 9633 KDD
Optimal strategy

• Generate instances that satisfy exactly


half of the hypotheses in current VS.
• Correct query concept can be found in
log2|VS| experiments

Computer Science Department


CS 9633 KDD
{<Sunny, Warm, ?, Strong, ?, ?>}
S:

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,>

G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}

A: <Sunny, Warm, Normal, Strong, Cool, Change> ?


{<Sunny, Warm, ?, Strong, ?, ?>}
S:

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,>

G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}

B: <Rainy, Cold, Normal, Light, Warm, Same> ?


{<Sunny, Warm, ?, Strong, ?, ?>}
S:

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,>

G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}

C: <Sunny, Warm, Normal, Light, Warm, Same> ?


{<Sunny, Warm, ?, Strong, ?, ?>}
S:

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,>

G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}

D: <Sunny, Cold, Normal, Strong, Warm, Same> ?


Inductive Bias

• Candidate Elimination converges toward the true


concept if
– It is given enough training examples
– Its initial hypothesis space contains the target concept
– No noisy data
• Fundamental questions
– What if the target concept is not contained in the hypothesis
space?
– Can we include every possible hypothesis?
– How does the size of the hypothesis space influence ability
to generalize?
– How does the size of the hypothesis space influence
number of examples we need?

Computer Science Department


CS 9633 KDD
Biased Hypothesis Space

• The hypothesis space we have been


considering biases the concepts we can
learn.
• An alternative is to use a hypothesis
space that can represent every
teachable concept (all possible subsets
of X).

Computer Science Department


CS 9633 KDD
Unbiased Hypothesis Space

• We need to be able to represent all


possible subsets of instances.
• If set X has |X| elements, what is the
size of the power set of X?
• For our example, the size of X is 96.
What is the size of the power set?
• What was the size of the hypothesis
space we have been considering?
Computer Science Department
CS 9633 KDD
Consider an Alternate
Definition of EnjoySport
• Want to be able to represent every subset of
instances with new hypothesis space H’.
• Let H’ be the power set of X.
• One method: allow arbitrary disjunctions,
conjunctions, and negations of earlier
hypotheses.
• Target concept Sky = sunny or sky = cloudy
would be:
<Sunny, ?, ?, ?, ?, ?>  <Cloudy, ?, ?, ?, ?, ?>

Computer Science Department


CS 9633 KDD
Alternate representation

Positive examples: x1, x2, and x3


Negative examples: x4 and x5
Most general hypotheses: {(x4  x5)}
Most specific hypotheses: {(x1 x2 x3)}
What can we classify unambiguously?

Computer Science Department


CS 9633 KDD
The Conclusion: Problem of
Induction
• Generalizing from any set of observations is never
logically justified, since there always exist many
hypotheses that could account for the observed data.
• Learning system MUST limit or direct its search
through the space of possible knowledge structures.
• This is called the bias of the system
• Two kinds of bias
– Representational bias (limit the hypotheses)
– Search bias (considers all possible concept descriptions but
examines some earlier than others.

Computer Science Department


CS 9633 KDD
Inductive Learning

• All inductive learning systems use


inductive bias
• Learning algorithms can be
characterized by the bias they use
• Useful point of comparison of learning
algorithms

Computer Science Department


CS 9633 KDD
Notation

y z z inductively follows from y

yz z deductively follows from y

Computer Science Department


CS 9633 KDD
Notation

• L is an arbitrary learning algorithm


• Dc is an arbitrary set of training data for an arbitrary
concept c
• After learning L is asked to classify a new instance
xi
• Let L(xi, Dc) be the classification
• Inductive inference step
( Dc  xi )  L( xi , Dc )

Computer Science Department


CS 9633 KDD
Inductive Bias Definition

Consider a concept learning algorithm L for the set of


instance X. Let c be an arbitrary concept defined over
X, and let Dc={<x,c(x)>} be an arbitrary set of training
examples of c. Let L(xi,Dc) denote the classification
assigned to the instance xi by L after training on data
Dc. The inductive bias of L is any minimal set of
assertions B such that for any target concept c and
corresponding training examples Dc

(xi  X )[(B  Dc  xi )  L( xi , Dc )]
Computer Science Department
CS 9633 KDD
Inductive System

Training Examples
Candidate Elimination Classification of new
Algorithm instance or “don’t
New Instance Using Hypothesis know”
Space H

Training Examples

New Instance Theorem Prover Classification of new


instance or “don’t
Assertion “H know”
contains the target
concept”
Conclusions

• Concept learning as search


• General to specific ordering of hypotheses
provides useful structure for guiding search
• Version spaces and candidate elimination
provide useful framework for studying
machine learning issues
• Inductive learning must use representational
or search bias

Computer Science Department


CS 9633 KDD

You might also like