0% found this document useful (0 votes)
38 views40 pages

Data Mining - Lecture 4

Data Mining - Lecture 4

Uploaded by

hendymostafa256
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views40 pages

Data Mining - Lecture 4

Data Mining - Lecture 4

Uploaded by

hendymostafa256
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Data Mining and Business Intelligence

Apriori

Mining Frequent Patterns,


FP-Growth
Associations, & Correlations
Evaluation
By Methods
Dr. Nora Shoaip

Lecture 4

Damanhour University
Faculty of Computers & Information Sciences
Department of Information Systems

2024 - 2025
Outline
 The Basics
• Market Basket Analysis
• Frequent Item sets
• Association Rules

 Frequent Item set Mining Methods


• Apriori Algorithm
• Generating Association Rules from Frequent Item sets
• FP-Growth

 Pattern Evaluation Methods

2
The Basics: What Is Frequent Pattern Analysis?

• Frequent pattern: a pattern (a set of items, subsequences, substructures,


etc.) that occurs frequently in a data set

• First proposed by Agrawal, Imielinski, and Swami [AIS93] in the


context of frequent itemsets and association rule mining

3
The Basics

21
The Basics

Motivation: Finding inherent regularities in data


What products were often purchased together?— Beer and diapers?!
What are the subsequent purchases after buying a PC?
What kinds of DNA are sensitive to this new drug?
Can we automatically classify web documents?
Applications
Basket data analysis, cross-marketing, catalog design, sale campaign analysis,
Web log (click stream) analysis, and DNA sequence analysis

5
The Basics

6
The Basics : Frequent Itemsets
Itemset X = {x1, …, xk} ex: X={A, B, C, D, E, F}
Find all the rules X  Y with minimum support and confidence
 support, s, probability that a transaction contains X  Y
 confidence, c, conditional probability that a transaction having X also
contains Y

7
The Basics : Frequent Itemsets
Itemset X = {x1, …, xk} ex: X={A, B, C, D, E, F}
Find all the rules X  Y with minimum support and confidence
 support, s, probability that a transaction contains X  Y
 confidence, c, conditional probability that a transaction having X also
contains Y

8
The Basics : Association Rules
Ex: Let supmin = 50%, confmin = 50% Transaction-id Items bought
Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3} 10 A, B, D
Association rules: 20 A, C, D
A  D (60%, 100%) 30 A, D, E
D  A (60%, 75%) 40 B, E, F
50 B, C, D, E, F

9
The Basics : Association Rules

 If frequency of itemset I satisfies min_support count then I is a frequent


itemset
 If a rule satisfies min_support and min_confidence thresholds, it is said
to be strong
 problem of mining association rules reduced to mining frequent itemsets
 Association rules mining becomes a two-step process:
 Find all frequent itemsets that occur at least as frequently as a
predetermined min_support count
 Generate strong association rules from the frequent itemsets that satisfy
min_support and min_confidence

10
Outline
 The Basics
• Market Basket Analysis
• Frequent Item sets
• Association Rules

 Frequent Item set Mining Methods


• Apriori Algorithm
• Generating Association Rules from Frequent Item sets
• FP-Growth

 Pattern Evaluation Methods

11
Mining Frequent Itemsets: Apriori
Goes as follows:
 Find frequent 1-itemsets  L1
 Use L1 to find frequent 2-itemsets  L2
 … until no more frequent k-itemsets can be found

Each Lk itemset requires a full dataset scan

To improve efficiency, use the Apriori property:


 ―All nonempty subsets of a frequent itemset must also be frequent‖ –
if a set cannot pass a test, all of its supersets will fail the same test as
well – if P(I) < min_support then P(I  A) < min_support

12
Mining Frequent Itemsets: Apriori

Transactional data example


N=9, min_supp count=2 Scan dataset for Compare
count of each candidate support
candidate with min_support
TID List of items
C1 L1
T100 I1, I2, I5 Itemset Support
Itemset Support count
T200 I2, I4 count
T300 I2, I3 {I1} 6 {I1} 6
T400 I1, I2, I4 {I2} 7 {I2} 7
T500 I1, I3 {I3} 6 {I3} 6
T600 I2, I3 {I4} 2 {I4} 2
T700 I1, I3 {I5} 2 {I5} 2
T800 I1, I2, I3, I5
T900 I1, I2, I3
13
Mining Frequent Itemsets: Apriori
Itemset Support
C2 Itemset C2 count
{I1, I2}
{I1, I2} 4 Itemset Support
{I1, I3} L2 count
Itemset Support
{I1, I3} 4
{I1, I4}
count {I1, I4} 1 {I1, I2} 4
{I1, I5}
{I1} 6 {I1, I5} 2 {I1, I3} 4
{I2, I3}
{I2} 7 {I2, I3} 4 {I1, I5} 2
{I2, I4}
{I3} 6 {I2, I4} 2 {I2, I3} 4
{I2, I5}
{I4} 2 {I2, I5} 2 {I2, I4} 2
{I3, I4}
{I5} 2 {I3, I4} 0 {I2, I5} 2
{I3, I5}
{I3, I5} 1
{I4, I5}
{I4, I5} 0
Compare candidate
Generate C2 candidates support with min_supp
Scan dataset for count
from L1 by joining L1  L1 of each candidate 14
Mining Frequent Itemsets: Apriori
C3 = L2  L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}
Not all subsets are frequent
Compare candidate
 Prune (Apriori property) Scan dataset for
count of each support with
Itemset Support candidate min_supp
count L3
{I1, I2} 4 C3
Itemset Support Itemset Support
{I1, I3} 4 Itemset count count
{I1, I5} 2 {I1, I2, I3} 2 {I1, I2, I3} 2
{I1, I2, I3}
{I2, I3} 4 {I1, I2, I5} 2 {I1, I2, I5} 2
{I1, I2, I5}
{I2, I4} 2
{I2, I5} 2
Two joining (lexicographically ordered) k-itemsets
must share first k-1 items 
Generate C3 candidates
{I1, I2} is not joined with {I2, I4}
from L2 by joining L2 L2
15
Mining Frequent Itemsets: Apriori

Itemset Support
count Itemset
Not all subsets are frequent
{I1, I2, I3} 2
{I1, I2, I3, I5}  Prune
{I1, I2, I5} 2

C4 =   Terminate

16
Mining Frequent Itemsets: Apriori

17
Apriori
Algorithm
Generate Ck using Lk-1 to find Lk

Join

Prune

18 11/3/2024
Mining Frequent Itemsets:
Generating Association Rules from Frequent Itemsets

19
Mining Frequent Itemsets:
Generating Association Rules from Frequent Itemsets
Nonempty subsets Association Rules Confidence
Itemset Support
count
{I1, I2} {I1, I2} I5 2/4 = 50%
{I1, I2, I3} 2
{I1, I2, I5} 2 {I1, I5} {I1, I5} I2 2/2 = 100%

{I2, I5} {I2, I5} I1 2/2 = 100%


{I1} I1 {I2, I5} 2/6 = 33%

{I2} I2 {I1, I5} 2/7 = 29%


{I5} I5 {I1, I2} 2/2 = 100%

For a min_confidence = 70%


20
Mining Frequent Itemsets:
FP-Growth

 To avoid costly candidate generation


 Divide-and-conquer strategy:
 Compress database representing frequent items into a frequent
pattern tree (FP-tree) – 2 passes over dataset
 Divide compressed database (FP-tree) into conditional databases,
then mine each for frequent itemsets – traverse through the FP-tree

21 11/3/2024
Mining Frequent Itemsets:
FP-Growth

Transactional data example Scan dataset for Compare candidate


N=9, min_supp count=2 count of each support with
candidate min_supp
TID List of items
T100 I1, I2, I5 C1 L1 - Reordered
T200 I2, I4 Itemset Support Itemset Support
count count
T300 I2, I3
{I1} 6 {I2} 7
T400 I1, I2, I4
T500 I1, I3 {I2} 7 {I1} 6
T600 I2, I3 {I3} 6 {I3} 6
T700 I1, I3 {I4} 2 {I4} 2
T800 I1, I2, I3, I5 {I5} 2 {I5} 2
T900 I1, I2, I3
22
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction

FP-tree

L1 - Reordered null { }
Itemset Support Node
count Link

{I2} 7
{I1} 6
{I3} 6
{I4} 2
{I5} 2

23
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction

FP-tree null { }

L1 - Reordered T100 TID List of items


Itemset Support Node I2:1
count Link
T100 I1, I2, I5
{I2} 7 T200 I2, I4
{I1} 6 I1:1 T300 I2, I3
{I3} 6 T400 I1, I2, I4
T500 I1, I3
{I4} 2
I5:1 T600 I2, I3
{I5} 2
T700 I1, I3
T800 I1, I2, I3, I5
Order of items is kept throughout path construction, with
T900 I1, I2, I3
common prefixes shared whenever applicable

24
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction

FP-tree null { }

L1 - Reordered
Itemset Support Node
I2:1 T200 TID List of items
count Link T100 I1, I2, I5
{I2} 7 T200 I2, I4

{I1} 6 I1:1 I4:1 T300 I2, I3


T400 I1, I2, I4
{I3} 6
T500 I1, I3
{I4} 2 I5:1 T600 I2, I3
{I5} 2 T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3

25
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction
FP-tree
null { }

L1 - Reordered
Itemset Support Node
I2:2 T200
count Link TID List of items

{I2} 7 T100 I1, I2, I5

{I1} 6 I1:1 I4:1 T200 I2, I4


T300 I2, I3
{I3} 6
T400 I1, I2, I4
{I4} 2 I5:1 T500 I1, I3
{I5} 2 T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3

26
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction
FP-tree null { }

L1 - Reordered
Itemset Support Node
I2:2
count Link TID List of items

{I2} 7 T300 T100 I1, I2, I5

I1:1 I3:1 I4:1 T200 I2, I4


{I1} 6
T300 I2, I3
{I3} 6
T400 I1, I2, I4
{I4} 2 I5:1 T500 I1, I3
{I5} 2 T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3

27
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction
FP-tree
null { }

L1 - Reordered
Itemset Support Node
I2:3
count TID List of items
Link
T100 I1, I2, I5
{I2} 7 T300
T200 I2, I4
{I1} 6 I1:1 I3:1 I4:1
T300 I2, I3
{I3} 6 T400 I1, I2, I4
{I4} 2 I5:1
T500 I1, I3

{I5} 2 T600 I2, I3


T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3

28
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction
FP-tree
null { }

L1 - Reordered
Itemset Support Node
I2:7 I1:2
count Link
{I2} 7
{I1} 6 I1:4 I3:2 I4:1 I3:2

{I3} 6
{I4} 2 I5:1 I3:2 I4:1
{I5} 2

For Tree
I5:1
Traversal

29
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction

Bottom-up algorithm – start from leaves and FP-tree


null { }
go up to root
L1 - Reordered
I2:7 I1:2
Itemset Support Node
count Link

{I2} 7
I1:4 I3:2 I4:1 I3:2
{I1} 6
{I3} 6
{I4} 2 I5:1 I3:2 I4:1
{I5} 2

I5:1

30
Mining Frequent Itemsets:
FP-Growth – Conditional FP-tree Construction

For I5 FP-tree
L1 - Reordered null { }
Itemset Support Node
count Link

{I2} 7
TID List of items
{I1} 6
T100 I1, I2, I5
{I3} 6
T200 I2, I4
{I4} 2 T300 I2, I3
{I5} 2 T400 I1, I2, I4

Eliminate I5 T500 I1, I3


T600 I2, I3

Eliminate transactions T700 I1, I3


not including I5 T800 I1, I2, I3, I5
T900 I1,
31I2, I3
11/3/2024
Mining Frequent Itemsets:
FP-Growth – Conditional FP-tree Construction
FP-tree null { }
For I5
L1 - Reordered
Itemset Support Node I2:1
count Link

{I2} 7
TID List of items
{I1} 6 I1:1
T100 I1, I2, I5
{I3} 6
T200 I2, I4
{I4} 2 T300 I2, I3
{I5} 2 T400 I1, I2, I4

Eliminate transactions not T500 I1, I3


including I5 T600 I2, I3
T700 I1, I3
Eliminate I5 T800 I1, I2, I3, I5
T900 I1,
32I2, I3
11/3/2024
Mining Frequent Itemsets:
FP-Growth – Conditional FP-tree Construction
FP-tree
For I5 null { }

L1 - Reordered
Itemset Support Node I2:2
count Link

{I2} 7
TID List of items
{I1} 6 I1:2
T100 I1, I2, I5
{I3} 6
T200 I2, I4
{I4} 2 T300 I2, I3
I3:1
{I5} 2 Eliminate transactions T400 I1, I2, I4
not including I5 T500 I1, I3
T600 I2, I3

Eliminate I5 T700 I1, I3


T800 I1, I2, I3, I5
T900 I1,
33I2, I3
11/3/2024
Mining Frequent Itemsets:
FP-Growth – Conditional FP-tree Construction

For I4 FP-tree
null { }

L1 - Reordered
Itemset Support Node I2:2
count Link

{I2} 7
TID List of items
{I1} 6 I1:1
T100 I1, I2, I5
{I3} 6
T200 I2, I4
{I4} 2 T300 I2, I3
{I5} 2 Eliminate transactions T400 I1, I2, I4
not including I4 T500 I1, I3
T600 I2, I3
Eliminate I4 T700 I1, I3
T800 I1, I2, I3, I5
T900 I1,
34I2, I3
11/3/2024
Mining Frequent Itemsets:
FP-Growth – Conditional FP-tree Construction
FP-tree
For I3 null { }

L1 - Reordered
Itemset Support Node I2:4 I1:2
count Link

{I2} 7
TID List of items
{I1} 6 I1:2
T100 I1, I2, I5
{I3} 6 Eliminate
T200 I2, I4
{I4} 2
transactions not
T300 I2, I3
including I3
{I5} 2 T400 I1, I2, I4
T500 I1, I3
T600 I2, I3
Eliminate T700 I1, I3
I3 T800 I1, I2, I3, I5
T900 I1,
35I2, I3
11/3/2024
Mining Frequent Itemsets:
FP-Growth

Item Conditional Pattern Base Conditional FP-tree Frequent Patterns Generated

I5 {{I2, I1: 1}, {I2, I1, I3: 1}} <I2:2, I1:2> {I2, I5: 2}, {I1, I5: 2},
{I2, I1, I5: 2}
I4 {{I2, I1: 1}, {I2: 1}} <I2:2> {I2, I4: 2}
I3 {{I2, I1: 2}, {I2: 2}, {I1: 2}} <I2:4, I1:2>, <I1:2> {I2, I3: 4}, {I1, I3: 4},
{I2, I1, I3: 2}
I1 {{I2: 4}} <I2:4> {I2, I1: 4}

Paths ending with item

36
Outline
 The Basics
• Market Basket Analysis
• Frequent Item sets
• Association Rules

 Frequent Item set Mining Methods


• Apriori Algorithm
• Generating Association Rules from Frequent Item sets
• FP-Growth

 Pattern Evaluation Methods

37
Pattern Evaluation Methods

38
Pattern Evaluation Methods

39

You might also like