0% found this document useful (0 votes)
25 views

Frequent Pattern Analysis-Arpriori

The document discusses frequent pattern mining and the Apriori algorithm. It defines frequent patterns as patterns that appear frequently in transaction data. The Apriori algorithm uses a level-wise approach to efficiently find all frequent itemsets in a database by exploiting an important property called the Apriori property, which states that all nonempty subsets of a frequent itemset must also be frequent. It generates candidate itemsets, then eliminates any candidates that have a subset that is not frequent. This allows it to prune the search space and find frequent itemsets with multiple database scans.

Uploaded by

discodancerhasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Frequent Pattern Analysis-Arpriori

The document discusses frequent pattern mining and the Apriori algorithm. It defines frequent patterns as patterns that appear frequently in transaction data. The Apriori algorithm uses a level-wise approach to efficiently find all frequent itemsets in a database by exploiting an important property called the Apriori property, which states that all nonempty subsets of a frequent itemset must also be frequent. It generates candidate itemsets, then eliminates any candidates that have a subset that is not frequent. This allows it to prune the search space and find frequent itemsets with multiple database scans.

Uploaded by

discodancerhasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CSE-463

Machine Learning
Mining Frequent Patterns, Associations,
and Correlations
Md. Rashadur Rahman
Department of CSE
CUET
Frequent Patters

➢Frequent patterns are patterns (e.g., itemsets, subsequences, or


substructures) that appear frequently in a data set.

➢ For example, a set of items, such as milk and bread, that appear frequently
together in a transaction data set is a frequent itemset.

➢Frequent pattern mining searches for recurring relationships in a given data


set.

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 2


Market Basket Analysis

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 3


Frequent Patters

Let I = {I1, I2,..., Im} be an itemset.


• Let D, the task-relevant data, be a set of database transactions where each
transaction T is a nonempty itemset such that T ⊆ I. Each transaction is
associated with an identifier, called a TID.
• Let A be a set of items. A transaction T is said to contain A if A ⊆ T.
• An association rule is an implication of the form
A⇒B
where A ⊂ I, B ⊂ I, A ≠ ∅, B ≠ ∅, and A ∩ B = φ.

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 4


Frequent Patters
The rule A ⇒ B holds in the transaction set D with support s,
where s is the percentage of transactions in D that contain A ∪ B (i.e.,
the union of sets A and B say, or, both A and B). This is taken to be the
probability, P(A ∪ B)

The rule A ⇒ B has confidence c in the transaction set D,


where c is the percentage of transactions in D containing A that also
contain B. This is taken to be the conditional probability, P(B|A)
support(A⇒B) =P(A ∪ B)
confidence(A⇒B) =P(B|A).

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 5


Basic Terms
Strong or Interesting Rules
Rules that satisfy both a minimum support threshold (min sup) and a
minimum confidence threshold (min conf) are called strong or interesting.
• By convention, we write support and confidence values so as to occur
between 0% and 100%, rather than 0 to 1.0.

K-itemset
An itemset that contains k items is a k-itemset.
The set {computer, antivirus software} is a 2-itemset

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 6


Basic Terms
Occurrence frequency of an itemset
The occurrence frequency of an itemset is the number of transactions that
contain the itemset. This is also known, simply, as the frequency, support
count, or count of the itemset.
Note that the itemset support defined previously is sometimes referred to as
relative support, whereas the occurrence frequency is called the absolute
support.

The set of frequent k-itemsets is commonly denoted by Lk

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 7


Relationship between Support and Confidence

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 8


Frequent Patters
In general, association rule mining can be viewed as a two-step process:

1. Find all frequent itemsets: By definition, each of these itemsets will occur
at least as frequently as a predetermined minimum support count, min
sup.
2. Generate strong association rules from the frequent itemsets: By
definition, these rules must satisfy minimum support and minimum
confidence.

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 9


Apriori Algorithm: Finding Frequent Itemsets

• Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1994


for mining frequent itemsets

• The name of the algorithm is based on the fact that the algorithm uses prior
knowledge of frequent itemset properties, as we shall see later.

• Apriori employs an iterative approach known as a level-wise search, where


k-itemsets are used to explore (k + 1)-itemsets.

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 10


Frequent Patters
• First, the set of frequent 1-itemsets is found by scanning the database to
accumulate the count for each item, and collecting those items that satisfy
minimum support.
• The resulting set is denoted by L1. Next, L1 is used to find L2, the set of
frequent 2-itemsets, which is used to find L3, and so on, until no more
frequent k-itemsets can be found.

• The finding of each Lk requires one full scan of the database.


• To improve the efficiency of the level-wise generation of frequent itemsets,
an important property called the Apriori property is used to reduce the
search space.

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 11


Apriori property

All nonempty subsets of a frequent itemset must also be frequent

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 12


The join step

For efficient implementation, Apriori assumes that items within a transaction


or itemset are sorted in lexicographic order

To construct set of candidate k-itemsets, CK The join, Lk−1 ⋈ LK-1, is


performed, where members of Lk−1 are joinable if their first (k − 2) items are
in common.

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 13


The join step
Expected C3
Itemset
{I1, I2, I3}
{I1, I2, I5}
{I1, I3, I5}
{I2, I3, I4}
{I2, I3, I5}
{I2, I4, I5}

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 14


The prune step

• To reduce the size of Ck, the Apriori property is used as follows. Any (k − 1)-
itemset that is not frequent cannot be a subset of a frequent k-itemset.

• Hence, if any (k − 1)-subset of a candidate k-itemset is not in Lk−1, then the


candidate cannot be frequent either and so can be removed from Ck.

• This subset testing can be done quickly by maintaining a hash tree of all
frequent itemsets.

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 15


Pruning
Expected C3 C3 (after pruning)
Itemset
Itemset
✔ {I1, I2, I3}
✔ {I1, I2, I5}
{I1, I2, I3}
✘ {I1, I3, I5} {I1, I2, I5}
✘ {I2, I3, I4}
✘ {I2, I3, I5}
✘ {I2, I4, I5}

The 2-item subsets of {I1, I2, I3} are {I1, I2}, {I1, I3}, and {I2, I3}
The 2-item subsets of {I1, I2, I5} are {I1, I2}, {I1, I5}, and {I2, I5}
The 2-item subsets of {I1, I3, I5} are {I1, I3}, {I1, I5}, and {I3, I5}
The 2-item subsets of {I2, I3, I4} are {I2, I3}, {I2, I4}, and {I3, I4}
Therefore, C3 = {{I1, I2, I3}, {I1, I2, I5}} after pruning

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 16


Example
Minimum Support count = 2 or Minimum support = 22%

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 17


Example (cont.)

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 18


Example (cont.)

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 19


Example (cont.)

Expected C4
Itemset
{I1, I2, I3, I5}

The 3-item subsets of {I1, I2, I3, I5} are {I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5} and {I2, I3, I5}
✘ ✘
Itemset{I1, I2, I3, I5} is pruned because its subset {I2, I3, I5}, {I1, I3, I5} are not
frequent. Thus, C4 = ∅, and the algorithm terminates, having found all of the frequent
itemsets.

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 20


Algorithm Apriori

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 21


Algorithm Apriori

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 22


Generating Association Rules from Frequent Itemsets
association rules can be generated as follows:

1. For each frequent itemset l, generate all nonempty subsets of l.


2. For every nonempty subset s of l, output the rule “s ⇒ (l − s)”
if confidence of the rule ≥ min conf,
where min conf is the minimum confidence threshold.

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 23


Generating Association Rules from Frequent Itemsets
If l = {I1, I2, I5}
The nonempty subsets of X are {I1, I2}, {I1, I5}, {I2, I5}, {I1}, {I2}, and {I5}
The resulting association rules are as shown below, each listed with its
confidence:

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 24


Thank You

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 25


Home Study
1. Improving the efficiency of Apriori
2. Pattern growth approach for mining frequent itemsets

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 26


References

[1] Han, J., Kamber, M., & Pei, J. (2012). Data mining concepts and techniques third
edition. University of Illinois at Urbana-Champaign Micheline Kamber Jian Pei Simon
Fraser University

24/05/2023 Department of CSE, Chittagong University of Engineering & Technology 27

You might also like