Frequent Pattern Analysis-Arpriori
Frequent Pattern Analysis-Arpriori
Machine Learning
Mining Frequent Patterns, Associations,
and Correlations
Md. Rashadur Rahman
Department of CSE
CUET
Frequent Patters
➢ For example, a set of items, such as milk and bread, that appear frequently
together in a transaction data set is a frequent itemset.
K-itemset
An itemset that contains k items is a k-itemset.
The set {computer, antivirus software} is a 2-itemset
1. Find all frequent itemsets: By definition, each of these itemsets will occur
at least as frequently as a predetermined minimum support count, min
sup.
2. Generate strong association rules from the frequent itemsets: By
definition, these rules must satisfy minimum support and minimum
confidence.
• The name of the algorithm is based on the fact that the algorithm uses prior
knowledge of frequent itemset properties, as we shall see later.
• To reduce the size of Ck, the Apriori property is used as follows. Any (k − 1)-
itemset that is not frequent cannot be a subset of a frequent k-itemset.
• This subset testing can be done quickly by maintaining a hash tree of all
frequent itemsets.
The 2-item subsets of {I1, I2, I3} are {I1, I2}, {I1, I3}, and {I2, I3}
The 2-item subsets of {I1, I2, I5} are {I1, I2}, {I1, I5}, and {I2, I5}
The 2-item subsets of {I1, I3, I5} are {I1, I3}, {I1, I5}, and {I3, I5}
The 2-item subsets of {I2, I3, I4} are {I2, I3}, {I2, I4}, and {I3, I4}
Therefore, C3 = {{I1, I2, I3}, {I1, I2, I5}} after pruning
Expected C4
Itemset
{I1, I2, I3, I5}
The 3-item subsets of {I1, I2, I3, I5} are {I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5} and {I2, I3, I5}
✘ ✘
Itemset{I1, I2, I3, I5} is pruned because its subset {I2, I3, I5}, {I1, I3, I5} are not
frequent. Thus, C4 = ∅, and the algorithm terminates, having found all of the frequent
itemsets.
[1] Han, J., Kamber, M., & Pei, J. (2012). Data mining concepts and techniques third
edition. University of Illinois at Urbana-Champaign Micheline Kamber Jian Pei Simon
Fraser University