## Association Mining

Objective.

To use Apriori algorithm to find frequent itemsets using candidate generation and generate association rule from frequent itemsets .

Assumption:

Consider a transaction table
 TID List of Item ids T-1 I1,I2,I5 T-2 I2,I4 T-3 I2,I3 T-4 I1,I2,I4 T-5 I1,I3 T-6 I2,I3 T-7 I1,I3 T-8 I1,I2,I3,I5 T-9 I1,I2,I3 T-10 I4,I5

In the first iteration of the algorithm , each item is a member of the set of candidate 1-itemsets ,C1,the algorithm simply scan all of the transactions in order to count the number of occurrence of each item.

 ITEMSET Sup.count I1 6 I2 7 I3 6 I4 3 I5 3

Suppose that the minimum support count required is 2. The set of frequent 1-itemsets ,L1 ,can then be determined. It consists of the candidate 1-itemsets satisfying minimum support. In our example, all of the candidates in C1 satisfy minimum support.

To discover the set of frequency 2-itemsect, L2 the algorithm uses the join L1 ¨L2 to generate a candidate set of 2-itemset ,C2 .

C2
 ITEMSET I1,I2 I1,I3 I1,I4 I1,I5 I2,I3 I2,I4 I2,I5 I3,I4 I3,I5 I4,I5

C2

 ITEMSET Sup.count I1,I2 4 I1,I3 4 I1,I4 1 I1,I5 2 I2,I3 4 I2,I4 2 I2,I5 2 I3,I4 0 I3,I5 1 I4,I5 1

L2

 ITEMSET Sup.count I1,I2 4 I1,I3 4 I1,I5 2 I2,I3 4 I2,I4 2 I2,I5 2

The generation of the set of candidate 3-itemsets, from the joint step , we first get

C3 = L1¨L2 = {{I1,I2,I3} ,{I1,I2,I5},{I1,I3,I5},{I2,I3,I4},{I2.I3,I5},{I2,I4,I5}}.

Based on the apriori property that all subset of a frequent itemset must also be frequent , we can determine that the four latter candidate cannot possibly be frequent .

C3

 ITEMSET I1,I2,I3 I1,I2,I5

L3

 ITEMSET Sup.count I1,I2,I3 2 I1,I2,I5 2

Thus we have generated candidate itemset and frequent itemset, where the minimum support count is 2.

Generating Association rules from frequent itemsets

Once the itemset from transaction have been found, it is straightforward to generate strong association rules from them. This can be done using equation for confident which we show again here for completeness

support_count (AUB)

Confidence (A=>B) = P(B|A) =   ——————————————

support_count(A)

Suppose that data contain the frequent itemset I = {I1, I2, I5}. What are the association rules that can be generated from I? The nonempty subset of I are

{I1, I2}, {I1, I5},{I2,I5},{I1},{I2} and {I5}.

The resulting association rules are as show below, each listed with its confidence .

I1^I2 –> I5     CONDIFENT = 2/4 =50%

I1^I5 –> I2     CONDIFENT = 2/2 = 100%

I2^I5 –> I1     CONDIFENT = 2/2 =100%

I1 –> I2^I5 CONFIDENT = 2/6 = 33%

I2 –> I1^I5 CONFIDENT = 2/7 =29%

I5 –> I1^I2 CONFIDENT = 2/2 = 100%

Result

Thus we have generated candidate itemset ,frequent itemset (where the minimum support count is 2) and finally derived the Association rules.