Association Mining

Objective.

To use Apriori algorithm to find frequent itemsets using candidate generation and generate association rule from frequent itemsets .

Assumption:

Consider a transaction table

TID

List of Item ids

T-1 I1,I2,I5
T-2 I2,I4
T-3 I2,I3
T-4 I1,I2,I4
T-5 I1,I3
T-6 I2,I3
T-7 I1,I3
T-8 I1,I2,I3,I5
T-9 I1,I2,I3
T-10 I4,I5

In the first iteration of the algorithm , each item is a member of the set of candidate 1-itemsets ,C1,the algorithm simply scan all of the transactions in order to count the number of occurrence of each item.

ITEMSET

Sup.count

I1 6
I2 7
I3 6
I4 3
I5 3

Suppose that the minimum support count required is 2. The set of frequent 1-itemsets ,L1 ,can then be determined. It consists of the candidate 1-itemsets satisfying minimum support. In our example, all of the candidates in C1 satisfy minimum support.

To discover the set of frequency 2-itemsect, L2 the algorithm uses the join L1 ¨L2 to generate a candidate set of 2-itemset ,C2 .

C2

ITEMSET

I1,I2
I1,I3
I1,I4
I1,I5
I2,I3
I2,I4
I2,I5
I3,I4
I3,I5
I4,I5

C2

ITEMSET

Sup.count

I1,I2 4
I1,I3 4
I1,I4 1
I1,I5 2
I2,I3 4
I2,I4 2
I2,I5 2
I3,I4 0
I3,I5 1
I4,I5 1

L2

ITEMSET

Sup.count

I1,I2 4
I1,I3 4
I1,I5 2
I2,I3 4
I2,I4 2
I2,I5 2

The generation of the set of candidate 3-itemsets, from the joint step , we first get

C3 = L1¨L2 = {{I1,I2,I3} ,{I1,I2,I5},{I1,I3,I5},{I2,I3,I4},{I2.I3,I5},{I2,I4,I5}}.

Based on the apriori property that all subset of a frequent itemset must also be frequent , we can determine that the four latter candidate cannot possibly be frequent .

C3

ITEMSET

I1,I2,I3
I1,I2,I5

L3

ITEMSET

Sup.count

I1,I2,I3 2
I1,I2,I5 2

Thus we have generated candidate itemset and frequent itemset, where the minimum support count is 2.

Generating Association rules from frequent itemsets

Once the itemset from transaction have been found, it is straightforward to generate strong association rules from them. This can be done using equation for confident which we show again here for completeness

support_count (AUB)

Confidence (A=>B) = P(B|A) =   ——————————————

support_count(A)

 

Suppose that data contain the frequent itemset I = {I1, I2, I5}. What are the association rules that can be generated from I? The nonempty subset of I are

{I1, I2}, {I1, I5},{I2,I5},{I1},{I2} and {I5}.

The resulting association rules are as show below, each listed with its confidence .

I1^I2 –> I5     CONDIFENT = 2/4 =50%

I1^I5 –> I2     CONDIFENT = 2/2 = 100%

I2^I5 –> I1     CONDIFENT = 2/2 =100%

I1 –> I2^I5 CONFIDENT = 2/6 = 33%

I2 –> I1^I5 CONFIDENT = 2/7 =29%

I5 –> I1^I2 CONFIDENT = 2/2 = 100%

Result

Thus we have generated candidate itemset ,frequent itemset (where the minimum support count is 2) and finally derived the Association rules.


2 Comments

  1. Amu wrote
    at 6:24 AM - 28th September 2010 Permalink

    Respected Sir,
    I am S.Amudaria.. Currently doing my ME..I have selected the project Semantic based information retrieval. So i need SVD , LSI and K-means algorithm. I am unable to run your codings.. The main thing is i am new to java..

  2. shakthydoss wrote
    at 12:43 PM - 28th September 2010 Permalink

    Hello Amudaria , you need to understand that i am also a student and not a teacher so no need to call me as sir . Secondly i have attached a jar file for LSI (Semantic based information retrieval) . You can download the jar and crack the code or execute to see the result.

Post a Comment

Your email is never published nor shared. Required fields are marked *