Mining Associations with Apriori using R – Part 1

Prologue: I have been working and practicing various skills and algorithms as a progress to show on my road-map to become as a matured data scientist. As a part of this expedition I have decided to document all those stuffs I am going through. So whatever you read under this column will be either a summary of my understanding or a post explaining the details of my experiment.

My previous expedition was on association mining so ultimately below post is a text explain my experiment and understanding.

Basket analysis 

Association mining is a technique that helps us to find hidden relationships (using association rules) among the items present in the data-set. You can start appreciating the goodness of Association mining only when you understand where it is used exactly in real time?  So to answer the question it is used in

  1. Shopping centers to increase the product sale.
  2. Pandora.com uses association rule to recommend music of your taste.
  3. Google use association technique to auto-complete the text you type in search box.

WalMart Beer-Diaper story

A popular case study “Beer-Diapers Wal-Mart- story” will help you to get even more intuition about association mining. WalMart decided to study their data and applied some association mining technique on their data-base. After analysis they found that young American males who buys Diaper also buys Beer bottles. So WalMart sale team rearranged their item-stacks and kept the beer next to diaper. As a result, male customers showed interest in buying beer than usual which ended up in increasing sale opportunity for WalMart.

This is famous because no one would have predicted such a relationship between the Beer and Diapers but Association mining helped to identify those relationship and helped business to improve their sales.

Apriori algorithm

Today there are many algorithms that apply association mining techniques to find association rules in that one of the best known and classical algorithms is Apriori algorithm (which is discussed in this post). However the complexity and performance of mining algorithms is subject to research area and I have not discussed here.

Apriori works in iterative approach known as level-wise search, where k-itemsets are used to explore (k+1) -itemsets. First, the set of frequent 1-itemset is found by scanning the database to accumulate the count for each item, and collecting those items that satisfy minimum support (Prune step). The resulting set is denoted as L1. Next, L1 is used to find L2  (Join step), the set of frequent 2-itemsets, which is used to find L3, and so on, until no more frequent k-itemset can be found.

Terms & Terminologies
1-itemsets means {a} , {b} , {c}
2-itemsets means {a, b} , {d, d} , {a, c}
K-itemsets means {i1, i2, i3,… ik}, {j1, j2, j3, …. jk}

Join step : Meaning 1-itemset is made to self-join to generate 2-itemsets.
Prune step : Here resulting set from Join step is filtered with minimum support threshold.
Cardinality set : Resulting set from every Prune step.

Support = No.of transactions containing ‘a’ and ‘b’ / total no of transaction.
Support => supp (a, b) => p (a U b)

Confident = No.of transactions containing ‘a’ and ‘b’ / no of transaction containing ‘a’.
Confident => con (a, b) == > P (b|a)  nothing but conditional probability.

When using algorithms like Apriori, we will end up finding large number of association rules and to select only those rules which are interesting, we will be using some constrains measures. One such popular constrain measures are minimum thresholds on support and confidence.

Continue with Part 2 Mining Associations with Apriori using R


Post a Comment

Your email is never published nor shared. Required fields are marked *