Mining Associations with Apriori using R – Part 2

Prologue: I have been working and practicing various skills and algorithms as a progress to show on my road-map to become as a matured data scientist. As a part of this expedition I have decided to document all those stuffs I am going through. So whatever you read under this column will be either a summary of my understanding or a post explaining the details of my experiment.

In part-1 we have discussed introduction, terminology, algorithm and applications of Apriori. Now let’s discuss the code implementation of Apriori using R.

In R to play with association mining we use two important packages called arules and arulesViz written be Michael Hahsler. The package arules is used for mining transaction records and extracting association rules. The package arulesViz is used as a presentation layer to visualize the association rules.

Data set used in the experiment contains 1-lack record.
Each record represents a transaction containing different items.
To keep it simple Items in each transaction record is represented as numeric value.
Data set and entire source code of program can be found on my github repository

Usually in R we use data.frame to hold data for any data mining activity. However for association mining with arules package we need to hold data in transactional form. So our first process in association mining is to convert the data.frame to transaction form.

trans = read.transactions(file=” transaction-dataset.txt” , format=”basket”, sep=” “);

The data.frame can be in either a normalized single form or a flat file (basket) form. When the file is in basket form it means that each record represents a transaction when the dataset is in ‘single’ form it means that each record represents one single item and each item contains a transaction id.

rules <- apriori(trans, parameter=list(support=0.01, confidence=0.5))

Once we load the transactional data into workspace we call apriori function to extract association rules.

rules <- apriori(trans, parameter=list(support=0.01, confidence=0.5))

In the above snippet we are telling the algorithm to give only the rules that has support greater than 0.01 and confidence greater than 0.5. And then resulting rules are stored in the variable rules. To take a look of generated rules we can use inspect function.

inspect(head(sort(rules , by=”lift”)))

Visualization of rules

Even after filtering the rules using some constrains measure that we have discussed in Part -1
We might end with large list of interesting rules outputted by the aprior rule function. And it not viable to go through all rules one by one. We can use Visualization technique to get deep insight about the rules. We can use arulesViz package to visualize the association rules. The package arulesViz gives us ability to draw different charts and graphs without getting our hard much dirty in coding.

Scatter plot: Default plot for arulesViz package is scatter plot. In this plot association rules are plotted again axes. Usually X-axis corresponds to support and Y-axis corresponds to confidence. However these

plot(rules, measure=c(“support”, ” confidence “), shading=”lift “)

Shading represents an additional coordinate that can be represented in this two dimensional space. Here color coordinate represent the lift value for each point on the scatter plot.

Two key plot: Two-key-plot is similar to scatter plot. Here X-axis corresponds to support and Y-axis corresponds to confidence. And color of point is used represent the order i.e number of items in the rule.

plot(rules, shading=”order”, control=list(main = “Two-key plot”))

Matrix based plot: In these graphs we can see the two parts to an association rule: the antecedent (IF) and the consequent (THEN). These patterns are found by determining frequent patterns in the data and these are identified by the support and confidence.

plot(rules, method=”matrix” , measure=”lift” )
plot(rules, method=”matrix3D” ,measure=c(“lift”, “confidence”))
plot(rules, method=”matrix” , measure=”lift” , interactive=TRUE )

Grouped Matrix: Grouped Matrix is similar to matrix based plot. Here rules are grouped to present as an aggregate in the matrix which can be explored interactively by zooming into and out of groups. Matrix based plot and Grouped Matrix plot are mostly used in interactive mode to understand the data points on the visualization area.
plot(rules, method=”grouped” )
plot(rules, method=”grouped” , interactive=TRUE )

Graph plot: Used to visualize association rules using vertices and edges where vertices typically represent items or item-sets and edges indicate relationship in rules.
plot(rules, method=”graph”);
plot(rules, method=”graph”, control=list(type=”items”))

You can download the data set and entire source code of program from my github repository. Also you are welcomed to modify code to match with your expectation, thank you.

Reference
http://cran.r-project.org/web/packages/arules/vignettes/arules.pdf
http://cran.r-project.org/web/packages/arulesViz/vignettes/arulesViz.pdf
http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mining/The_Apriori_Algorithm
http://en.wikipedia.org/wiki/Association_rule_learning
http://www.allanalytics.com/author.asp?section_id=2037&doc_id=253387&image_number=1


6 Comments

  1. Dino Amaral wrote
    at 6:45 PM - 22nd June 2014 Permalink

    Niice explanation !!!

  2. shakthydoss wrote
    at 9:32 PM - 22nd June 2014 Permalink

    Thanks Dino Amaral.
    I am glad it helped you.

  3. Dino Amaral wrote
    at 1:15 PM - 22nd June 2014 Permalink

    Niice explanation !!!

  4. shakthydoss wrote
    at 4:02 PM - 22nd June 2014 Permalink

    Thanks Dino Amaral.
    I am glad it helped you.

  5. zoraskiller@gmail.com wrote
    at 12:43 AM - 31st May 2015 Permalink

    Thanks! It really helped, specially the reference links, most apreciated.

  6. manoj wrote
    at 6:38 PM - 4th November 2016 Permalink

    Thanks for the clear explanation. Any idea on, the extracted rules to be converted into JSON and store the items in MongoDB.

Post a Comment

Your email is never published nor shared. Required fields are marked *