Building on the Arules Infrastructure for Analyzing Transaction Data with R
The free and extensible statistical computing environment R with its enormous number of extension packages already provides many state-of-the-art techniques for data analysis. Support for association rule mining, a popular exploratory method which can be used, among other purposes, for uncovering cross-selling opportunities in market baskets, has become available recently with the R extension package arules. After a brief introduction to transaction data and association rules, we present the formal framework implemented in arules and demonstrate how clustering and association rule mining can be applied together using a market basket data set from a typical retailer. This paper shows that implementing a basic infrastructure with formal classes in R provides an extensible basis which can very efficiently be employed for developing new applications (such as clustering transactions) in addition to association rule mining.
Unable to display preview. Download preview PDF.
- AGRAWAL, R., IMIELINSKI, T. and SWAMI, A. (1993): Mining Association Rules Between Sets of Items in Large Databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. ACM Press, 207–216.Google Scholar
- BATES, D. and MAECHLER, M. (2005): Matrix: A Matrix Package for R. R package version 0.95–5.Google Scholar
- BERRY, M. and LINOFF, G. (1997): Data Mining Techniques for Marketing, Sales and Customer Support. John Wiley & Sons.Google Scholar
- BORGELT, C. (2003): Efficient Implementations of Apriori and Eclat. In: FIMI’03: Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.Google Scholar
- FOWLER, M. (2004): UML Distilled: A Brief Guide to the Standard Object Modeling Language. Addison-Wesley Professional, third edition.Google Scholar
- GUPTA, G.K., STREHL, A. and GHOSH, J. (1999): Distance Based Clustering of Association Rules. In: Proceedings of the Artificial Neural Networks in Engineering Conference, 1999, St. Louis. ASME, 9, 759–764.Google Scholar
- HAHSLER, M., GRÜN, B. and HORNIK, K. (2006): arules: Mining Association Rules and Frequent Itemsets. R package version 0.2–7.Google Scholar
- HORNIK, K. (2005): A CLUE for CLUster Ensembles. Journal of Statistical Software, 14(12).Google Scholar
- HORNIK, K. (2006): CLUE: CLUster Ensembles. R package version 0.3–3.Google Scholar
- KAUFMAN, L. and ROUSSEEUW, P. (1990): Finding Groups in Data. Wiley-Interscience Publication.Google Scholar
- MAECHLER, M. (2005): cluster: Cluster Analysis Extended Rousseeuw et al. R package version 1.10.2.Google Scholar
- PIATETSKY-SHAPIRO, G. (1991): Discovery, Analysis, and Presentation of Strong Rules. In: G. Piatetsky-Shapiro and W. J. Frawley (Eds.): Knowledge Discovery in Databases, AAAI/MIT Press, Cambridge, MA.Google Scholar
- R DEVELOPMENT CORE TEAM (2005): R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.Google Scholar