Building on the Arules Infrastructure for Analyzing Transaction Data with R

  • Michael Hahsler
  • Kurt Hornik
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

The free and extensible statistical computing environment R with its enormous number of extension packages already provides many state-of-the-art techniques for data analysis. Support for association rule mining, a popular exploratory method which can be used, among other purposes, for uncovering cross-selling opportunities in market baskets, has become available recently with the R extension package arules. After a brief introduction to transaction data and association rules, we present the formal framework implemented in arules and demonstrate how clustering and association rule mining can be applied together using a market basket data set from a typical retailer. This paper shows that implementing a basic infrastructure with formal classes in R provides an extensible basis which can very efficiently be employed for developing new applications (such as clustering transactions) in addition to association rule mining.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AGGARWAL, C.C., PROCOPIUC, C.M. and YU, P.S. (2002): Finding Localized Associations in Market Basket Data. Knowledge and Data Engineering, 14, 1, 51–62.CrossRefGoogle Scholar
  2. AGRAWAL, R., IMIELINSKI, T. and SWAMI, A. (1993): Mining Association Rules Between Sets of Items in Large Databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. ACM Press, 207–216.Google Scholar
  3. BATES, D. and MAECHLER, M. (2005): Matrix: A Matrix Package for R. R package version 0.95–5.Google Scholar
  4. BERRY, M. and LINOFF, G. (1997): Data Mining Techniques for Marketing, Sales and Customer Support. John Wiley & Sons.Google Scholar
  5. BORGELT, C. (2003): Efficient Implementations of Apriori and Eclat. In: FIMI’03: Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.Google Scholar
  6. CHAMBERS, J.M. (1998): Programming with Data. Springer, New York.CrossRefMATHGoogle Scholar
  7. FOWLER, M. (2004): UML Distilled: A Brief Guide to the Standard Object Modeling Language. Addison-Wesley Professional, third edition.Google Scholar
  8. GUPTA, G.K., STREHL, A. and GHOSH, J. (1999): Distance Based Clustering of Association Rules. In: Proceedings of the Artificial Neural Networks in Engineering Conference, 1999, St. Louis. ASME, 9, 759–764.Google Scholar
  9. HAHSLER, M., GRÜN, B. and HORNIK, K. (2005): arules — A Computational Environment for Mining Association Rules and Frequent Item Sets. Journal of Statistical Software, 14,15, 1–25.CrossRefGoogle Scholar
  10. HAHSLER, M., GRÜN, B. and HORNIK, K. (2006): arules: Mining Association Rules and Frequent Itemsets. R package version 0.2–7.Google Scholar
  11. HASTIE, T., TIBSHIRANI, R. and FRIEDMAN, J. (2001): The Elements of Statistical Learning. Springer, Berlin.CrossRefMATHGoogle Scholar
  12. HORNIK, K. (2005): A CLUE for CLUster Ensembles. Journal of Statistical Software, 14(12).Google Scholar
  13. HORNIK, K. (2006): CLUE: CLUster Ensembles. R package version 0.3–3.Google Scholar
  14. KAUFMAN, L. and ROUSSEEUW, P. (1990): Finding Groups in Data. Wiley-Interscience Publication.Google Scholar
  15. MAECHLER, M. (2005): cluster: Cluster Analysis Extended Rousseeuw et al. R package version 1.10.2.Google Scholar
  16. PIATETSKY-SHAPIRO, G. (1991): Discovery, Analysis, and Presentation of Strong Rules. In: G. Piatetsky-Shapiro and W. J. Frawley (Eds.): Knowledge Discovery in Databases, AAAI/MIT Press, Cambridge, MA.Google Scholar
  17. R DEVELOPMENT CORE TEAM (2005): R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.Google Scholar
  18. RUSSELL, G.J., BELL, D., BODAPATI, A., BROWN, C.L., JOENGWEN, C., GAETH, G., GUPTA, S. and MANCHANDA, P. (1997): Perspectives on Multiple Category Choice. Marketing Letters, 8,3, 297–305.CrossRefGoogle Scholar
  19. SNEATH, P.H. (1957): Some Thoughts on Bacterial Classification. Journal of General Microbiology, 17, 184–200.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Michael Hahsler
    • 1
  • Kurt Hornik
    • 2
  1. 1.Department of Information Systems and OperationsWirtschaftsuniversitätWienAustria
  2. 2.Department of Statistics and MathematicsWirtschaftsuniversitätWienAustria

Personalised recommendations