Abstract
The field of market basket analysis, the search for meaningful associations in customer purchase data, is one of the oldest areas of data mining. The typical solution involves the mining and analysis of association rules, which take the form of statements such as “people who buy diapers are likely to buy beer”. It is well-known, however, that typical transaction datasets can support hundreds or thousands of obvious association rules for each interesting rule, and filtering through the rules is a non-trivial task (Klemettinen et al. In: Proceedings of CIKM, pp 401–407, 1994). One may use an interestingness measure to quantify the usefulness of various rules, but there is no single agreed-upon measure and different measures can result in very different rankings of association rules. In this work, we take a different approach to mining transaction data. By modeling the data as a product network, we discover expressive communities (clusters) in the data, which can then be targeted for further analysis. We demonstrate that our network based approach can concisely isolate influence among products, mitigating the need to search through massive lists of association rules. We develop an interestingness measure for communities of products and show that it isolates useful, actionable communities. Finally, we build upon our experience with product networks to propose a comprehensive analysis strategy by combining both traditional and network-based techniques. This framework is capable of generating insights that are difficult to achieve with traditional analysis methods.
Similar content being viewed by others
Notes
A matter of notation: Throughout the paper, as we discuss insights from our data, it will be necessary to mention a number of specific products sold in the store. Whenever we do so, we will denote them in ALL_CAPS to distinguish specific products from concepts or classes of items. Classes of items are typed in normal text. Thus, throughout the paper, WATER_DASANI_20_OZ refers to a specific type of water, whereas “water” refers to a general class of products.
References
Adomavicius G, Tuzhilin A (1999) User profiling in personalization applications through rule discovery and validation. In: Proceedings of KDD. ACM, New York, pp 377–381
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in very large databases. In: Proceedings of the 20th International Conference on VLDB. Santiago, Chile, pp 487–499
Asur S, Ucar D, Parthasarathy S (2007) An ensemble framework for clustering protein-protein interaction networks. In: ISMB/ECCB, pp 29–40
Barabasi A, Bonabeau E (2003) Scale-free networks. Sci Am 288(5):50–59
Blanchard J, Guillet F, Briand H (2003) Exploratory visualization for association rule rummaging. In: KDD-03 workshop on multimedia data mining (MDM-03)
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks
Brijs T, Vanhoof K, Wets G (2003) Defining interestingness for association rules. Int J Inf Theor Appl 10(4):370–376
Brin S, Motwani R, Page L, Winograd T (1998) What can you do with a Web in your Pocket? Data Eng Bull 21(2):37–47
Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. In: Proceedings of the ACM SIGMOD, pp 265–276
Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. ACM SIGMOD Record 26(2):255–264
Cavique L (2007) A scalable algorithm for the market basket analysis. J Retail Consumer Serv 14(6):400–407
Chawla S, Arunasalam B, Davis J (2003) Mining open source software (oss) data using association rules network. PAKDD 461–466
Chawla S, Davis J, Pandey G (2004) On local pruning of association rules using directed hypergraphs. In: 20th international conference on data engineering
Cho Y, Kim J, Kim S (2002) A personalized recommender system based on web usage mining and decision tree induction. Expert Syst Appl 23(3):329–342
Clauset A, Newman M, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(066111)
Clauset A, Shalizi C, Newman M (2007) Power-law distributions in empirical data. axriv, 706
Du N, Wu B, Pei X, Wang B, Xu L (2007) Community detection in large-scale social networks. In: Proceedings of WebKDD. ACM, pp 16–25
DuMouchel W, Pregibon D (2001) Empirical bayes screening for multi-item associations. In: Proceedings of KDD, pp 67–76
Fonseca B, Golgher P, Pôssas B, Ribeiro-Neto B, Ziviani N (2005) Concept-based interactive query expansion. In: Proceedings of CIKM. ACM, p 703
Gouda K, Zaki M (2001) Efficiently mining maximal frequent itemsets. In: Proceedings of ICDM. IEEE Computer Society, pp 163–170
Han J, Pei J (2000) Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD Explor Newslett 2(2):14–20
Hao M, Dayal U, Hsu M, Sprenger T, Gross M (2001) Visualization of directed associations in e-commerce transaction data. In: Proceedings of VisSym, vol 1, pp 185–192
Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mininga general survey and comparison. ACM SIGKDD Explor Newslett 2(1):58–64
Kleinberg J, Lawrence S (2001) The structure of the web. Science 294:1849–1850
Klemettinen M, Mannila H, Ronkainen P, Toivonen H, Verkamo A (1994) Finding interesting rules from large sets of discovered association rules. In: Proceedings of CIKM, pp 401–407
Massen C, Doye J (2005) Identifying communities within energy landscapes. Phys Rev E 71(4):46101
Mauri C (2003) Card loyalty. A new emerging issue in grocery retailing. Journal of Retailing and Consumer Serv 10(1):13–25
McGarry K (2005) A survey of interestingness measures for knowledge discovery. Knowl Eng Rev 20(01):39–61
Newman M (2004) Detecting community structure in networks. Eur Phys J B Condens Matter Complex Syst 38(2):321–330
Newman M (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):36104
Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):26113
Palmer C, Faloutsos C (2003) Electricity based external similarity of categorical attributes. Lecture notes in computer science, pp 486–500
Pandey G, Chawla S, Poon S, Arunasalam B, Davis J (2009) Association rules network: definition and applications. Statistical analysis and data mining 1(4)
Steinhaeuser K, Chawla N (2008) Community detection in a large-scale real world social network. In: LNCS. Springer, Berlin
Tan P, Kumar V, Srivastava J (2004) Selecting the right objective measure for association analysis. Inf Syst 29(4):293–313
Tong H, Faloutsos C (2006) Center-piece subgraphs: problem definition and fast solutions. In: Proceedings of KDD. ACM New York, pp 404–413
Tong H, Faloutsos C, Pan J (2006) Fast random walk with restart and its applications. In: Proceedings of ICDM, pp 613–622
Wong P, Whitney P, Thomas J (1999) Visualizing association rules for text mining. In: 1999 IEEE Symposium on Information Visualization, 1999 (Info Vis’ 99) Proceedings, pp 120–123
Xiong H, Tan P, Kumar V (2006) Hyperclique pattern discovery. Data Mining Knowl Discov 13(2):219–242
Zaki M (2000) Generating non-redundant association rules. In: Proceedings of KDD. ACM New York, pp 34–43
Zaki M, Hsiao C (2002) CHARM: An efficient algorithm for closed itemset mining. In: 2nd SIAM International Conference on Data Mining, pp 457–473
Zaki M, Parthasarathy S, Ogihara M, Li W et al (1997) New algorithms for fast discovery of association rules. In: Proceedings of KDD, vol 20
Zaki MJ (1999) Parallel and distributed association mining: a survey. IEEE Concurr 7(4):14–25
Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: Proceedings of KDD. ACM, New York, pp 401–406
Acknowledgments
This work partially supported by the National Science Foundation under grant NSF 0826958, the NET Institute, and the Arthur J. Schmitt Foundation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Raeder, T., Chawla, N.V. Market basket analysis with networks. Soc. Netw. Anal. Min. 1, 97–113 (2011). https://doi.org/10.1007/s13278-010-0003-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13278-010-0003-7