Skip to main content
Log in

Market basket analysis with networks

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

The field of market basket analysis, the search for meaningful associations in customer purchase data, is one of the oldest areas of data mining. The typical solution involves the mining and analysis of association rules, which take the form of statements such as “people who buy diapers are likely to buy beer”. It is well-known, however, that typical transaction datasets can support hundreds or thousands of obvious association rules for each interesting rule, and filtering through the rules is a non-trivial task (Klemettinen et al. In: Proceedings of CIKM, pp 401–407, 1994). One may use an interestingness measure to quantify the usefulness of various rules, but there is no single agreed-upon measure and different measures can result in very different rankings of association rules. In this work, we take a different approach to mining transaction data. By modeling the data as a product network, we discover expressive communities (clusters) in the data, which can then be targeted for further analysis. We demonstrate that our network based approach can concisely isolate influence among products, mitigating the need to search through massive lists of association rules. We develop an interestingness measure for communities of products and show that it isolates useful, actionable communities. Finally, we build upon our experience with product networks to propose a comprehensive analysis strategy by combining both traditional and network-based techniques. This framework is capable of generating insights that are difficult to achieve with traditional analysis methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. A matter of notation: Throughout the paper, as we discuss insights from our data, it will be necessary to mention a number of specific products sold in the store. Whenever we do so, we will denote them in ALL_CAPS to distinguish specific products from concepts or classes of items. Classes of items are typed in normal text. Thus, throughout the paper, WATER_DASANI_20_OZ refers to a specific type of water, whereas “water” refers to a general class of products.

References

  • Adomavicius G, Tuzhilin A (1999) User profiling in personalization applications through rule discovery and validation. In: Proceedings of KDD. ACM, New York, pp 377–381

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in very large databases. In: Proceedings of the 20th International Conference on VLDB. Santiago, Chile, pp 487–499

  • Asur S, Ucar D, Parthasarathy S (2007) An ensemble framework for clustering protein-protein interaction networks. In: ISMB/ECCB, pp 29–40

  • Barabasi A, Bonabeau E (2003) Scale-free networks. Sci Am 288(5):50–59

    Article  Google Scholar 

  • Blanchard J, Guillet F, Briand H (2003) Exploratory visualization for association rule rummaging. In: KDD-03 workshop on multimedia data mining (MDM-03)

  • Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks

  • Brijs T, Vanhoof K, Wets G (2003) Defining interestingness for association rules. Int J Inf Theor Appl 10(4):370–376

    Google Scholar 

  • Brin S, Motwani R, Page L, Winograd T (1998) What can you do with a Web in your Pocket? Data Eng Bull 21(2):37–47

    Google Scholar 

  • Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. In: Proceedings of the ACM SIGMOD, pp 265–276

  • Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. ACM SIGMOD Record 26(2):255–264

    Article  Google Scholar 

  • Cavique L (2007) A scalable algorithm for the market basket analysis. J Retail Consumer Serv 14(6):400–407

    Article  Google Scholar 

  • Chawla S, Arunasalam B, Davis J (2003) Mining open source software (oss) data using association rules network. PAKDD 461–466

  • Chawla S, Davis J, Pandey G (2004) On local pruning of association rules using directed hypergraphs. In: 20th international conference on data engineering

  • Cho Y, Kim J, Kim S (2002) A personalized recommender system based on web usage mining and decision tree induction. Expert Syst Appl 23(3):329–342

    Article  Google Scholar 

  • Clauset A, Newman M, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(066111)

  • Clauset A, Shalizi C, Newman M (2007) Power-law distributions in empirical data. axriv, 706

  • Du N, Wu B, Pei X, Wang B, Xu L (2007) Community detection in large-scale social networks. In: Proceedings of WebKDD. ACM, pp 16–25

  • DuMouchel W, Pregibon D (2001) Empirical bayes screening for multi-item associations. In: Proceedings of KDD, pp 67–76

  • Fonseca B, Golgher P, Pôssas B, Ribeiro-Neto B, Ziviani N (2005) Concept-based interactive query expansion. In: Proceedings of CIKM. ACM, p 703

  • Gouda K, Zaki M (2001) Efficiently mining maximal frequent itemsets. In: Proceedings of ICDM. IEEE Computer Society, pp 163–170

  • Han J, Pei J (2000) Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD Explor Newslett 2(2):14–20

    Article  Google Scholar 

  • Hao M, Dayal U, Hsu M, Sprenger T, Gross M (2001) Visualization of directed associations in e-commerce transaction data. In: Proceedings of VisSym, vol 1, pp 185–192

  • Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mininga general survey and comparison. ACM SIGKDD Explor Newslett 2(1):58–64

    Article  Google Scholar 

  • Kleinberg J, Lawrence S (2001) The structure of the web. Science 294:1849–1850

    Article  Google Scholar 

  • Klemettinen M, Mannila H, Ronkainen P, Toivonen H, Verkamo A (1994) Finding interesting rules from large sets of discovered association rules. In: Proceedings of CIKM, pp 401–407

  • Massen C, Doye J (2005) Identifying communities within energy landscapes. Phys Rev E 71(4):46101

    Article  MathSciNet  Google Scholar 

  • Mauri C (2003) Card loyalty. A new emerging issue in grocery retailing. Journal of Retailing and Consumer Serv 10(1):13–25

    Article  Google Scholar 

  • McGarry K (2005) A survey of interestingness measures for knowledge discovery. Knowl Eng Rev 20(01):39–61

    Article  Google Scholar 

  • Newman M (2004) Detecting community structure in networks. Eur Phys J B Condens Matter Complex Syst 38(2):321–330

    Article  Google Scholar 

  • Newman M (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):36104

    Article  Google Scholar 

  • Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):26113

    Article  Google Scholar 

  • Palmer C, Faloutsos C (2003) Electricity based external similarity of categorical attributes. Lecture notes in computer science, pp 486–500

  • Pandey G, Chawla S, Poon S, Arunasalam B, Davis J (2009) Association rules network: definition and applications. Statistical analysis and data mining 1(4)

  • Steinhaeuser K, Chawla N (2008) Community detection in a large-scale real world social network. In: LNCS. Springer, Berlin

  • Tan P, Kumar V, Srivastava J (2004) Selecting the right objective measure for association analysis. Inf Syst 29(4):293–313

    Article  Google Scholar 

  • Tong H, Faloutsos C (2006) Center-piece subgraphs: problem definition and fast solutions. In: Proceedings of KDD. ACM New York, pp 404–413

  • Tong H, Faloutsos C, Pan J (2006) Fast random walk with restart and its applications. In: Proceedings of ICDM, pp 613–622

  • Wong P, Whitney P, Thomas J (1999) Visualizing association rules for text mining. In: 1999 IEEE Symposium on Information Visualization, 1999 (Info Vis’ 99) Proceedings, pp 120–123

  • Xiong H, Tan P, Kumar V (2006) Hyperclique pattern discovery. Data Mining Knowl Discov 13(2):219–242

    Article  MathSciNet  Google Scholar 

  • Zaki M (2000) Generating non-redundant association rules. In: Proceedings of KDD. ACM New York, pp 34–43

  • Zaki M, Hsiao C (2002) CHARM: An efficient algorithm for closed itemset mining. In: 2nd SIAM International Conference on Data Mining, pp 457–473

  • Zaki M, Parthasarathy S, Ogihara M, Li W et al (1997) New algorithms for fast discovery of association rules. In: Proceedings of KDD, vol 20

  • Zaki MJ (1999) Parallel and distributed association mining: a survey. IEEE Concurr 7(4):14–25

    Article  Google Scholar 

  • Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: Proceedings of KDD. ACM, New York, pp 401–406

Download references

Acknowledgments

This work partially supported by the National Science Foundation under grant NSF 0826958, the NET Institute, and the Arthur J. Schmitt Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Troy Raeder.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raeder, T., Chawla, N.V. Market basket analysis with networks. Soc. Netw. Anal. Min. 1, 97–113 (2011). https://doi.org/10.1007/s13278-010-0003-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13278-010-0003-7

Keywords

Navigation