Social Network Analysis and Mining

, Volume 1, Issue 3, pp 187–218

A systematic approach to the one-mode projection of bipartite graphs

Original Article

Abstract

Bipartite graphs are common in many complex systems as they describe a relationship between two different kinds of actors, e.g., genes and proteins, metabolites and enzymes, authors and articles, or products and consumers. A common approach to analyze them is to build a graph between the nodes on one side depending on their relationships with nodes on the other side; this so-called one-mode projection is a crucial step for all further analysis but a systematic approach to it was lacking so far. Here, we present a systematic approach that evaluates the significance of the co-occurrence for each pair of nodes vw, i.e., the number of common neighbors of v and w. It turns out that this can be seen as a special case of evaluating the interestingness of an association rule in data mining. Based on this connection we show that classic interestingness measures in data mining cannot be applied to evaluate most real-world product-consumer relationship data. We thus introduce generalized interestingness measures for both, one-mode projections of bipartite graphs and data mining and show their robustness and stability by example. We also provide theoretical results that show that the old method cannot even be used as an approximative method. In a last step we show that the new interestingness measures show stable and significant results that result in attractive one-mode projections of bipartite graphs.

Keywords

Bipartite graphs One-mode projection Association rules Interestingness measures 

References

  1. Abdi H (2007) The Kendall rank correlation coefficient. In: Encyclopedia of measurement and statistics. Sage, Thousand OaksGoogle Scholar
  2. Admiraal R, Handcock MS (2008) Networksis: a package to simulate bipartite graphs with fixed marginals through sequential importance sampling. J Stat Softw 24(8):1–21Google Scholar
  3. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD international conference on management of data 1993, pp 207–216Google Scholar
  4. Alon U (2006) An introduction to systems biology: design principles of biological circuits. Chapman & Hall/CRCGoogle Scholar
  5. Artzy-Randrup Y, Fleishman SJ, Ben-Tal N, Stone L (2004) Comment on “Network motifs: simple building blocks of complex networks” and “Superfamilies of evolved and designed networks”. Science 305:1107cGoogle Scholar
  6. Barvinok A (2008) Enumerateing contingency tables via random permanents. Combin Probab Comput 17(1):1–19MATHCrossRefMathSciNetGoogle Scholar
  7. Bollobás B (2001) Random graphs, 2nd edn. In: Cambridge studies in advanced mathematics, vol 73. Cambridge University Press, LondonGoogle Scholar
  8. Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings ACM SIGMOD international conference on management of data 1997, pp 255–264Google Scholar
  9. Brualdi RA (1980) Matrices of zeros and ones with fixed row and column vectors. Linear Algebra Appl 33:159–231MATHCrossRefMathSciNetGoogle Scholar
  10. Brualdi RA (2006) Algorithms for constructing (0,1)-matrices with prescribed row and column sum vectors. Discrete Math 306:3054–3062MATHCrossRefMathSciNetGoogle Scholar
  11. Chen Y, Diaconis P, Holmes SP, Liu JS (2005) Sequential monte carlo methods for statistical analysis of tables. J Am Stat Assoc 100(469):109–120MATHCrossRefMathSciNetGoogle Scholar
  12. Cobb GW, Chen YP (2003) An application of Markov chain Monte Carlo to community ecology. Am Math Mon 110:265–288MATHCrossRefMathSciNetGoogle Scholar
  13. Dorogovtsev SN, Mendes JF (2003) Evolution of networks. Oxford University PressGoogle Scholar
  14. Gale D (1957) A theorem on flows in networks. Pac J Math 7:1073–1082MATHMathSciNetGoogle Scholar
  15. Gionis A, Mannila H, Mielikäinen T, Tsaparas P (2007) Assessing data mining results via swap randomization. ACM Trans Knowl Discov Data 1(3):article no. 14Google Scholar
  16. Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99:7821–7826MATHCrossRefMathSciNetGoogle Scholar
  17. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL (2007) The human disease network. Proc Natl Acad Sci 104:8685–8690CrossRefGoogle Scholar
  18. Greenhill C, McKay BD (2008) Asymptotic enumeration of sparse nonnegative integer matrices with specified row and column sums. Adv Appl Math 41(4):459–481MATHCrossRefMathSciNetGoogle Scholar
  19. Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining—a general survey and comparison. SIGKDD Explor 2(2):1–58CrossRefGoogle Scholar
  20. Holmes RB, Jones LK (1986) On uniform generation of two-way tables with fixed margins and the conditional volume test of Diaconis and Efron. Ann Stat 24(1):64–68MathSciNetGoogle Scholar
  21. Kendall M (1938) A new measure of rank correlation. Biometrika 30:81–89MATHMathSciNetGoogle Scholar
  22. Li M, Fan Y, Chen J, Gao L, Di Z, Wu J (2005) Weighted networks of scientific communication: the measurement and topological role of weight. Phys A 350:643–656CrossRefGoogle Scholar
  23. Ford LR, Fulkerson DR (1962) Flows in networks. Princeton University PressGoogle Scholar
  24. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: Simple building blocks of complex networks. Science 298:824–827CrossRefGoogle Scholar
  25. Newman ME (2001a) Scientific collaboration networks I. Phys Rev E 64:016,131Google Scholar
  26. Newman ME (2001b) Scientific collaboration networks II: shortest paths, weighted networks, and centrality. Phys Rev E 64:016,132Google Scholar
  27. Newson R (2006) Efficient calculation of jackknife confidence intervals for rank statistics. J Stat Softw 15(1):1–10Google Scholar
  28. Newman ME, Barabási AL, Watts DJ (eds) (2006) The structure and dynamics of networks. Princeton University Press, PrincetonGoogle Scholar
  29. Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818CrossRefGoogle Scholar
  30. Piatetsky-Shapiro G (1991) Knowledge discovery in databases. Discovery, analysis, and presentation of strong rules. AAAI/MIT Press, pp 229–248Google Scholar
  31. Ravasz E, Somera A, Mongru D, Oltvai Z, Barabási AL (2002) Hierarchical organization of modularity in metabolic networks. Science 297:1551–1553CrossRefGoogle Scholar
  32. Raeder T, Chawla NV (2011) Market basket analysis with networks. Soc Netw Anal Min 1Google Scholar
  33. Ryser H (1963) Combinatorial mathematics. In: Carus mathematical monograph, vol 14. Mathematical Association of America, WashingtonGoogle Scholar
  34. Vázquez A, Flammini A, Maritan A, Vespignani A (2002) Modeling of protein interaction networks. ComPlexUs 1:38–44Google Scholar
  35. Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244CrossRefGoogle Scholar
  36. Wasserman S, Faust K (1999) Social network analysis—methods and applications, revised, reprinted edn. Cambridge University Press, CambridgeGoogle Scholar
  37. Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442CrossRefGoogle Scholar
  38. Zhou T, Ren J, Medo M, Zhang YC (2007) Bipartite network projection and personal recommendation. Phys Rev E 76:046,115Google Scholar
  39. Zweig KA (2010) How to forget the second side of the story: a new method for the one-mode projection of bipartite graphs. In: Proceedings of the 2010 international conference on advances in social networks analysis and mining ASONAM 2010, pp 200–207Google Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Interdisciplinary Center for Scientific Computing, University of HeidelbergHeidelbergGermany
  2. 2.Wilhelm-Schickard-Institute, University of TübingenTübingenGermany

Personalised recommendations