Pattern Extraction from Graphs and Beyond

Chapter

Abstract

We explain recent studies on pattern extraction from large-scale graphs. Patterns are represented explicitly and implicitly. Explicit patterns are concrete subgraphs defined in graph theory, e.g., clique and tree. For an explicit model of patterns, we introduce notable fast algorithms for finding all frequent patterns. We also confirm that these problems are closely related to traditional problems in data mining. On the other hand, implicit patterns are defined by statistical factors, e.g., modularity, betweenness, and flow determining optimal hidden subgraphs. For both models, we give an introductory survey focusing on notable pattern extraction algorithms.

References

  1. 1.
    Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized substructure discovery for semi-structured data. In: PKDD, pp. 1–14 (2002)Google Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE, pp. 3–14 (1995)Google Scholar
  3. 3.
    Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR, pp. 37–45 (1998)Google Scholar
  4. 4.
    Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa., S.: Efficient substructure discovery from large semi-structured data. In: SDM, pp. 158–174 (2002)Google Scholar
  5. 5.
    Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa., S.: Online algorithms for mining semi-structured data stream. In: ICDM, pp. 27–34 (2002)Google Scholar
  6. 6.
    Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering frequent substructures in large unordered trees. In: Discovery Science, pp. 47–61 (2003)Google Scholar
  7. 7.
    Backstrom, L., Huttenlocher, D.P., Kleinberg, J.M., Lan, X.: Group formation in large social networks: membership, growth, and evolution. In: KDD, pp. 44–54 (2006)Google Scholar
  8. 8.
    Batagelj, V., Zaversnik, M.: An O(m) algorithm for cores decomposition of networks. arXiv, preprint cs/0310049 (2003)Google Scholar
  9. 9.
    Berger-Wolf, T.Y., Saia, J.: A framework for analysis of dynamic social networks. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 523–528. ACM (2006)Google Scholar
  10. 10.
    Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 10008 (2008)CrossRefGoogle Scholar
  11. 11.
    Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On modularity clustering. IEEE Trans. Knowl. Data Eng. 20, 172–188 (2008)CrossRefGoogle Scholar
  12. 12.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Networks ISDN Syst. 30(1), 107–117 (1998)CrossRefGoogle Scholar
  13. 13.
    Ceglar, A., Roddick, J.F.: Association mining. ACM Comput. Surv. 38(2), 5 (2006)Google Scholar
  14. 14.
    Chen, G., Wu, X., Zhu, X.: Sequential pattern mining in multiple streams. In: ICDM, pp. 27–30 (2005)Google Scholar
  15. 15.
    Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: maintaining closed frequent itemsets over a stream sliding window. In: ICDM, pp. 59–66 (2004)Google Scholar
  16. 16.
    Chiba, N. Nishizeki, T.: Arboricity and subgraph listing algorithms. SIAM J. Comput. 14(1) 210–223 (1985)Google Scholar
  17. 17.
    Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 66111–66117 (2004)Google Scholar
  18. 18.
    Cohen, J.D.: Trusses: cohesive subgraphs for social network analysis. National Security Agency Technical Report (2008)Google Scholar
  19. 19.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT press and McGraw-Hill Book Co., Cambridge (1992)Google Scholar
  20. 20.
    Dehaspe, L., Toivonen, H., King, R.D.: Finding frequent substructures in chemical compounds. In: KDD, pp. 30–36 (1998)Google Scholar
  21. 21.
    Diestel, R.: Graph Theory. Springer, Heidelberg (2000)Google Scholar
  22. 22.
    Ezeife, C.I., Monwar, M.: SSM: a frequent sequential data stream patterns miner. In: CIDM, pp. 120–126 (2007)Google Scholar
  23. 23.
    Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: KDD, pp. 150–160 (2000)Google Scholar
  24. 24.
    Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.: Self-organization and idenfitication of web communities. IEEE Comput. 35(3), 66–71 (2002)CrossRefGoogle Scholar
  25. 25.
    Freeman, L.C.: A set of measures of cenrrality based upon betweenness. Sociometry 40, 35–41 (1977)CrossRefGoogle Scholar
  26. 26.
    Fu, W., Song, L., Xing, E.P.: Dynamic mixed membership blockmodel for evolving networks. In: Proceedings of the 26th annual international conference on machine learning, pp. 329–336. ACM (2009)Google Scholar
  27. 27.
    Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H.: Parameter free bursty events detection in text streams. In: VLDB, pp. 181–192 (2005)Google Scholar
  28. 28.
    Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. PNAS 99(12), 7821–7826 (2002)Google Scholar
  29. 29.
    Goldberg, A.V., Tarjan, R.E.: A new approach to the maximal flow problem. In: STOC, pp. 136–146 (1986)Google Scholar
  30. 30.
    Greene, D., Doyle, D., Cunningham, P.: Tracking the evolution of communities in dynamic social networks. In: 2010 international conference on advances in social networks analysis and mining (ASONAM), pp. 176–183. IEEE (2010)Google Scholar
  31. 31.
    Hido, S., Kawano, H.: AMIOT: Induced ordered tree mining in tree-structured databases. In: ICDM, pp. 170–177 (2005)Google Scholar
  32. 32.
    Ishiguro, K., Iwata, T., Ueda, N., Tenenbaum, J.: Dynamic infinite relational model for time-varying relational data analysis. Adv. Neural Inf. Process. Syst. 23, 919-927 (2010)Google Scholar
  33. 33.
    Jiménez, A., Berzal, F., Cubero, J.-C.: Frequent tree pattern mining: a survey. Intell. Data Anal. 14(6), 603–622 (2010)Google Scholar
  34. 34.
    Keogh, E., Lonardi, S., Chiu, B.Y.-C.: Finding surprising patterns in a time series database in linear time and space. In: KDD, pp. 550–556 (2002)Google Scholar
  35. 35.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)MathSciNetMATHCrossRefGoogle Scholar
  36. 36.
    Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Extracting large-scale knowledge bases from the web. In: VLDB, pp. 639–650 (1999)Google Scholar
  37. 37.
    Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1–3), 458–473 (2008)MathSciNetMATHCrossRefGoogle Scholar
  38. 38.
    Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: KDD, pp. 497–506 (2009)Google Scholar
  39. 39.
    Leskovec, J., Horvitz, E.: Planetary-scale views on a large instant-messaging network. In: WWW, pp. 915–924 (2008)Google Scholar
  40. 40.
    Li, H.-F., Lee, S.Y.: Miningfrequentitemsetsoverdatastreams using efficient window sliding techniques. Expert Syst. Appl. 36, 1466–1477 (2009)CrossRefGoogle Scholar
  41. 41.
    Li, H.-F., Lee, S.Y., Shan, M.-K.: Online mining (recently) maximal frequent itemsets over data streams. In: RIDE-SDMA, pp. 11–18 (2005)Google Scholar
  42. 42.
    Lin, Y.R., Chi, Y., Zhu, S., Sundaram, H., Tseng, B.L.: Facetnet: a framework for analyzing communities and their evolutions in dynamic networks. In: Proceedings of the 17th international conference on World Wide Web, pp. 685–694. ACM (2008)Google Scholar
  43. 43.
    Makino, K., Uno, T.: New algorithms for enumerating all maximal cliques. In: SWAT, pp. 260–272 (2004)Google Scholar
  44. 44.
    Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: VLDB, pp. 346–357 (2002)Google Scholar
  45. 45.
    Mokken, R.J.: Cliques, clubs and clans. Qual. Quant. 13(2), 161–173 (1979)CrossRefGoogle Scholar
  46. 46.
    Moon, J.W., Moser, L.: On cliques in graphs. Isr. J. Math. 3, 23–28 (1965)MathSciNetMATHCrossRefGoogle Scholar
  47. 47.
    Morishita, S.: On classification and regression. In: Discovery Science, pp. 40–57 (1998)Google Scholar
  48. 48.
    Nakamura, Y., Horiike, T., Kuboyama, T., Sakamoto, H.: Extracting research communities from bibliographic data. KES J. 16(1), 25–34 (2012)Google Scholar
  49. 49.
    Nakano, S., Uno, T.: Efficient generation of rooted trees. Technical report, NII Technical Report NII-2003-005E (2003)Google Scholar
  50. 50.
    Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 066133 (2004)Google Scholar
  51. 51.
    Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)CrossRefGoogle Scholar
  52. 52.
    Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: 1st international workshop on mining graphs, trees, and sequences (MGTS), pp. 55–64 (2003)Google Scholar
  53. 53.
    Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: KDD, pp. 647–652 (2004)Google Scholar
  54. 54.
    Oates, T., Cohen, P.R.: Searching for structure in multiple streams of data. In: ICML, pp. 346–354 (1996)Google Scholar
  55. 55.
    Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE, pp. 215–224 (2001)Google Scholar
  56. 56.
    Pei, J., Han, J., Wang, W.: Constraint-based sequential pattern mining: the pattern-growth methods. J. Intell. Inf. Syst. 28(2), 133–160 (2007)CrossRefGoogle Scholar
  57. 57.
    Qiu, J., Lin, Z.: A framework for exploring organizational structure in dynamic social networks. Decis. Support Syst. 51(4), 760–771 (2011)CrossRefGoogle Scholar
  58. 58.
    Raissi, C., Roncelet, P., Teisseire, M.: SPEED: mining maxirnal sequential patterns over data strearns. In: International IEEE conference on intelligent systems, pp. 546–552 (2006)Google Scholar
  59. 59.
    Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: WEA, pp. 606–609 (2005)Google Scholar
  60. 60.
    Seidman, S.B.: Network structure and minimum degree. Social Networks 5(3), 269–287 (1983)MathSciNetCrossRefGoogle Scholar
  61. 61.
    Seidman, S.B., Foster, B.L.: A graph-theoretic generalization of the clique concept. J. Math. Soc. 6(1), 139–154 (1978)MathSciNetMATHCrossRefGoogle Scholar
  62. 62.
    Snowsill, T., Nicart, F., Stefani, M., De Bie, T., Cristianini, N.: Finding surprising patterns in textual data streams. In: International workshop on cognitive information processing, pp. 405–410 (2010)Google Scholar
  63. 63.
    Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: EDBT, pp. 3–17 (1996)Google Scholar
  64. 64.
    Tantipathananandh, C., Berger-Wolf, T., Kempe, D.: A framework for community identification in dynamic social networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 717–726. ACM (2007)Google Scholar
  65. 65.
    Tatikonda, S., Parthasarathy, S., Kur, T.M.: Trips and tides: new algorithms for tree mining. In: CIKM, pp. 455–464 (2006)Google Scholar
  66. 66.
    Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1), 28–42 (2006)MathSciNetMATHCrossRefGoogle Scholar
  67. 67.
    Tsukiyama, S., Ide, M., Ariyoshi, H., Shirakawa, I.: A new algorithm for generating all the maximal independent sets. SIAM J. Comput., 6, 505–517 (1977)Google Scholar
  68. 68.
    Uno, T., Asai, T., Uchida, Y., Arimura, H.: LCM: an efficient algorithm for enumerating frequent closed item sets. In: FIMI (2003)Google Scholar
  69. 69.
    Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)Google Scholar
  70. 70.
    Wang, N., Zhang, J., Tan, K.-L., Tung., A.K.H.: On triangulation-based dense neighborhood graph discovery. In VLDB, pp. 58–68 (2010)Google Scholar
  71. 71.
    Wang, N., Zhang, J., Tan, K.L., Tung, A.K.H.: On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endowment 4(2), 58–68 (2010)Google Scholar
  72. 72.
    Wasserman, S., Faust, K.: Social network analysis: methods and applications. Cambridge University Press, Cambridge (1994)Google Scholar
  73. 73.
    Zaki, M.J.: Efficiently mining frequent trees in a forest. In: KDD, pp. 71–80 (2002)Google Scholar
  74. 74.
    Zaki, M.J.: Efficiently mining frequent embedded unordered trees. Fundam. Inform. 66(1–2), 33–52 (2005)MathSciNetMATHGoogle Scholar
  75. 75.
    Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)CrossRefGoogle Scholar
  76. 76.
    Zaki, M.J., Ogihara, M.: Theoretical foundation of association rules. In: Workshop on data-mining and knowledge discovery (1998)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  1. 1.Kyushu Institute of TechnologyIizuka-shiJapan
  2. 2.Gakushuin UniversityTokyoJapan
  3. 3.PRESTO JSTKawaguchiJapan

Personalised recommendations