Advertisement

Nearly Exact Mining of Frequent Trees in Large Networks

  • Ashraf M. Kibriya
  • Jan Ramon
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7523)

Abstract

Mining frequent patterns in a single network (graph) poses a number of challenges. Already only to match one path pattern to a network (upto subgraph isomorphism) is NP-complete. Matching algorithms that exist, become intractable even for reasonably small patterns, on networks which are large or have a high average degree. Based on recent advances in parameterized complexity theory, we propose a novel miner for rooted trees in networks. The miner, for a fixed parameter k (maximal pattern size), can mine all rooted trees with delay linear in the size of the network and only mildly exponential in the fixed parameter k (2 k ). This allows us to mine tractably, rooted trees, in large networks such as the WWW or social networks. We establish the practical applicability of our miner, by presenting an experimental evaluation on both synthetic and real-world data.

Keywords

Rooted Tree Frequent Pattern Pattern Mining Frequent Tree Graph Isomorphism 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Berlingerio, M., Bonchi, F., Bringmann, B., Gionis, A.: Mining Graph Evolution Rules. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part I. LNCS, vol. 5781, pp. 115–130. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  2. 2.
    Bogdanov, P., Mongiovì, M., Singh, A.K.: Mining heavy subgraphs in time-evolving networks. In: Proceedings of the 2011 IEEE 11th International Conference on Data Mining, ICDM 2011, pp. 81–90. IEEE Computer Society, Washington, DC (2011)CrossRefGoogle Scholar
  3. 3.
    Borgelt, C., Berthold, M.R.: Mining molecular fragments: Finding relevant substructures of molecules. In: Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM 2002, pp. 51–58. IEEE Computer Society, Washington, DC (2002)CrossRefGoogle Scholar
  4. 4.
    Borgelt, C., Meinl, T., Berthold, M.: Moss: a program for molecular substructure mining. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, OSDM 2005, pp. 6–15. ACM, New York (2005)CrossRefGoogle Scholar
  5. 5.
    Bringmann, B., Nijssen, S.: What is frequent in a single graph? In: Frasconi, P., Kersting, K., Wrobel, S. (eds.) Proceedings of MLG-2007: 5th International Workshop on Mining and Learning with Graphs, pp. 1–4 (2007)Google Scholar
  6. 6.
    Calders, T., Ramon, J., Van Dyck, D.: All normalized anti-monotonic overlap graph measures are bounded. Data Mining and Knowl. Disc. 23(3), 503–548 (2011)zbMATHCrossRefGoogle Scholar
  7. 7.
    Chi, Y., Xia, Y., Yang, Y., Muntz, R.: Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans. on Knowl. and Data Eng. 17, 190–202 (2005)CrossRefGoogle Scholar
  8. 8.
    Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: An improved algorithm for matching large graphs. In: 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition, Cuen, pp. 149–159 (2001)Google Scholar
  9. 9.
    Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 1367–1372 (2004)CrossRefGoogle Scholar
  10. 10.
    Diestel, R.: Graph Theory, 4th edn., electronic edn. Springer (2010)Google Scholar
  11. 11.
    Dries, A., Nijssen, S.: Mining Patterns in Networks using Homomorphism. In: Proceedings of the Twelfth SIAM International Conference on Data Mining, pp. 260–271. Omnipress (April 2012), https://lirias.kuleuven.be/handle/123456789/350328
  12. 12.
    Gallagher, B., Tong, H., Eliassi-Rad, T., Faloutsos, C.: Using ghost edges for classification in sparsely labeled networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 256–264. ACM, New York (2008)CrossRefGoogle Scholar
  13. 13.
    Gjoka, M., Kurant, M., Butts, C., Markopoulou, A.: Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In: Proc. of IEEE INFOCOM 2010 (2010)Google Scholar
  14. 14.
    Hasan, M.A., Zaki, M.J.: Output space sampling for graph patterns. Proceedings of the VLDB Endowment 2(1), 730–741 (2009)Google Scholar
  15. 15.
    Henderson, K., Gallagher, B., Li, L., Akoglu, L., Eliassi-Rad, T., Tong, H., Faloutsos, C.: It’s who you know: graph mining using recursive structural features. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011, pp. 663–671. ACM, New York (2011)CrossRefGoogle Scholar
  16. 16.
    Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of the 2003 Third IEEE International Conference on Data Mining, ICDM 2003, pp. 549–556. IEEE Computer Society, Washington, DC (2003)CrossRefGoogle Scholar
  17. 17.
    Huan, J., Wang, W., Prins, J., Yang, J.: Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 581–586. ACM, New York (2004)CrossRefGoogle Scholar
  18. 18.
    Inokuchi, A., Washio, T., Motoda, H.: An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  19. 19.
    Koutis, I.: Faster Algebraic Algorithms for Path and Packing Problems. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 575–586. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  20. 20.
    Koutis, I., Williams, R.: Limits and Applications of Group Algebras for Parameterized Problems. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009, Part I. LNCS, vol. 5555, pp. 653–664. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  21. 21.
    Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM 2001, pp. 313–320. IEEE Computer Society, Washington, DC (2001)CrossRefGoogle Scholar
  22. 22.
    McKay, B.D.: Practical graph isomorphism. Congr. Numerantium 10, 45–87 (1981)MathSciNetGoogle Scholar
  23. 23.
    Nienhuys-Cheng, S.-H., de Wolf, R.: Foundations of Inductive Logic Programming. LNCS (LNAI), vol. 1228. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  24. 24.
    Nijssen, S., Kok, J.: There is no optimal, theta-subsumption based refinement operator, personal communicationGoogle Scholar
  25. 25.
    Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 647–652. ACM, New York (2004)CrossRefGoogle Scholar
  26. 26.
    Nijssen, S., Kok, J.N.: The gaston tool for frequent subgraph mining. Electronic Notes in Theoretical Computer Science 127(1), 77–87 (2005); Proceedings of the International Workshop on Graph-Based Tools (GraBaTs 2004)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Thomas, L.T., Valluri, S.R., Karlapalem, K.: Margin: Maximal frequent subgraph mining. ACM Trans. Knowl. Discov. Data 4, 10:1–10:42 (2010)zbMATHCrossRefGoogle Scholar
  28. 28.
    Ullmann, J.: An algorithm for subgraph isomorphism. JACM 23(1), 31–42 (1976)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Wörlein, M., Meinl, T., Fischer, I., Philippsen, M.: A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 392–403. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  30. 30.
    Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM 2002, pp. 721–724. IEEE Computer Society, Washington, DC (2002)Google Scholar
  31. 31.
    Yan, X., Han, J.: Closegraph: mining closed frequent graph patterns. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 286–295. ACM, New York (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ashraf M. Kibriya
    • 1
  • Jan Ramon
    • 1
  1. 1.Department of Computer ScienceKatholieke Universiteit LeuvenLeuvenBelgium

Personalised recommendations