LGM: Mining Frequent Subgraphs from Linear Graphs

  • Yasuo Tabei
  • Daisuke Okanohara
  • Shuichi Hirose
  • Koji Tsuda
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6635)

Abstract

A linear graph is a graph whose vertices are totally ordered. Biological and linguistic sequences with interactions among symbols are naturally represented as linear graphs. Examples include protein contact maps, RNA secondary structures and predicate-argument structures. Our algorithm, linear graph miner (LGM), leverages the vertex order for efficient enumeration of frequent subgraphs. Based on the reverse search principle, the pattern space is systematically traversed without expensive duplication checking. Disconnected subgraph patterns are particularly important in linear graphs due to their sequential nature. Unlike conventional graph mining algorithms detecting connected patterns only, LGM can detect disconnected patterns as well. The utility and efficiency of LGM are demonstrated in experiments on protein contact maps.

References

  1. 1.
    Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized substructure discovery for semi-structured data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 1–14. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Avis, D., Fukuda, K.: Reverse search for enumeration. Discrete Appl. Math. 65, 21–46 (1996)CrossRefMATHGoogle Scholar
  3. 3.
    Davydov, E., Batzoglou, S.: A computational model for RNA multiple sequence alignment. Theoretical Computer Science 368, 205–216 (2006)CrossRefMATHGoogle Scholar
  4. 4.
    Eichinger, F., Böhm, K., Huber, M.: Mining edge-weighted call graphs to localise software bugs. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 333–348. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Fertin, G., Hermelin, D., Rizzi, R., Vialette, S.: Common structured patterns in linear graphs: Approximation and combinatorics. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 241–252. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  6. 6.
    Georgii, E., Dietmann, S., Uno, T., Pagel, P., Tsuda, K.: Enumeration of condition-dependent dense modules in protein interaction networks. Bioinformatics 25(7), 933–940 (2009)CrossRefGoogle Scholar
  7. 7.
    Glyakina, A.V., Garbuzynskiy, S.O., Lobanov, M.Y., Galzitskaya, O.V.: Different packing of external residues can explain differences in the thermostability of proteins from thermophilic and mosophilic organisms. Bioinformatics 23, 2231–2238 (2007)CrossRefGoogle Scholar
  8. 8.
    Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  9. 9.
    Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM 2001), pp. 313–320 (2001)Google Scholar
  10. 10.
    Mirny, L.A., Shakhnovich, E.I.: Universally Conserved Positions in Protein Folds: Reading Evolutionary Signals about Stability, Folding Kinetics and Function. Journal of Molecular Biology 291, 177–196 (1999)CrossRefGoogle Scholar
  11. 11.
    Miyao, Y., Sætre, R., Sagae, K., Matsuzaki, T., Tsujii, J.: Task-oriented evaluation of syntactic parsers and their representations. In: 46th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 46–54 (2008)Google Scholar
  12. 12.
    Nowozin, S., Tsuda, K.: Frequent subgraph retrieval in geometric graph databases. In: Perner, P. (ed.) ICDM 2008. LNCS (LNAI), vol. 5077, pp. 953–958. Springer, Heidelberg (2008)Google Scholar
  13. 13.
    Nowozin, S., Tsuda, K., Uno, T., Kudo, T., Bakir, G.: Weighted substructure mining for image analysis. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos (2007)Google Scholar
  14. 14.
    Pei, J., Han, J., Mortazavi-asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering 16(11), 1424–1440 (2004)CrossRefGoogle Scholar
  15. 15.
    Saigo, H., Nowozin, S., Kadowaki, T., Taku, K., Tsuda, K.: gBoost: a mathematical programming approach to graph classification and regression. Machine Learning 75, 69–89 (2008)CrossRefGoogle Scholar
  16. 16.
    Uno, T., Kiyomi, M., Arimura, H.: LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, pp. 77–86 (2005)Google Scholar
  17. 17.
    Wale, N., Karypis, G.: Comparison of descriptor spaces for chemical compound retrieval and classification. In: Proceedings of the 2006 IEEE International Conference on Data Mining, pp. 678–689 (2006)Google Scholar
  18. 18.
    Yan, X., Cheng, H., Han, J., Yu, P.S.: Mining significant graph patterns by leap search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 433–444 (2008)Google Scholar
  19. 19.
    Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), pp. 721–724 (2002)Google Scholar
  20. 20.
    Yan, X., Han, J.: CloseGraph: mining closed frequent graph patterns. In: Proceedings of 2003 International Conference on Knowledge Discovery and Data Mining (SIGKDD 2003), pp. 286–295 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yasuo Tabei
    • 1
  • Daisuke Okanohara
    • 2
  • Shuichi Hirose
    • 3
  • Koji Tsuda
    • 1
    • 3
  1. 1.ERATO Minato ProjectJapan Science and Technology AgencySapporoJapan
  2. 2.Preferred Infrastructure, IncTokyoJapan
  3. 3.Computational Biology Research CenterNational Institute of Advanced Industrial Science and Technology (AIST)TokyoJapan

Personalised recommendations