Knowledge and Information Systems

, Volume 28, Issue 2, pp 423–447 | Cite as

An efficient graph-mining method for complicated and noisy data with real-world applications

  • Yi Jia
  • Jintao Zhang
  • Jun HuanEmail author
Regular Paper


In this paper, we present a novel graph database-mining method called APGM (APproximate Graph Mining) to mine useful patterns from noisy graph database. In our method, we designed a general framework for modeling noisy distribution using a probability matrix and devised an efficient algorithm to identify approximate matched frequent subgraphs. We have used APGM to both synthetic data set and real-world data sets on protein structure pattern identification and structure classification. Our experimental study demonstrates the efficiency and efficacy of the proposed method.


Graph mining Approximate subgraph isomorphism 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal CC (2009) Managing and mining uncertain data. Springer, BerlinzbMATHCrossRefGoogle Scholar
  2. 2.
    Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: Proceedings of the 2009 ACM SIGKDD international conference on Knowledge discovery and data mining (SIGKDD’09), pp 29–37Google Scholar
  3. 3.
    Bandyopadhyay D, Snoeyink J (2004) Almost-Delaunay simplices: nearest neighbor relations for imprecise points. In: ACM-SIAM symposium on distributed algorithms, pp 403–412Google Scholar
  4. 4.
    Chan J, Bailey J, Leckie C (2008) Discovering correlated spatio-temporal changes in evolving graphs. Knowl Inf Syst 16(1): 53–96CrossRefGoogle Scholar
  5. 5.
    Chen C, Yan X, Zhu F, Han J (2007) Gapprox: mining frequent approximate patterns from a massive network. In: Proceedings of the 2007 international conference on data mining (ICDM’07)Google Scholar
  6. 6.
    Eddy SR (2004) Where did the blosum62 alignment score matrix come from. Nat Biotechnol 22: 1035–1036CrossRefGoogle Scholar
  7. 7.
    Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 14Google Scholar
  8. 8.
    Holder LB, Cook DJ, Djoko S (1994) Substructures discovery in the subdue system. In: Proceedings of AAAI’94 workshop knowledge discovery in databases, pp 169–180Google Scholar
  9. 9.
    Hu H, Yan X, Huang Y, Han J, Zhou XJ (2005) Mining coherent dense subgraphs across massive biological networks for functional discovery. In: Proceedings of the 2005 international conference on intelligent systems for molecular biology (ISMB’05)Google Scholar
  10. 10.
    Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraph in the presence of isomorphism. In: Proceedings of the 2003 IEEE international conference on data mining (ICDM’03), pp 549–552Google Scholar
  11. 11.
    Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of the 2003 international conference on data mining (ICDM’03)Google Scholar
  12. 12.
    Huan J, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A, Wang W (2006) Distance-based identification of spatial motifs in proteins using constrained frequent subgraph mining. In: Proceedings of the IEEE computational systems bioinformaticsGoogle Scholar
  13. 13.
    Huan J, Prins J, Wang W, Carter C, Dokholyan NV (2006) Coordinated evolution of protein sequences and structures with structure entropy. In: Computer Science Department Technical ReportGoogle Scholar
  14. 14.
    Huan J, Wang W, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A (2004) Mining family specific residue packing patterns from protein structure graphs. In: Proceedings of the 8th annual international conference on research in computational molecular biology (RECOMB), pp 308–315Google Scholar
  15. 15.
    Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 581–586Google Scholar
  16. 16.
    Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proceeding of 2000 practice of knowledge discovery in databases conference (PKDD’00), pp 13–23Google Scholar
  17. 17.
    Judson KA, Lubinski JM, Jiang M, Chang Y, Eisenberg RJ, Cohen GH, Friedman HM (2003) Blocking immune evasion as a novel approach for prevention and treatment of herpes simplex virus infection. J Virol 77: 12639–12645CrossRefGoogle Scholar
  18. 18.
    Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the 2001 international conference on data mining (ICDM’01), pp 313–320Google Scholar
  19. 19.
    Lahiri M, Berger-Wolf TY (2009) Periodic subgraph mining in dynamic networks. Knowl Inf Syst (online first 09/2009)Google Scholar
  20. 20.
    Lahiri M, Berger-Wolf TY (2007) Structure prediction in temporal networks using frequent subgraphs. Computat Intell Data Min, pp. 35–42Google Scholar
  21. 21.
    Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 647–652Google Scholar
  22. 22.
    Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM (1997) CATH—a hierarchic classification of protein domain structures. Structure 5(8): 1093–1108CrossRefGoogle Scholar
  23. 23.
    Pei J, Jiang D, Zhang A (2005) Mining cross-graph quasi-cliques in gene expression and protein interaction data. ICDE, pp 353–354Google Scholar
  24. 24.
    De Raedt L, Kramer S (2001) The levelwise version space algorithm and its application to molecular fragment finding. In: IJCAI’01: seventeenth international joint conference on artificial intelligence, vol 2, pp 853–859Google Scholar
  25. 25.
    Wang G, Dunbrack RL Jr (2003) PISCES: a protein sequence culling server. Bioinformatics 19: 1589–1591CrossRefGoogle Scholar
  26. 26.
    Weng C-H, Chen Y-L (2010) Mining fuzzy association rules from uncertain data. Knowl Inf Syst 23(2): 129–152CrossRefGoogle Scholar
  27. 27.
    Yada K, Motoda H, Washio T, Miyawaki A (2004) Consumer behavior analysis by graph mining technique. Lecture Notes in Computer Science, pp 800–806Google Scholar
  28. 28.
    Yan X, Han J (2002) gspan: graph-based substructure pattern mining. In: Procceeding of international conference on data mining (ICDM’02), pp 721–724Google Scholar
  29. 29.
    Yan X, Zhu F, Yu PS, Han J (2006) Feature-based substructure similarity search. ACM Trans Database Syst 31(4): 1418–1453CrossRefGoogle Scholar
  30. 30.
    Zhang S, Yang J (2008) Ram: randomized approximate graph mining export. Scientific and Statistical Database ManagementGoogle Scholar
  31. 31.
    Zhang S, Yang J, Cheedella V (2007) Monkey: approximate graph mining based on spanning trees. In: Proceeding of IEEE 23rd international conference data engineering (ICDE’07), pp 1247–1249Google Scholar
  32. 32.
    Zou Z, Li J, Gao H, Zhang S (2009) Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of the 2009 conference on information and knowledge management (CIKM’09), pp 583–592Google Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.Department of Electrical Engineering & Computer ScienceUniversity of KansasLawrenceUSA
  2. 2.Center for Bioinformatics, Department of Molecular BiosciencesThe University of KansasLawrenceUSA

Personalised recommendations