An efficient graph-mining method for complicated and noisy data with real-world applications

Jia, Yi; Zhang, Jintao; Huan, Jun

doi:10.1007/s10115-010-0376-y

An efficient graph-mining method for complicated and noisy data with real-world applications

Regular Paper
Published: 02 February 2011

Volume 28, pages 423–447, (2011)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Yi Jia¹,
Jintao Zhang² &
Jun Huan¹

500 Accesses
45 Citations
Explore all metrics

Abstract

In this paper, we present a novel graph database-mining method called APGM (APproximate Graph Mining) to mine useful patterns from noisy graph database. In our method, we designed a general framework for modeling noisy distribution using a probability matrix and devised an efficient algorithm to identify approximate matched frequent subgraphs. We have used APGM to both synthetic data set and real-world data sets on protein structure pattern identification and structure classification. Our experimental study demonstrates the efficiency and efficacy of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Graph Patterns

Graph Clustering via Inexact Patterns

AGraP: an algorithm for mining frequent patterns in a single graph using inexact matching

Article 06 May 2014

Marisol Flores-Garrido, Jesús-Ariel Carrasco-Ochoa & José Fco. Martínez-Trinidad

References

Aggarwal CC (2009) Managing and mining uncertain data. Springer, Berlin
Book MATH Google Scholar
Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: Proceedings of the 2009 ACM SIGKDD international conference on Knowledge discovery and data mining (SIGKDD’09), pp 29–37
Bandyopadhyay D, Snoeyink J (2004) Almost-Delaunay simplices: nearest neighbor relations for imprecise points. In: ACM-SIAM symposium on distributed algorithms, pp 403–412
Chan J, Bailey J, Leckie C (2008) Discovering correlated spatio-temporal changes in evolving graphs. Knowl Inf Syst 16(1): 53–96
Article Google Scholar
Chen C, Yan X, Zhu F, Han J (2007) Gapprox: mining frequent approximate patterns from a massive network. In: Proceedings of the 2007 international conference on data mining (ICDM’07)
Eddy SR (2004) Where did the blosum62 alignment score matrix come from. Nat Biotechnol 22: 1035–1036
Article Google Scholar
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 14
Holder LB, Cook DJ, Djoko S (1994) Substructures discovery in the subdue system. In: Proceedings of AAAI’94 workshop knowledge discovery in databases, pp 169–180
Hu H, Yan X, Huang Y, Han J, Zhou XJ (2005) Mining coherent dense subgraphs across massive biological networks for functional discovery. In: Proceedings of the 2005 international conference on intelligent systems for molecular biology (ISMB’05)
Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraph in the presence of isomorphism. In: Proceedings of the 2003 IEEE international conference on data mining (ICDM’03), pp 549–552
Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of the 2003 international conference on data mining (ICDM’03)
Huan J, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A, Wang W (2006) Distance-based identification of spatial motifs in proteins using constrained frequent subgraph mining. In: Proceedings of the IEEE computational systems bioinformatics
Huan J, Prins J, Wang W, Carter C, Dokholyan NV (2006) Coordinated evolution of protein sequences and structures with structure entropy. In: Computer Science Department Technical Report
Huan J, Wang W, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A (2004) Mining family specific residue packing patterns from protein structure graphs. In: Proceedings of the 8th annual international conference on research in computational molecular biology (RECOMB), pp 308–315
Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 581–586
Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proceeding of 2000 practice of knowledge discovery in databases conference (PKDD’00), pp 13–23
Judson KA, Lubinski JM, Jiang M, Chang Y, Eisenberg RJ, Cohen GH, Friedman HM (2003) Blocking immune evasion as a novel approach for prevention and treatment of herpes simplex virus infection. J Virol 77: 12639–12645
Article Google Scholar
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the 2001 international conference on data mining (ICDM’01), pp 313–320
Lahiri M, Berger-Wolf TY (2009) Periodic subgraph mining in dynamic networks. Knowl Inf Syst (online first 09/2009)
Lahiri M, Berger-Wolf TY (2007) Structure prediction in temporal networks using frequent subgraphs. Computat Intell Data Min, pp. 35–42
Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 647–652
Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM (1997) CATH—a hierarchic classification of protein domain structures. Structure 5(8): 1093–1108
Article Google Scholar
Pei J, Jiang D, Zhang A (2005) Mining cross-graph quasi-cliques in gene expression and protein interaction data. ICDE, pp 353–354
De Raedt L, Kramer S (2001) The levelwise version space algorithm and its application to molecular fragment finding. In: IJCAI’01: seventeenth international joint conference on artificial intelligence, vol 2, pp 853–859
Wang G, Dunbrack RL Jr (2003) PISCES: a protein sequence culling server. Bioinformatics 19: 1589–1591
Article Google Scholar
Weng C-H, Chen Y-L (2010) Mining fuzzy association rules from uncertain data. Knowl Inf Syst 23(2): 129–152
Article Google Scholar
Yada K, Motoda H, Washio T, Miyawaki A (2004) Consumer behavior analysis by graph mining technique. Lecture Notes in Computer Science, pp 800–806
Yan X, Han J (2002) gspan: graph-based substructure pattern mining. In: Procceeding of international conference on data mining (ICDM’02), pp 721–724
Yan X, Zhu F, Yu PS, Han J (2006) Feature-based substructure similarity search. ACM Trans Database Syst 31(4): 1418–1453
Article Google Scholar
Zhang S, Yang J (2008) Ram: randomized approximate graph mining export. Scientific and Statistical Database Management
Zhang S, Yang J, Cheedella V (2007) Monkey: approximate graph mining based on spanning trees. In: Proceeding of IEEE 23rd international conference data engineering (ICDE’07), pp 1247–1249
Zou Z, Li J, Gao H, Zhang S (2009) Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of the 2009 conference on information and knowledge management (CIKM’09), pp 583–592

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering & Computer Science, University of Kansas, Lawrence, KS, 66045, USA
Yi Jia & Jun Huan
Center for Bioinformatics, Department of Molecular Biosciences, The University of Kansas, Lawrence, KS, 66046, USA
Jintao Zhang

Authors

Yi Jia
View author publications
You can also search for this author in PubMed Google Scholar
Jintao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Huan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Huan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jia, Y., Zhang, J. & Huan, J. An efficient graph-mining method for complicated and noisy data with real-world applications. Knowl Inf Syst 28, 423–447 (2011). https://doi.org/10.1007/s10115-010-0376-y

Download citation

Received: 29 January 2010
Revised: 10 November 2010
Accepted: 20 November 2010
Published: 02 February 2011
Issue Date: August 2011
DOI: https://doi.org/10.1007/s10115-010-0376-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient graph-mining method for complicated and noisy data with real-world applications

Abstract

Access this article

Similar content being viewed by others

Mining Graph Patterns

Graph Clustering via Inexact Patterns

AGraP: an algorithm for mining frequent patterns in a single graph using inexact matching

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient graph-mining method for complicated and noisy data with real-world applications

Abstract

Access this article

Similar content being viewed by others

Mining Graph Patterns

Graph Clustering via Inexact Patterns

AGraP: an algorithm for mining frequent patterns in a single graph using inexact matching

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation