Advertisement

Mining Discriminative Subgraph Patterns from Structural Data

  • Ning JinEmail author
  • Wei Wang
Chapter
Part of the Studies in Big Data book series (SBD, volume 1)

Abstract

Many scientific applications search for patterns in complex structural information; when this structural information is represented as graphs, a powerful tool is efficiently mining discriminative subgraphs. For example, the structures of chemical compounds can be stored as graphs, and with the help of discriminative subgraphs, chemists can predict which compounds are potentially toxic; 3D protein structures can be stored as graphs, and with the help of discriminative subgraphs, pharmacologists can predict which proteins are able to bind certain ligands and which are not; program flow information can be represented as graphs and with the help of discriminative subgraphs, computer scientists can identify program bugs and predict which program flows are successful and which are not. Many research studies have been devoted to developing efficient discriminative subgraph pattern mining algorithms. Higher efficiency allows users to process larger graph datasets and higher effectiveness enables users to achieve better results in applications. In this chapter, we introduce several existing discriminative subgraph pattern mining algorithms, including LEAP, CORK, graphSig, COM, GAIA and LTS. We evaluate the algorithms with real protein and chemical structure data.

Keywords

Discrimination Power Candidate List Discrimination Score Optimal Pattern Candidate Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bandyopadhyay, D., Huan, J., Liu, J., Prins, J., Snoeyink, J., Wang, W., Tropsha, A.: Structure-based function inference using protein family-specific fingerprints. Protein Science 15, 1537–1543 (2006)CrossRefGoogle Scholar
  2. 2.
    Bandyopadhyay, D., Huan, J., Prins, J., Snoeyink, J., Wang, W., Tropsha, A.: Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development. J. Comput. Aided Mol. Des. (2009)Google Scholar
  3. 3.
    Bandyopadhyay, D., Huan, J., Prins, J., Snoeyink, J., Wang, W., Tropsha, A.: Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: II. Case studies and applications. J. Comput. Aided Mol. Des. (2009)Google Scholar
  4. 4.
    Chen, B.Y., et al.: Geometric sieving: Automated distributed optimization of 3D motifs for protein function prediction. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2006. LNCS (LNBI), vol. 3909, pp. 500–515. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Chen, W.-Y., Zhang, D., Chang, E.: Combinational Collaborative Filtering for Personalized Community Recommendation. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 115–123 (2008)Google Scholar
  6. 6.
    Fei, H., Huan, J.: Structure Feature Selection For Graph Classification. In: ACM 17th International Conference of Knowledge Management 2008 (CIKM 2008), Napa Valley, California (2008)Google Scholar
  7. 7.
    Fei, H., Huan, J.: Boosting with Structure Information in the Functional Space: an Application to Graph Classification. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD (2010)Google Scholar
  8. 8.
    Fröhlich, H., Wegner, J.K., Sieker, F., Zell, A.: Optimal Assignment Kernels for Attributed Molecular Graphs. In: Proceedings of the 22nd International Conference on Machine Learning (ICML), pp. 225–232 (2005)Google Scholar
  9. 9.
    Helma, C., Cramer, T., Kramer, S., Raedt, L.D.: Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J. Chem. Inf. Comput. Sci. 44, 1402–1411 (2004)CrossRefGoogle Scholar
  10. 10.
    Hsu, H., Jones, J.A., Orso, A.: RAPID: Identifying bug signatures to support debugging activities. In: ASE (Automated Software Engineering) (2008)Google Scholar
  11. 11.
    Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraph in the presence of isomorphism. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM), pp. 549–552 (2003)Google Scholar
  12. 12.
    Huan, J., Wang, W., Bandyopadhyay, D., Snoeyink, J., Prins, J., Tropsha, A.: Mining spatial motifs from protein structure graphs. In: RECOMB, pp. 308–315 (2004)Google Scholar
  13. 13.
    Huan, J., Bandyopadhyay, D., Prins, J., Snoeyink, J., Tropsha, A., Wang, W.: Distance-based identification of spatial motifs in proteins using constrained frequent subgraph mining. In: Proceedings of the LSS Computational Systems Bioinformatics Conference (CSB), pp. 227–238 (2006)Google Scholar
  14. 14.
    Jin, N., Young, C., Wang, W.: Graph Classification Based on Pattern Co-occurrence. In: Proceedings of the ACM 18th Conference on Information and Knowledge Management (CIKM), pp. 573–582 (2009)Google Scholar
  15. 15.
    Jin, N., Young, C., Wang, W.: GAIA: graph classification using evolutionary computation. In: Proceedings of the ACM SIGMOD International Conference on management of Data, pp. 879–890 (2010)Google Scholar
  16. 16.
    Jin, N., Wang, W.: LTS: Discriminative subgraph mining by learning from search history. In: ICDE 2011, pp. 207–218 (2011)Google Scholar
  17. 17.
    Khan, A., Yan, X., Wu, K.-L.: Towards Proximity Pattern Mining in Large Graphs. In: SIGMOD 2010 (Proc. 2010 Int. Conf. on Management of Data) (June 2010)Google Scholar
  18. 18.
    Ranu, S., Singh, A.K.: GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases. In: Proceedings of the 25th International Conference on Data Engineering (ICDE), pp. 844–855 (2009)Google Scholar
  19. 19.
    Smalter, A., Huan, J., Lushington, G.: A Graph Pattern Diffusion Kernel for Chemical Compound Classification. In: Proceedings of the 8th IEEE International Conference on Bioinformatics and BioEngineering, BIBE 2008 (2008)Google Scholar
  20. 20.
    Smalter, A., Huan, J., Lushington, G.: Graph Wavelet Alignment Kernels for Drug Virtual Screening. Journal of Bioinformatics and Computational Biology 7(3), 473–497 (2009)CrossRefGoogle Scholar
  21. 21.
    Saigo, H., Kraemer, N., Tsuda, K.: Partial Least Squares Regression for Graph Mining. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 578–586 (2008)Google Scholar
  22. 22.
    Thoma, M., Cheng, H., Gretton, A., Han, J., Kriegel, H., Smola, A., Song, L., Yu, P., Yan, X., Borgwardt, K.: Near-optimal supervised feature selection among frequent subgraphs. In: SDM 2009, Sparks, Nevada, USA (2009)Google Scholar
  23. 23.
    Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 721–724 (2002)Google Scholar
  24. 24.
    Yan, X., Cheng, H., Han, J., Yu, P.S.: Mining significant graph patterns by leap search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 433–444 (2008)Google Scholar
  25. 25.
    Yao, H., Kristensen, D.M., Mihalek, I., Sowa, M.E., Shaw, C., Kimmel, M., Kavraki, L., Lichtarge, O.: An accurate, sensitive, and scalable method to identify functional sites in protein structures. J. Mol. Biol. 326, 255–261 (2003)CrossRefGoogle Scholar
  26. 26.
    Zhang, X., Wang, W., Huan, J.: On demand Phenotype Ranking through Subspace Clustering. In: Proceedings of SIAM International Conference on Data Mining, SDM (2007)Google Scholar
  27. 27.
    Zhang, S., Yang, J.: RAM: Randomized Approximate Graph Mining. In: Proceedings of the 20th International Conference on Scientific and Statistical Database Management, pp. 187–203 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Catalog Quality DepartmentAmazonSeattleUSA
  2. 2.Computer Science DepartmentUniversity of CaliforniaLos AngelesUSA

Personalised recommendations