Skip to main content

Thresholding of Semantic Similarity Networks Using a Spectral Graph-Based Technique

  • Conference paper
  • First Online:
New Frontiers in Mining Complex Patterns (NFMCP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8399))

Included in the following conference series:

Abstract

The functional similarity among terms of an ontology is evaluated by using Semantic Similarity Measures (SSM). In computational biology, biological entities such as genes or proteins are usually annotated with terms extracted from Gene Ontology (GO) and the most common application is to find the similarity or dissimilarity among two entities through the application of SSMs to their annotations. More recently, the extensive application of SSMs yielded to the Semantic Similarity Networks (SSNs). SSNs are edge-weighted graphs where the nodes are concepts (e.g. proteins) and each edge has an associated weight that represents the semantic similarity among related pairs of nodes. Community detection algorithms that analyse SSNs, such as protein complexes prediction or motif extraction, may reveal clusters of functionally associated proteins. Because SSNs have a high number of arcs with low weight, likened to noise, the application of classical clustering algorithms on raw networks exhibits low performance. To improve the performance of such algorithms, a possible approach is to simplify the structure of SSNs through a preprocessing step able to delete arcs likened to noise. Thus we propose a novel preprocessing strategy to simplify SSNs based on an hybrid global-local thresholding approach based on spectral graph theory. As proof of concept we demonstrate that community detection algorithms applied to filtered (thresholded) networks, have better performances in terms of biological relevance of the results, with respect to the use of raw unfiltered networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://wodaklab.org/cyc2008/

  2. 2.

    http://wodaklab.org/cyc2008/

  3. 3.

    http://fastsemsim.sourceforge.net

References

  1. Ala, U., Piro, R.M., Grassi, E., Damasco, C., Silengo, L., Oti, M., Provero, P., Cunto, F.D.: Prediction of human disease genes by human-mouse conserved coexpression analysis. PLoS Comput. Biol. 4(3), e1000043 (2008)

    Article  Google Scholar 

  2. Bader, G.D., Hogue, C.W.V.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 27, 1–27 (2003)

    Google Scholar 

  3. Bertolazzi, P., Bock, M.E., Guerra, C.: On the functional and structural characterization of hubs in protein-protein interaction networks. Biotechnol. Adv. 31(2), 274–286 (2013)

    Article  Google Scholar 

  4. Domany, E., Blatt, M., Wiseman, S.: Superparamagnetic clustering of data. Phys. Rev. Lett. 76(18), 3251–3254 (1996)

    Article  Google Scholar 

  5. Bolla, M., Tusnády, G.: Spectra and optimal partitions of weighted graphs. Discrete Math. 128(1), 1–20 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  6. Brohée, S., van Helden, J.: Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinform. 7, 488 (2006)

    Article  Google Scholar 

  7. Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., Apweiler, R.: The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucl. Acids Res. 32(suppl-1), D262–D266 (2004)

    Article  Google Scholar 

  8. Cannataro, M., Guzzi, P.H., Veltri, P.: Protein-to-protein interactions: technologies, databases, and algorithms. ACM Comput. Surv. 43, 1:1–1:36 (2010)

    Article  Google Scholar 

  9. Cannataro, M., Guzzi, P.H., Sarica, A.: Data mining and life sciences applications on the grid. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 3(3), 216–238 (2013)

    Google Scholar 

  10. Chung, F.: Spectral Graph Theory. Regional Conference Series in Mathematics, vol. 92. American Mathematical Society, Providence (1994)

    Google Scholar 

  11. Cvetković, D., Simić, S.K.: Towards a spectral theory of graphs based on the signless laplacian, ii. Linear Algebra Appl. 432(9), 2257–2272 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  12. Ding, C., He, X., Zha, H.: A spectral method to separate disconnected and nearly-disconnected web graph components. In: Proceedings of the Seventh ACM International Conference on Knowledge Discovery and Data Mining, San Francisco, 26–29 August 2001

    Google Scholar 

  13. Enright, S., Van Dongen, A.J., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30(7), 1575–1584 (2002)

    Article  Google Scholar 

  14. Freeman, T.C., Goldovsky, L., Brosch, M., van Dongen, S., Maziere, P., Grocock, R.J., Freilich, S., Thornton, J., Enright, A.J.: Construction, visualization, and clustering of transcription networks from microarray expression data. PLoS Comput. Biol. 3(10), e206 (2007)

    Article  Google Scholar 

  15. Guldener, U., Munsterkotter, M., Oesterheld, M., Pagel, P., Ruepp, A., Mewes, H.W., Stumpflen, V.: Mpact: the mips protein interaction resource on yeast. Nucleic Acids Res. 34, D436–D441 (2006)

    Article  Google Scholar 

  16. Guzzi, P.H., Mina, M., Guerra, C., Cannataro, M.: Semantic similarity analysis of protein data: assessment with biological features and issues. Briefings Bioinform. 13(5), 569–585 (2012)

    Article  Google Scholar 

  17. Harispe, S., Sanchez, D., Ranwez, S., Janaqi, S., Montmain, J.: A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J. Biomed. Inform. 48, 38–53 (2014)

    Article  Google Scholar 

  18. Ji, J., Zhang, A., Liu, C., Quan, X., Liu, Z.: Survey: functional module detection from protein-protein interaction networks. IEEE Trans. Knowl. Data Eng. 99(PrePrints), 1 (2013)

    Google Scholar 

  19. King, A.D., Przulj, N., Jurisica, I.: Bioinformatics (Oxford, England)

    Google Scholar 

  20. Lee, H.K., Hsu, A.K., Sajdak, J., Qin, J., Pavlidis, P.: Coexpression analysis of human genes across many microarray data sets. Genome Res. 14, 1085–1094 (2004)

    Article  Google Scholar 

  21. Lin, D.: An Information-Theoretic Definition of Similarity. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  22. Ma, X., Gao, L.: Biological network analysis: insights into structure and functions. Briefings Funct. Genomics 11(6), 434–442 (2012)

    Article  MathSciNet  Google Scholar 

  23. Merris, R.: Laplacian matrices of graphs: a survey. Linear Algebra Appl. 197, 143–176 (1994)

    Article  MathSciNet  Google Scholar 

  24. Mina, M., Guzzi, P.H.: Alignmcl: comparative analysis of protein interaction networks through markov clustering. In: BIBM Workshops, pp. 174–181. IEEE (2012)

    Google Scholar 

  25. Mohar, B.: The laplacian spectrum of graphs. Graph Theor. Comb. Appl. 2, 871–898 (1991)

    MathSciNet  Google Scholar 

  26. Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2, 849–856 (2002)

    Google Scholar 

  27. Guzzi, P., Mina, M.: Investigating bias in semantic similarity measures for analysis of protein interactions. In: Proceedings of 1st International Workshop on Pattern Recognition in Proteomics, Structural Biology and Bioinformatics (PR PS BB 2011), pp. 71–80, 13 September 2011 (2012)

    Google Scholar 

  28. Pesquita, C., Faria, D., O Falcão, A., Lord, P., Couto, F.M.: Semantic similarity in biomedical ontologies. PLoS Comput. Biol. 5(7), e1000443 (2009)

    Article  Google Scholar 

  29. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI, pp. 448–453 (1995)

    Google Scholar 

  30. Rito, T., Wang, Z., Deane, C.M., Reinert, G.: How threshold behaviour affects the use of subgraphs for network comparison. Bioinformatics 26(18), i611–i617 (2010)

    Article  Google Scholar 

  31. Su, G., Kuchinsky, A., Morris, J.H., States, D.J., Meng, F.: Glay: community structure analysis of biological networks. Bioinformatics 26(24), 3135–3137 (2010)

    Article  Google Scholar 

  32. Wang, H., Zheng, H., Azuaje, F.: Ontology- and graph-based similarity assessment in biological networks. Bioinformatics 26(20), 2643–2644 (2010)

    Article  Google Scholar 

  33. Zhu, X., Gerstein, M., Snyder, M.: Getting connected: analysis and principles of biological networks. Genes Dev. 21(9), 1010–1024 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pietro Hiram Guzzi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Guzzi, P.H., Veltri, P., Cannataro, M. (2014). Thresholding of Semantic Similarity Networks Using a Spectral Graph-Based Technique. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2013. Lecture Notes in Computer Science(), vol 8399. Springer, Cham. https://doi.org/10.1007/978-3-319-08407-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08407-7_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08406-0

  • Online ISBN: 978-3-319-08407-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics