Abstract
The functional similarity among terms of an ontology is evaluated by using Semantic Similarity Measures (SSM). In computational biology, biological entities such as genes or proteins are usually annotated with terms extracted from Gene Ontology (GO) and the most common application is to find the similarity or dissimilarity among two entities through the application of SSMs to their annotations. More recently, the extensive application of SSMs yielded to the Semantic Similarity Networks (SSNs). SSNs are edge-weighted graphs where the nodes are concepts (e.g. proteins) and each edge has an associated weight that represents the semantic similarity among related pairs of nodes. Community detection algorithms that analyse SSNs, such as protein complexes prediction or motif extraction, may reveal clusters of functionally associated proteins. Because SSNs have a high number of arcs with low weight, likened to noise, the application of classical clustering algorithms on raw networks exhibits low performance. To improve the performance of such algorithms, a possible approach is to simplify the structure of SSNs through a preprocessing step able to delete arcs likened to noise. Thus we propose a novel preprocessing strategy to simplify SSNs based on an hybrid global-local thresholding approach based on spectral graph theory. As proof of concept we demonstrate that community detection algorithms applied to filtered (thresholded) networks, have better performances in terms of biological relevance of the results, with respect to the use of raw unfiltered networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ala, U., Piro, R.M., Grassi, E., Damasco, C., Silengo, L., Oti, M., Provero, P., Cunto, F.D.: Prediction of human disease genes by human-mouse conserved coexpression analysis. PLoS Comput. Biol. 4(3), e1000043 (2008)
Bader, G.D., Hogue, C.W.V.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 27, 1–27 (2003)
Bertolazzi, P., Bock, M.E., Guerra, C.: On the functional and structural characterization of hubs in protein-protein interaction networks. Biotechnol. Adv. 31(2), 274–286 (2013)
Domany, E., Blatt, M., Wiseman, S.: Superparamagnetic clustering of data. Phys. Rev. Lett. 76(18), 3251–3254 (1996)
Bolla, M., Tusnády, G.: Spectra and optimal partitions of weighted graphs. Discrete Math. 128(1), 1–20 (1994)
Brohée, S., van Helden, J.: Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinform. 7, 488 (2006)
Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., Apweiler, R.: The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucl. Acids Res. 32(suppl-1), D262–D266 (2004)
Cannataro, M., Guzzi, P.H., Veltri, P.: Protein-to-protein interactions: technologies, databases, and algorithms. ACM Comput. Surv. 43, 1:1–1:36 (2010)
Cannataro, M., Guzzi, P.H., Sarica, A.: Data mining and life sciences applications on the grid. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 3(3), 216–238 (2013)
Chung, F.: Spectral Graph Theory. Regional Conference Series in Mathematics, vol. 92. American Mathematical Society, Providence (1994)
Cvetković, D., Simić, S.K.: Towards a spectral theory of graphs based on the signless laplacian, ii. Linear Algebra Appl. 432(9), 2257–2272 (2010)
Ding, C., He, X., Zha, H.: A spectral method to separate disconnected and nearly-disconnected web graph components. In: Proceedings of the Seventh ACM International Conference on Knowledge Discovery and Data Mining, San Francisco, 26–29 August 2001
Enright, S., Van Dongen, A.J., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30(7), 1575–1584 (2002)
Freeman, T.C., Goldovsky, L., Brosch, M., van Dongen, S., Maziere, P., Grocock, R.J., Freilich, S., Thornton, J., Enright, A.J.: Construction, visualization, and clustering of transcription networks from microarray expression data. PLoS Comput. Biol. 3(10), e206 (2007)
Guldener, U., Munsterkotter, M., Oesterheld, M., Pagel, P., Ruepp, A., Mewes, H.W., Stumpflen, V.: Mpact: the mips protein interaction resource on yeast. Nucleic Acids Res. 34, D436–D441 (2006)
Guzzi, P.H., Mina, M., Guerra, C., Cannataro, M.: Semantic similarity analysis of protein data: assessment with biological features and issues. Briefings Bioinform. 13(5), 569–585 (2012)
Harispe, S., Sanchez, D., Ranwez, S., Janaqi, S., Montmain, J.: A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J. Biomed. Inform. 48, 38–53 (2014)
Ji, J., Zhang, A., Liu, C., Quan, X., Liu, Z.: Survey: functional module detection from protein-protein interaction networks. IEEE Trans. Knowl. Data Eng. 99(PrePrints), 1 (2013)
King, A.D., Przulj, N., Jurisica, I.: Bioinformatics (Oxford, England)
Lee, H.K., Hsu, A.K., Sajdak, J., Qin, J., Pavlidis, P.: Coexpression analysis of human genes across many microarray data sets. Genome Res. 14, 1085–1094 (2004)
Lin, D.: An Information-Theoretic Definition of Similarity. Morgan Kaufmann, San Francisco (1998)
Ma, X., Gao, L.: Biological network analysis: insights into structure and functions. Briefings Funct. Genomics 11(6), 434–442 (2012)
Merris, R.: Laplacian matrices of graphs: a survey. Linear Algebra Appl. 197, 143–176 (1994)
Mina, M., Guzzi, P.H.: Alignmcl: comparative analysis of protein interaction networks through markov clustering. In: BIBM Workshops, pp. 174–181. IEEE (2012)
Mohar, B.: The laplacian spectrum of graphs. Graph Theor. Comb. Appl. 2, 871–898 (1991)
Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2, 849–856 (2002)
Guzzi, P., Mina, M.: Investigating bias in semantic similarity measures for analysis of protein interactions. In: Proceedings of 1st International Workshop on Pattern Recognition in Proteomics, Structural Biology and Bioinformatics (PR PS BB 2011), pp. 71–80, 13 September 2011 (2012)
Pesquita, C., Faria, D., O Falcão, A., Lord, P., Couto, F.M.: Semantic similarity in biomedical ontologies. PLoS Comput. Biol. 5(7), e1000443 (2009)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI, pp. 448–453 (1995)
Rito, T., Wang, Z., Deane, C.M., Reinert, G.: How threshold behaviour affects the use of subgraphs for network comparison. Bioinformatics 26(18), i611–i617 (2010)
Su, G., Kuchinsky, A., Morris, J.H., States, D.J., Meng, F.: Glay: community structure analysis of biological networks. Bioinformatics 26(24), 3135–3137 (2010)
Wang, H., Zheng, H., Azuaje, F.: Ontology- and graph-based similarity assessment in biological networks. Bioinformatics 26(20), 2643–2644 (2010)
Zhu, X., Gerstein, M., Snyder, M.: Getting connected: analysis and principles of biological networks. Genes Dev. 21(9), 1010–1024 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Guzzi, P.H., Veltri, P., Cannataro, M. (2014). Thresholding of Semantic Similarity Networks Using a Spectral Graph-Based Technique. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2013. Lecture Notes in Computer Science(), vol 8399. Springer, Cham. https://doi.org/10.1007/978-3-319-08407-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-08407-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08406-0
Online ISBN: 978-3-319-08407-7
eBook Packages: Computer ScienceComputer Science (R0)