Analysis of Informative Features for Negative Selection in Protein Function Prediction

  • Marco Frasca
  • Fabio Lipreri
  • Dario MalchiodiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10209)


Negative examples in automated protein function prediction (AFP), that is proteins known not to possess a given protein function, are usually not directly stored in public proteome and genome databases, such as the Gene Ontology database. Nevertheless, most computational methods need negative examples to infer new predictions. A variety of algorithms has been proposed in AFP for negative selection, ranging from network- and feature-based heuristics, to hierarchy-based and hierarchy-less strategies. Moreover, several bio-molecular information sources about proteins, such as gene co-expression, genetic and protein-protein interactions data, are naturally encoded in protein networks, where nodes are proteins and edges connect proteins sharing common characteristics. Although selecting negatives in biological networks is thereby a central and challenging problem in computational biology, detecting the characteristics proteins should have to be considered as negative is still a difficult task. It this work, we show that a few protein features extracted from the network help in detecting reliable negatives. We tested such features in two real world experiments: predicting unreliable negatives with an SVM classifier through temporal holdout on model organisms for AFP, and selecting reliable negatives with a clustering-based state-of-the-art negative selection procedure.


Negative example selection Protein function prediction Biological networks Fuzzy clustering Protein features 


  1. 1.
    Robinson, P.N., et al.: The human phenotype ontology: a tool for annotating and analyzing human hereditary disease. Am. J. Hum. Genet. 83(5), 610–615 (2008)CrossRefGoogle Scholar
  2. 2.
    Ruepp, A., et al.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res. 32(18), 5539–5545 (2004)CrossRefGoogle Scholar
  3. 3.
    Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Nature Genet. 25(1), 25–29 (2000)CrossRefGoogle Scholar
  4. 4.
    Radivojac, P., et al.: A large-scale evaluation of computational protein function prediction. Nat. Methods 10(3), 221–227 (2013)CrossRefGoogle Scholar
  5. 5.
    Jiang, Y., Oron, T.R., et al.: An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol. 17(1), 184 (2016)CrossRefGoogle Scholar
  6. 6.
    Mordelet, F., Vert, J.P.: A bagging SVM to learn from positive and unlabeled examples. Pattern Recogn. Lett. 37, 201–209 (2014)CrossRefGoogle Scholar
  7. 7.
    Burghouts, G.J., Schutte, K., Bouma, H., den Hollander, R.J.M.: Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos. Mach. Vis. Appl. 25(1), 85–98 (2014)CrossRefGoogle Scholar
  8. 8.
    Frasca, M., Malchiodi, D.: Selection of negative examples for node label prediction through fuzzy clustering techniques. In: Bassis, S., Esposito, A., Morabito, F.C., Pasero, E. (eds.) Advances in Neural Networks. SIST, vol. 54, pp. 67–76. Springer, Cham (2016). doi: 10.1007/978-3-319-33747-0_7 CrossRefGoogle Scholar
  9. 9.
    Gomez, S.M., Noble, W.S., Rzhetsky, A.: Learning to predict protein-protein interactions from protein sequences. Bioinformatics 19(15), 1875–1881 (2003)CrossRefGoogle Scholar
  10. 10.
    Mostafavi, S., Morris, Q.: Using the gene ontology hierarchy when predicting gene function. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 419–427 (2009)Google Scholar
  11. 11.
    Youngs, N., Penfold-Brown, D., Drew, K., Shasha, D., Bonneau, R.: Parametric bayesian priors and better choice of negative examples improve protein function prediction. Bioinformatics 29(9), tt10-98 (2013)Google Scholar
  12. 12.
    Youngs, N., Penfold-Brown, D., Bonneau, R., Shasha, D.: Negative example selection for protein function prediction: the NoGO database. PLOS Comput. Biol. 10(6), 1–12 (2014)Google Scholar
  13. 13.
    Frasca, M., Bassis, S.: Gene-disease prioritization through cost-sensitive graph-based methodologies. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2016. LNCS, vol. 9656, pp. 739–751. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-31744-1_64 CrossRefGoogle Scholar
  14. 14.
    Ashburn, T.T., Thor, K.B.: Drug repositioning: identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 3(8), 673–683 (2004)CrossRefGoogle Scholar
  15. 15.
    Gillis, J., Pavlidis, P.: The impact of multifunctional genes on “Guilt by Association” analysis. PLoS ONE 6(2), e17258 (2011)Google Scholar
  16. 16.
    Frasca, M.: Automated gene function prediction through gene multifunctionality in biological networks. Neurocomputing 162, 48–56 (2015)CrossRefGoogle Scholar
  17. 17.
    Opsahl, T., Agneessens, F., Skvoretz, J.: Node centrality in weighted networks: generalizing degree and shortest paths. Soc. Netw. 32(3), 245–251 (2010)CrossRefGoogle Scholar
  18. 18.
    Frasca, M., Bertoni, A., et al.: UNIPred: unbalance-aware Network Integration and Prediction of protein functions. J. Comput. Biol. 22(12), 1057–1074 (2015)CrossRefGoogle Scholar
  19. 19.
    Szklarczyk, D., et al.: String v10: proteinprotein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(D1), D447–D452 (2015)CrossRefGoogle Scholar
  20. 20.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)Google Scholar
  21. 21.
    Mostafavi, S., Goldenberg, A., Morris, Q.: Labeling nodes using three degrees of propagation. PLoS ONE 7(12), e51947 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Dipartimento di InformaticaUniversità degli Studi di MilanoMilanoItaly

Personalised recommendations