Skip to main content

Committee-Based Active Learning to Select Negative Examples for Predicting Protein Functions

  • 684 Accesses

Part of the Lecture Notes in Computer Science book series (LNBI,volume 11925)


The Automated Functional Prediction (AFP) of proteins became a challenging problem in bioinformatics and biomedicine aiming at handling and interpreting the extremely large-sized proteomes of several eukaryotic organisms. A central issue in AFP is the absence in public repositories for protein functions, e.g. the Gene Ontology (GO), of well defined sets of negative examples to learn accurate classifiers for AFP. In this paper we investigate the Query by Committee paradigm of active learning to select the negatives most informative for the classifier and the protein function to be inferred. We validated our approach in predicting the Gene Ontology function for the S.cerevisiae proteins.


  • Query By Committee
  • Active learning
  • Protein function prediction

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.

    A balanced seed training set counterbalances the predominance of 0 labels.


  1. Ashburner, M., et al.: Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat. Genet. 25, 25–29 (2000)

    CrossRef  Google Scholar 

  2. Eisner, R., Poulin, B., Szafron, D., Lu, P.: Improving protein prediction using the hierarchical structure of the gene ontology. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2005)

    Google Scholar 

  3. Mostafavi, S., Morris, Q.: Using the gene ontology hierarchy when predicting gene function. In: Proceedings of the Twenty-Fifth Annual Conference on Uncertainty in Artificial Intelligence (UAI-09), (Corvallis, Oregon), pp. 419–427. AUAI Press (2009)

    Google Scholar 

  4. Youngs, N., Penfold-Brown, D., Bonneau, R., Shasha, D.: Negative example selection for protein function prediction: the NoGO database. PLoS Comput. Biol. 10, 1–12 (2014)

    CrossRef  Google Scholar 

  5. Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28, 133–168 (1997)

    CrossRef  Google Scholar 

  6. Bertoni, A., Frasca, M., Valentini, G.: COSNet: a cost sensitive neural network for semi-supervised learning in graphs. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6911, pp. 219–234. Springer, Heidelberg (2011).

    CrossRef  Google Scholar 

  7. Frasca, M., Lipreri, F., Malchiodi, D.: Analysis of informative features for negative selection in protein function prediction. In: Rojas, I., Ortuño, F. (eds.) IWBBIO 2017, Part II. LNCS, vol. 10209, pp. 267–276. Springer, Cham (2017).

    CrossRef  Google Scholar 

  8. Szklarczyk, D., et al.: String v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(D1), D447–D452 (2015)

    CrossRef  Google Scholar 

  9. Dagan, I., Engelson, S.P.: Committee-based sampling for training probabilistic classifiers. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 150–157. Morgan Kaufmann (1995)

    Google Scholar 

  10. Melville, P., Mooney, R.J.: Diverse ensembles for active learning. In: Proceedings of the Twenty-first International Conference on Machine Learning, ICML 2004, p. 74. ACM, New York (2004)

    Google Scholar 

  11. Abe, N., Mamitsuka, H.: Query learning strategies using boosting and bagging. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, San Francisco, CA, USA, pp. 1–9 (1998)

    Google Scholar 

  12. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    CrossRef  Google Scholar 

  13. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines: and Other Kernel-based Learning Methods. Cambridge University Press, New York (2000)

    CrossRef  Google Scholar 

  14. Breiman, L., Friedman, G., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Belmont (1984)

    MATH  Google Scholar 

  15. Gini, C.: Variabilità e Mutuabilità. Contributo allo Studio delle Distribuzioni e delle Relazioni Statistiche, C. Cuppini, Bologna (1912)

    Google Scholar 

  16. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)

    CrossRef  MathSciNet  Google Scholar 

Download references


This work was supported by the grant title Machine learning algorithms to handle label imbalance in biomedical taxonomies, code PSR2017\(\_\)DIP\(\_\)010\(\_\)MFRAS, Università degli Studi di Milano.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Giorgio Valentini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Frasca, M., Sepehri, M., Petrini, A., Grossi, G., Valentini, G. (2020). Committee-Based Active Learning to Select Negative Examples for Predicting Protein Functions. In: Raposo, M., Ribeiro, P., Sério, S., Staiano, A., Ciaramella, A. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2018. Lecture Notes in Computer Science(), vol 11925. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34584-6

  • Online ISBN: 978-3-030-34585-3

  • eBook Packages: Computer ScienceComputer Science (R0)