Fast Target Set Reduction for Large-Scale Protein Function Prediction: A Multi-class Multi-label Machine Learning Approach

  • Thomas Lingner
  • Peter Meinicke
Conference paper

DOI: 10.1007/978-3-540-87361-7_17

Part of the Lecture Notes in Computer Science book series (LNCS, volume 5251)
Cite this paper as:
Lingner T., Meinicke P. (2008) Fast Target Set Reduction for Large-Scale Protein Function Prediction: A Multi-class Multi-label Machine Learning Approach. In: Crandall K.A., Lagergren J. (eds) Algorithms in Bioinformatics. WABI 2008. Lecture Notes in Computer Science, vol 5251. Springer, Berlin, Heidelberg

Abstract

Large-scale sequencing projects have led to a vast amount of protein sequences, which have to be assigned to functional categories. Currently, profile hidden markov models and kernel-based machine learning methods provide the most accurate results for protein classification. However, the prediction of new sequences with these approaches is computationally expensive. We present an approach for fast scoring of protein sequences by means of feature-based protein sequence representation and multi-class multi-label machine learning techniques. Using the Pfam database, we show that our method provides high computational efficiency and that the approach is well-suitable for pre-filtering of large sequence sets.

Keywords

protein classification large-scale multi-class multi-label Pfam homology search metagenomics target set reduction protein function prediction machine learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Thomas Lingner
    • 1
  • Peter Meinicke
    • 1
  1. 1.Department of Bioinformatics, Institute for Microbiology and GeneticsUniversity of GöttingenGöttingenGermany

Personalised recommendations