Fast Target Set Reduction for Large-Scale Protein Function Prediction: A Multi-class Multi-label Machine Learning Approach
- Cite this paper as:
- Lingner T., Meinicke P. (2008) Fast Target Set Reduction for Large-Scale Protein Function Prediction: A Multi-class Multi-label Machine Learning Approach. In: Crandall K.A., Lagergren J. (eds) Algorithms in Bioinformatics. WABI 2008. Lecture Notes in Computer Science, vol 5251. Springer, Berlin, Heidelberg
Large-scale sequencing projects have led to a vast amount of protein sequences, which have to be assigned to functional categories. Currently, profile hidden markov models and kernel-based machine learning methods provide the most accurate results for protein classification. However, the prediction of new sequences with these approaches is computationally expensive. We present an approach for fast scoring of protein sequences by means of feature-based protein sequence representation and multi-class multi-label machine learning techniques. Using the Pfam database, we show that our method provides high computational efficiency and that the approach is well-suitable for pre-filtering of large sequence sets.
Keywordsprotein classification large-scale multi-class multi-label Pfam homology search metagenomics target set reduction protein function prediction machine learning
Unable to display preview. Download preview PDF.