Chapter

Intelligent Data Engineering and Automated Learning - IDEAL 2007

Volume 4881 of the series Lecture Notes in Computer Science pp 890-897

Discriminating Microbial Species Using Protein Sequence Properties and Machine Learning

  • Ali Al-ShahibAffiliated withBiomedical Informatics Signals and Systems Research Laboratory, Department of Electronic, Electrical and Computer Engineering, The University of Birmingham, BirminghamBioinformatics Research Centre, Department of Computing Science, University of Glasgow, Glasgow
  • , David GilbertAffiliated withBioinformatics Research Centre, Department of Computing Science, University of Glasgow, Glasgow
  • , Rainer BreitlingAffiliated withGroningen Bioinformatics Centre, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Haren

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Much work has been done to identify species-specific proteins in sequenced genomes and hence to determine their function. We assumed that such proteins have specific physico-chemical properties that will discriminate them from proteins in other species. In this paper, we examine the validity of this assumption by comparing proteins and their properties from different bacterial species using Support Vector Machines (SVM). We show that by training on selected protein sequence properties, SVMs can successfully discriminate between proteins of different species. This finding takes us a step closer to inferring the functional characteristics of these proteins.