Detecting thermophilic proteins through selecting amino acid and dipeptide composition features
- First Online:
- 226 Downloads
Detecting thermophilic proteins is an important task for designing stable protein engineering in interested temperatures. In this work, we develop a simple but efficient method to classify thermophilic proteins from mesophilic ones using the amino acid and dipeptide compositions. Since most of the amino acid and dipeptide compositions are redundant, we propose a new forward floating selection technique to select only a useful subset of these compositions as features for support vector machine-based classification. We test the proposed method on a benchmark data set of 915 thermophilic and 793 mesophilic proteins. The results show that our method using 28 amino acid and dipeptide compositions achieves an accuracy rate of 93.3% evaluated by the jackknife cross-validation test, which is higher not only than the existing methods but also than using all amino acid and dipeptide compositions.