DomSVR: domain boundary prediction with support vector regression from sequence information alone
- 103 Downloads
Protein domains are structural and fundamental functional units of proteins. The information of protein domain boundaries is helpful in understanding the evolution, structures and functions of proteins, and also plays an important role in protein classification. In this paper, we propose a support vector regression-based method to address the problem of protein domain boundary identification based on novel input profiles extracted from AAindex database. As a result, our method achieves an average sensitivity of ∼36.5% and an average specificity of ∼81% for multi-domain protein chains, which is overall better than the performance of published approaches to identify domain boundary. As our method used sequence information alone, our method is simpler and faster.
KeywordsDomain boundary prediction Support vector regression AAindex Principal component analysis
This work was supported in part by grant 2 G12 RR003048 from the RCMI program, Division of Research Infrastructure, National Center for Research Resources, NIH and the Mordecai Wyatt Johnson program of Howard University. This work was also supported in part by the Singapore MOE ARC Tier-2 funding grant T208B2203 and the National Science Foundation of China (No. 60803107). CL’s work was supported by NSF (CCF-0845888).
- Drucker H, Burges CJC, Kaufman L, Smola AJ, Vapnik V (1996) Support vector regression machines. In: Proceedings of the NIPS, pp 155–161Google Scholar
- Goodall C (1990) Modern methods of data analysis. Sage Publications, Newbury Park, CAGoogle Scholar
- Gunn SR (1998) Support vector machines for classification and regression. Faculty of Engineering and Applied Science, University of SouthamptonGoogle Scholar
- Jolliffe IT (2002) Principal component analysis. Springer, NY.Google Scholar