Amino Acids

, Volume 39, Issue 3, pp 713–726 | Cite as

DomSVR: domain boundary prediction with support vector regression from sequence information alone

  • Peng ChenEmail author
  • Chunmei Liu
  • Legand Burge
  • Jinyan Li
  • Mahmood Mohammad
  • William Southerland
  • Clay Gloster
  • Bing Wang
Original Article


Protein domains are structural and fundamental functional units of proteins. The information of protein domain boundaries is helpful in understanding the evolution, structures and functions of proteins, and also plays an important role in protein classification. In this paper, we propose a support vector regression-based method to address the problem of protein domain boundary identification based on novel input profiles extracted from AAindex database. As a result, our method achieves an average sensitivity of ∼36.5% and an average specificity of ∼81% for multi-domain protein chains, which is overall better than the performance of published approaches to identify domain boundary. As our method used sequence information alone, our method is simpler and faster.


Domain boundary prediction Support vector regression AAindex Principal component analysis 



This work was supported in part by grant 2 G12 RR003048 from the RCMI program, Division of Research Infrastructure, National Center for Research Resources, NIH and the Mordecai Wyatt Johnson program of Howard University. This work was also supported in part by the Singapore MOE ARC Tier-2 funding grant T208B2203 and the National Science Foundation of China (No. 60803107). CL’s work was supported by NSF (CCF-0845888).


  1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402CrossRefPubMedGoogle Scholar
  2. Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16:412–424CrossRefPubMedGoogle Scholar
  3. Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT (2005) Protein structure prediction servers at University College London. Nucleic Acids Res 33:w36–w38CrossRefPubMedGoogle Scholar
  4. Chen P, Wang B, Wong HS, Huang DS (2007) Prediction of protein B-factors using multi-class bounded SVM. Protein Pept Lett 14(2):185–190CrossRefPubMedGoogle Scholar
  5. Cheng J, Sweredoski MJ, Baldi P (2006) DOMpro: protein domain prediction using profiles, secondary structure, relative solvent accessibility, and recursive neural networks. Data Min Knowl Discov 13:1–10CrossRefGoogle Scholar
  6. Chivian D, Kim DE, Malmstrom L, Bradley P, Robertson T, Murphy P, Strauss CE, Bonneau R, Rohl CA, Baker D (2003) Automated prediction of CASP-5 structures using the Robetta server. Proteins 53(S6):524–533CrossRefPubMedGoogle Scholar
  7. Copley RR, Doerksa T, Letunica I, Borka P (2002) Protein domain analysis in the era of complete genomes. FEBS Lett 513:129–134CrossRefPubMedGoogle Scholar
  8. Dovidchenko NV, Lobanov MY, Galzitskaya OV (2007) Prediction of number and position of domain boundaries in multi-domain proteins by use of amino acid sequence alone. Curr Protein Pept Sci 8(2):189–195CrossRefPubMedGoogle Scholar
  9. Drucker H, Burges CJC, Kaufman L, Smola AJ, Vapnik V (1996) Support vector regression machines. In: Proceedings of the NIPS, pp 155–161Google Scholar
  10. Dumontier M, Feldman R, Yao HJ, Hogue CWV (2005) Armadillo: doamin boundary prediction by amino acid composition. J Mol Biol 350:1061–1073CrossRefPubMedGoogle Scholar
  11. Edelman GM (1973) Antibody structure and molecular immunology. Science 180:830–840CrossRefPubMedGoogle Scholar
  12. Fukuchi S, Nishikawa K (2001) Protein surface amino acid compositions distinctively differ between thermophilic and mesophilic bacteria. J Mol Biol 309:835–843CrossRefPubMedGoogle Scholar
  13. Galzitskaya OV, Melnik BS (2003) Prediction of protein domain boundaries from sequence alone. Protein Sci 12:696–701CrossRefPubMedGoogle Scholar
  14. George RA, Heringa J (2002) Protein domain identification and improved sequence similarity searching using PSI-BLAST. Proteins: Struct Funct Gen 48:672–681CrossRefGoogle Scholar
  15. George RA, Heringa J (2002) SNAPDRAGON: a new method to predict protein structural domain boundaries from sequence data. J Mol Biol 316:839–851CrossRefPubMedGoogle Scholar
  16. Gewehr JE, Zimmer R (2006) SSEP-Domain: protein domain prediction by alignment of secondary structure elements and profiles. Bioinformatics 22:181–187CrossRefPubMedGoogle Scholar
  17. Goodall C (1990) Modern methods of data analysis. Sage Publications, Newbury Park, CAGoogle Scholar
  18. Gunn SR (1998) Support vector machines for classification and regression. Faculty of Engineering and Applied Science, University of SouthamptonGoogle Scholar
  19. Heger A, Holm L (2003) Exhaustive enumeration of protein domain families. J Mol Biol 328:749–767CrossRefPubMedGoogle Scholar
  20. Jolliffe IT (2002) Principal component analysis. Springer, NY.Google Scholar
  21. Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M (2008) AAindex: amino acid index database, progress report. Nucleic Acids Res 36:D202–D205CrossRefPubMedGoogle Scholar
  22. Levitt M, Chothia C (1976) Structural patterns in globular proteins. Nature 261:552–558CrossRefPubMedGoogle Scholar
  23. Lexa M, Valle G (2003) PRIMEX: rapid identification of oligonucleotide matches in whole genomes. Bioinformatics 19:2486–2488CrossRefPubMedGoogle Scholar
  24. Linding R, Russell RB, Neduva V, Gibson TJ (2003) GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res 31:3701–3708CrossRefPubMedGoogle Scholar
  25. Liu J, Rost B (2004) Sequence-based prediction of protein domains. Nucleic Acids Res 32:3522–3530CrossRefPubMedGoogle Scholar
  26. Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C (2007) CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res 35:D237–240CrossRefPubMedGoogle Scholar
  27. Marsden RL, McGuffin LJ, Jones DT (2002) Rapid protein domain assignment from amino acid sequence using predicted secondary structure. Protein Sci 11:2814–2824CrossRefPubMedGoogle Scholar
  28. Miyazawa S, Jernigan RL (1999) Self-consistent estimation of inter-residue protein contact energies based on an equilibrium mixture approximation of residues. Proteins 34:49–68CrossRefPubMedGoogle Scholar
  29. Munoz V, Serrano L (1994) Intrinsic secondary structure propensities of the amino acids, using statistical phi–psi matrices: comparison with experimental scale. Proteins 20:301–311CrossRefPubMedGoogle Scholar
  30. Nagarajan N, Yona G (2004) Automatic prediction of protein domains from sequence information using a hybrid learning system. Bioinformatics 20:1335–1360CrossRefPubMedGoogle Scholar
  31. Nanduri S, Carpick BW, Yang Y, Williams BR, Qin J (1998) Structure of the double-stranded RNA-binding domain of the protein kinase PKR reveals the molecular basis of its dsRNA-mediated activation. EMBO J 17:5458–5465CrossRefPubMedGoogle Scholar
  32. Orengo CA, Michie AD, Jones DT, Swindells MB, Thornton JM (1997) CATH: a hierarchic classification of protein domain structures. Structure 5:1093–1108CrossRefPubMedGoogle Scholar
  33. Porter RR (1973) Structural studies of immunoglobulins. Science 180:713–716CrossRefPubMedGoogle Scholar
  34. Rackovsky S, Scheraga HA (1982) Differential geometry and polymer conformation. 4. Conformational and nucleation properties of individual amino acids. Macromolecules 15:1340–1346CrossRefGoogle Scholar
  35. Saini HK, Fischer D (2005) Meta-DP: domain prediction meta server. Bioinformatics 21:2917–2920CrossRefPubMedGoogle Scholar
  36. Sikder AR, Zomaya AY (2006) Improving the performance of DomainDiscovery of protein domain boundary assignment using inter-domain linker index. BMC Bioinform 7:S6CrossRefGoogle Scholar
  37. Sim J, Kim SY, Lee J (2005) PRODO: prediction of protein domain boundaries using neural networks. Proteins 59:627–632CrossRefPubMedGoogle Scholar
  38. Suyama M, Ohara O (2003) DomCut: prediction of inter-domain linker regions in amino acid sequences. Bioinformatics 19:673–674CrossRefPubMedGoogle Scholar
  39. von Ohsen N, Sommer I, Zimmer R, Lengauer T (2004) Arby: automatic protein structure prediction using profile-profile alignment and confidence measures. Bioinformatics 20:2228–2235CrossRefPubMedGoogle Scholar
  40. Wetlaufer DB (1973) Nucleation, rapid folding, and globular intrachain regions in proteins. Proc Natl Acad Sci USA 70:697–701CrossRefPubMedGoogle Scholar
  41. Ye L, Liu T, Wu Z, Zhou R (2007) Sequence-based protein domain boundary prediction using BP neural network with various property profiles. Proteins: Struct Funct Bioinform 71:300–307CrossRefGoogle Scholar
  42. Yoo PD, Sikder AR, Zhou BB, Zomaya AY (2008) Improved general regression network for protein domain boundary prediction. BMC Bioinform 9:S12CrossRefGoogle Scholar
  43. Zdobnov EM, Apweiler R (2001) InterProScan-an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17:847–848CrossRefPubMedGoogle Scholar
  44. Zhou Y, Vitkup D, Karplus M (1999) Native proteins are surface-molten solids: application of the Lindemann criterion for the solid versus liquid state. J Mol Biol 285:1371–1375CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Peng Chen
    • 1
    • 2
    Email author
  • Chunmei Liu
    • 1
  • Legand Burge
    • 1
  • Jinyan Li
    • 2
  • Mahmood Mohammad
    • 3
  • William Southerland
    • 4
  • Clay Gloster
    • 5
  • Bing Wang
    • 6
  1. 1.Department of Systems and Computer ScienceHoward UniversityWashingtonUSA
  2. 2.Bioinformatics Research Center, School of Computer EngineeringNanyang Technological UniversitySingaporeSingapore
  3. 3.Department of MathematicsHoward UniversityWashingtonUSA
  4. 4.Department of BiochemistryHoward UniversityWashingtonUSA
  5. 5.Department of Electrical and Computer EngineeringHoward UniversityWashingtonUSA
  6. 6.School of Electrical Engineering and InformationAnhui University of TechnologyMa’anshanPeople’s Republic of China

Personalised recommendations