Skip to main content

Prediction of Protein Domains from Sequence Information Using Support Vector Machines

  • Conference paper
  • 1572 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3973))

Abstract

Guessing the boundaries of structural domains has been an important and challenging problem in experimental and computational structural biology. Predictions were based on intuition, biochemical properties, statistics, sequence homology and other aspects of predicted protein structure. In this paper a promising method for detecting the domain structure of a protein from sequence information alone was presented. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using support vector machines. The overall accuracy of the method for a single protein chains dataset, is about 85%. The result demonstrates that the utility of the method can help not only in predicting the complete 3D structure of a protein but also in the study of proteins’ building blocks and for functional analysis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rose, G.D.: Hierarchic Organization of Domains in Globular Proteins. J. Mol. Biol. 134, 447–470 (1979)

    Article  Google Scholar 

  2. Sonnhammer, E.L., Kahn, D.: Modular Arrangement of Proteins as Inferred From Analysis of Homology. Protein Sci. 3, 482–492 (1994)

    Article  Google Scholar 

  3. Gracy, J., Argos, P.: Automated Protein Sequence Database Classification. I. Integration of Copositional Similarity Search, Local Similarity Search and Multiple Sequence Alignment. II. Delineation of domain boundries from sequence similarity. Bioinformatics 14, 164–187 (1998)

    Article  Google Scholar 

  4. George, R.A., Heringa, J.: Protein Domain Identification and Improved Sequence Similarity Searching Ssing PSI-BLAST. Proteins 48, 672–681 (2002)

    Article  Google Scholar 

  5. Murzin, G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a Structural Classification of Proteins Database for the Investigation of Sequences and Structures. J. Mol. Biol. 247, 536–540 (1995)

    Google Scholar 

  6. Orengo, A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH-a Hierarchic Classification of Protein Domain Structures. Structure 5, 1093–1108 (1997)

    Article  Google Scholar 

  7. Holm, L., Sander, C.: Mapping the Protein Universe. Science 273, 595–602 (1996)

    Article  Google Scholar 

  8. Alexandrov, N., Shindyalov, I.: PDP:protein domain parser. Bioinf. 19, 429–430 (2003)

    Article  Google Scholar 

  9. Xu, Y., Xu, D.: Protein Domain Decomposition Using a Graph-Theoretic Approach. Bioinformatics 16, 1091–1104 (2000)

    Article  Google Scholar 

  10. Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Finn, R.D., Sonnhammer, E.L.: Pfam 3.1: 1313 Multiple Alignments and Profile HMMs Match the Majority of Proteins. Nucl. Acids Res. 27, 260–262 (1999)

    Article  Google Scholar 

  11. Ponting, P., Schultz, J., Milpetz, F., Bork, P.: SMART: Identification and Annotation of domains from Signaling and Extracellular Protein Sequences. Nucl. Acids Res. 27, 229–232 (1999)

    Article  Google Scholar 

  12. Wheelan, S.J., Marchler-Bauer, A., Bryant, S.H.: Domain Size Distributions Can Predict Domain Boundaries. Bioinformatics 16, 613–618 (2000)

    Article  Google Scholar 

  13. Galzitskaya, O.V., Melnik, B.S.: Prediction of Protein Domain Boundaries from Sequence alone. Protein Science 12, 696–701 (2003)

    Article  Google Scholar 

  14. Kosiol, C., Goldman, N., Buttimore, N.H.: A New Criterion and Method for Amino Acid Classification. Journal of Theoretical Biology 228, 97–106 (2004)

    Article  MathSciNet  Google Scholar 

  15. Nagaragan, N., Yona, G.: Automatic Prediction of Protein Domains from Sequence Information Using a Hybrid Learn System. Bioinformatics 1, 1–27 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zou, S., Huang, Y., Wang, Y., Zhou, C. (2006). Prediction of Protein Domains from Sequence Information Using Support Vector Machines. In: Wang, J., Yi, Z., Zurada, J.M., Lu, BL., Yin, H. (eds) Advances in Neural Networks - ISNN 2006. ISNN 2006. Lecture Notes in Computer Science, vol 3973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11760191_99

Download citation

  • DOI: https://doi.org/10.1007/11760191_99

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34482-7

  • Online ISBN: 978-3-540-34483-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics