Prediction of Protein Domains from Sequence Information Using Support Vector Machines

Zou, Shuxue; Huang, Yanxin; Wang, Yan; Zhou, Chunguang

doi:10.1007/11760191_99

Prediction of Protein Domains from Sequence Information Using Support Vector Machines

Shuxue Zou²¹,
Yanxin Huang²¹,
Yan Wang²¹ &
…
Chunguang Zhou²¹

Conference paper

1572 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3973))

Abstract

Guessing the boundaries of structural domains has been an important and challenging problem in experimental and computational structural biology. Predictions were based on intuition, biochemical properties, statistics, sequence homology and other aspects of predicted protein structure. In this paper a promising method for detecting the domain structure of a protein from sequence information alone was presented. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using support vector machines. The overall accuracy of the method for a single protein chains dataset, is about 85%. The result demonstrates that the utility of the method can help not only in predicting the complete 3D structure of a protein but also in the study of proteins’ building blocks and for functional analysis.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rose, G.D.: Hierarchic Organization of Domains in Globular Proteins. J. Mol. Biol. 134, 447–470 (1979)
Article Google Scholar
Sonnhammer, E.L., Kahn, D.: Modular Arrangement of Proteins as Inferred From Analysis of Homology. Protein Sci. 3, 482–492 (1994)
Article Google Scholar
Gracy, J., Argos, P.: Automated Protein Sequence Database Classification. I. Integration of Copositional Similarity Search, Local Similarity Search and Multiple Sequence Alignment. II. Delineation of domain boundries from sequence similarity. Bioinformatics 14, 164–187 (1998)
Article Google Scholar
George, R.A., Heringa, J.: Protein Domain Identification and Improved Sequence Similarity Searching Ssing PSI-BLAST. Proteins 48, 672–681 (2002)
Article Google Scholar
Murzin, G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a Structural Classification of Proteins Database for the Investigation of Sequences and Structures. J. Mol. Biol. 247, 536–540 (1995)
Google Scholar
Orengo, A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH-a Hierarchic Classification of Protein Domain Structures. Structure 5, 1093–1108 (1997)
Article Google Scholar
Holm, L., Sander, C.: Mapping the Protein Universe. Science 273, 595–602 (1996)
Article Google Scholar
Alexandrov, N., Shindyalov, I.: PDP:protein domain parser. Bioinf. 19, 429–430 (2003)
Article Google Scholar
Xu, Y., Xu, D.: Protein Domain Decomposition Using a Graph-Theoretic Approach. Bioinformatics 16, 1091–1104 (2000)
Article Google Scholar
Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Finn, R.D., Sonnhammer, E.L.: Pfam 3.1: 1313 Multiple Alignments and Profile HMMs Match the Majority of Proteins. Nucl. Acids Res. 27, 260–262 (1999)
Article Google Scholar
Ponting, P., Schultz, J., Milpetz, F., Bork, P.: SMART: Identification and Annotation of domains from Signaling and Extracellular Protein Sequences. Nucl. Acids Res. 27, 229–232 (1999)
Article Google Scholar
Wheelan, S.J., Marchler-Bauer, A., Bryant, S.H.: Domain Size Distributions Can Predict Domain Boundaries. Bioinformatics 16, 613–618 (2000)
Article Google Scholar
Galzitskaya, O.V., Melnik, B.S.: Prediction of Protein Domain Boundaries from Sequence alone. Protein Science 12, 696–701 (2003)
Article Google Scholar
Kosiol, C., Goldman, N., Buttimore, N.H.: A New Criterion and Method for Amino Acid Classification. Journal of Theoretical Biology 228, 97–106 (2004)
Article MathSciNet Google Scholar
Nagaragan, N., Yona, G.: Automatic Prediction of Protein Domains from Sequence Information Using a Hybrid Learn System. Bioinformatics 1, 1–27 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, 130012, China
Shuxue Zou, Yanxin Huang, Yan Wang & Chunguang Zhou

Authors

Shuxue Zou
View author publications
You can also search for this author in PubMed Google Scholar
Yanxin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chunguang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
Jun Wang
Computational Intelligence Laboratory, School of Computer Science and Engineering, University of Electronic Science and Technology of China, 610054, Chengdu, P.R. China
Zhang Yi
Department of Electrical Engineering, University of Louisville, 40292, Louisville, KY, U.S.A
Jacek M. Zurada
Laboratory for Computational Biology, Shanghai Center for Systems Biomedicine, 800 Dong Chuan Rd, 200240, Shanghai, China
Bao-Liang Lu
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zou, S., Huang, Y., Wang, Y., Zhou, C. (2006). Prediction of Protein Domains from Sequence Information Using Support Vector Machines. In: Wang, J., Yi, Z., Zurada, J.M., Lu, BL., Yin, H. (eds) Advances in Neural Networks - ISNN 2006. ISNN 2006. Lecture Notes in Computer Science, vol 3973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11760191_99

Download citation

DOI: https://doi.org/10.1007/11760191_99
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34482-7
Online ISBN: 978-3-540-34483-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics