Protein Solvent Accessibility Prediction Using Support Vector Machines and Sequence Conservations

Oğul, Hasan; Mumcuoğlu, Erkan Ü.

doi:10.1007/11803089_17

Protein Solvent Accessibility Prediction Using Support Vector Machines and Sequence Conservations

Hasan Oğul¹⁹ &
Erkan Ü. Mumcuoğlu²⁰

Conference paper

781 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3949))

Abstract

A two-stage method is developed for the single sequence prediction of protein solvent accessibility from solely its amino acid sequence. The first stage classifies each residue in a protein sequence as exposed or buried using support vector machine (SVM). The features used in the SVM are physico-chemical properties of the amino acid to be predicted as well as the information coming from its neighboring residues. The SVM-based predictions are refined using pairwise conservative patterns, called maximal unique matches (MUMs). The MUMs are identified by an efficient data structure called suffix tree. The baseline predictions, SVM-based predictions and MUM-based refinements are tested on a nonredundant protein data set and 7̃3% prediction accuracy is achieved for a solvent accessibility threshold that provides an evenly distribution between buried and exposed classes. The results demonstrate that the new method achieves slightly better accuracy than recent methods using single sequence prediction.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahmad, S., Gromiha, M.M.: NETASA: neural network based prediction of solvent accessibility. Bioinformatics 18, 819–824 (2002)
Article Google Scholar
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)
Article Google Scholar
Chen, H., Zhou, H., Hu, X., Yoo, I.: Classification comparison of prediction of solvent accessibility from protein sequences. In: 2nd Asia-Pacific Bioinformatics Conference, Dunedin, New Zelland (2004)
Google Scholar
Delcher, A., Kasif, S., Fleishmann, R., Peterson, J., White, O., Salzberg, S.: Alignment of whole genomes. Nucleic Acids Research 27, 2369–2376 (1999)
Article Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Horton, H.B., Moran, L.A., Ochs, R.S., Rawn, J.D., Scrimgeour, K.G.: Principles of Biochemistry. Prentice Hall, Englewood Cliffs (2002)
Google Scholar
Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22, 2577–2637 (1983)
Article Google Scholar
Li, X., Pan, X.-M.: New method for accurate prediction of solvent accessibility from protein sequence. Proteins 42, 1–5 (2001)
Article Google Scholar
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for remote homology detection. In: Proc. 6th. Int. Conf. on Computational Molecular Biology, pp. 225–232 (2002)
Google Scholar
Oğul, H., Erciyes, K.: Identifying all local and global alignments between two DNA sequences. In: Proc. 17th Int. Sym. on Computer and Information Sciences, pp. 468–475 (2001)
Google Scholar
Rost, B., Sander, C.: Conservation and prediction of solvent accessibility in protein families. Proteins 20, 216–226 (1994)
Article Google Scholar
Richardson, C.J., Barlow, D.J.: The bottom line for prediction of residue solvent accessibility. Protein Engineering 12, 1051–1054 (1999)
Article Google Scholar
Thompson, M.J., Goldstein, R.A.: Predicting solvent accessibility: higher accuracy using Bayesian statistics and optimized residue substitution classes. Proteins 25, 38–47 (1996)
Article Google Scholar
Ukkonen, E.: On-line construction of suffix-trees. Algorithmica 14, 249–260 (1995)
Article MathSciNet MATH Google Scholar
Vapnik, V.: The nature of statistical learning theory. Springer, New York (1995)
Book MATH Google Scholar
Ward, J., McGuffin, L.C., Buxton, B.F., Jones, D.T.: Secondary structure prediction with support vector machines. Bioinformatics 19, 1650–1655 (2003)
Article Google Scholar
Yuan, Z., Burrage, K., Mattick, J.: Prediction of protein solvent accessibility using support vector machines. Proteins 48, 566–570 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Başkent University, 06530, Ankara, Turkey
Hasan Oğul
Information Systems and Health Informatics, Informatics Institute, Middle East Technical University, 06531, Ankara, Turkey
Erkan Ü. Mumcuoğlu

Authors

Hasan Oğul
View author publications
You can also search for this author in PubMed Google Scholar
Erkan Ü. Mumcuoğlu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Izmir Institute of Technology, Electrical Electronics Engineering, Gülbahçe, Urla, Izmir, Turkey
F. Acar Savacı

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oğul, H., Mumcuoğlu, E.Ü. (2006). Protein Solvent Accessibility Prediction Using Support Vector Machines and Sequence Conservations. In: Savacı, F.A. (eds) Artificial Intelligence and Neural Networks. TAINN 2005. Lecture Notes in Computer Science(), vol 3949. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11803089_17

Download citation

DOI: https://doi.org/10.1007/11803089_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36713-0
Online ISBN: 978-3-540-36861-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics