Protein Solvent Accessibility Prediction Using Support Vector Machines and Sequence Conservations

  • Hasan Oğul
  • Erkan Ü. Mumcuoğlu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3949)


A two-stage method is developed for the single sequence prediction of protein solvent accessibility from solely its amino acid sequence. The first stage classifies each residue in a protein sequence as exposed or buried using support vector machine (SVM). The features used in the SVM are physico-chemical properties of the amino acid to be predicted as well as the information coming from its neighboring residues. The SVM-based predictions are refined using pairwise conservative patterns, called maximal unique matches (MUMs). The MUMs are identified by an efficient data structure called suffix tree. The baseline predictions, SVM-based predictions and MUM-based refinements are tested on a nonredundant protein data set and 7̃3% prediction accuracy is achieved for a solvent accessibility threshold that provides an evenly distribution between buried and exposed classes. The results demonstrate that the new method achieves slightly better accuracy than recent methods using single sequence prediction.


Support Vector Machine Solvent Accessibility Suffix Tree Efficient Data Structure Remote Homology Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ahmad, S., Gromiha, M.M.: NETASA: neural network based prediction of solvent accessibility. Bioinformatics 18, 819–824 (2002)CrossRefGoogle Scholar
  2. 2.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)CrossRefGoogle Scholar
  3. 3.
    Chen, H., Zhou, H., Hu, X., Yoo, I.: Classification comparison of prediction of solvent accessibility from protein sequences. In: 2nd Asia-Pacific Bioinformatics Conference, Dunedin, New Zelland (2004)Google Scholar
  4. 4.
    Delcher, A., Kasif, S., Fleishmann, R., Peterson, J., White, O., Salzberg, S.: Alignment of whole genomes. Nucleic Acids Research 27, 2369–2376 (1999)CrossRefGoogle Scholar
  5. 5.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)CrossRefMATHGoogle Scholar
  6. 6.
    Horton, H.B., Moran, L.A., Ochs, R.S., Rawn, J.D., Scrimgeour, K.G.: Principles of Biochemistry. Prentice Hall, Englewood Cliffs (2002)Google Scholar
  7. 7.
    Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22, 2577–2637 (1983)CrossRefGoogle Scholar
  8. 8.
    Li, X., Pan, X.-M.: New method for accurate prediction of solvent accessibility from protein sequence. Proteins 42, 1–5 (2001)CrossRefGoogle Scholar
  9. 9.
    Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for remote homology detection. In: Proc. 6th. Int. Conf. on Computational Molecular Biology, pp. 225–232 (2002)Google Scholar
  10. 10.
    Oğul, H., Erciyes, K.: Identifying all local and global alignments between two DNA sequences. In: Proc. 17th Int. Sym. on Computer and Information Sciences, pp. 468–475 (2001)Google Scholar
  11. 11.
    Rost, B., Sander, C.: Conservation and prediction of solvent accessibility in protein families. Proteins 20, 216–226 (1994)CrossRefGoogle Scholar
  12. 12.
    Richardson, C.J., Barlow, D.J.: The bottom line for prediction of residue solvent accessibility. Protein Engineering 12, 1051–1054 (1999)CrossRefGoogle Scholar
  13. 13.
    Thompson, M.J., Goldstein, R.A.: Predicting solvent accessibility: higher accuracy using Bayesian statistics and optimized residue substitution classes. Proteins 25, 38–47 (1996)CrossRefGoogle Scholar
  14. 14.
    Ukkonen, E.: On-line construction of suffix-trees. Algorithmica 14, 249–260 (1995)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Vapnik, V.: The nature of statistical learning theory. Springer, New York (1995)CrossRefMATHGoogle Scholar
  16. 16.
    Ward, J., McGuffin, L.C., Buxton, B.F., Jones, D.T.: Secondary structure prediction with support vector machines. Bioinformatics 19, 1650–1655 (2003)CrossRefGoogle Scholar
  17. 17.
    Yuan, Z., Burrage, K., Mattick, J.: Prediction of protein solvent accessibility using support vector machines. Proteins 48, 566–570 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hasan Oğul
    • 1
  • Erkan Ü. Mumcuoğlu
    • 2
  1. 1.Department of Computer EngineeringBaşkent UniversityAnkaraTurkey
  2. 2.Information Systems and Health Informatics, Informatics InstituteMiddle East Technical UniversityAnkaraTurkey

Personalised recommendations