Journal of Biomolecular NMR

, Volume 62, Issue 3, pp 387–401 | Cite as

Accessible surface area from NMR chemical shifts

  • Noor E. Hafsa
  • David Arndt
  • David S. Wishart


Accessible surface area (ASA) is the surface area of an atom, amino acid or biomolecule that is exposed to solvent. The calculation of a molecule’s ASA requires three-dimensional coordinate data and the use of a “rolling ball” algorithm to both define and calculate the ASA. For polymers such as proteins, the ASA for individual amino acids is closely related to the hydrophobicity of the amino acid as well as its local secondary and tertiary structure. For proteins, ASA is a structural descriptor that can often be as informative as secondary structure. Consequently there has been considerable effort over the past two decades to try to predict ASA from protein sequence data and to use ASA information (derived from chemical modification studies) as a structure constraint. Recently it has become evident that protein chemical shifts are also sensitive to ASA. Given the potential utility of ASA estimates as structural constraints for NMR we decided to explore this relationship further. Using machine learning techniques (specifically a boosted tree regression model) we developed an algorithm called “ShiftASA” that combines chemical-shift and sequence derived features to accurately estimate per-residue fractional ASA values of water-soluble proteins. This method showed a correlation coefficient between predicted and experimental values of 0.79 when evaluated on a set of 65 independent test proteins, which was an 8.2 % improvement over the next best performing (sequence-only) method. On a separate test set of 92 proteins, ShiftASA reported a mean correlation coefficient of 0.82, which was 12.3 % better than the next best performing method. ShiftASA is available as a web server ( for submitting input queries for fractional ASA calculation.


Nuclear magnetic resonance Chemical-shifts Machine learning Accessible surface area Protein 



The authors would like to thank Dr. Mark Berjanskii for his helpful suggestions in preparing the ShiftASA program. Financial support from the Natural Sciences and Engineering Research Council (NSERC), the Alberta Prion Research Institute (APRI) and PrioNet is gratefully acknowledged.

Supplementary material

10858_2015_9957_MOESM1_ESM.pdf (2 mb)
Supplementary material 1 (PDF 2064 kb)


  1. Adamczak R, Porollo A, Meller J (2004) Accurate prediction of solvent accessibility using neural networks-based regression. Proteins Struct Funct Bioinform 56(4):753–767CrossRefGoogle Scholar
  2. Ahmad S, Gromiha MM (2002) NETASA: neural network based prediction of solvent accessibility. Bioinformatics 18(6):819–824CrossRefGoogle Scholar
  3. Ahmad S, Gromiha MM, Sarai A (2003) Real value prediction of solvent accessibility from amino acid sequence. Proteins Struct Funct Bioinform 50(4):629–635CrossRefGoogle Scholar
  4. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402CrossRefGoogle Scholar
  5. Avbeli F, Kocjan D, Baldwin RL (2004) Protein chemical shifts arising from alpha-helices and beta-sheets depend on solvent exposure. Proc Natl Acad Sci USA 101(50):17394–17397ADSCrossRefGoogle Scholar
  6. Benkert P, Tosatto SC, Schomburg D (2008) QMEAN: a comprehensive scoring function for model quality assessment. Proteins Struct Funct Bioinform 71(1):261–277CrossRefGoogle Scholar
  7. Berjanskii MV, Wishart DS (2005) A simple method to predict protein flexibility using secondary chemical shifts. J Am Chem Soc 127(43):14970–14971CrossRefGoogle Scholar
  8. Berjanskii MV, Wishart DS (2013) A simple method to measure protein side-chain mobility using NMR chemical shifts. J Am Chem Soc 135(39):14536–14539CrossRefGoogle Scholar
  9. Biswas KM, DeVido DR, Dorsey JG (2003) Evaluation of methods for measuring amino acid hydrophobicities and interactions. J Chromatogr A 1000(1):637–655CrossRefGoogle Scholar
  10. Chen H, Zhou HX (2005) Prediction of solvent accessibility and sites of deleterious mutations from protein sequence. Nucleic Acids Res 33(10):3193–3199ADSCrossRefGoogle Scholar
  11. Chothia C (1976) The nature of the accessible and buried surfaces in proteins. J Mol Biol 105(1):1–12CrossRefGoogle Scholar
  12. Croy CH, Koeppe JR, Bergqvist S, Komives EA (2004) Allosteric changes in solvent accessibility observed in thrombin upon active site occupation. Biochemistry 43(18):5246–5255CrossRefGoogle Scholar
  13. Eisenberg D, Weiss RM, Terwilliger TC (1984) The hydrophobic moment detects periodicity in protein hydrophobicity. Proc Natl Acad Sci 81(1):140–144ADSCrossRefGoogle Scholar
  14. Eisenhaber F, Argos P (1993) Improved strategy in analytic surface calculation for molecular systems: handling of singularities and computational efficiency. J Comput Chem 14(11):1272–1280CrossRefGoogle Scholar
  15. Engelman DM, Steitz TA, Goldman A (1986) Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu Rev Biophys Biomol Struct 15(1):321–353CrossRefGoogle Scholar
  16. Frishman D, Argos P (1995) Knowledge-based protein secondary structure assignment. Proteins Struct Funct Bioinform 23(4):566–579CrossRefGoogle Scholar
  17. Garg A, Kaur H, Raghava GPS (2005) Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure. Proteins Struct Funct Bioinform 61(2):318–324CrossRefGoogle Scholar
  18. Han B, Liu Y, Ginzinger SW, Wishart DS (2011) SHIFTX2: significantly improved protein chemical shift prediction. J Biomol NMR 50(1):43–57CrossRefGoogle Scholar
  19. Holbrook SR, Muskal SM, Kim SH (1990) Predicting surface exposure of amino acids from protein sequence. Protein Eng 3(8):659–665CrossRefGoogle Scholar
  20. Hopp TP, Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci 78(6):3824–3828ADSCrossRefGoogle Scholar
  21. Huyghues-Despointes BM, Langhorst U, Steyaert J, Pace CN, Scholtz JM (1999) Hydrogen-exchange stabilities of RNase T1 and variants with buried and solvent-exposed Ala → Gly mutations in the helix. Biochemistry 38(50):16481–16490CrossRefGoogle Scholar
  22. Janin J (1979) Surface and inside volumes in globular proteins. Nature 277:491–492ADSCrossRefGoogle Scholar
  23. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637CrossRefGoogle Scholar
  24. Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157(1):105–132CrossRefGoogle Scholar
  25. Lavigne P, Willard L, Sykes BD, Bagu JR, Boyko R, Holmes CE (2000) Structure-based thermodynamic analysis of the dissociation of protein phosphatase-1 catalytic subunit and microcystin-LR docked complexes. Protein Sci 9(2):252–264CrossRefGoogle Scholar
  26. Lee B, Richards FM (1971) The interpretation of protein structures: estimation of static accessibility. J Mol Biol 55(3):379–400CrossRefGoogle Scholar
  27. Li X, Pan XM (2001) New method for accurate prediction of solvent accessibility from protein sequence. Proteins Struct Funct Bioinform 42(1):1–5CrossRefGoogle Scholar
  28. Manavalan P, Ponnuswamy PK (1978) Hydrophobic character of amino acid residues in globular proteins. Nature 275:673–674ADSCrossRefGoogle Scholar
  29. Marsh JA (2013) Buried and accessible surface area control intrinsic protein flexibility. J Mol Biol 425:3250–3263CrossRefGoogle Scholar
  30. Mayrose I, Graur D, Ben-Tal N, Pupko T (2004) Comparison of site-specific rate-inference methods: Bayesian methods are superior. Mol Biol Evol 21:1781–1791CrossRefGoogle Scholar
  31. Myers JK, Nick PC, Martin SJ (1995) Denaturant m values and heat capacity changes: relation to changes in accessible surface areas of protein unfolding. Protein Sci 4(10):2138–2148CrossRefGoogle Scholar
  32. Naderi-Manesh H, Sadeghi M, Arab S, Moosavi MAA (2001) Prediction of protein surface accessibility with information theory. Proteins Struct Funct Bioinform 42(4):452–459CrossRefGoogle Scholar
  33. Nguyen MN, Rajapakse JC (2005) Prediction of protein relative solvent accessibility with a two-stage SVM approach. Proteins Struct Funct Bioinform 59(1):30–37CrossRefGoogle Scholar
  34. Ozenne V, Bauer F, Salmon L, Huang JR, Jensen MR, Segard S, Blackledge M (2012) Flexible-meccano: a tool for the generation of explicit ensemble descriptions of intrinsically disordered proteins and their associated experimental observables. Bioinformatics 28(11):1463–1470CrossRefGoogle Scholar
  35. Petersen B, Petersen TN, Andersen P, Nielsen M, Lundegaard C (2009) A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct Biol 9(1):51CrossRefGoogle Scholar
  36. Pollastri G, Baldi P, Fariselli P, Casadio R (2002) Prediction of coordination number and relative solvent accessibility in proteins. Proteins Struct Funct Bioinform 47(2):142–153CrossRefGoogle Scholar
  37. R Development Core Team (2008) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ISBN 3-900051-07-0.
  38. Richards FM (1974) The interpretation of protein structures: total volume, group volume distributions and packing density. J Mol Biol 82(1):1–14CrossRefGoogle Scholar
  39. Richards FM (1977) Areas, volumes, packing and protein structure. Annu Rev Biophys Bioeng 6:151–176CrossRefGoogle Scholar
  40. Ridgeway G (2007) Generalized boosted models: a guide to the GBM package. R package vignette.
  41. Rost B, Sander C (1994) Conservation and prediction of solvent accessibility in protein families. Proteins Struct Funct Bioinform 20(3):216–226CrossRefGoogle Scholar
  42. Serpa JJ, Makepeace KA, Borchers TH, Wishart DS, Petrotchenko EV, Borchers CH (2014) Using isotopically-coded hydrogen peroxide as a surface modification reagent for the structural characterization of prion protein aggregates. J Proteomics 100:160–166CrossRefGoogle Scholar
  43. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7(1):539CrossRefGoogle Scholar
  44. Thompson MJ, Goldstein RA (1996) Predicting solvent accessibility: higher accuracy using Bayesian statistics and optimized residue substitution classes. Proteins Struct Funct Genet 25(1):38–47CrossRefGoogle Scholar
  45. Trevor H, Robert T, Friedman JJH (2001) The elements of statistical learning, vol 1. Springer, New YorkzbMATHGoogle Scholar
  46. UniProt Consortium (2010) The universal protein resource (UniProt) in 2010. Nucleic Acids Res 38(Suppl 1):D142–D148CrossRefGoogle Scholar
  47. Valdar WSJ (2002) Scoring residue conservation. Proteins Struct Funct Bioinform 48(2):227–241CrossRefGoogle Scholar
  48. Vranken W, Rieping W (2009) Relationship between chemical shift value and accessible surface area for all amino acid atoms. BMC Struct Biol 9(1):20CrossRefGoogle Scholar
  49. Wagner M, Adamczak R, Porollo A, Meller J (2005) Linear regression models for solvent accessibility prediction in proteins. J Comput Biol 12(3):355–369CrossRefGoogle Scholar
  50. Wang Y, Jardetzky O (2002) Probability-based protein secondary structure identification using combined NMR chemical-shift data. Protein Sci 11(4):852–861CrossRefGoogle Scholar
  51. Wishart DS (2011) Interpreting protein chemical shift data. Prog Nucl Magn Reson Spectrosc 58(1):62–87CrossRefGoogle Scholar
  52. Wishart DS, Sykes BD (1994) Chemical shifts as a tool for structure determination. Methods Enzymol 239:363–392CrossRefGoogle Scholar
  53. Yuan Z, Huang B (2004) Prediction of protein accessible surface areas by support vector regression. Proteins Struct Funct Bioinform 57(3):558–564MathSciNetCrossRefGoogle Scholar
  54. Zhang H, Neal S, Wishat DS (2003) RefDB: a database of uniformly referenced protein chemical shifts. J Biomol NMR 25:173–195CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  • Noor E. Hafsa
    • 1
  • David Arndt
    • 1
  • David S. Wishart
    • 1
    • 2
  1. 1.Department of Computing ScienceUniversity of AlbertaEdmontonCanada
  2. 2.Department of Biological SciencesUniversity of AlbertaEdmontonCanada

Personalised recommendations