CSI 2.0: a significantly improved version of the Chemical Shift Index

Hafsa, Noor E.; Wishart, David S.

doi:10.1007/s10858-014-9863-x

CSI 2.0: a significantly improved version of the Chemical Shift Index

Article
Published: 02 October 2014

Volume 60, pages 131–146, (2014)
Cite this article

Journal of Biomolecular NMR Aims and scope Submit manuscript

Noor E. Hafsa¹ &
David S. Wishart^1,2

1324 Accesses
45 Citations
3 Altmetric
Explore all metrics

Abstract

Protein chemical shifts have long been used by NMR spectroscopists to assist with secondary structure assignment and to provide useful distance and torsion angle constraint data for structure determination. One of the most widely used methods for secondary structure identification is called the Chemical Shift Index (CSI). The CSI method uses a simple digital chemical shift filter to locate secondary structures along the protein chain using backbone ¹³C and ¹H chemical shifts. While the CSI method is simple to use and easy to implement, it is only about 75–80 % accurate. Here we describe a significantly improved version of the CSI (2.0) that uses machine-learning techniques to combine all six backbone chemical shifts (¹³C_α, ¹³C_β, ¹³C, ¹⁵N, ¹HN, ¹H_α) with sequence-derived features to perform far more accurate secondary structure identification. Our tests indicate that CSI 2.0 achieved an average identification accuracy (Q3) of 90.56 % for a training set of 181 proteins in a repeated tenfold cross-validation and 89.35 % for a test set of 59 proteins. This represents a significant improvement over other state-of-the-art chemical shift-based methods. In particular, the level of performance of CSI 2.0 is equal to that of standard methods, such as DSSP and STRIDE, used to identify secondary structures via 3D coordinate data. This suggests that CSI 2.0 could be used both in providing accurate NMR constraint data in the early stages of protein structure determination as well as in defining secondary structure locations in the final protein model(s). A CSI 2.0 web server (http://csi.wishartlab.com) is available for submitting the input queries for secondary structure identification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CheSPI: chemical shift secondary structure population inference

Article 19 June 2021

The PROSECCO server for chemical shift predictions in ordered and disordered proteins

Article Open access 08 November 2017

POTENCI: prediction of temperature, neighbor and pH-corrected chemical shifts for intrinsically disordered proteins

Article 05 February 2018

References

Adamczak R, Porollo A, Meller J (2005) Combining prediction of secondary structure and solvent accessibility in proteins. Proteins Struct Funct Bioinform 59(3):467–475
Article Google Scholar
Adams PD, Baker D, Brunger AT, Das R, DiMaio F, Read RJ, Richardson DC, Richardson JS, Terwilliger TC (2013) Advances, interactions, and future developments in the CNS, Phenix and Rosetta structural biology software systems. Annu Rev Biophys 43:265–287
Article Google Scholar
Alexander PA, He Y, Chen Y, Orban J, Bryan PN (2009) A minimal sequence code for switching protein structure and function. Proc Natl Acad Sci 106(50):21149–21154
Article ADS Google Scholar
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Article Google Scholar
Andrec M, Snyder DA, Zhou Z, Young J, Montelione GT, Levy RM (2007) A large data set comparison of protein structures determined by crystallography and NMR: statistical test for structural differences and the effect of crystal packing. Proteins Struct Funct Bioinform 69(3):449–465
Article Google Scholar
Berjanskii MV, Wishart DS (2005) A simple method to predict protein flexibility using secondary chemical shifts. J Am Chem Soc 127(43):14970–14971
Article Google Scholar
Berjanskii M, Tang P, Liang J, Cruz JA, Zhou J, Zhou Y, Bassett E, MacDonell C, Lu P, Wishart DS (2009) GeNMR: a web server for rapid NMR-based protein structure determination. Nucleic Acids Res 37((Web server issue)):W670–W677
Article Google Scholar
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28(1):235–242
Article Google Scholar
Camilloni C, De Simone A, Vranken WF, Vendruscolo M (2012) Determination of secondary structure populations in disordered states of proteins using nuclear magnetic resonance chemical shifts. Biochemistry 51(11):2224–2231
Article Google Scholar
Cheung MS, Maguire ML, Stevens TJ, Broadhurst RW (2010) DANGLE: a Bayesian inferential method for predicting protein backbone dihedral angles and secondary structure. J Magn Reson 202(2):223–233
Article ADS Google Scholar
Cole C, Barber JD, Barton GJ (2008) The Jpred 3 secondary structure prediction server. Nucleic Acids Res 36(suppl 2):W197–W201
Article Google Scholar
Development Core Team R (2009) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Google Scholar
Durrett R (2010) Probability: theory and examples, vol 3. Cambridge University Press, London
Book Google Scholar
Eghbalnia HR, Wang L, Bahrami A, Assadi A, Markley JL (2005) Protein energetic conformational analysis from NMR chemical shifts (PECAN) and its use in determining secondary structural elements. J Biomol NMR 32(1):71–81
Article Google Scholar
Eyrich VA, Marti-Renom MA, Przybylski D, Madhusudhan MS, Fiser A, Pazos F, Valencia A, Sali A, Rost B (2001) EVA: Continuous automatic evaluation of protein structure prediction servers. Bioinformatics 17:1242–1243
Article Google Scholar
Fesinmeyer RM, Hudson FM, Olsen KA, White GW, Euser A, Andersen NH (2005) Chemical shifts provide fold populations and register of β-hairpins and β-sheets. J Biomol NMR 33(4):213–231
Article Google Scholar
Frishman D, Argos P (1995) Knowledge‐based protein secondary structure assignment. Proteins Struct Funct Bioinform 23(4):566–579
Article Google Scholar
Han B, Liu Y, Ginzinger SW, Wishart DS (2011) SHIFTX2: significantly improved protein chemical shift prediction. J Biomol NMR 50(1):43–57
Article Google Scholar
He B, Wang K, Liu Y, Xue B, Uversky VN, Dunker AK (2009) Predicting intrinsic disorder in proteins: an overview. Cell Res 19(8):929–949
Article Google Scholar
Hung LH, Samudrala R (2003) Accurate and automated classification of protein secondary structure with PsiCSI. Protein Sci 12(2):288–295
Article Google Scholar
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292(2):195–202
Article Google Scholar
Jones DT, Tress M, Bryson K, Hadley C (1999) Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure. Proteins Suppl 3:104–111
Article Google Scholar
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637
Article Google Scholar
Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab-an S4 package for kernel methods in R. J Stat Softw 11:1–20
Google Scholar
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26
Google Scholar
Labudde D, Leitner D, Krüger M, Oschkinat H (2003) Prediction algorithm for amino acid types with their secondary structure in proteins (PLATON) using chemical shifts. J Biomol NMR 25(1):41–53
Article Google Scholar
Levitt M (1978) Conformational preferences for globular proteins. J Am Chem Soc 17(20):4277–4284
Google Scholar
Mielke SP, Krishnan VV (2004) An evaluation of chemical shift index-based secondary structure determination in proteins: influence of random coil chemical shifts. J Biomol NMR 30(2):143–153
Article Google Scholar
Mielke SP, Krishnan VV (2009) Characterization of protein secondary structure from NMR chemical shifts. Prog Nucl Magn Reson Spectrosc 54(3–4):141–165
Article Google Scholar
Momen-Roknabadi A, Sadeghi M, Pezeshk H, Marashi SA (2008) Impact of residue accessible surface area on the prediction of protein secondary structures. BMC Bioinform 9(1):357
Article Google Scholar
Montgomerie S, Sundraraj S, Gallin WJ, Wishart DS (2006) Improving the accuracy of protein secondary structure prediction using structural alignment. BMC Bioinform 7:301
Article Google Scholar
Ratnaparkhi GS, Ramachandran S, Udgaonkar JB, Varadarajan R (1998) Discrepancies between the NMR and X-ray structures of uncomplexed barstar: analysis suggests that packing densities of protein structures determined by NMR are unreliable. Biochemistry 37(19):6958–6966
Article Google Scholar
Rost B, Sander C, Schneider R (1994) Redefining the goals of protein secondary structure prediction. J Mol Biol 235:13–26
Article Google Scholar
Schwarzinger S, Kroon GJ, Foss TR, Chung J, Wright PE, Dyson HJ (2001) Sequence-dependent correction of random coil NMR chemical shifts. J Am Chem Soc 123(13):2970–2978
Article Google Scholar
Shen Y, Bax A (2012) Identification of helix capping and β-turn motifs from NMR chemical shifts. J Biomol NMR 52(3):211–232
Article Google Scholar
Shen Y, Bax A (2013) Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J Biomol NMR 56(3):227–241
Article Google Scholar
Shen Y, Delaglio F, Cornilescu G, Bax A (2009a) TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J Biomol NMR 44(4):213–223
Article Google Scholar
Shen Y, Vernon R, Baker D, Bax A (2009b) De novo protein structure generation from incomplete chemical shift assignments. J Biomol NMR 43(2):63–78
Article Google Scholar
Shen Y, Bryan PN, He Y, Orban J, Baker D, Bax A (2010) De novo structure generation using chemical shifts for proteins with high-sequence identity but different folds. Protein Sci 19(2):349–356
Article Google Scholar
Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7(1):539
Article Google Scholar
Soding J, Remmert M (2011) Protein sequence comparison and fold recognition: progress and good practice benchmarking. Curr Opin Struct Biol 21(3):404–411
Article Google Scholar
Soding J, Biegert A, Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33((Web server issue)):W244–W248
Article Google Scholar
Tyagi M, Bornot A, Offmann B, de Brevern AG (2009) Analysis of loop boundaries using different local structure assignment methods. Prot Sci 18(9):1869–1881
Article Google Scholar
Ulrich EL, Akutsu H, Doreleijers JF, Harano Y, Ioannidis YE, Lin J et al (2008) BioMagResBank. Nucleic Acids Res 36(Suppl 1):D402–D408
Google Scholar
UniProt Consortium (2010) The universal protein resource (UniProt) in 2010. Nucleic Acids Res 38(Suppl 1):D142–D148
Article Google Scholar
Valdar WSJ (2002) Scoring residue conservation. Proteins Struct Funct Bioinform 48(2):227–241
Article Google Scholar
Wang G, Dunbrack RLJ (2003) PISCES: a protein culling server. Bioinformatics 19(12):1589–1591
Article Google Scholar
Wang Y, Jardetzky O (2002a) Probability-based protein secondary structure identification using combined NMR chemical-shift data. Protein Sci 11(4):852–861
Article Google Scholar
Wang Y, Jardetzky O (2002b) Investigation of the neighboring residue effects on protein chemical shifts. J Am Chem Soc 124(47):14075–14084
Article Google Scholar
Wang CC, Chen JH, Lai WC, Chuang WJ (2007a) 2DCSi: identification of protein secondary structure and redox state using 2D cluster analysis of NMR chemical shifts. J Biomol NMR 38(1):57–63
Article Google Scholar
Wang L, Eghbalnia HR, Markley JL (2007b) Nearest-neighbor effects on backbone alpha and beta carbon chemical shifts in proteins. J Biomol NMR 39(3):247–257
Article Google Scholar
Willard L, Ranjan A, Zhang H, Monzavi H, Boyko RF, Sykes BD, Wishart DS (2003) VADAR: a web server for quantitative evaluation of protein structure quality. Nucleic Acids Res 31(13):3316–3319
Article Google Scholar
Wishart DS (2011) Interpreting protein chemical shift data. Prog Nucl Magn Reson Spectrosc 58(1):62–87
Article Google Scholar
Wishart DS, Case DA (2002) Use of chemical shifts in macromolecular structure determination. Methods Enzymol 338:3–34
Article Google Scholar
Wishart DS, Nip AM (1998) Protein chemical shift analysis: a practical guide. Biochm Cell Biol 76(2–3):153–163
Wishart DS, Sykes BD (1994a) Chemical shifts as a tool for structure determination. Methods Enzymol 239:363–392
Article Google Scholar
Wishart DS, Sykes BD (1994b) The 13C chemical shift index: a simple method for the identification of protein secondary structure using 13C chemical shift data. J Biomol NMR 4(2):171–180
Article Google Scholar
Wishart DS, Sykes BD, Richards FM (1992) The chemical shift index: a fast and simple method for the assignment of protein secondary structure through NMR spectroscopy. Biochemistry 31(6):1647–1651
Article Google Scholar
Wishart DS, Arndt D, Berjanskii M, Tang P, Zhou J, Lin G (2008) CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data. Nucleic Acids Res 36((Web server issue)):W496–W502
Article Google Scholar
Wuthrich K (1986) NMR of proteins and nucleic acids. Wiley, New York
Google Scholar
Wuthrich K (1990) Protein structure determination in solution by NMR spectroscopy. J Bio Chem 265(36):22059–22062
Google Scholar
Zemla A, Venclovas C, Fidelis K, Rost B (1999) A modified definition of SOV, a segment-based measure for protein secondary structure prediction assessment. Proteins 34:220–223
Article Google Scholar
Zhang H, Neal S, Wishat DS (2003) RefDB: a database of uniformly referenced protein chemical shifts. J Biomol NMR 25:173–195
Article Google Scholar
Zhang W, Dunker AK, Zhou Y (2008) Assessing secondary structure assignment of protein structures by using pairwise sequence‐alignment benchmarks. Proteins Struct Funct Bioinform 71(1):61–67
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank Yongjie Liang for his help in preparing the CSI 2.0 web server. Financial support from the Natural Sciences and Engineering Research Council (NSERC), the Alberta Prion Research Institute (APRI) and PrioNet is gratefully acknowledged.

Author information

Authors and Affiliations

Department of Computing Science, University of Alberta, Edmonton, Canada
Noor E. Hafsa & David S. Wishart
Department of Biological Sciences, University of Alberta, Edmonton, Canada
David S. Wishart

Authors

Noor E. Hafsa
View author publications
You can also search for this author in PubMed Google Scholar
David S. Wishart
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David S. Wishart.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hafsa, N.E., Wishart, D.S. CSI 2.0: a significantly improved version of the Chemical Shift Index. J Biomol NMR 60, 131–146 (2014). https://doi.org/10.1007/s10858-014-9863-x

Download citation

Received: 27 June 2014
Accepted: 17 September 2014
Published: 02 October 2014
Issue Date: November 2014
DOI: https://doi.org/10.1007/s10858-014-9863-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CSI 2.0: a significantly improved version of the Chemical Shift Index

Abstract

Access this article

Similar content being viewed by others

CheSPI: chemical shift secondary structure population inference

The PROSECCO server for chemical shift predictions in ordered and disordered proteins

POTENCI: prediction of temperature, neighbor and pH-corrected chemical shifts for intrinsically disordered proteins

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CSI 2.0: a significantly improved version of the Chemical Shift Index

Abstract

Access this article

Similar content being viewed by others

CheSPI: chemical shift secondary structure population inference

The PROSECCO server for chemical shift predictions in ordered and disordered proteins

POTENCI: prediction of temperature, neighbor and pH-corrected chemical shifts for intrinsically disordered proteins

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation