Application of Dirichlet process mixture model to the identification of spin systems in protein NMR spectra


Analysis of structure, function and interactions of proteins by NMR spectroscopy usually requires the assignment of resonances to the corresponding nuclei in protein. This task, although automated by methods such as FLYA or PINE, is still frequently performed manually. To facilitate the manual sequence-specific chemical shift assignment of complex proteins, we propose a method based on Dirichlet process mixture model (DPMM) that performs automated matching of groups of signals observed in NMR spectra to corresponding nuclei in protein sequence. The model has been extensively tested on 80 proteins retrieved from the BMRB database and has shown superior performance to the reference method.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Data availability

The proposed model is implemented as part of the Dumpling software (available at The DPMM model generates recommendations to a user, who performs a manual sequence-specific resonance assignment in Sparky-like graphical interface (Figs. S1–S4).


  1. Aeschbacher T, Schmidt E, Blatter M, Maris C, Duss O, Allain FHT, Güntert P, Schubert M (2013) Automated and assisted RNA resonance assignment using nmr chemical shift statistics. Nucleic Acids Res 41:172–172

    Article  Google Scholar 

  2. Alipanahi B, Gao X, Karakoc E, Li SC, Balbach F, Feng G, Donaldson L, Li M (2011) Error tolerant NMR backbone resonance assignment and automated structure generation. J Bioinf Comput Biol 9:15–41

    Article  Google Scholar 

  3. Attias H (2000) A variational Bayesian framework for graphical models. In: Advances in Neural Information Processing Systems, pp 209–215

  4. Bahrami A, Assadi AH, Markley JL, Eghbalnia HR (2009) Probabilistic interaction network of evidence algorithm and its application to complete labeling of peak lists from protein NMR spectroscopy. PLoS Comput Biol 5:1–15

    Article  Google Scholar 

  5. Bax A, Clore G, Gronenborn A (1990) 1H–1H correlation via isotropic mixing of 13C magnetization, a new three-dimensional approach for assigning 1H and 13C spectra of 13C-enriched proteins. J Magn Reson 88:425–431

    ADS  Google Scholar 

  6. Blei D, Jordan M (2006) Variational inference for Dirichlet process mixtures. Bayesian Anal 1:121–143

    MathSciNet  Article  MATH  Google Scholar 

  7. Grzesiek S, Bax A (1992) An efficient experiment for sequential backbone assignment of medium-sized isotopically enriched proteins. J Magn Reson 99:201–207

    ADS  Google Scholar 

  8. Grzesiek S, Bax A (1993) Amino acid type determination in the sequential assignment procedure of uniformly 13 C/15 N-enriched proteins. J Biomol NMR 3:185–204

    Google Scholar 

  9. Grzesiek S, Anglister J, Bax A (1993) Correlation of backbone amide and aliphatic side-chain resonances in 13C/15N-enriched proteins by isotropic mixing of 13C magnetization. J Magn Reson 101:114–119

    Article  Google Scholar 

  10. Güntert P (2004) Automated NMR structure calculation with cyana. In: Downing AK (ed) Protein NMR techniques. Humana Press, Totowa, pp 353–378

  11. Güntert P, Salzmann M, Braun D, Wüthrich K (2000) Sequence-specific NMR assignment of proteins by global fragment mapping with the program MAPPER. J Biomol NMR 18:129–137

    Article  Google Scholar 

  12. Ishwaran H, James LF (2001) Gibbs sampling methods for stick-breaking priors. J Am Stat Assoc 96:161–173

    MathSciNet  Article  MATH  Google Scholar 

  13. Jones E, Oliphant T, Peterson P et al (2001) SciPy: open source scientific tools for Python. URL

  14. Kay L, Ikura M, Tschudin R, Bax A (1990) Three-dimensional triple-resonance NMR spectroscopy of isotopically enriched proteins. J Magn Reson 89:496–514

    ADS  Google Scholar 

  15. Klukowski P, Augoff M, Zieba M, Drwal M, Gonczarek A, Walczak MJ (2018) NMRNet: a deep learning approach to automated peak picking of protein NMR spectra. Bioinformatics.

    Google Scholar 

  16. Lukin J, Gove A, Talukdar S, Ho C (1997) Automated probabilistic method for assigning backbone resonances of (13C, 15N)-labeled proteins. J Biomol NMR 9:151–166

    Article  Google Scholar 

  17. Moseley H, Sahota G, Montelione G (2004) Assignment validation software suite for the evaluation and presentation of protein resonance assignment data. J Biomol NMR 28:341–355

    Article  Google Scholar 

  18. Rieping W, Vranken WF (2010) Validation of archived chemical shifts through atomic coordinates. Proteins 78:2482–2489

    Google Scholar 

  19. Rule GS, Hitchens TK (2006) Fundamentals of protein NMR spectroscopy. Springer Science & Business Media, New York

    Google Scholar 

  20. Schmidt E, Güntert P (2012) A new algorithm for reliable and general NMR resonance assignment. J Am Chem Soc 134:12817–12829

    Article  Google Scholar 

  21. Ulrich EL, Akutsu H, Doreleijers JF, Harano Y, Ioannidis YE, Lin J, Livny M, Mading S, Maziuk D, Miller Z et al (2007) BioMagResBank. Nucleic Acids Res 36:402–408

    Article  Google Scholar 

  22. Wang B, Wang Y, Wishart D (2010) A probabilistic approach for validating protein NMR chemical shift assignments. J Mol Biol 47:85–99

    Google Scholar 

  23. Wang Y, Jardetzky O (2002) Probability-based protein secondary structure identification using combined NMR chemical-shift data. Protein Sci 11:852–861

    Article  Google Scholar 

Download references


The research has been co-financed by the Ministry of Science and Higher Education, Republic of Poland: Adam Gonczarek, Grant No. 0402/0082/17.

Author information




PK designed the model with the support of AG and MA; PK and MA implemented the model and the experiments; PK and MA designed the experiments; PK, MA, AG, MJW discussed the results and wrote the manuscript, MZ prepared the Dumpling components to make the model publicly available.

Corresponding author

Correspondence to Piotr Klukowski.

Additional information

Piotr Klukowski and Michał Augoff would like to be considered as joint first author.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 5.51 MB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Klukowski, P., Augoff, M., Zamorski, M. et al. Application of Dirichlet process mixture model to the identification of spin systems in protein NMR spectra. J Biomol NMR 71, 11–18 (2018).

Download citation


  • Chemical shift assignment
  • Mixture models
  • Spin system identification