Journal of Biomolecular NMR

, Volume 35, Issue 3, pp 187–208 | Cite as

Inferential backbone assignment for sparse data

  • Olga Vitek
  • Chris Bailey-Kellogg
  • Bruce Craig
  • Jan Vitek


This paper develops an approach to protein backbone NMR assignment that effectively assigns large proteins while using limited sets of triple-resonance experiments. Our approach handles proteins with large fractions of missing data and many ambiguous pairs of pseudoresidues, and provides a statistical assessment of confidence in global and position-specific assignments. The approach is tested on an extensive set of experimental and synthetic data of up to 723 residues, with match tolerances of up to 0.5 ppm for \(\hbox{C}^{\upalpha}\) and \(\hbox{C}^{\upbeta}\) resonance types. The tests show that the approach is particularly helpful when data contain experimental noise and require large match tolerances. The keys to the approach are an empirical Bayesian probability model that rigorously accounts for uncertainty in the data at all stages in the analysis, and a hybrid stochastic tree-based search algorithm that effectively explores the large space of possible assignments.


Bayesian modeling NMR assignment sparse data statistical inference stochastic search algorithm structural genomics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



The authors would like to thanks Dr. Post, Purdue University, for providing the experimental data sets, Drs. Jung and Zweckstetter, Max Plank Institute for Biophysical Chemistry, for sharing their simulated data, and Drs. Moseley and Montelione, Rutgers University, for providing access to the AutoAssign data. This work was supported in part by a Purdue Dissertation Fellowship and by NSF grants EIA-9802068 and IIS-0502801.


  1. Andrec M., Levy R. (2002) J. Biomol. NMR 23:263–270CrossRefGoogle Scholar
  2. Atreya H.S., Sahu S.C., Chary K.V.R., Govil G. (2000) J. Biomol. NMR 17:125–136CrossRefGoogle Scholar
  3. Bartels C., Güntert P., Billeter M., Wüthrich K. (1997) J. Comp. Chem. 18:139–149CrossRefGoogle Scholar
  4. Buchler N.E.G., Zuiderweg E.P.R., Wang H., Goldstein R.A. (1997) J. Magn. Res. 125:34–42CrossRefADSGoogle Scholar
  5. Burnham, K.P. and Anderson, D. (2002) Model Selection and Multi-Model Inference, 2nd edn., SpringerGoogle Scholar
  6. Coggins B.E., Zhou P. (2003) J. Biomol. NMR 26:93–111CrossRefGoogle Scholar
  7. Eghbalnia R.H., Bahrami A., Wang L., Assadi A., Markley J.L. (2005) J. Biomol. NMR 32:219–233CrossRefGoogle Scholar
  8. Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (1995) Bayesian Data Analysis, Chapman and HallGoogle Scholar
  9. Hoeting J.A., Madigan D., Raftery A.E., Volinsky C.T (1999) Stat. Sci. 14:382–417CrossRefMathSciNetMATHGoogle Scholar
  10. Hitchens T.K., Lukin J.A., Zhan Y., McCallum S.A., Rule G.S. (2003) J. Biomol. NMR 25:1–9CrossRefGoogle Scholar
  11. Hoos H., Stützle T (2005) Stochastic Local Search: Foundations and Applications. Elsevier, CAMATHGoogle Scholar
  12. Kass R.E., Raftery A.E. (1995) J. Am. Stat. Assoc. 90:773–795CrossRefMATHGoogle Scholar
  13. Jung J.-S., Zweckstetter M. (2004) J. Biomol. NMR 30:11–24CrossRefGoogle Scholar
  14. Lukin J.A., Gove A.P., Talukdar S.N., Ho C. (1997) J. Biomol. NMR 9:151–166CrossRefGoogle Scholar
  15. Ma L., Jones C.T., Groesch T.D., Kuhn R.J., Post C.B. (2004) Proc. Natl. Acad. Sci. 101:3414–3419CrossRefADSGoogle Scholar
  16. Marin A., Malliavin T., Nicholas P., Delsuc M.-A. (2004) J. Biomol. NMR 30:47–60CrossRefGoogle Scholar
  17. McGuffin L.J., Bryson K., Jones D.T. (2000) Bioinformatics 16:404–405CrossRefGoogle Scholar
  18. Moseley H.N.B., Montelione G.T. (1999) Curr. Opin. Struct. Biol. 9:635–642CrossRefGoogle Scholar
  19. Rieping W., Habeck M., Nilges M. (2005) Science 309:303–306CrossRefADSGoogle Scholar
  20. Seavey B.R., Farr E.A., Westler W.M., Markley J. (1991) J. Biomol. NMR 1:217–236CrossRefGoogle Scholar
  21. The Ubiquitin NMR Resource, University College London/Ludwig Institute for Cancer Research Joint NMR Laboratory,
  22. Vitek, O. (2005) PhD Dissertation, Department of Statistics, Purdue UniversityGoogle Scholar
  23. Vitek O., Bailey-Kellogg C., Craig B., Kuliniewicz P., Vitek J. (2005) Bioinformatics 21(Suppl 2):ii230–ii236CrossRefGoogle Scholar
  24. Vitek, O., Vitek, J., Craig, B. and Bailey-Kellogg, C. (2004) Stat. Appl. Genet. Mol. Biol. 3, Article 6. Available at: Scholar
  25. Wan Y., Jardetzky O. (2002) J. Am. Chem. Soc. 124:14075–14084CrossRefGoogle Scholar
  26. Wan X., Tegos T., Lin G. (2004) J. Bioinform. Comp. Biol. 2:747–764CrossRefGoogle Scholar
  27. Wang, J., Wang, T., Zuiderweg, E. and Crippen, G. (2005) J. Bioinform. Comp. Biol., 33, 261–279Google Scholar
  28. Wüthrich K. (2003) Angew. Chem.-Int. Edit. 42:3340–3363CrossRefGoogle Scholar
  29. Zhang H., Neal S., Wishart D.S. (2003) J. Biomol. NMR 25:173–195CrossRefGoogle Scholar
  30. Zimmerman D.E., Kulikowski C.A., Huang Y., Feng W., Tashiro M., Shimotakahara S.S., Chien C., Powers R., Montelione G.T. (1997) J. Mol. Biol. 269:592–610CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  • Olga Vitek
    • 1
  • Chris Bailey-Kellogg
    • 2
  • Bruce Craig
    • 3
  • Jan Vitek
    • 4
  1. 1.Institute for Systems BiologySeattleUSA
  2. 2.Department of Computer ScienceDartmouth CollegeHanoverUSA
  3. 3.Department of StatisticsPurdue UniversityWest LafayetteUSA
  4. 4.Department of Computer SciencesPurdue UniversityWest LafayetteUSA

Personalised recommendations