CONNJUR R: an annotation strategy for fostering reproducibility in bio-NMR—protein spectral assignment
- 249 Downloads
Reproducibility is a cornerstone of the scientific method, essential for validation of results by independent laboratories and the sine qua non of scientific progress. A key step toward reproducibility of biomolecular NMR studies was the establishment of public data repositories (PDB and BMRB). Nevertheless, bio-NMR studies routinely fall short of the requirement for reproducibility that all the data needed to reproduce the results are published. A key limitation is that considerable metadata goes unpublished, notably manual interventions that are typically applied during the assignment of multidimensional NMR spectra. A general solution to this problem has been elusive, in part because of the wide range of approaches and software packages employed in the analysis of protein NMR spectra. Here we describe an approach for capturing missing metadata during the assignment of protein NMR spectra that can be generalized to arbitrary workflows, different software packages, other biomolecules, or other stages of data analysis in bio-NMR. We also present extensions to the NMR-STAR data dictionary that enable machine archival and retrieval of the “missing” metadata.
KeywordsCONNJUR Data model Reproducibility Analysis NMR-STAR
This research was funded by United States National Institutes of Health Grant GM-083072. The authors would like to thank Dr. Mark Maciejewski for kindly providing time-domain data of the Samp3 protein and Dr. Woonghee Lee for adding the reproducibility extensions to the NMRFam release of Sparky.
Conflict of interest
The authors declare that they have no conflict of interest.
- Dall’Olio GM, Bertranpetit J, Laayouni H (2010) The annotation and the usage of scientific databases could be improved with public issue tracker software. Database 2010:baq035Google Scholar
- Eclipse IDE (2007) The Eclipse Foundation. www.eclipse.org
- Goddard TD, Kneller DG (2004) SPARKY 3. University of California, San Francisco, p 15Google Scholar
- Güntert P (2004) Automated NMR structure calculation with CYANA. In: Downing AK (ed) Methods in Molecular Biology, vol. 278: Protein NMR techniques. Humana Press, Totowa, pp 353–378Google Scholar
- Johnson BA (2004) Using NMRView to visualize and analyze the NMR spectra of macromolecules. In: Downing AK (ed) Methods in Molecular Biology, vol. 278: Protein NMR techniques. Humana Press, Totowa, pp 313–352Google Scholar
- Keller RLJ (2004) Optimizing the process of nuclear magnetic resonance spectrum analysis and computer aided resonance assignment. Diss ETH No. 15947. Diss. Swiss Federal Institute of Technology, ZurichGoogle Scholar
- Loeliger J, McCullough M (2012) Version control with Git: powerful tools and techniques for collaborative software development. O’Reilly Media Inc, SebastopolGoogle Scholar
- Open Source Initiative (2006) The MIT License. http://opensource.org/licenses/MIT
- Rowland NMR Toolkit Script Generator. Web. September 18, 2014. http://sbtools.uchc.edu/nmr/nmr_toolkit/
- Stodden V, Miguez S (2014) Best practices for computational science: software infrastructure and environments for reproducible and extensible research. J Open Res Softw 2(1):1–6. doi: 10.5334/jors.ay