The Journal of Supercomputing

, Volume 62, Issue 1, pp 150–173 | Cite as

Protein simulation data in the relational model

  • Andrew M. Simms
  • Valerie Daggett


High performance computing is leading to unprecedented volumes of data. Relational databases offer a robust and scalable model for storing and analyzing scientific data. However, these features do not come without a cost—significant design effort is required to build a functional and efficient repository. Modeling protein simulation data in a relational database presents several challenges: The data captured from individual simulations are large, multidimensional, and must integrate with both simulation software and external data sites. Here, we present the dimensional design and relational implementation of a comprehensive data warehouse for storing and analyzing molecular dynamics simulations using SQL Server.


Data warehouse Relational database 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Codd EF (1970) A relational model of data for large shared data banks. Commun ACM 13:377–387 zbMATHCrossRefGoogle Scholar
  2. 2.
    Codd EF, Codd SB et al (1993) Providing OLAP to user-analysts: an IT mandate Google Scholar
  3. 3.
    Berman HM, Westbrook J et al (2000) The protein data bank. Nucleic Acids Res 28:235–242 CrossRefGoogle Scholar
  4. 4.
    Henrick K, Feng Z et al (2008) Remediation of the protein data bank archive. Nucleic Acids Res 36:D426-33 Google Scholar
  5. 5.
    Simms AM, Toofanny RD, Kehl C, Benson NC, Daggett V (2008) Dynameomics: design of a computational lab workflow and scientific data repository for protein simulations. Protein Eng Des Sel 21:369–377 CrossRefGoogle Scholar
  6. 6.
    Schaeffer RD, Jonsson AL, Simms AM, Daggett V (2011) Generation of a consensus protein domain dictionary. Bioinformatics 27:46–54 CrossRefGoogle Scholar
  7. 7.
    Simms AM, Beck DAC, Jonsson AL, Schaeffer RD, Daggett V (2011) The molecular mechanics parameter markup language (submitted for publication) Google Scholar
  8. 8.
    Beck DAC, Alonso DOV, Daggett V (2000–2011) in lucem molecular mechanics (ilmm) Google Scholar
  9. 9.
    Toofanny RD, Simms AM, Beck DAC, Daggett V (2011) Implementation of 3D spatial indexing and compression in a large-scale molecular dynamics simulation database for rapid atomic contact detection. BMC Bioinform 12:334 CrossRefGoogle Scholar
  10. 10.
    Levitt M (1983) Molecular dynamics of native protein. I. Computer simulation of trajectories. J Mol Biol 168:595–617 CrossRefGoogle Scholar
  11. 11.
    Levitt M, Hirshberg M, Sharon R, Daggett V (1995) Potential energy function and parameters for simulations of the molecular dynamics of proteins and nucleic acids in solution. Comput Phys Commun 91:215–231 CrossRefGoogle Scholar
  12. 12.
    Microsoft Corporation (2007) SQL server 2008 Google Scholar
  13. 13.
    International Organization for Standardization, International Electrotechnical Commission (2001) Information technology: database languages: SQL. Part 1, Framework (SQL/framework). Geneva Google Scholar
  14. 14.
    Microsoft Corporation (2010) SQL Server Books Online Google Scholar
  15. 15.
    Fritchey G, Dam S (2009) SQL server 2008 query performance tuning distilled. New York Google Scholar
  16. 16.
    IEEE Computer Society Standards Committee, IEEE Standards Board et al (1985) IEEE standard for binary floating-point arithmetic Google Scholar
  17. 17.
    Kehl CE, Simms AM, Toofanny RD, Daggett V (2008) Dynameomics: a multi-dimensional analysis-optimized database for dynamic protein data. Protein Eng Des Sel 21:379–386 CrossRefGoogle Scholar
  18. 18.
    Simms AM, Daggett V (2011) (in preparation) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Biomedical and Health Informatics ProgramUniversity of WashingtonSeattleUSA
  2. 2.BioengineeringUniversity of WashingtonSeattleUSA

Personalised recommendations