Skip to main content
Log in

Trajectory NG: portable, compressed, general molecular dynamics trajectories

  • Original Paper
  • Published:
Journal of Molecular Modeling Aims and scope Submit manuscript

Abstract

We present general algorithms for the compression of molecular dynamics trajectories. The standard ways to store MD trajectories as text or as raw binary floating point numbers result in very large files when efficient simulation programs are used on supercomputers. Our algorithms are based on the observation that differences in atomic coordinates/velocities, in either time or space, are generally smaller than the absolute values of the coordinates/velocities. Also, it is often possible to store values at a lower precision. We apply several compression schemes to compress the resulting differences further. The most efficient algorithms developed here use a block sorting algorithm in combination with Huffman coding. Depending on the frequency of storage of frames in the trajectory, either space, time, or combinations of space and time differences are usually the most efficient. We compare the efficiency of our algorithms with each other and with other algorithms present in the literature for various systems: liquid argon, water, a virus capsid solvated in 15 mM aqueous NaCl, and solid magnesium oxide. We perform tests to determine how much precision is necessary to obtain accurate structural and dynamic properties, as well as benchmark a parallelized implementation of the algorithms. We obtain compression ratios (compared to single precision floating point) of 1:3.3–1:35 depending on the frequency of storage of frames and the system studied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Hess B, Kutzner C, van der Spoel D, Lindahl E (2008) J Chem Theory Comput 4(3):435

    Article  CAS  Google Scholar 

  2. Bowers KJ, Chow E, Xu H, Dror RO, Eastwood MP, Gregersen BA, Klepeis JL, Kolossváry I, Moraes MA, Sacerdoti FD, Salmon JK, Shan Y, Shaw DE (2006) SC ’06: Proceedings of the ACM/IEEE Conference on Supercomputing. ACM, New York

  3. Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kale L, Schulten K (2005) J Comp Chem 26:1781

    Article  CAS  Google Scholar 

  4. Gailly J et al (2010) GZIP version 1.4. http://ftp.gnu.org/gnu/gzip/

  5. Seward J (2008) BZIP2 version 1.0.5. http://www.bzip.org/

  6. Green D, Meacham KE, Surridge M, van Hoesel F, Berendsen HJC (1995) Methods and techniques in computational chemistry: METECC-95. STEF, Cagliari, p 435

  7. Melo A, Puga AT, Gentil F, Brito N, Alves AP, Ramos MJ (2000) J Chem Inf Comput Sci 40:559

    CAS  Google Scholar 

  8. Meyer T, Ferrer-Costa C, Pérez A, Rueda M, Bidon-Chanal A, Luque FJ, Laughton A, Oronzco M (2006) J Chem Theory Comput 2:251

    Article  CAS  Google Scholar 

  9. Uppsala Universitet (2010)TrajNG—trajectory compression library. http://www.uppmax.uu.se/Members/daniels/trajng-trajectory-compression-library

  10. Burrows M, Wheeler DJ (1994) SRC research report. Digital Equipment Corporation, Palo Alto

  11. Ziv J, Lempel A (1977) IEEE Trans Inf Theory IT23:337

    Article  Google Scholar 

  12. Huffman DV (1952) IRE 40:1098

    Article  Google Scholar 

  13. Bentley J, Sleator D, Tarjan R, Wei V (1986) Commun ACM 29(4):320

    Article  Google Scholar 

  14. Schulz R, Lindner B, Petridis L, Smith J (2009) J Chem Theory Comput 5:2798

    Article  CAS  Google Scholar 

  15. Allen MP, Tildesley DJ (1987) Computer simulation of liquids. Clarendon, Oxford

    Google Scholar 

  16. Swope WC, Andersen HC, Berens PH, Wilson KR (1982) J Chem Phys 76:637

    Article  CAS  Google Scholar 

  17. Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML (1983) J Chem Phys 79:926

    Article  CAS  Google Scholar 

  18. Nosé S (1984) Mol Phys 52:255

    Article  Google Scholar 

  19. Hoover WG (1985) Phys Rev A 31:1695

    Article  Google Scholar 

  20. Andersen HC (1983) J Comput Phys 52:24

    Article  CAS  Google Scholar 

  21. Cicotti G, Ferrario M, Ryckaert J (1982) Mol Phys 47(6):1253

    Article  Google Scholar 

  22. Harding JH, Harker AH (1985) Phil Mag B 25(3):119

    Google Scholar 

  23. Cleveland CL (1988) J Chem Phys 89(8):4987

    Article  CAS  Google Scholar 

  24. Parrinello M, Rahman A (1981) J App Phys 52(12):7182

    Article  CAS  Google Scholar 

  25. Mitchell PJ, Fincham D (1993) J Phys Condens Matter 5:1031

    Article  CAS  Google Scholar 

  26. Jones T, Liljas L (1984) J Mol Biol 177:735

    Article  CAS  Google Scholar 

  27. Berendsen HJC, Postma JPM, van Gunsteren WF, Nola AD, Haak JR (1984) J Chem Phys 81:3684

    Article  CAS  Google Scholar 

  28. Miyamoto S, Kollman P (1992) J Comput Chem 13:952

    Article  CAS  Google Scholar 

  29. Hess B (2008) J Chem Theory Comput 4:116

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The computations were performed on resources provided by the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX), and resources provided by SNIC through the National Supercomputer Centre (NSC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Spångberg.

Appendix

Appendix

A. Automatic selection of optimal compression algorithms

The optimal compression algorithm to use depends on the system simulated and the frequency with which frames are written to the trajectory file. The first time a block of frames is to be compressed and written to disk, we run a test of all compression algorithms and choose the one that gives the smallest compressed size. This test is performed only once, so all subsequent blocks are compressed using the same compression algorithm as initially determined. The selection of algorithms to include in the test is controlled by a parameter to the library routines.

B. Portable storage

Our implementation writes all integers with the least significant byte first, making the file format essentially little endian. However the file format is completely portable, since all external (I/O) references in our implementation are done using individual bytes only. This means that any system endianness—either big, little, or mixed—is handled portably. Also, we never store floating point values, only properly scaled fixed point numbers (integers). All text stored in the file is written as ASCII (automatic conversion to/from the source encoding is performed).

C. File sizes

Tables 3, 4, 5, 6, 7 and 8 show the raw file sizes from the simulation trajectories compressed with the different algorithms.

Table 3 Trajectory file sizes in bytes from the liquid argon simulation trajectory. The results for different compression algorithms are shown. For comparison, the uncompressed trajectories where values are stored as 32-bit floats are also shown
Table 4 Trajectory file sizes in bytes from the liquid water simulation trajectory. The results for different compression algorithms are shown. For comparison, the uncompressed trajectories where values are stored as 32-bit floats are shown
Table 5 Trajectory file sizes in bytes from the liquid water simulation trajectory stored with high accuracy. The results for different compression algorithms are shown. For comparison, the uncompressed trajectories where values are stored as 32-bit floats are shown
Table 6 Trajectory file sizes in bytes from the solid magnesium oxide simulation trajectory. The results for different compression algorithms are shown. For comparison, the uncompressed trajectories where values are stored as 32-bit floats are shown
Table 7 Trajectory file sizes in bytes from the virus-in-water simulation trajectory. The results for different compression algorithms are shown. For comparison, the uncompressed trajectories where values are stored as 32-bit floats are shown
Table 8 Trajectory file sizes in bytes from the virus-in-water simulation trajectory. The results for different compression algorithms are shown. For comparison, the uncompressed trajectories where values are stored as 32-bit floats are shown

Rights and permissions

Reprints and permissions

About this article

Cite this article

Spångberg, D., Larsson, D.S.D. & van der Spoel, D. Trajectory NG: portable, compressed, general molecular dynamics trajectories. J Mol Model 17, 2669–2685 (2011). https://doi.org/10.1007/s00894-010-0948-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00894-010-0948-5

Keywords

Navigation