Abstract
We present general algorithms for the compression of molecular dynamics trajectories. The standard ways to store MD trajectories as text or as raw binary floating point numbers result in very large files when efficient simulation programs are used on supercomputers. Our algorithms are based on the observation that differences in atomic coordinates/velocities, in either time or space, are generally smaller than the absolute values of the coordinates/velocities. Also, it is often possible to store values at a lower precision. We apply several compression schemes to compress the resulting differences further. The most efficient algorithms developed here use a block sorting algorithm in combination with Huffman coding. Depending on the frequency of storage of frames in the trajectory, either space, time, or combinations of space and time differences are usually the most efficient. We compare the efficiency of our algorithms with each other and with other algorithms present in the literature for various systems: liquid argon, water, a virus capsid solvated in 15 mM aqueous NaCl, and solid magnesium oxide. We perform tests to determine how much precision is necessary to obtain accurate structural and dynamic properties, as well as benchmark a parallelized implementation of the algorithms. We obtain compression ratios (compared to single precision floating point) of 1:3.3–1:35 depending on the frequency of storage of frames and the system studied.
Similar content being viewed by others
References
Hess B, Kutzner C, van der Spoel D, Lindahl E (2008) J Chem Theory Comput 4(3):435
Bowers KJ, Chow E, Xu H, Dror RO, Eastwood MP, Gregersen BA, Klepeis JL, Kolossváry I, Moraes MA, Sacerdoti FD, Salmon JK, Shan Y, Shaw DE (2006) SC ’06: Proceedings of the ACM/IEEE Conference on Supercomputing. ACM, New York
Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kale L, Schulten K (2005) J Comp Chem 26:1781
Gailly J et al (2010) GZIP version 1.4. http://ftp.gnu.org/gnu/gzip/
Seward J (2008) BZIP2 version 1.0.5. http://www.bzip.org/
Green D, Meacham KE, Surridge M, van Hoesel F, Berendsen HJC (1995) Methods and techniques in computational chemistry: METECC-95. STEF, Cagliari, p 435
Melo A, Puga AT, Gentil F, Brito N, Alves AP, Ramos MJ (2000) J Chem Inf Comput Sci 40:559
Meyer T, Ferrer-Costa C, Pérez A, Rueda M, Bidon-Chanal A, Luque FJ, Laughton A, Oronzco M (2006) J Chem Theory Comput 2:251
Uppsala Universitet (2010)TrajNG—trajectory compression library. http://www.uppmax.uu.se/Members/daniels/trajng-trajectory-compression-library
Burrows M, Wheeler DJ (1994) SRC research report. Digital Equipment Corporation, Palo Alto
Ziv J, Lempel A (1977) IEEE Trans Inf Theory IT23:337
Huffman DV (1952) IRE 40:1098
Bentley J, Sleator D, Tarjan R, Wei V (1986) Commun ACM 29(4):320
Schulz R, Lindner B, Petridis L, Smith J (2009) J Chem Theory Comput 5:2798
Allen MP, Tildesley DJ (1987) Computer simulation of liquids. Clarendon, Oxford
Swope WC, Andersen HC, Berens PH, Wilson KR (1982) J Chem Phys 76:637
Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML (1983) J Chem Phys 79:926
Nosé S (1984) Mol Phys 52:255
Hoover WG (1985) Phys Rev A 31:1695
Andersen HC (1983) J Comput Phys 52:24
Cicotti G, Ferrario M, Ryckaert J (1982) Mol Phys 47(6):1253
Harding JH, Harker AH (1985) Phil Mag B 25(3):119
Cleveland CL (1988) J Chem Phys 89(8):4987
Parrinello M, Rahman A (1981) J App Phys 52(12):7182
Mitchell PJ, Fincham D (1993) J Phys Condens Matter 5:1031
Jones T, Liljas L (1984) J Mol Biol 177:735
Berendsen HJC, Postma JPM, van Gunsteren WF, Nola AD, Haak JR (1984) J Chem Phys 81:3684
Miyamoto S, Kollman P (1992) J Comput Chem 13:952
Hess B (2008) J Chem Theory Comput 4:116
Acknowledgements
The computations were performed on resources provided by the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX), and resources provided by SNIC through the National Supercomputer Centre (NSC).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
A. Automatic selection of optimal compression algorithms
The optimal compression algorithm to use depends on the system simulated and the frequency with which frames are written to the trajectory file. The first time a block of frames is to be compressed and written to disk, we run a test of all compression algorithms and choose the one that gives the smallest compressed size. This test is performed only once, so all subsequent blocks are compressed using the same compression algorithm as initially determined. The selection of algorithms to include in the test is controlled by a parameter to the library routines.
B. Portable storage
Our implementation writes all integers with the least significant byte first, making the file format essentially little endian. However the file format is completely portable, since all external (I/O) references in our implementation are done using individual bytes only. This means that any system endianness—either big, little, or mixed—is handled portably. Also, we never store floating point values, only properly scaled fixed point numbers (integers). All text stored in the file is written as ASCII (automatic conversion to/from the source encoding is performed).
C. File sizes
Tables 3, 4, 5, 6, 7 and 8 show the raw file sizes from the simulation trajectories compressed with the different algorithms.
Rights and permissions
About this article
Cite this article
Spångberg, D., Larsson, D.S.D. & van der Spoel, D. Trajectory NG: portable, compressed, general molecular dynamics trajectories. J Mol Model 17, 2669–2685 (2011). https://doi.org/10.1007/s00894-010-0948-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00894-010-0948-5