Just One Bit in a Million: On the Effects of Data Corruption in Files

  • Volker Heydegger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5714)


So far little attention has been paid to file format robustness, i.e., a file formats capability for keeping its information as safe as possible in spite of data corruption. The paper on hand reports on the first comprehensive research on this topic. The research work is based on a study on the status quo of file format robustness for various file formats from the image domain. A controlled test corpus was built which comprises files with different format characteristics. The files are the basis for data corruption experiments which are reported on and discussed.


digital preservation file format file format robustness data integrity data corruption bit error error resilience 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Avcıbas, I., Sankur, B., Sayood, K.: Statistical evaluation of image quality measures. Journal of Electronic Imaging 11(2), 206–223 (2002)CrossRefGoogle Scholar
  2. 2.
    Bairavasundaram, L.N., et al.: An Analysis of Data Corruption in the Storage Stack. ACM Transactions on Storage 4(3) (2008)Google Scholar
  3. 3.
    Buonora, P., Liberati, F.: A Format for Digital Preservation of Images: A Study on JPEG 2000 File Robustness. D-Lib Magazine 7/8 (2008), (accessed May 2009)
  4. 4.
    Chapman, S., et al.: Page Image Compression for Mass Digitization. In: Archiving 2007. Final program and proceedings, pp. 37–42 (2007)Google Scholar
  5. 5.
    Gilesse, R., Rog, J., Verheusen, A.: Life Beyond uncompressed TIFF: Alternative File Formats for the Storage of Master Image Files. In: Archiving 2008. Final program and proceedings, pp. 41–46 (2008)Google Scholar
  6. 6.
    Heydegger, V.: Analysing the Impact of File Formats on Data Integrity. In: Archiving 2008. Final program and proceedings, pp. 50–55 (2008)Google Scholar
  7. 7.
    Iraci, J.: The Relative Stabilities of Optical Disk Formats. Restaurator 26(2) (2005)Google Scholar
  8. 8.
    ISO/IEC 15444-5:2003. JPEG 2000 image coding system (2003) Google Scholar
  9. 9.
    Matsumoto, M., Nishimura, T.: Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Trans. on Modeling and Computer Simulation 8(1), 3–30 (1998)CrossRefzbMATHGoogle Scholar
  10. 10.
    Panzer-Steindel, B.: Data Integrity, internal CERN/IT study (2007), (accessed May 2009)
  11. 11.
    Schroeder, B., Gibson, G.A.: Disk failures in the real world: What does an mttf of 1,000,000 hours mean to you? In: Proceedings of the 5th USENIX Conference on File and Storage Technologies, FAST (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Volker Heydegger
    • 1
  1. 1.Historisch-Kulturwissenschaftliche Informationsverarbeitung (HKI), Albertus-Magnus-PlatzUniversität zu KölnKölnGermany

Personalised recommendations