Digital Preservation Metadata Practice for Disk Image Access

  • Alexandra Chassanoff
  • Kam Woods
  • Christopher A. Lee
Chapter

Abstract

Many libraries, archives, and museums are now regularly acquiring, processing, and analyzing born-digital materials. Materials exist on a variety of source media, including flash drives, hard drives, floppy disks, and optical media. Extracting disk images (i.e., sector-by-sector copies of digital media) is an increasingly common practice. It can be essential to ensuring provenance, original order, and chain of custody. Disk images allow users to explore and interact with the original data without risk of permanent alteration. These replicas help institutions to safeguard against modifications to underlying data that can occur when a file system contained on a storage medium is mounted, or a bootable medium is powered up. Retention of disk images can substantially reduce preservation risks. Digital storage media become progressively difficult (or impossible) to read over time, due to “bit rot,” obsolescence of media, and reduced availability of devices to read them. Simply copying the allocated files off a disk and discarding the storage carrier, however, can be problematic. The ability to access and render the content of files can depend upon the presence of other data that resided on the disk. These dependencies are often not obvious upon first inspection and may only be discovered after the original medium is no longer readable or available. Disk images also enable a wide range of potential access approaches, including dynamic browsing of disk images (Misra S, Lee CA, Woods K (2014) A Web Service for File-Level Access to Disk Images. Code4Lib Journal, 25 [3]) and emulation of earlier computing platforms. Disk images often contain residual data, which may consist of previously hidden or deleted files (Redwine G, et al. in Born digital: guidance for donors, dealers, and archival repositories. Council on Library and Information Resources, Washington, 2013 [4]). Residual data can be valuable for scholars interested in learning about the context of creation. Traces of activities undertaken in the original environment—for example, identifying removable media connected to a host machine or finding contents of browser caches—can provide additional sources of information for researchers and facilitate the preservation of materials (Woods K, et al. in Proceedings of the 11th annual international ACM/IEEE joint conference on digital libraries, pp. 57–66, 2011 [5]). Digital forensic tools can be used to create disk images in a wide range of formats. These include raw files (such as those produced by the Unix tool dd). Quantifying successes and failures for many tools can require judgment calls by qualified digital curation professionals. Verifying a checksum for a file is a simple case; the checksums either match or are different. In the events described in the previous sections, however, the conditions for success are fuzzier. For example, fiwalk will often “successfully” complete whether or not it is able to extract a meaningful record of the contents of file system(s) on a disk image. Likewise, bulk_extractor will simply report items of interest it has discovered. Knowing whether this output is useful (and whether it has changed between separate executions of a given tool) depends on comparison of the output between the two runs, information not currently recorded in the PREMIS document. In the BitCurator implementation, events are often recorded as having completed, rather than as having succeeded, to avoid ambiguity. Future iterations of the implementation may include more nuanced descriptions of event outcomes.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dooley JM, Luce K (2010) Taking our pulse: the OCLC research survey of special collections and archives. OCLC Research, Dublin, OhioGoogle Scholar
  2. 2.
    Lee CA (2012) Archival application of digital forensics methods for authenticity, description and access provision. Comma 2(4):133–139CrossRefGoogle Scholar
  3. 3.
    Misra S, Lee CA, Woods K (2014) A web service for file-level access to disk images. Code4 Lib J 25. http://journal.code4lib.org/articles/9773. Accessed 23 Oct 2016
  4. 4.
    Redwine G, Barnard M, Donovan K, Farr E, Forstrom M, Hansen W, Leighton John J, Kuhl N, Shaw S, Thomas S (2013) Born digital: guidance for donors, dealers, and archival repositories. Council on Library and Information Resources, Washington, DCGoogle Scholar
  5. 5.
    Woods K, Lee CA, Garfinkel S (2011) Extending digital repository architectures to support disk image preservation and access. In: Proceedings of the 11th annual international ACM/IEEE joint conference on digital libraries (pp 57–66). New York, NY. Association for Computing Machinery. doi:10.1145/1998076.1998088. Accessed 15 Jan 2016
  6. 6.
    Encase image file format, http://www.forensicswiki.org/wiki/Encase_image_file_format. Accessed 12 Dec 2015
  7. 7.
    Advanced forensics format, http://www.forensicswiki.org/wiki/AFF. Accessed 12 Dec 2015
  8. 8.
    Wikipedia (2016) Apple Disk Image. https://en.wikipedia.org/w/index.php?title=Apple_Disk_Image&oldid=693887000. Accessed 15 Jan 2016
  9. 9.
    Guymager, http://guymager.sourceforge.net/. Accessed 12 Dec 2015
  10. 10.
    fiwalk, http://www.forensicswiki.org/wiki/Fiwalk. Accessed 12 Dec 2015
  11. 11.
    Garfinkel S (2012) Digital forensics XML and the DFXML toolset. Digital Investig 8:161–174CrossRefGoogle Scholar
  12. 12.
    Lee CA, Woods K, Kirschenbaum M, Chassanoff A (2013) From bitstreams to heritage: putting digital forensics into practice in collecting institutions. http://www.bitcurator.net/docs/bitstreams-to-heritage.pdf. Accessed 15 Jan 2016
  13. 13.
    AIMS Work Group (2012) AIMS born-digital collections: an inter-institutional model for stewardship. http://www2.lib.virginia.edu/aims/whitepaper/AIMS_final.pdf. Accessed 15 Jan 2016
  14. 14.
    Gengenbach MJ (2012) ‘The way we do it here’: mapping digital forensics workflows in collecting institutions. A master’s paper for the M.S. in L.S degree. http://digitalcurationexchange.org/system/files/gengenbach-forensic-workflows-2012.pdf. Accessed 15 Jan 2016
  15. 15.
  16. 16.
  17. 17.
  18. 18.
    Lee CA (2012) Digital curation as communication mediation. In: Mehler A, Romary L, Gibbon D (eds) Handbook of technical communication. Mouton De Gruyter, Berlin, pp 507–530Google Scholar
  19. 19.
    PREMIS Editorial Committee (2015) PREMIS data dictionary for preservation metadata, version 3.0. http://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf. Accessed 15 Jan 2016
  20. 20.
    PREMIS Editorial Committee (2015) PREMIS data dictionary for preservation metadata, version 2.0. http://www.loc.gov/standards/premis/v2/premis-dd-2-0.pdf. Accessed 15 Jan 2016
  21. 21.
    Forensics Wiki (2015) Bulk extractor. http://www.forensicswiki.org/wiki/Bulk_extractor. Accessed 15 Jan 2016
  22. 22.
    Consultative Committee for Space Data System (CCSDS) (2012) Reference model for an Open Archival Information System (OAIS). http://public.ccsds.org/publications/archive/650x0m2.pdf. Accessed 15 Jan 2016

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Alexandra Chassanoff
    • 1
  • Kam Woods
    • 1
  • Christopher A. Lee
    • 1
  1. 1.University of North CarolinaChapel HillUSA

Personalised recommendations