, Volume 11, Issue 3-4, pp 293-309,
Open Access This content is freely available online to anyone, anywhere at any time.

Archival quality and long-term preservation: a research framework for validating the usefulness of digital surrogates

Abstract

Digital archives accept and preserve digital content for long-term use. Increasingly, stakeholders are creating large-scale digital repositories to ingest surrogates of archival resources or digitized books whose intellectual value as surrogates may exceed that of the original sources themselves. Although digital repository developers have expended significant effort to establish the trustworthiness of repository procedures and infrastructures, relatively little attention has been paid to the quality and usefulness of the preserved content itself. In situations where digital content has been created by third-party firms, content quality (or its absence in the form of unacceptable error) may directly influence repository trustworthiness. This article establishes a conceptual foundation for the association of archival quality and information quality research. It outlines a research project that is designed to develop and test measures of quality for digital content preserved in HathiTrust, a large-scale preservation repository. The research establishes methods of measuring error in digitized books at the data, page, and volume level and applies the measures to statistically valid samples of digitized books, adjusting for inter-coder inconsistencies and the effects of sampling strategies. The research findings are then validated with users who conform to one of four use-case scenarios: reading online, printing on demand, data mining, and print collection management. The paper concludes with comments on the implications of assessing archival quality within a digital preservation context.