Content Profiling for Preservation: Improving Scale, Depth and Quality

  • Artur Kulmukhametov
  • Christoph Becker
Conference paper

DOI: 10.1007/978-3-319-12823-8_1

Part of the Lecture Notes in Computer Science book series (LNCS, volume 8839)
Cite this paper as:
Kulmukhametov A., Becker C. (2014) Content Profiling for Preservation: Improving Scale, Depth and Quality. In: Tuamsuk K., Jatowt A., Rasmussen E. (eds) The Emergence of Digital Libraries – Research and Practices. ICADL 2014. Lecture Notes in Computer Science, vol 8839. Springer, Cham

Abstract

Content profiling in digital preservation is a crucial step that enables controlled management of content over time. However, large-scale profiling is facing a set of challenges. As data grows and gets more diverse, the only option to control it is to combine outputs of multiple characterization tools to cover the varieties of formats and extract features of interest. This cooperation of tools introduces conflicting measures and poses challenges on data quality. Sparsity and labeling conflicts make it difficult or impossible to partition, sample and analyze large metadata sets of a content profile. Without this, however, it is virtually impossible to manage heterogeneous collections reliably over time.

In this paper, we present the content profiling tool C3PO, which includes rule-based techniques and heuristics designed for conflict reduction. We conduct a set of experiments in which we assess the effect of creating such a mechanisms and rule set on the quality and effectiveness of content profiling. The results show the potential of simple conflict reduction rules to strongly improve data quality of content profiling for analysis and decision support.

Keywords

Digital Preservation Characterization Content Profiling Conflict Reduction 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Artur Kulmukhametov
    • 1
  • Christoph Becker
    • 1
    • 2
  1. 1.Information and Software Engineering GroupVienna University of TechnologyAustria
  2. 2.Faculty of InformationUniversity of TorontoCanada

Personalised recommendations