Skip to main content
Log in

Archival description and linked data: a preliminary study of opportunities and implementation challenges

  • Original Paper
  • Published:
Archival Science Aims and scope Submit manuscript

Abstract

This paper presents the results of a study to investigate how archives can connect their collections to related data sources through the use of Semantic Web technologies, specifically Linked Data. Questions explored included (a) What types of data currently available in archival surrogates such as Encoded Archival Description (EAD) finding aids and Machine-Readable Cataloging (MARC) records may be useful if converted to Linked Data? (b) For those potentially useful data points identified in archival surrogates, how might one align data structures found in those surrogates to the data structures of other relevant internal or external information sources? (c) What features of current standards and data structures present impediments or challenges that must be overcome in order to achieve interoperability among disparate data sources? To answer these questions, the researcher identified metadata elements of potential use as Linked Data in archival surrogates, as well as metadata element sets and vocabularies of data sets that could serve as pathways to relevant external data sources. Data sets chosen for the study included DBpedia and schema.org; metadata element sets examined included Friend of a Friend (FOAF), GeoNames, and Linking Open Description of Events (LODE). The researcher then aligned tags found in the EAD encoding standard to related classes and properties found in these Linked Data sources and metadata element sets. To investigate the third question about impediments to incorporating Linked Data in archival descriptions, the researcher analyzed the locations and frequencies at which controlled and uncontrolled access points (personal and family name, corporate name, geographic name, and genre/form entities) appeared in a sample of MARC and EAD archival descriptive records by using a combination of hand counts and the natural language processing (NLP) tool, OpenCalais. The results of the location and frequency analysis, combined with the results of the alignment process, helped the researcher identify several critical challenges currently impeding interoperability among archival information systems and relevant Linked Data sources, including differences in granularity between archival and other data source vocabularies, and inadequacies of current encoding standards to support semantic tagging of potential access points embedded in free text areas of archival surrogates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. DBpedia, http://dbpedia.org/About.

  2. Simple Knowledge Organization System, http://www.w3.org/2004/02/skos/.

  3. OWL2 Web Ontology Language, http://www.w3.org/TR/owl2-overview/.

  4. SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/.

  5. OhioLink Finding Aid Repository, http://ead.ohiolink.edu/xtf-ead/.

  6. Schema.org, http://schema.org.

  7. Civil War Data 150, http://www.civilwardata150.net.

  8. Linked Open Copac and Archives Hub (LOCAH) Project, http://archiveshub.ac.uk/locah/about/.

  9. Europeana, http://www.europeana.eu.

  10. Social Networks and Archival Context Project, http://socialarchive.iath.virginia.edu.

  11. EAC-CPF Ontology, http://archivi.ibc.regione.emilia-romagna.it/ontology/reference_document/referencedocument.html.

  12. FOAF Vocabulary Specification 0.98, http://xmlns.com/foaf/spec/; DCMI Metadata Terms, http://dublincore.org/documents/dcmi-terms/; BIO: A vocabulary for biographical information, http://vocab.org/bio/0.1/.html; The Virtual International Authority File, http://viaf.org.

  13. Linked Jazz, http://linkedjazz.org.

  14. GeoNames, http://www.geonames.org; Getty Thesaurus of Geographic Names, http://www.getty.edu/research/tools/vocabularies/tgn/; MARC List for Geographic Areas, http://www.loc.gov/marc/geoareas/; W3C Geospatial Vocabulary, http://www.w3.org/2003/01/geo/wgs84_pos#.

  15. CIDOC Conceptual Reference Model (CIDOC-CRM), http://www.cidoc-crm.org.

  16. Library of Congress Name Authority File, http://authorities.loc.gov.

  17. Linking Open Description of Events Ontology, http://linkedevents.org/ontology/.

  18. The Society of American Archivists’ Glossary of Archival and Records Terminology defines an artificial collection as “a collection of materials with different provenance assembled and organized to facilitate its management or use.”

  19. The OpenCalais web service may be found at http://viewer.opencalais.com.

  20. Because of the nature of the 1XX fields (which are not repeatable), one may reliably assume that the median number of personal/family name headings for personal papers is 1, and is 0 for corporate papers and government records. The use of 1XX in artificial collections varies. The cataloger may use a personal name, corporate body name, or collection title for main entry. For the records examined in this study, a personal or family name was recorded in the 1XX field for forty percent of the records examined, with the other 60 % having a title for the main entry.

  21. As noted in the section on personal/family names, the 1XX fields are not repeatable. Thus, we can reliably predict that the median number of corporate body access points in the 1XX fields for each type of record: none for personal/family papers, 1 for corporate records, and 1 for government records. For artificial collections, no corporate body names were used in the 110 field (although that does not preclude their use).

References

Download references

Acknowledgments

The author wishes to acknowledge the Institute of Museum and Library Services National Leadership Grant program for their support of this research. This project also would not have been possible without the input and assistance of Dr. Marcia Lei Zeng, principal investigator for the Metadata Vocabulary Junction Project (http://lod-lam.slis.kent.edu), and research assistants Laurence Skirvin and Sammy Davidson. In addition, the author thanks Thomas Baker (Dublin Core Metadata Initiative), Jeff Young (OCLC) and Maja Žumer (University of Ljubljana) for sharing their wisdom about Linked Data, metadata, and ontologies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karen F. Gracy.

Appendices

Appendix 1: Alignment of EAD tags with MARC fields and classes/properties of various Linked Data ontologies

Table 2 EAD tag elements crosswalked to MARC fields and subfields
Table 3 EAD data element tags aligned to DBpedia and schema.org classes and properties
Table 4 EAD data element tags aligned to FOAF, LODE, and GeoNames classes and properties

Appendix 2: Entity Counts in Sample of Archival Collection MARC Records

Table 5 Personal/family names found in MARC records of archival collections
Table 6 Corporate body names found in MARC records of archival collections
Table 7 Geographic names found in MARC records of archival collections
Table 8 Genre/form names found in MARC records of archival collections

Appendix 3: Entity Counts in Sample of Archival Collection EAD Records

Table 9 Personal/family names found in finding aids of archival collections
Table 10 Corporate body names found in finding aids of archival collections
Table 11 Geographic names found in finding aids of archival collections
Table 12 Genre/form terms found in finding aids of archival collections

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gracy, K.F. Archival description and linked data: a preliminary study of opportunities and implementation challenges. Arch Sci 15, 239–294 (2015). https://doi.org/10.1007/s10502-014-9216-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10502-014-9216-2

Keywords

Navigation