Archival Science

, Volume 15, Issue 3, pp 239–294 | Cite as

Archival description and linked data: a preliminary study of opportunities and implementation challenges

  • Karen F. Gracy
Original Paper


This paper presents the results of a study to investigate how archives can connect their collections to related data sources through the use of Semantic Web technologies, specifically Linked Data. Questions explored included (a) What types of data currently available in archival surrogates such as Encoded Archival Description (EAD) finding aids and Machine-Readable Cataloging (MARC) records may be useful if converted to Linked Data? (b) For those potentially useful data points identified in archival surrogates, how might one align data structures found in those surrogates to the data structures of other relevant internal or external information sources? (c) What features of current standards and data structures present impediments or challenges that must be overcome in order to achieve interoperability among disparate data sources? To answer these questions, the researcher identified metadata elements of potential use as Linked Data in archival surrogates, as well as metadata element sets and vocabularies of data sets that could serve as pathways to relevant external data sources. Data sets chosen for the study included DBpedia and; metadata element sets examined included Friend of a Friend (FOAF), GeoNames, and Linking Open Description of Events (LODE). The researcher then aligned tags found in the EAD encoding standard to related classes and properties found in these Linked Data sources and metadata element sets. To investigate the third question about impediments to incorporating Linked Data in archival descriptions, the researcher analyzed the locations and frequencies at which controlled and uncontrolled access points (personal and family name, corporate name, geographic name, and genre/form entities) appeared in a sample of MARC and EAD archival descriptive records by using a combination of hand counts and the natural language processing (NLP) tool, OpenCalais. The results of the location and frequency analysis, combined with the results of the alignment process, helped the researcher identify several critical challenges currently impeding interoperability among archival information systems and relevant Linked Data sources, including differences in granularity between archival and other data source vocabularies, and inadequacies of current encoding standards to support semantic tagging of potential access points embedded in free text areas of archival surrogates.


Archival description and access Linked data Semantic interoperability Encoded Archival Description (EAD) Machine-readable cataloging (MARC) 



The author wishes to acknowledge the Institute of Museum and Library Services National Leadership Grant program for their support of this research. This project also would not have been possible without the input and assistance of Dr. Marcia Lei Zeng, principal investigator for the Metadata Vocabulary Junction Project (, and research assistants Laurence Skirvin and Sammy Davidson. In addition, the author thanks Thomas Baker (Dublin Core Metadata Initiative), Jeff Young (OCLC) and Maja Žumer (University of Ljubljana) for sharing their wisdom about Linked Data, metadata, and ontologies.


  1. Bermes E (2011) Convergence and interoperability: a Linked Data perspective. IFLA 2011, 13–18 August, 2011, San Juan, Puerto Rico pp. 1–12 Accessed 30 Dec 2013
  2. Berner RC (1971) Manuscript catalogs and other finding aids: what are their relationships? Amer Arch 34:367–372Google Scholar
  3. Berners-Lee T, Hendler J, Lassila O (2001) The Semantic Web. Sci Am (May):29–37Google Scholar
  4. Blanke T, Bryant M, Speck R, Kristel C (2012) Information extraction on noisy texts for historical research. Digital Humanities 2012, 16–20 July 2012, Hamburg, Germany. Accessed 30 Dec 2013
  5. Brickley D, Miller L (2010) FOAF vocabulary specification 0.98. Accessed 30 Dec 2013
  6. Catone J (2008) Australian museum uses Open Calais to tag collection. Accessed 30 Dec 2013
  7. Civil War Data 150 (2010) Civil War Data 150: linking Civil War data across state and federal archives and libraries. Accessed 30 Dec 2013
  8. Clough P, Tang J, Hall M, Warner A (2011) Linking archival data to location: a case study at the UK National Archives. Aslib Proc 63(2–3):127–147Google Scholar
  9. Coats L (2004) Users of EAD finding aids: who are they and are they satisfied? J Arch Organ 2(3):25–39CrossRefGoogle Scholar
  10. Cox RJ (2007) Revisiting the archival finding aid. J Arch Organ 5(4):5–32CrossRefGoogle Scholar
  11. Cox E, Czechowski L (2007) Subject access points in the MARC record and archival finding aid: enough or too many? J Arch Organ 5(4):51–59CrossRefGoogle Scholar
  12. Coyle K (2012) Linked data tools: connecting on the Web. ALA Lib Tech Reports 48(4):10–14Google Scholar
  13. Cyganiak R (2011) The linking open data cloud. Accessed 30 Dec 2013
  14. Davies T (2011) Elements of a linked open data stack. Accessed 30 Dec 2013
  15. Digital Collections and Archives, Tufts University (2013) LiAM: Linked Archival Metadata. Accessed 30 Dec 2013
  16. Dooley J (1992) Subject indexing in context. Am Arch 55:344–354Google Scholar
  17. Dooley J, Luce K (2010) Taking our pulse: the OCLC research survey of special collections and archives. OCLC Research, DublinGoogle Scholar
  18. Dooley J, Beckett R, Cullingford A, Sambrook K, Sheppard C, Worrall S (2013) Survey of special collections and archives in the United Kingdom and Ireland. OCLC Research and RLUK, DublinGoogle Scholar
  19. Duff W (2001) Evaluating metadata on a metalevel. Arch Sci 1:285–294CrossRefGoogle Scholar
  20. Duff W, Johnson C (2003) Where is the list with all the names? Information-seeking behavior of genealogists. Am Arch 66:79–95Google Scholar
  21. Duff W, Stoyanova P (1998) Transforming the crazy quilt: archival displays from a user’s point of view. Archivaria 45:44–79Google Scholar
  22. Eidson MY (2002) Describing anything that walks: the problem behind the problem with EAD. J Arch Organ 1(4):5–28CrossRefGoogle Scholar
  23. Erp M, Oomen J, Segers R et al. (2011) Automatic heritage metadata enrichment with historic events. Museums and the Web 2011, 6–9 April 2011, Philadelphia, PA. Accessed 30 Dec 2013
  24. Feeney K (1999) Retrieval of archival finding aids using world-wide-web search engines. Am Arch 62:206–228Google Scholar
  25. Fons T, Penka J, Wallis R (2012) OCLC’s Linked Data initiative: using to make library data relevant on the web. Inf Stand Q 24(2–3):29–33Google Scholar
  26. Gabriel C (2002) Subject access to archives and manuscript collections: an historical overview. J Arch Organ 1(4):53–63CrossRefGoogle Scholar
  27. Hamburger S (2004) How researchers search for manuscript and archival collections. J Arch Organ 2(1–2):79–102CrossRefGoogle Scholar
  28. Heath T, Bizer C (2011) Linked data: evolving the Web into a global data space. Morgan & Claypool, San RafaelGoogle Scholar
  29. Hienert D, Luciano F (2012) Extraction of historical events from Wikipedia. Proceedings of knowledge discovery and data mining meets linked open data (Know@LOD) workshop at ESWC 2012, 27-31 May 2012, Heraklion, Crete. Accessed 30 Dec 2013
  30. Hyvönen E, Lindquist T, Törnroos J, Mäkelä E (2012) History on the semantic web as linked data: an event gazetteer and timeline for the World War I. Proceedings of CIDOC 2012, Enriching Cultural Heritage, 10–14 June 2012, Helsinki, Finland. Accessed 30 Dec 2013
  31. Isaac A, Clayphan R, Haslhofer B (2012) Europeana: moving to linked open data. Inf Stand Q 24(2–3):34–40CrossRefGoogle Scholar
  32. Larson R, Janakiraman K (2011) Connecting archival collections: the social networks and archival context project. In Gradmann S, et al. (eds) Proceedings of the International Conference on Theory and Practice of Digital Libraries (TPDL 2011), 26–28 September 2011, Berlin, Germany. Lecture Notes in Computer Science 6966. Springer, Berlin, pp. 3–14Google Scholar
  33. Library of Congress (2006) Encoded Archival Description tag library, version 2002, appendix A: EAD crosswalks. Accessed 30 Dec 2013
  34. Light M, Hyry T (2002) Colophons and annotations: new directions for the finding aid. Am Arch 65:216–230Google Scholar
  35. Lynch C (2002) Digital collections, digital libraries, and the digitization of cultural heritage information. First Monday 7(5). Accessed 30 Dec 2013
  36. Lytle R (1980) Intellectual access to archives: I. Provenance and content indexing methods of subject retrieval. Am Arch 43 (Winter 1980): 64–75Google Scholar
  37. MacNeil H (2012) What finding aids do: archival description as rhetorical genre in traditional and web-based environments? Arch Sci 12:485–500. doi: 10.1007/s10502-012-9175-4 CrossRefGoogle Scholar
  38. Mascaro M (2011) Controlled access headings in EAD finding aids: current practices in number of and types of headings assigned. J Arch Organ 9:208–225. doi: 10.1080/15332748.2011.643690 CrossRefGoogle Scholar
  39. Mazzini S, Ricci F (2011) EAC-CPF Ontology and linked archival data. Proceedings of the 1st international workshop on semantic digital archives, 29 Sept 2011, Berlin, Germany. CEUR Workshop Proceedings, vol. 801, pp. 72–81. Accessed 30 Dec 2013
  40. Michelson A (1987) Description and reference in the age of automation. Am Arch 50:192–208Google Scholar
  41. Nesmith T (2005) Reopening archives: bringing new contextualities into archival theory and practice. Archivaria 60:259–274Google Scholar
  42. Nimer C (2011). Applying inheritance: single-level displays and repurposeable metadata. Society of American Archivists, Chicago. Accessed 30 Dec 2013
  43. OpenCalais (2013). How does Calais work? Accessed 30 Dec 2013
  44. Pattuelli MC (2012) Personal name vocabularies as Linked Open Data: a case study of jazz artist names. J Inf Sci 38(6):558–565CrossRefGoogle Scholar
  45. Perkins J, Yoose B (2011) Case study: mining oral history for enhanced access. Poster presentation, Society of American Archivists annual conference, 22–27 August 2011, Chicago, IL. Accessed 30 Dec 2013
  46. Prom C (2004) User interactions with electronic finding aids in a controlled setting. Am Arch 67:234–268Google Scholar
  47. Pugh MJ (1982) The illusion of omniscience: subject access and the reference archivist. Am Arch 45(1):33–44Google Scholar
  48. Raimond Y, Abdallah S (2007) The Event Ontology. Accessed 30 Dec 2013
  49. Redding C (2002) Reengineering finding aids revisited: current archival descriptive practice and its effect on EAD implementation. J Arch Organ 1(3):35–49CrossRefGoogle Scholar
  50. Rizzo G, Troncy R (2011) NERD: evaluating named entity recognition tools in the web of data. Workshop on Web Scale Knowledge Extraction, ISWC 2011, 23–27 October 2011, Bonn, Germany. Accessed 30 Dec 2013
  51. Ruddock B, Stevenson J (2011) Creating linked open data for library and archive descriptions. Multimed Inf Technol 37(4):19–20Google Scholar
  52. Schaffner J (2009) The metadata is the interface: better description for better discovery of archives and special collections, synthesized from user studies. OCLC Research, DublinGoogle Scholar
  53. Scheir W (2006) First entry: report on a qualitative exploratory study of novice user experience with online finding aids. J Arch Org 3(4):49–85Google Scholar
  54. (2011) FAQ [Frequently Asked Questions]. Accessed 30 Dec 2013
  55. Shaw E (2001) Rethinking EAD: balancing flexibility and interoperability. New Rev Info Netw 7:117–132CrossRefGoogle Scholar
  56. Shaw R (2010) LODE: An ontology for Linking Open Descriptions of Events. Accessed 30 Dec 2013
  57. Singhal A (2012) Introducing the Knowledge Graph: things not strings. Accessed 30 Dec 2013
  58. Society of American Archivists, Technical Subcommittee on Encoded Archival Description (2011) EAD: Technical considerations. Accessed 30 Dec 2013
  59. Spindler R, Pearce-Moses R (1993) Does AMC mean archives made complicated? Am Arch 56:330–341Google Scholar
  60. Stevenson J (2012) Linking data: linking lives: the creation and display of Linked Open Data for archives. International Council on Archives, 20–24 August 2012, Brisbane, Australia. Accessed 30 Dec 2013
  61. Tibbo H (2003) Primarily history: historians and the search for primary source materials. Am Arch 66:9–50Google Scholar
  62. Trace CB, Dillon A (2012) The evolution of the finding aid in the United States: from physical to digital document genre. Arch Sci 12:501–519CrossRefGoogle Scholar
  63. University of California, Berkeley. Library (1994) Berkeley Finding Aid Project. Accessed 30 Dec 2013 Accessed 30 Dec 2013
  64. Vatant B (2012) GeoNames Ontology. Accessed 30 Dec 2013
  65. Yakel E (2003) Archival representation. Arch Sci 3:1–25CrossRefGoogle Scholar
  66. Yakel E (2004) Encoded Archival Description: are finding aids boundary spanners or barriers for users? J Arch Organ 2(1–2):63–77CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.School of Library and Information ScienceKent State UniversityKentUSA

Personalised recommendations