Abstract
This paper presents the results of a study to investigate how archives can connect their collections to related data sources through the use of Semantic Web technologies, specifically Linked Data. Questions explored included (a) What types of data currently available in archival surrogates such as Encoded Archival Description (EAD) finding aids and Machine-Readable Cataloging (MARC) records may be useful if converted to Linked Data? (b) For those potentially useful data points identified in archival surrogates, how might one align data structures found in those surrogates to the data structures of other relevant internal or external information sources? (c) What features of current standards and data structures present impediments or challenges that must be overcome in order to achieve interoperability among disparate data sources? To answer these questions, the researcher identified metadata elements of potential use as Linked Data in archival surrogates, as well as metadata element sets and vocabularies of data sets that could serve as pathways to relevant external data sources. Data sets chosen for the study included DBpedia and schema.org; metadata element sets examined included Friend of a Friend (FOAF), GeoNames, and Linking Open Description of Events (LODE). The researcher then aligned tags found in the EAD encoding standard to related classes and properties found in these Linked Data sources and metadata element sets. To investigate the third question about impediments to incorporating Linked Data in archival descriptions, the researcher analyzed the locations and frequencies at which controlled and uncontrolled access points (personal and family name, corporate name, geographic name, and genre/form entities) appeared in a sample of MARC and EAD archival descriptive records by using a combination of hand counts and the natural language processing (NLP) tool, OpenCalais. The results of the location and frequency analysis, combined with the results of the alignment process, helped the researcher identify several critical challenges currently impeding interoperability among archival information systems and relevant Linked Data sources, including differences in granularity between archival and other data source vocabularies, and inadequacies of current encoding standards to support semantic tagging of potential access points embedded in free text areas of archival surrogates.
Similar content being viewed by others
Notes
DBpedia, http://dbpedia.org/About.
Simple Knowledge Organization System, http://www.w3.org/2004/02/skos/.
OWL2 Web Ontology Language, http://www.w3.org/TR/owl2-overview/.
SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/.
OhioLink Finding Aid Repository, http://ead.ohiolink.edu/xtf-ead/.
Schema.org, http://schema.org.
Civil War Data 150, http://www.civilwardata150.net.
Linked Open Copac and Archives Hub (LOCAH) Project, http://archiveshub.ac.uk/locah/about/.
Europeana, http://www.europeana.eu.
Social Networks and Archival Context Project, http://socialarchive.iath.virginia.edu.
FOAF Vocabulary Specification 0.98, http://xmlns.com/foaf/spec/; DCMI Metadata Terms, http://dublincore.org/documents/dcmi-terms/; BIO: A vocabulary for biographical information, http://vocab.org/bio/0.1/.html; The Virtual International Authority File, http://viaf.org.
Linked Jazz, http://linkedjazz.org.
GeoNames, http://www.geonames.org; Getty Thesaurus of Geographic Names, http://www.getty.edu/research/tools/vocabularies/tgn/; MARC List for Geographic Areas, http://www.loc.gov/marc/geoareas/; W3C Geospatial Vocabulary, http://www.w3.org/2003/01/geo/wgs84_pos#.
CIDOC Conceptual Reference Model (CIDOC-CRM), http://www.cidoc-crm.org.
Library of Congress Name Authority File, http://authorities.loc.gov.
Linking Open Description of Events Ontology, http://linkedevents.org/ontology/.
The Society of American Archivists’ Glossary of Archival and Records Terminology defines an artificial collection as “a collection of materials with different provenance assembled and organized to facilitate its management or use.”
The OpenCalais web service may be found at http://viewer.opencalais.com.
Because of the nature of the 1XX fields (which are not repeatable), one may reliably assume that the median number of personal/family name headings for personal papers is 1, and is 0 for corporate papers and government records. The use of 1XX in artificial collections varies. The cataloger may use a personal name, corporate body name, or collection title for main entry. For the records examined in this study, a personal or family name was recorded in the 1XX field for forty percent of the records examined, with the other 60 % having a title for the main entry.
As noted in the section on personal/family names, the 1XX fields are not repeatable. Thus, we can reliably predict that the median number of corporate body access points in the 1XX fields for each type of record: none for personal/family papers, 1 for corporate records, and 1 for government records. For artificial collections, no corporate body names were used in the 110 field (although that does not preclude their use).
References
Bermes E (2011) Convergence and interoperability: a Linked Data perspective. IFLA 2011, 13–18 August, 2011, San Juan, Puerto Rico pp. 1–12 http://conference.ifla.org/past/2011/149-bermes-en.pdf. Accessed 30 Dec 2013
Berner RC (1971) Manuscript catalogs and other finding aids: what are their relationships? Amer Arch 34:367–372
Berners-Lee T, Hendler J, Lassila O (2001) The Semantic Web. Sci Am (May):29–37
Blanke T, Bryant M, Speck R, Kristel C (2012) Information extraction on noisy texts for historical research. Digital Humanities 2012, 16–20 July 2012, Hamburg, Germany. http://www.dh2012.uni-hamburg.de/conference/programme/abstracts/information-extraction-on-noisy-texts-for-historical-research/. Accessed 30 Dec 2013
Brickley D, Miller L (2010) FOAF vocabulary specification 0.98. http://xmlns.com/foaf/spec/. Accessed 30 Dec 2013
Catone J (2008) Australian museum uses Open Calais to tag collection. http://readwrite.com/2008/04/01/australian_museum_uses_open_calais. Accessed 30 Dec 2013
Civil War Data 150 (2010) Civil War Data 150: linking Civil War data across state and federal archives and libraries. http://www.civilwardata150.net. Accessed 30 Dec 2013
Clough P, Tang J, Hall M, Warner A (2011) Linking archival data to location: a case study at the UK National Archives. Aslib Proc 63(2–3):127–147
Coats L (2004) Users of EAD finding aids: who are they and are they satisfied? J Arch Organ 2(3):25–39
Cox RJ (2007) Revisiting the archival finding aid. J Arch Organ 5(4):5–32
Cox E, Czechowski L (2007) Subject access points in the MARC record and archival finding aid: enough or too many? J Arch Organ 5(4):51–59
Coyle K (2012) Linked data tools: connecting on the Web. ALA Lib Tech Reports 48(4):10–14
Cyganiak R (2011) The linking open data cloud. http://lod-cloud.net. Accessed 30 Dec 2013
Davies T (2011) Elements of a linked open data stack. http://www.timdavies.org.uk/wp-content/uploads/IKMLI-LOD-Stack-Draft-Diagram.png. Accessed 30 Dec 2013
Digital Collections and Archives, Tufts University (2013) LiAM: Linked Archival Metadata. http://sites.tufts.edu/liam/deliverables/prospectus-for-linked-archival-metadata-a-guidebook/. Accessed 30 Dec 2013
Dooley J (1992) Subject indexing in context. Am Arch 55:344–354
Dooley J, Luce K (2010) Taking our pulse: the OCLC research survey of special collections and archives. OCLC Research, Dublin
Dooley J, Beckett R, Cullingford A, Sambrook K, Sheppard C, Worrall S (2013) Survey of special collections and archives in the United Kingdom and Ireland. OCLC Research and RLUK, Dublin
Duff W (2001) Evaluating metadata on a metalevel. Arch Sci 1:285–294
Duff W, Johnson C (2003) Where is the list with all the names? Information-seeking behavior of genealogists. Am Arch 66:79–95
Duff W, Stoyanova P (1998) Transforming the crazy quilt: archival displays from a user’s point of view. Archivaria 45:44–79
Eidson MY (2002) Describing anything that walks: the problem behind the problem with EAD. J Arch Organ 1(4):5–28
Erp M, Oomen J, Segers R et al. (2011) Automatic heritage metadata enrichment with historic events. Museums and the Web 2011, 6–9 April 2011, Philadelphia, PA. http://www.museumsandtheweb.com/mw2011/papers/automatic_heritage_metadata_enrichment_with_hi. Accessed 30 Dec 2013
Feeney K (1999) Retrieval of archival finding aids using world-wide-web search engines. Am Arch 62:206–228
Fons T, Penka J, Wallis R (2012) OCLC’s Linked Data initiative: using Schema.org to make library data relevant on the web. Inf Stand Q 24(2–3):29–33
Gabriel C (2002) Subject access to archives and manuscript collections: an historical overview. J Arch Organ 1(4):53–63
Hamburger S (2004) How researchers search for manuscript and archival collections. J Arch Organ 2(1–2):79–102
Heath T, Bizer C (2011) Linked data: evolving the Web into a global data space. Morgan & Claypool, San Rafael
Hienert D, Luciano F (2012) Extraction of historical events from Wikipedia. Proceedings of knowledge discovery and data mining meets linked open data (Know@LOD) workshop at ESWC 2012, 27-31 May 2012, Heraklion, Crete. http://arxiv.org/pdf/1205.4138.pdf. Accessed 30 Dec 2013
Hyvönen E, Lindquist T, Törnroos J, Mäkelä E (2012) History on the semantic web as linked data: an event gazetteer and timeline for the World War I. Proceedings of CIDOC 2012, Enriching Cultural Heritage, 10–14 June 2012, Helsinki, Finland. http://www.cidoc2012.fi/en/File/1609/hyvonen.pdf. Accessed 30 Dec 2013
Isaac A, Clayphan R, Haslhofer B (2012) Europeana: moving to linked open data. Inf Stand Q 24(2–3):34–40
Larson R, Janakiraman K (2011) Connecting archival collections: the social networks and archival context project. In Gradmann S, et al. (eds) Proceedings of the International Conference on Theory and Practice of Digital Libraries (TPDL 2011), 26–28 September 2011, Berlin, Germany. Lecture Notes in Computer Science 6966. Springer, Berlin, pp. 3–14
Library of Congress (2006) Encoded Archival Description tag library, version 2002, appendix A: EAD crosswalks. http://www.loc.gov/ead/tglib/appendix_a.html. Accessed 30 Dec 2013
Light M, Hyry T (2002) Colophons and annotations: new directions for the finding aid. Am Arch 65:216–230
Lynch C (2002) Digital collections, digital libraries, and the digitization of cultural heritage information. First Monday 7(5). http://www.firstmonday.org/ojs/index.php/fm/article/view/949/870. Accessed 30 Dec 2013
Lytle R (1980) Intellectual access to archives: I. Provenance and content indexing methods of subject retrieval. Am Arch 43 (Winter 1980): 64–75
MacNeil H (2012) What finding aids do: archival description as rhetorical genre in traditional and web-based environments? Arch Sci 12:485–500. doi:10.1007/s10502-012-9175-4
Mascaro M (2011) Controlled access headings in EAD finding aids: current practices in number of and types of headings assigned. J Arch Organ 9:208–225. doi:10.1080/15332748.2011.643690
Mazzini S, Ricci F (2011) EAC-CPF Ontology and linked archival data. Proceedings of the 1st international workshop on semantic digital archives, 29 Sept 2011, Berlin, Germany. CEUR Workshop Proceedings, vol. 801, pp. 72–81. http://ceur-ws.org/Vol-801/paper6.pdf. Accessed 30 Dec 2013
Michelson A (1987) Description and reference in the age of automation. Am Arch 50:192–208
Nesmith T (2005) Reopening archives: bringing new contextualities into archival theory and practice. Archivaria 60:259–274
Nimer C (2011). Applying inheritance: single-level displays and repurposeable metadata. Society of American Archivists, Chicago. http://www2.archivists.org/sites/all/files/CNFinal.pdf. Accessed 30 Dec 2013
OpenCalais (2013). How does Calais work? http://www.opencalais.com/about. Accessed 30 Dec 2013
Pattuelli MC (2012) Personal name vocabularies as Linked Open Data: a case study of jazz artist names. J Inf Sci 38(6):558–565
Perkins J, Yoose B (2011) Case study: mining oral history for enhanced access. Poster presentation, Society of American Archivists annual conference, 22–27 August 2011, Chicago, IL. http://ohda.matrix.msu.edu/2012/06/mining-oral-history-for-enhanced-access/. Accessed 30 Dec 2013
Prom C (2004) User interactions with electronic finding aids in a controlled setting. Am Arch 67:234–268
Pugh MJ (1982) The illusion of omniscience: subject access and the reference archivist. Am Arch 45(1):33–44
Raimond Y, Abdallah S (2007) The Event Ontology. http://motools.sourceforge.net/event/event.html. Accessed 30 Dec 2013
Redding C (2002) Reengineering finding aids revisited: current archival descriptive practice and its effect on EAD implementation. J Arch Organ 1(3):35–49
Rizzo G, Troncy R (2011) NERD: evaluating named entity recognition tools in the web of data. Workshop on Web Scale Knowledge Extraction, ISWC 2011, 23–27 October 2011, Bonn, Germany. http://porto.polito.it/2440793/1/wekex2011_submission_6.pdf. Accessed 30 Dec 2013
Ruddock B, Stevenson J (2011) Creating linked open data for library and archive descriptions. Multimed Inf Technol 37(4):19–20
Schaffner J (2009) The metadata is the interface: better description for better discovery of archives and special collections, synthesized from user studies. OCLC Research, Dublin
Scheir W (2006) First entry: report on a qualitative exploratory study of novice user experience with online finding aids. J Arch Org 3(4):49–85
Schema.org (2011) FAQ [Frequently Asked Questions]. http://schema.org/docs/faq.html. Accessed 30 Dec 2013
Shaw E (2001) Rethinking EAD: balancing flexibility and interoperability. New Rev Info Netw 7:117–132
Shaw R (2010) LODE: An ontology for Linking Open Descriptions of Events. http://linkedevents.org/ontology/. Accessed 30 Dec 2013
Singhal A (2012) Introducing the Knowledge Graph: things not strings. http://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html. Accessed 30 Dec 2013
Society of American Archivists, Technical Subcommittee on Encoded Archival Description (2011) EAD: Technical considerations. http://www2.archivists.org/sites/all/files/EADRevisionTechnicalConsiderations_0.pdf. Accessed 30 Dec 2013
Spindler R, Pearce-Moses R (1993) Does AMC mean archives made complicated? Am Arch 56:330–341
Stevenson J (2012) Linking data: linking lives: the creation and display of Linked Open Data for archives. International Council on Archives, 20–24 August 2012, Brisbane, Australia. http://ica2012.ica.org/files/pdf/Full%20papers%20upload/ica12Final00029.pdf. Accessed 30 Dec 2013
Tibbo H (2003) Primarily history: historians and the search for primary source materials. Am Arch 66:9–50
Trace CB, Dillon A (2012) The evolution of the finding aid in the United States: from physical to digital document genre. Arch Sci 12:501–519
University of California, Berkeley. Library (1994) Berkeley Finding Aid Project. https://web.archive.org/web/20130427232556. Accessed 30 Dec 2013 http://sunsite.berkeley.edu/FindingAids/EAD/bfap.html. Accessed 30 Dec 2013
Vatant B (2012) GeoNames Ontology. http://www.geonames.org/ontology/documentation.html. Accessed 30 Dec 2013
Yakel E (2003) Archival representation. Arch Sci 3:1–25
Yakel E (2004) Encoded Archival Description: are finding aids boundary spanners or barriers for users? J Arch Organ 2(1–2):63–77
Acknowledgments
The author wishes to acknowledge the Institute of Museum and Library Services National Leadership Grant program for their support of this research. This project also would not have been possible without the input and assistance of Dr. Marcia Lei Zeng, principal investigator for the Metadata Vocabulary Junction Project (http://lod-lam.slis.kent.edu), and research assistants Laurence Skirvin and Sammy Davidson. In addition, the author thanks Thomas Baker (Dublin Core Metadata Initiative), Jeff Young (OCLC) and Maja Žumer (University of Ljubljana) for sharing their wisdom about Linked Data, metadata, and ontologies.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Alignment of EAD tags with MARC fields and classes/properties of various Linked Data ontologies
Appendix 2: Entity Counts in Sample of Archival Collection MARC Records
Appendix 3: Entity Counts in Sample of Archival Collection EAD Records
Rights and permissions
About this article
Cite this article
Gracy, K.F. Archival description and linked data: a preliminary study of opportunities and implementation challenges. Arch Sci 15, 239–294 (2015). https://doi.org/10.1007/s10502-014-9216-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10502-014-9216-2