A semantic architecture for preserving and interpreting the information contained in Irish historical vital records

Abstract

Irish Record Linkage 1864–1913 is a multi-disciplinary project that started in 2014 aiming to create a platform for analyzing events captured in historical birth, marriage, and death records by applying semantic technologies for annotating, storing, and inferring information from the data contained in those records. This enables researchers to, among other things, investigate to what extent maternal and infant mortality rates were underreported. We report on the semantic architecture, provide motivation for the adoption of RDF and Linked Data principles, and elaborate on the ontology construction process that was influenced by both the requirements of the digital archivists and historians. Concerns of digital archivists include the preservation of the archival record and following best practices in preservation, cataloguing, and data protection. The historians in this project wish to discover certain patterns in those vital records. An important aspect of the semantic architecture is the clear separation of concerns that reflects those distinct requirements—the transcription and archival authenticity of the register pages and the interpretation of the transcribed data—that led to the creation of two distinct ontologies and knowledge bases. The advantage of this clear separation is the transcription of register pages resulted in a reusable data set fit for other research purposes. These transcriptions were enriched with metadata according to best practices in archiving for ingestion in suitable long-term digital preservation platforms.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    https://irishrecordlinkage.wordpress.com/.

  2. 2.

    http://www.irish-genealogy-toolkit.com/Irish-marriage-records.html.

  3. 3.

    The terms and conditions of our data sharing agreement do not permit us to make public any data that would identify any individual [7]. One can access the historic records of the GRO at its dedicated research room in Dublin, but it is restricted per diem and there is an associated charge.

  4. 4.

    A MySQL database (https://www.mysql.com/).

  5. 5.

    With phpMyAdmin (https://www.phpmyadmin.net/).

  6. 6.

    http://jena.apache.org/.

  7. 7.

    http://wifo5-03.informatik.uni-mannheim.de/pubby/.

  8. 8.

    https://www.w3.org/Submission/SWRL/.

  9. 9.

    Available via http://purl.org/net/irish-record-linkage/records.

  10. 10.

    http://irl.dri.ie/.

  11. 11.

    Friend-of-a-Friend: http://xmlns.com/foaf/spec/.

  12. 12.

    http://wiki.eclipse.org/Persona_vocabulary.

  13. 13.

    http://apps.who.int/classifications/icd10/browse/2010/en.

  14. 14.

    Int. List of Causes of Death, Rev.1 (1900), http://www.wolfbane.com/icd/icd1h.html.

  15. 15.

    Int. List of Causes of Death, Rev.2 (1909). http://www.wolfbane.com/icd/icd2h.html.

  16. 16.

    Department of Commerce and Labor, Bureau of Census. International Classification of Causes of Sickness and Death. Washington Government of Printing Office (1910)

  17. 17.

    We used the classification systems that existed in the studied historical period rather than applying today’s most current classification systems, because classification systems reflect a different understanding of disease than those in the 19th century. Diseases may be classified by etiology (cause), pathogenesis (mechanism by which the disease is caused), or by symptom(s). Nosology is a branch of medicine deals with classification of disease. The historical evolution of classification systems, such as ICD or ICSD, is closely related to historical and intellectual conditions of the area. The Early disease classification used by physicians was largely based philosophically on humoral theories of disease, with occasional suggestions that malign outside influences might cause illness or death. The first version of ICD included the principle of classifying diseases by etiology. In later years, the focus first shifted to symptoms and then to mechanism of diseases. For example, in the historical records, we observed “Teething” as cause of death. International List of Causes of Death, Revision 1 provides a classification category for this such as “82 Teething” for infants. The latest version of same classification (ICD10 or ICD11) does not have such a category as a disease or cause of death. A second reason for adopting historical classification systems is the number of categories that have expanded dramatically to reflect the new insights for understanding cause, mechanism, and symptoms of diseases as medical knowledge advanced. The first version of International List of Causes of Death, Revision 1 (1900) had 191 items, whereas current one has more than 14,400 different codes. Mapping the historical disease classification to current ones would require the examination of historical definitions of each category and map each of them to current possible understanding of diseases. In such a mapping, a historian can explore how medical knowledge and social conditions effects the formation of nosologies, but it would not have served our purpose of classifying historical cause of diseases in 19th century.

  18. 18.

    http://repository.dri.ie/.

References

  1. 1.

    First Annual Report of the Registrar-General of Marriages, Births, and Deaths in Ireland.http://www.cso.ie/en/media/csoie/releasespublications/documen ts/birthsdm/archivedreports/P-VS,1864 (1869). Accessed Dec 2015

  2. 2.

    Aloia, N., Papatheodorou, C., Gavrilis, D., Debole, F., Meghini, C.: Describing research data: A case study for archaeology. In: Meersman, R., Panetto, H., Dillon, T.S., Missikoff, T.S., Liu, L., Pastor, O., Cuzzocrea, A., Sellis T.K. (Eds.) On the Move to Meaningful Internet Systems: OTM 2014 Conferences—Confederated International Conferences: CoopIS, and ODBASE 2014, Amantea, Italy, October 27-31, 2014, Proceedings, Lecture Notes in Computer Science, vol. 8841, pp. 768–775. Springer. doi:10.1007/978-3-662-45563-0_48 (2014)

  3. 3.

    Arenas, M., Bertails, A., Prudhommeaux, E., Sequeda, J.: A direct mapping of relational data to RDF. W3C Recommendation, W3C. URL:https://www.w3.org/TR/rdb-direct-mapping/ (2012). Accessed Dec 2015

  4. 4.

    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K. Choi, K. Noy, N.F., Allemang, D. Lee, K., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux P. (Eds.) The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11–15, 2007., Lecture Notes in Computer Science, vol. 4825, pp. 722–735. Springer (2007)

  5. 5.

    Berners-Lee, T.: Relational databases on the semantic web. http://www.w3.org/DesignIssues/RDB-RDF.html (last retrieved December 2015) (1998)

  6. 6.

    Berners-Lee, T.: Linked data—design issues. Last accessed: June 7th, 2015. URL:http://www.w3.org/DesignIssues/LinkedData.html (2006). Accessed Dec 2015

  7. 7.

    Beyan, O., Breathnach, C., Collins, S., Debruyne, C., Decker, S., Grant, D., Grant, R., Gurrin, B.: Towards linked vital registration data for reconstituting families and creating longitudinal health histories. In: KR4HC Workshop (in conjunction with KR 2014), pp. 181–187 (2014)

  8. 8.

    Bischof, S., Decker, S., Krennwallner, T., Lopes, N., Polleres, A.: Mapping between RDF and XML with XSPARQL. J. Data Semantics 1(3), 147–185 (2012)

    Article  Google Scholar 

  9. 9.

    Bizer, C.: D2R MAP—a database to RDF mapping language. In: King, I., Máray T. (Eds.) Proceedings of the Twelfth International World Wide Web Conference-Posters, WWW 2003, Budapest, Hungary (2003)

  10. 10.

    Bizer, C., Heath, T., Berners-Lee, T.: Linked data—the story so far. Int. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)

    Article  Google Scholar 

  11. 11.

    Boonstra, O., Breure, L., Doorn, P.: Past, present and future of historical information science. Historical Social Research/Historische Sozialforschung pp. 4–132 (2004)

  12. 12.

    Bustillo, M., Collins, S., Gallagher, D., Grant, R., Harrower, N., Kenny, S., Ní Cholla, R., O’Carroll, A., Redmond, S., Webb, S.: Dublin Core and the Digital Repository of Ireland (Grant, R. ed.). Tech. rep., Maynooth: Maynooth University; Dublin: Trinity College Dublin; Dublin: Royal Irish Academy; Galway: National University of Ireland, Galway (2014)

  13. 13.

    Bustillo, M., Collins, S., Gallagher, D., Grant, R., Harrower, N., Kenny, S., Ní Cholla, R., O’Carroll, A., Redmond, S., Webb, S.: Qualified Dublin Core and the Digital Repository of Ireland (Grant, R. ed.). Tech. rep., Maynooth: Maynooth University; Dublin: Trinity College Dublin; Dublin: Royal Irish Academy; Galway: National University of Ireland, Galway (2015)

  14. 14.

    Bustillo, M., Grant, R., Kenny, S., Martínez-García, A., McGoohan, C., Ní Cholla, R., O’Carroll, A., O’Neill, J., Redmond, S., Webb, S.: MODS and the Digital Repository of Ireland (Grant, R. ed.). Tech. rep., Maynooth: Maynooth University; Dublin: Trinity College Dublin; Dublin: Royal Irish Academy; Galway: National University of Ireland, Galway (2016)

  15. 15.

    Coppens, S., Mannens, E., Van Deursen, D., Hochstenbach, P., Janssens, B., Van de Walle, R.: Publishing provenance information on the web using the memento datetime content negotiation. In: Bizer, C., Heath, T., Berners-Lee, T., Hausenblas M. (Eds.) WWW2011 Workshop on Linked Data on the Web, Hyderabad, India, 2011, CEUR Workshop Proceedings, vol. 813. CEUR-WS.org (2011)

  16. 16.

    Coppens, S., Verborgh, R., Peyrard, S., Ford, K., Creighton, T., Guenther, R., Mannens, E., Van de Walle, R.: PREMIS OWL - A semantic long-term preservation model. Int. J. on Digital Libraries 15(2-4), 87–101. doi:10.1007/s00799-014-0136-9 (2015)

  17. 17.

    Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF Mapping Language. W3C Recommendation, W3C. URL:http://www.w3.org/TR/r2rml/ (2012). Accessed Dec 2015

  18. 18.

    Debruyne, C., Beyan, O.D., Grant, R., Collins, S., Decker, S.: On a linked data platform for irish historical vital records. In: Kapidakis, S., Mazurek, C., Werla, M. (Eds.) Research and Advanced Technology for Digital Libraries—19th International Conference on Theory and Practice of Digital Libraries, TPDL 2015, Poznań, Poland, September 14–18, 2015. Proceedings, Lecture Notes in Computer Science, vol. 9316, pp. 99–110. Springer. doi:10.1007/978-3-319-24592-8_8 (2015)

  19. 19.

    Dell’Aglio, D., Polleres, A., Lopes, N., Bischof, S.: Querying the web of data with XSPARQL 1.1. In: Verborgh, R., Mannens E. (eds.) Proceedings of the ISWC Developers Workshop 2014, co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, October 19, 2014., CEUR Workshop Proceedings, vol. 1268, pp. 113–118. CEUR-WS.org (2014)

  20. 20.

    Feeney, K.C., O’Sullivan, D., Tai, W., Brennan, R.: Improving curated web-data quality with structured harvesting and assessment. Int. J. Semantic Web Inf. Syst. 10(2), 35–62. doi:10.4018/ijswis.2014040103 (2014)

  21. 21.

    Fox, M.S., Gruninger, M.: Enterprise modeling. AI Magazine 19(3), 109–121 (1998)

    Google Scholar 

  22. 22.

    Frontini, F., Brando, C., Ganascia, J.: Semantic web based named entity linking for digital humanities and heritage texts. In: Zucker et al. [35], pp. 77–88. URL:http://ceur-ws.org/Vol-1364/paper9

  23. 23.

    Grant, D., Debruyne, C., Grant, R., Collins, S.: Creating and consuming metadata from transcribed historical vital records for ingestion in a long-term digital preservation platform. In: Ciuciu, I., Panetto, H., Debruyne, C., Aubry, A., Bollen, P., Valencia-García, A.R., Mishra, A., Fensel, A., Ferri, F. (Eds.) On the Move to Meaningful Internet Systems: OTM 2015 Workshops, Lecture Notes in Computer Science, vol. 9416, pp. 445–450. Springer. doi:10.1007/978-3-319-26138-6_47 (2015)

  24. 24.

    Grüninger, M., Fox, M.S.: The role of competency questions in enterprise engineering. In: BenchmarkingTheory and Practice, pp. 22–31. Springer (1995)

  25. 25.

    Lopes, N., Grant, R., Ó Raghallaigh, B., Ó Carragáin, E., Collins, S., Decker, S.: Linked Logainm: Enhancing Library Metadata Using Linked Data of Irish Place Names. In: Bolikowski, L., Casarosa, V., Goodale, P. Houssos, N. Manghi, P., Schirrwagen J. (Eds.) Theory and practice of digital libraries—TPDL 2013 Selected Workshops - LCPD 2013, SUEDL 2013, DataCur 2013, Held in Valletta, Malta, September 22-26, 2013. Revised Selected Papers, Communications in Computer and Information Science, vol. 416, pp. 65–76. Springer (2014)

  26. 26.

    McGuinness, D., Lebo, T., Sahoo, S.: PROV-O: The PROV ontology. W3C Recommendation, W3C http://www.w3.org/TR/2013/REC-prov-o-20130430/ (2013)

  27. 27.

    Meroño-Peñuela, A., Ashkpour, A., van Erp, M., Mandemakers, K., Breure, L., Scharnhorst, A., Schlobach, S., van Harmelen, F.: Semantic technologies for historical research: a survey. Semantic Web 6(6), 539–564. doi:10.3233/SW-140158 (2015)

  28. 28.

    Motik, B., Horrocks, I.: OWL Datatypes: Design and Implementation. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D. Finin, T.W., Thirunarayan K. (Eds.) The Semantic Web - ISWC 2008, 7th International Semantic Web Conference, ISWC 2008, Karlsruhe, Germany, October 26-30, 2008. Proceedings, Lecture Notes in Computer Science, vol. 5318, pp. 307–322. Springer (2008)

  29. 29.

    Newcombe, H.B., Kennedy, J.M.: Record linkage: Making maximum use of the discriminating power of identifying information. Commun. ACM 5(11), 563–566. doi:10.1145/368996.369026 (1962)

  30. 30.

    Orgel, T., Höffernig, M., Bailer, W., Russegger, S.: A metadata model and mapping approach for facilitating access to heterogeneous cultural heritage assets. Int. J. on Digital Libraries 15(2-4), 189–207. doi:10.1007/s00799-015-0138-2 (2015)

  31. 31.

    Rademaker, A., Oliveira, D., de Paiva, V., Higuchi, S., Sá, A., Alvim, M.: A linked open data architecture for the historical archives of the getulio vargas foundation. Int. J. on Digital Libraries 15(2-4), 153–167 (2015). doi:10.1007/s00799-015-0147-1

  32. 32.

    Tounsi, M., Faron-Zucker, C., Zucker, A., Villata, S., Cabrio, E.: Studying the history of pre-modern zoology with linked data and vocabularies. In: Zucker et al. [35], pp. 7–14. URL:http://ceur-ws.org/Vol-1364/paper1

  33. 33.

    Unbehauen, J., Hellmann, S., Auer, S., Stadler, C.: Knowledge extraction from structured sources. In: Ceri, S., Brambilla M. (Eds.) Search computing - broadening web search, Lecture Notes in Computer Science, vol. 7538, pp. 34–52. Springer. DOI10.1007/978-3-642-34213-4_3. URL:http://dx.doi.org/10.1007/978-3-642-34213-4_3 (2012)

  34. 34.

    Woodbury, C.: Automatic extraction from and reasoning about genealogical records: A prototype. Master’s thesis, Brigham Young University, Provo, Utah, USA (2010)

  35. 35.

    Zucker, A., Draelants, I., Faron-Zucker, C., Monnin, A. (Eds.): Proceedings of the First International Workshop Semantic Web for Scientific Heritage at the 12th ESWC 2015 Conference, Portorož, Slovenia, June 1st, 2015, CEUR Workshop Proceedings, vol. 1364. CEUR-WS.org. URL:http://ceur-ws.org/Vol-1364 (2015)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Christophe Debruyne.

Additional information

We are grateful to the Registrar General of Ireland for permitting us to use the rich digital content contained in the vital records for the purposes of this research project. This publication has emanated from research conducted within the Irish Record Linkage, 1864–1913 project supported by the RPG2013-3 Irish Research Council Interdisciplinary Research Project Grant. The Digital Repository of Ireland (formerly NAVR) gratefully acknowledges funding from the Irish HEA PRTLI programme. We also would like to thank Prof. Declan O’Sullivan from the ADAPT Centre at Trinity College Dublin and the anonymous reviewers for their valuable feedback. Christophe Debruyne is currently supported by the Science Foundation Ireland (Grant 13/RC/2106) as part of the ADAPT Centre for Digital Content Technology Platform Research at Trinity College Dublin.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Debruyne, C., Beyan, O.D., Grant, R. et al. A semantic architecture for preserving and interpreting the information contained in Irish historical vital records. Int J Digit Libr 17, 159–174 (2016). https://doi.org/10.1007/s00799-016-0180-8

Download citation

Keywords

  • Historical vital records
  • Cultural heritage
  • Linked data
  • Ontology engineering
  • Preservation