NotaryPedia: A Knowledge Graph of Historical Notarial Manuscripts
- 809 Downloads
Abstract
The Notarial Archives in Valletta, the capital city of Malta, houses a rich and valuable collection of around twenty thousand notarial manuscripts dating back to the 15th century. The Archive wants to make the contents of this collection easily accessible and searchable to researchers and the general public. Knowledge Graphs have been successfully used to represent similar historical content. Nevertheless, building a Knowledge Graph for the archives is challenging as these documents are written in medieval Latin and currently there is a lack of information extraction tools that recognise this language. This is, furthermore, compounded with a lack of medieval Latin corpora to train and evaluate machine learning algorithms, as well as a lack of an ontological representation for the contents of notarial manuscripts. In this paper, we present NotaryPedia, a Knowledge Graph for the Notarial Archives. We extend our previous work on entity and keyphrase extraction with relation extraction to populate the Knowledge Graph using an ontological vocabulary for notarial deeds. Furthermore, we perform Knowledge Graph completeness using link-prediction and inference. Our work was evaluated using different translational distance and semantic matching models to predict relations amongst literals by promoting them to entities and to infer new knowledge from existing entities. A 49% relation prediction accuracy using TransE was achieved.
Keywords
Knowledge Graph Medieval latin text Notarial Ontology Relation extraction Link predictionReferences
- 1.ISAD(G): General international standard archival description 2000, 2 edn. (2000)Google Scholar
- 2.Ahonen, E., Hyvonen, E.: Publishing Historical Texts on the Semantic Web –A Case Study, pp. 167–173. IEEE (2009)Google Scholar
- 3.Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating Embeddings for Modeling Multi-relational Data, pp. 2787–2795 (2013)Google Scholar
- 4.Debruyne, C., Beyan, O.D., Grant, R., Collins, S., Decker, S., Harrower, N.: A semantic architecture for preserving and interpreting the information contained in irish historical vital records. Int. J. Digit. Libr. 17(3), 159–174 (2016)CrossRefGoogle Scholar
- 5.Efremova, J., Montes García, A., Calders, T.: Classification of historical notary acts with noisy labels. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 49–54. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_6CrossRefGoogle Scholar
- 6.Efremova, J., García, A.M., Iriondo, A.B., Calders, T.: Who are my ancestors? Retrieving family relationships from historical texts. In: Braslavski, P., et al. (eds.) RuSSIR 2015. CCIS, vol. 573, pp. 121–129. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41718-9_6CrossRefGoogle Scholar
- 7.Efremova, J., Montes Garcia, A., Calders, T., Zhang, J.: Towards population reconstruction: extraction of family relationships from historical documents (2015)Google Scholar
- 8.Efremova, J., et al.: Multi-source entity resolution for genealogical data. In: Bloothooft, G., Christen, P., Mandemakers, K., Schraagen, M. (eds.) Population Reconstruction, pp. 129–154. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19884-2_7CrossRefGoogle Scholar
- 9.Ehrlinger, L., Wob, W.: Towards a Definition of Knowledge Graphs (2016)Google Scholar
- 10.Ellul, C., Abela, C., Azzopardi, J.: Extracting Information from Medieval Notarial deeds, pp. 25–28. EKAW (2018)Google Scholar
- 11.Erdmann, A., et al.: Challenges and solutions for latin named entity recognition. In: The COLING 2016 Organizing Committee, pp. 85–93 (2016)Google Scholar
- 12.Feeney, K.C., O’Sullivan, D., Tai, W., Brennan, R.: Improving curated web-data quality with structured harvesting and assessment. Int. J. Semant. Web Inf. Syst. 10(2), 35–62 (2014)CrossRefGoogle Scholar
- 13.Fiorini, S.: Documentary Sources of Maltese History Part I Notarial Documents No 1 Notary Giacomo Zabbara. University of Malta, 1 edn. (1996)Google Scholar
- 14.Gonzalez, E.: Unsupervised Relation Extraction by Massive Clustering (2009)Google Scholar
- 15.Han, X., et al.: Openke: an open toolkit for knowledge embedding. In: Proceedings of EMNLP (2018)Google Scholar
- 16.Monti, M., et al.: Construction of enterprise knowledge graphs. In: Pan, J.Z., Vetere, G., Gomez-Perez, J.M., Wu, H. (eds.) Exploiting Linked Data and Knowledge Graphs in Large Organisations. Springer, Cham (2017). chap 8Google Scholar
- 17.Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web 8(3), 489–508 (2016)CrossRefGoogle Scholar
- 18.Pawar, S., Palshikar, G., Bhattacharyya, P.: Relation Extraction: A Survey (2017)Google Scholar
- 19.Ruddock, B.: Linked data and the locah project. Bus. Inf. Rev. 28(2), 105–111 (2011)Google Scholar
- 20.Siddiqui, T., Aalam, P.: Short text clustering; challenges & solutions: a literature review. Int. J. Math. Comput. Res. 3(6), 1025–1031 (2015)Google Scholar
- 21.Srinivas, V.: Link Prediction in Social Networks, 1st edn. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28922-9CrossRefzbMATHGoogle Scholar
- 22.Villazon-Terrazas, B., Garcia-Santa, N., Ren, Y., Srinivas, K., Rodriguez-Muro, M., Alexopoulos, P., Pan, J.Z.: Construction of enterprise knowledge graphs (I). Exploiting Linked Data and Knowledge Graphs in Large Organisations, pp. 87–116. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-45654-6_4CrossRefGoogle Scholar
- 23.Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: a survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 29(12), 2724–2743 (2017)CrossRefGoogle Scholar
- 24.Winkler, W.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods (1990)Google Scholar
- 25.Yang, Y., Lichtenwalter, R.N., Chawla, N.V.: Evaluating link prediction methods. Knowl. Inf. Syst. 45(3), 751–782 (2014)CrossRefGoogle Scholar