Abstract
This paper presents an approach for automatically retrieving family relationships from a real-world collection of Dutch historical notary acts. We aim to retrieve relationships like husband - wife, parent - child, widow of, etc. Our approach includes person names extraction, reference disambiguation, candidate generation and family relationship prediction. Since we have a limited amount of training data, we evaluate different feature configurations based on the n-gram analysis. The best results were obtained by using a combination of bi-grams and tri-grams of words together with the distance in words between two names. We evaluate our results for each type of the relationships in terms of precision, recall and \(f-score\).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
‘Kerk van Erp’ in Dutch means ‘church of Erp’.
- 4.
- 5.
References
Aggarwal, C.C., Zhai, C.X.: A survey of text classification algorithms. In: Aggarwal, C.C., Zhai, C.X. (eds.) Mining Text Data, pp. 163–222. Springer, Heidelberg (2012)
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python, 1st edn. O’Reilly Media Inc., Sebastopol (2009)
Collovini, S., Pugens, L., Vanin, A.A., Vieira, R.: Extraction of relation descriptors for Portuguese using conditional random fields. In: Bazzan, A.L.C., Pichara, K. (eds.) IBERAMIA 2014. LNCS, vol. 8864, pp. 108–119. Springer, Heidelberg (2014)
Eddy, S.R.: What is a hidden markov model? Nat. Biotech. 22(10), 1315–1316 (2004)
Efremova, J., Montes García, A., Calders, T.: Classification of historical notary acts with noisy labels. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 49–54. Springer, Heidelberg (2015)
Efremova, J., Ranjbar-Sahraei, B., Oliehoek, F.A., Calders, T., Tuyls, K.: An interactive, web-based tool for genealogical entity resolution. In: 25th Benelux Conference on Artificial Intelligence (BNAIC 2013), The Netherlands (2013)
Efremova, J., Ranjbar-Sahraei, B., Oliehoek, F.A., Calders, T., Tuyls, K.: A baseline method for genealogical entity resolution. In: Proceedings of the Workshop on Population Reconstruction, Organized in the Framework of the LINKS Project (2014)
Frank, E., Bouckaert, R.R.: Naive Bayes for text classification with unbalanced classes. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 503–510. Springer, Heidelberg (2006)
Ikonomakis, M., Kotsiantis, S., Tampakas, V.: Text classification using machine learning techniques. WSEAS Trans. Comput. 4, 966–974 (2005)
Kivimäki, I., Panchenko, A., Dessy, A., Verdegem, D., Francq, P., Fairon, C., Bersini, H., Saerens, M.: A graph-based approach to skill extraction from text (2013)
Kokkinakis, D., Malm, M.: Character profiling in 19th century fiction (2011)
Makazhanov, A., Barbosa, D., Kondrak, G.: Extracting family relationship networks from novels(2014). CoRR, arXiv:1405.0603
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL, the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL 2009, vol. 2, pp. 1003–1011, USA, 2009. Association for Computational Linguistics (2011)
Sammut, C., Webb, G.I.: Encyclopedia of Machine Learning. Springer, Heidelberg (2010)
Santos, D., Mamede, N., Baptista, J.: Extraction of family relations between entities. In: INForum 2010: - II Simpósio de Informática (2010)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Acknowledgements
Mining Social Structures from Genealogical Data (project no. 640.005.003) project, part of the CATCH program funded by the Netherlands Organization for Scientific Research (NWO).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Efremova, J., García, A.M., Iriondo, A.B., Calders, T. (2016). Who Are My Ancestors? Retrieving Family Relationships from Historical Texts. In: Braslavski, P., et al. Information Retrieval. RuSSIR 2015. Communications in Computer and Information Science, vol 573. Springer, Cham. https://doi.org/10.1007/978-3-319-41718-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-41718-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41717-2
Online ISBN: 978-3-319-41718-9
eBook Packages: Computer ScienceComputer Science (R0)