Abstract
This article is devoted to the main possibilities of using natural language text processing technologies for historical research. The authors present a taxonomy of biographical facts and use text mining technologies (extraction of information from texts) to obtain biographical information from texts in Russian in accordance with the proposed types of biographical facts. The conducting of biographical research implies the need to view and study large amounts of textual information. The considerable part of the sources is currently presented in electronic form, which allows researchers to apply modern methods of extracting information to them. The article presents an overview of the main areas of application of technologies for automatic processing natural language texts in the humanities and describes the approach to the creation and implementation of the instrument for the extraction and systematization of biographical facts. This instrument will be useful to both historians and other users interested in biographical research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hockey, S.: Electronic Texts in the Humanities: Principles and Practice. Oxford University Press, Oxford (2000). https://doi.org/10.1093/acprof:oso/9780198711940.001.0001
Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Merono-Penuela, A., et al.: Semantic technologies for historical research: a survey. Semant. Web. 6(6), 539–564 (2015). https://doi.org/10.3233/SW-140158
Landje, D.V., Snarskij, A.A., Bezsudnov, I.V.: Internet: Navigation in Complex Networks: Models and Algorithms. Librokom (URSS), Moscow (2009).(in Russian)
Piotrowski, M.: Natural language processing for historical texts. Syn. Lect. Hum. Lang. Technol. 5(2), 1–157 (2012). https://doi.org/10.2200/S00436ED1V01Y201207HLT017
Basegroup Labs: Data Analysis Technologies. http://basegroup.ru. (in Russian)
Text Mining (Big Data, Unstructured Data). http://www.statsoft.com. (in Russian)
Boolean retrieval. In: Manning, C., Raghavan, P., Schütze, H. (eds.) Introduction to Information Retrieval, pp. 1–18. Cambridge University Press, Cambridge (2008). https://nlp.stanford.edu/IR-book/pdf/01bool.pdf
Schweizer, T.J., Alassi, S., Mattmüller, M., Rosenthaler, L., Harbrecht, H.: Integrating his-torical scientific texts into the Bernoulli-Euler online platform. In: Digital Humanities. Montreal, Canada (2017). https://dh2017.adho.org/abstracts/147/147.pdf
Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21(3), 267–297 (2013). https://doi.org/10.1093/pan/mps028
Zwaan, J., Smink, W., Sool, A., Westerhof, G., Veldkamp, B., Wiegersma, S.: Flexible NLP pipelines for digital humanities research. In: Proceedings of the 4th Digital Humanities Benelux Conference 2017, Montreal, Canada (2017). https://dh2017.adho.org/abstracts/215/215.pdf
Adamovich, I.M., Volkov, O.I.: The system of facts extraction from historical texts. Syst. Means Inform. 25(3), 235–250 (2015). https://doi.org/10.14357/08696527150315. (in Russian)
Santos, C., Xiang, B., Zhou, B.: Classifying relations by ranking with convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, vol. 1, pp. 626–634 (2015). https://doi.org/10.3115/v1/P15-1061
Meerkamp, P., Zhou, Z.: Information Extraction with Character-Level Neural Networks and Free Noisy Supervision. Cornell University Library. https://arxiv.org/pdf/1612.04118.pdf
Homma, Y., Sadamitsu, K., Nishida, K., Higashinaka, R., Asano, H., Matsuo, Y.: A hierarchical neural network for information extraction of product attribute and condition sentences. In: Proceedings of the Open Knowledge Base and Question Answering (OKBQA), Osaka, Japan, pp. 21–29 (2016). https://www.aclweb.org/anthology/W16-4403
Gogar, T., Hubacek, O., Sedivy, J.: Deep neural networks for web page information ex-traction. In: Proceedings of AIAI 2016: Artificial Intelligence Applications and Innovations, Thessaloniki, Greece, pp. 154–163 (2016).
Gogar, T., Hubacek, O., Sedivy, J.: Deep neural networks for web page information extraction. In: Iliadis, L., Maglogiannis, I. (eds.) AIAI 2016. IAICT, vol. 475, pp. 154–163. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44944-9_14
Glazkova, A., Kruzhinov, V., Sokova, Z. Automatic compilation of Person's information portraits as an instrument of historical research. In: CEUR Workshop Proceedings: Supplementary Proceedings of the 6th International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2017), Moscow, Russia, 27–29 July 2017, vol. 1975, pp. 56–62 (2017)
Glazkova, A.V.: Named-entity biographic recognizer (NERbiografija). Patent RF, no. 2017616011 (2017). (in Russian)
“Persons-1000” Collection. http://ai-center.botik.ru/Airec/index.php/ru/collections/28-persons-1000. (in Russian)
Acknowledgements
The authors wish to thank all those who participated in the field exercise and helped to make it a successful endeavor.
The reported study was funded by RFBR according to the research project 18-37-00272.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Glazkova, A., Kruzhinov, V., Sokova, Z. (2021). Automatic Text Processing for Historical Research. In: Sukhomlin, V., Zubareva, E. (eds) Modern Information Technology and IT Education. SITITO 2017. Communications in Computer and Information Science, vol 1204. Springer, Cham. https://doi.org/10.1007/978-3-030-78273-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-78273-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78272-6
Online ISBN: 978-3-030-78273-3
eBook Packages: Computer ScienceComputer Science (R0)