Automatic Text Processing for Historical Research

Glazkova, Anna; Kruzhinov, Valery; Sokova, Zinaida

doi:10.1007/978-3-030-78273-3_14

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1204))

Included in the following conference series:

International Conference on Modern Information Technology and IT Education

308 Accesses

Abstract

This article is devoted to the main possibilities of using natural language text processing technologies for historical research. The authors present a taxonomy of biographical facts and use text mining technologies (extraction of information from texts) to obtain biographical information from texts in Russian in accordance with the proposed types of biographical facts. The conducting of biographical research implies the need to view and study large amounts of textual information. The considerable part of the sources is currently presented in electronic form, which allows researchers to apply modern methods of extracting information to them. The article presents an overview of the main areas of application of technologies for automatic processing natural language texts in the humanities and describes the approach to the creation and implementation of the instrument for the extraction and systematization of biographical facts. This instrument will be useful to both historians and other users interested in biographical research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hockey, S.: Electronic Texts in the Humanities: Principles and Practice. Oxford University Press, Oxford (2000). https://doi.org/10.1093/acprof:oso/9780198711940.001.0001
Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book Google Scholar
Merono-Penuela, A., et al.: Semantic technologies for historical research: a survey. Semant. Web. 6(6), 539–564 (2015). https://doi.org/10.3233/SW-140158
Article Google Scholar
Landje, D.V., Snarskij, A.A., Bezsudnov, I.V.: Internet: Navigation in Complex Networks: Models and Algorithms. Librokom (URSS), Moscow (2009).(in Russian)
Google Scholar
Piotrowski, M.: Natural language processing for historical texts. Syn. Lect. Hum. Lang. Technol. 5(2), 1–157 (2012). https://doi.org/10.2200/S00436ED1V01Y201207HLT017
Article Google Scholar
Basegroup Labs: Data Analysis Technologies. http://basegroup.ru. (in Russian)
Text Mining (Big Data, Unstructured Data). http://www.statsoft.com. (in Russian)
Boolean retrieval. In: Manning, C., Raghavan, P., Schütze, H. (eds.) Introduction to Information Retrieval, pp. 1–18. Cambridge University Press, Cambridge (2008). https://nlp.stanford.edu/IR-book/pdf/01bool.pdf
Schweizer, T.J., Alassi, S., Mattmüller, M., Rosenthaler, L., Harbrecht, H.: Integrating his-torical scientific texts into the Bernoulli-Euler online platform. In: Digital Humanities. Montreal, Canada (2017). https://dh2017.adho.org/abstracts/147/147.pdf
Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21(3), 267–297 (2013). https://doi.org/10.1093/pan/mps028
Article Google Scholar
Zwaan, J., Smink, W., Sool, A., Westerhof, G., Veldkamp, B., Wiegersma, S.: Flexible NLP pipelines for digital humanities research. In: Proceedings of the 4th Digital Humanities Benelux Conference 2017, Montreal, Canada (2017). https://dh2017.adho.org/abstracts/215/215.pdf
Adamovich, I.M., Volkov, O.I.: The system of facts extraction from historical texts. Syst. Means Inform. 25(3), 235–250 (2015). https://doi.org/10.14357/08696527150315. (in Russian)
Santos, C., Xiang, B., Zhou, B.: Classifying relations by ranking with convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, vol. 1, pp. 626–634 (2015). https://doi.org/10.3115/v1/P15-1061
Meerkamp, P., Zhou, Z.: Information Extraction with Character-Level Neural Networks and Free Noisy Supervision. Cornell University Library. https://arxiv.org/pdf/1612.04118.pdf
Homma, Y., Sadamitsu, K., Nishida, K., Higashinaka, R., Asano, H., Matsuo, Y.: A hierarchical neural network for information extraction of product attribute and condition sentences. In: Proceedings of the Open Knowledge Base and Question Answering (OKBQA), Osaka, Japan, pp. 21–29 (2016). https://www.aclweb.org/anthology/W16-4403
Gogar, T., Hubacek, O., Sedivy, J.: Deep neural networks for web page information ex-traction. In: Proceedings of AIAI 2016: Artificial Intelligence Applications and Innovations, Thessaloniki, Greece, pp. 154–163 (2016).
Google Scholar
Gogar, T., Hubacek, O., Sedivy, J.: Deep neural networks for web page information extraction. In: Iliadis, L., Maglogiannis, I. (eds.) AIAI 2016. IAICT, vol. 475, pp. 154–163. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44944-9_14
Chapter Google Scholar
Glazkova, A., Kruzhinov, V., Sokova, Z. Automatic compilation of Person's information portraits as an instrument of historical research. In: CEUR Workshop Proceedings: Supplementary Proceedings of the 6th International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2017), Moscow, Russia, 27–29 July 2017, vol. 1975, pp. 56–62 (2017)
Google Scholar
Glazkova, A.V.: Named-entity biographic recognizer (NERbiografija). Patent RF, no. 2017616011 (2017). (in Russian)
Google Scholar
“Persons-1000” Collection. http://ai-center.botik.ru/Airec/index.php/ru/collections/28-persons-1000. (in Russian)

Download references

Acknowledgements

The authors wish to thank all those who participated in the field exercise and helped to make it a successful endeavor.

The reported study was funded by RFBR according to the research project 18-37-00272.

Author information

Authors and Affiliations

University of Tyumen, Volodarskogo Str. 6, 625003, Tyumen, Russia
Anna Glazkova, Valery Kruzhinov & Zinaida Sokova

Authors

Anna Glazkova
View author publications
You can also search for this author in PubMed Google Scholar
Valery Kruzhinov
View author publications
You can also search for this author in PubMed Google Scholar
Zinaida Sokova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Glazkova .

Editor information

Editors and Affiliations

Moscow State University, Moscow, Russia
Vladimir Sukhomlin
Moscow State University, Moscow, Russia
Elena Zubareva

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Glazkova, A., Kruzhinov, V., Sokova, Z. (2021). Automatic Text Processing for Historical Research. In: Sukhomlin, V., Zubareva, E. (eds) Modern Information Technology and IT Education. SITITO 2017. Communications in Computer and Information Science, vol 1204. Springer, Cham. https://doi.org/10.1007/978-3-030-78273-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-78273-3_14
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78272-6
Online ISBN: 978-3-030-78273-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics