Skip to main content

Automatic Text Processing for Historical Research

  • Conference paper
  • First Online:
Modern Information Technology and IT Education (SITITO 2017)

Abstract

This article is devoted to the main possibilities of using natural language text processing technologies for historical research. The authors present a taxonomy of biographical facts and use text mining technologies (extraction of information from texts) to obtain biographical information from texts in Russian in accordance with the proposed types of biographical facts. The conducting of biographical research implies the need to view and study large amounts of textual information. The considerable part of the sources is currently presented in electronic form, which allows researchers to apply modern methods of extracting information to them. The article presents an overview of the main areas of application of technologies for automatic processing natural language texts in the humanities and describes the approach to the creation and implementation of the instrument for the extraction and systematization of biographical facts. This instrument will be useful to both historians and other users interested in biographical research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hockey, S.: Electronic Texts in the Humanities: Principles and Practice. Oxford University Press, Oxford (2000). https://doi.org/10.1093/acprof:oso/9780198711940.001.0001

  2. Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  3. Merono-Penuela, A., et al.: Semantic technologies for historical research: a survey. Semant. Web. 6(6), 539–564 (2015). https://doi.org/10.3233/SW-140158

    Article  Google Scholar 

  4. Landje, D.V., Snarskij, A.A., Bezsudnov, I.V.: Internet: Navigation in Complex Networks: Models and Algorithms. Librokom (URSS), Moscow (2009).(in Russian)

    Google Scholar 

  5. Piotrowski, M.: Natural language processing for historical texts. Syn. Lect. Hum. Lang. Technol. 5(2), 1–157 (2012). https://doi.org/10.2200/S00436ED1V01Y201207HLT017

    Article  Google Scholar 

  6. Basegroup Labs: Data Analysis Technologies. http://basegroup.ru. (in Russian)

  7. Text Mining (Big Data, Unstructured Data). http://www.statsoft.com. (in Russian)

  8. Boolean retrieval. In: Manning, C., Raghavan, P., Schütze, H. (eds.) Introduction to Information Retrieval, pp. 1–18. Cambridge University Press, Cambridge (2008). https://nlp.stanford.edu/IR-book/pdf/01bool.pdf

  9. Schweizer, T.J., Alassi, S., Mattmüller, M., Rosenthaler, L., Harbrecht, H.: Integrating his-torical scientific texts into the Bernoulli-Euler online platform. In: Digital Humanities. Montreal, Canada (2017). https://dh2017.adho.org/abstracts/147/147.pdf

  10. Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21(3), 267–297 (2013). https://doi.org/10.1093/pan/mps028

    Article  Google Scholar 

  11. Zwaan, J., Smink, W., Sool, A., Westerhof, G., Veldkamp, B., Wiegersma, S.: Flexible NLP pipelines for digital humanities research. In: Proceedings of the 4th Digital Humanities Benelux Conference 2017, Montreal, Canada (2017). https://dh2017.adho.org/abstracts/215/215.pdf

  12. Adamovich, I.M., Volkov, O.I.: The system of facts extraction from historical texts. Syst. Means Inform. 25(3), 235–250 (2015). https://doi.org/10.14357/08696527150315. (in Russian)

  13. Santos, C., Xiang, B., Zhou, B.: Classifying relations by ranking with convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, vol. 1, pp. 626–634 (2015). https://doi.org/10.3115/v1/P15-1061

  14. Meerkamp, P., Zhou, Z.: Information Extraction with Character-Level Neural Networks and Free Noisy Supervision. Cornell University Library. https://arxiv.org/pdf/1612.04118.pdf

  15. Homma, Y., Sadamitsu, K., Nishida, K., Higashinaka, R., Asano, H., Matsuo, Y.: A hierarchical neural network for information extraction of product attribute and condition sentences. In: Proceedings of the Open Knowledge Base and Question Answering (OKBQA), Osaka, Japan, pp. 21–29 (2016). https://www.aclweb.org/anthology/W16-4403

  16. Gogar, T., Hubacek, O., Sedivy, J.: Deep neural networks for web page information ex-traction. In: Proceedings of AIAI 2016: Artificial Intelligence Applications and Innovations, Thessaloniki, Greece, pp. 154–163 (2016).

    Google Scholar 

  17. Gogar, T., Hubacek, O., Sedivy, J.: Deep neural networks for web page information extraction. In: Iliadis, L., Maglogiannis, I. (eds.) AIAI 2016. IAICT, vol. 475, pp. 154–163. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44944-9_14

    Chapter  Google Scholar 

  18. Glazkova, A., Kruzhinov, V., Sokova, Z. Automatic compilation of Person's information portraits as an instrument of historical research. In: CEUR Workshop Proceedings: Supplementary Proceedings of the 6th International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2017), Moscow, Russia, 27–29 July 2017, vol. 1975, pp. 56–62 (2017)

    Google Scholar 

  19. Glazkova, A.V.: Named-entity biographic recognizer (NERbiografija). Patent RF, no. 2017616011 (2017). (in Russian)

    Google Scholar 

  20. “Persons-1000” Collection. http://ai-center.botik.ru/Airec/index.php/ru/collections/28-persons-1000. (in Russian)

Download references

Acknowledgements

The authors wish to thank all those who participated in the field exercise and helped to make it a successful endeavor.

The reported study was funded by RFBR according to the research project 18-37-00272.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Glazkova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Glazkova, A., Kruzhinov, V., Sokova, Z. (2021). Automatic Text Processing for Historical Research. In: Sukhomlin, V., Zubareva, E. (eds) Modern Information Technology and IT Education. SITITO 2017. Communications in Computer and Information Science, vol 1204. Springer, Cham. https://doi.org/10.1007/978-3-030-78273-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78273-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78272-6

  • Online ISBN: 978-3-030-78273-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics