Abstract
This paper considers the problem of recognizing information about the spatial relations of geographical objects in natural language texts. The proposed technology of spatial relations visualization makes it possible to extract geographical information from unstructured texts and present it in a structured form suitable for use by modern geographical information systems. The extracted information can be used for automated filling and updating the database of geographic information system for generating cartograms that allow visual analysis of the spatial connectivity of arbitrary geographical objects. To get the best result, we combine neural network methods to recognize named entities (geographical objects), a rule-based approach to identify potential spatial relationships, and domain-specific lexical patterns. Since the information about recognized geographical entities and spatial relationships is stored in a standardized structured form, we can use standard geoservices for visualization without additional preparation of geodata in the last stage of the technology. The result of the visualization is a geographical image (cartogram) showing the spatial relations of geographical objects recognized as a result of the analysis of natural language texts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, M.: An overview of natural language processing. Int. J. Res. Appl. Sci. Eng. Technol. (IJRASET) 7, 2811–2813 (2019)
Pilecki, B.M., Vicentiy, A.V.: Development of a method for extracting spatial data from texts for visualization and information decision-making support for territorial management. In: IOP Conference Series: Earth and Environmental Science. Institute of Physics Publishing (2020). https://doi.org/10.1088/1755-1315/539/1/012087
Pedrosa, J., Oliveira, D.M., Meira, W., Ribeiro, A.L.: Automated classification of cardiology diagnoses based on textual medical reports. In: Proceedings of the 8th Symposium on Knowledge Discovery, Mining and Learning (KDMILE 2020), pp. 185–192 (2020)
Purves, R.S., Clough, P., Jones, C.B., Hall, M.H., Murdock, V.: Geographic information retrieval: progress and challenges in spatial search of text. In: Foundations and Trends in Information Retrieval, vol. 12, pp. 164–318. Now Publishers Inc. (2018)
Vicentiy, A.V., Dikovitsky, V.V., Shishaev, M.G.: Automated extraction and visualization of spatial data obtained by analyzing texts about projects of arctic transport logistics development. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds.) Intelligent Systems Applications in Software Engineering. AISC, vol. 1046, pp. 419–433. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30329-7_37
Grishman, R.: Information extraction. In: Clark, A., Fox, C., Lappin, S. (eds.): The Handbook of Computational Linguistics and Natural Language Processing, pp. 515–530. Wiley-Blackwell, Malden (2010)
Doddington, G.R., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation (2004). http://www.lrec-conf.org/proceedings/lrec2004/pdf/5.pdf
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Int. J. Linguist. Lang. Resour. Investig. 30, 3–26 (2007). https://doi.org/10.1075/li.30.1.03nad
Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pp. 147–155. Association for Computational Linguistics (2009)
Campelo, C.E.C., de Souza Baptista, C.: A model for geographic knowledge extraction on web documents. In: Heuser, C.A., Pernul, G. (eds.) Advances in Conceptual Modeling - Challenging Perspectives. LNCS, vol. 5833, pp. 317–326. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04947-7_38
Campelo, E.C., Baptista, C.S.: Mining influential terms for toponym recognition and resolution. Revista Brasileira de Cartografia. 68, 1123–1132 (2016)
Zenasni, S., Kergosien, E., Roche, M., Teisseire, M.: Spatial information extraction from short messages. Expert Syst. Appl. 95, 351–367 (2018)
Acheson, E., Volpi, M., Purves, R.S.: Machine learning for cross-gazetteer matching of natural features. Int. J. Geogr. Inf. 34, 1–27 (2019)
Capineri, C., et al.: European Handbook of Crowdsourced Geographic Information, p. 474. Ubiquity Press, London (2016). https://doi.org/10.5334/bax
Yadav, V., Bethard, S.: A survey on recent advances in named entity recognition from deep learning models. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 2145–2158. Association for Computational Linguistics, Santa Fe (2018)
Stock, K.: Mining location from social media: a systematic review (2018). www.elsevier.com/locate/ceus, https://doi.org/10.1016/j.compenvurbsys.2018.05.007
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Ivanitskiy, R., Shipilo, A., Kovriguina, L.: Russian named entities recognition and classification using distributed word and phrase representations. In: Proceedings of the 3rd Annual International Symposium on Information Management and Big Data - SIMBig 2016, Cusco, Peru, 1–3 September 2016, pp. 150–156. CEUR-WS.org (2016)
Alfred, R., et al.: A rule-based named-entity recognition for Malay articles. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) Advanced Data Mining and Applications. LNCS (LNAI), vol. 8346, pp. 288–299. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-53914-5_25
Song, H.J., Jo, B.C., Park, C.Y., Kim, J.D., Kim, Y.S.: Comparison of named entity recognition methodologies in biomedical documents. Biomed. Eng. Online 17, 158 (2018). https://doi.org/10.1186/s12938-018-0573-6
Eftimov, T., Seljak, B.K., Korošec, P.: A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations. PLoS ONE 12 (2017). https://doi.org/10.1371/journal.pone.0179488
Işıklar, Y.E., Çiçekli, N.: A TV content augmentation system exploiting rule based named entity recognition method. In: Abdelrahman, O.H., Gelenbe, E., Gorbil, G., Lent, R. (eds.) Information Sciences and Systems 2015. LNEE, vol. 363, pp. 349–357. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-22635-4_32
Sari, Y., Hassan, M.F., Zamin, N.: Rule-based pattern extractor and named entity recognition: a hybrid approach. In: Proceedings 2010 International Symposium on Information Technology - Engineering Technology, ITSim 2010, pp. 563–568 (2010). https://doi.org/10.1109/ITSIM.2010.5561392
Piskorski, J., Pivovarova, L., Šnajder, J., Steinberger, J., Yangarber, R.: The first cross-lingual challenge on recognition, normalization, and matching of named entities in Slavic languages. In: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, pp. 76–85. Association for Computational Linguistics, Stroudsburg (2017). https://doi.org/10.18653/v1/w17-1412
Piskorski, J., et al.: The second cross-lingual challenge on recognition, normalization, classification, and linking of named entities across Slavic languages. In: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pp. 63–74. Association for Computational Linguistics, Stroudsburg (2019). https://doi.org/10.18653/v1/W19-3709
Official BERT github. https://github.com/google-research/bert. Accessed 17 Aug 2021
Official Natasha. https://natasha.github.io/. Accessed 17 Aug 2021
Brykina, M.M., Faynveyts, A.V., Toldova, S.Yu.: Dictionary-based ambiguity resolution in Russian named entities recognition: a case study, vol. 1, pp. 163–177 (2013)
Earley, J.: An efficient context-free parsing algorithm. In: Communications of the ACM, vol. 13, pp. 94–102 (1970)
Official Yargy parser. https://github.com/natasha/yargy. Accessed 17 Aug 2021
Naykhanova, L.V.: The technology of creating methods for automatically constructing ontologies using genetic and automatic programming, UlanUde, p. 244 (2008). (in Russian)
Nenausnikov, K.V., Kuleshov, S.V.: Algorithm of automatic selection of collocations from the text. J. Instrum. Eng. 62(11), 976–981 (2019). (in Russian)
Apresyan, Y.D.: Three-level control theory: lexicographical aspect. In: Typology of Language and Theory of Grammar, pp. 17–21 (2007). (in Russian)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Vicentiy, A.V., Shishaev, M.G. (2021). The Technology of Spatial Relations Visualization Based on the Analysis of Natural Language Texts. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds) Software Engineering Application in Informatics. CoMeSySo 2021. Lecture Notes in Networks and Systems, vol 232. Springer, Cham. https://doi.org/10.1007/978-3-030-90318-3_78
Download citation
DOI: https://doi.org/10.1007/978-3-030-90318-3_78
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90317-6
Online ISBN: 978-3-030-90318-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)