Multi-level Modeling of Structural Elements of Natural Language Texts and Its Applications

Anikin, Anton; Sychev, Oleg; Gurtovoy, Vladislav

doi:10.1007/978-3-319-99316-4_1

Anton Anikin¹⁵,
Oleg Sychev¹⁵ &
Vladislav Gurtovoy¹⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 848))

Included in the following conference series:

Biologically Inspired Cognitive Architectures Meeting

564 Accesses
6 Citations

Abstract

Methods of extracting knowledge in the analysis of large volumes of natural language texts are relevant for solving various problems in the field of analysis and generation of textual information, such as text analysis for extracting data, fact and semantics; presenting extracted information in a convenient for machine processing form (for example, ontology); classification and clustering texts, including thematic modeling; information retrieval (including thematic search, search based on the user model, ontology-based models, document sample based search); texts abstracting and annotating; developing of intelligent question-answering systems; generating texts of different types (fiction, marketing, weather forecasts etc.); as well as rewriting texts, preserving the meaning of the original text for presenting it to different target audiences. In order for such methods to work, it is necessary to construct and use models that adequately describe structural elements of the text on different levels (individual words, sentences, thematic text fragments), their characteristics and semantics, as well as relations between them, allowing to form higher-level structures. Such models should also take into account general characteristics of textual data: genre, purpose, target audience, scientific field and others. In this paper, authors review three main approaches to text modeling (structural, statistical and hybrid), their characteristics, pros and cons and applicability on different stages (knowledge extraction, storage and text generation) of solving problems in the field of analysis and generation of textual information.

This paper presents the results of research carried out under the RFBR grant 18-07-00032 “Intelligent support of decision making of knowledge management for learning and scientific research based on the collaborative creation and reuse of the domain information space and ontology knowledge representation model”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anikin, A., Litovkin, D., Kultsova, M., Sarkisova, E.: Ontology-based collaborative development of domain information space for learning and scientific research. In: Ngonga Ngomo, A.C., Křemen, P. (eds.) Proceedings of Knowledge Engineering and Semantic Web: 7th International Conference, KESW 2016, 21-23 September 2016, Prague, Czech Republic, pp. 301–315 (2016)
Google Scholar
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The berkeley FrameNet project. In: COLING-ACL 1998, Proceedings of the Conference, Montreal, Canada, pp. 86–90 (1998)
Google Scholar
Balikas, G., Amini, M.R., Clausel, M.: On a topic model for sentences. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 921–924. SIGIR 2016. ACM, New York (2016). https://doi.org/10.1145/2911451.2914714
Bense, H.: Using very large scale ontologies for natural language generation. In: JOWO. CEUR Workshop Proceedings, vol. 2050. CEUR-WS.org (2017)
Google Scholar
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 113–120. ACM, New York (2006). https://doi.org/10.1145/1143844.1143859
Boas, H.C.: From Theory to Practice: Frame Semantics and the Design of FrameNet, pp. 129–160. Narr, Tübingen (2005)
Google Scholar
Daza, A., Calvo, H., Figueroa-Nazuno, J.: Automatic text generation by learning from literary structures. In: Proceedings of the Fifth Workshop on Computational Linguistics for Literature, pp. 9–19. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/W16-0202
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Konstas, I., Lapata, M.: A global model for concept-to-text generation. J. Artif. Int. Res. 48(1), 305–346 (2013). http://dl.acm.org/citation.cfm?id=2591248.2591256
Le, H.T., Abeysinghe, G.: A study to improve the efficiency of a discourse parsing system. In: Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing, pp. 101–114. Springer, Heidelberg (2003)
Chapter Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on International Conference on Machine Learning. ICML2014, vol. 32, pp. II–1188–II–1196. JMLR.org (2014)
Google Scholar
Lebret, R., Grangier, D., Auli, M.: Generating text from structured data with application to the biography domain. CoRR abs/1603.07771 (2016). http://arxiv.org/abs/1603.07771
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text 8(3), 243–281 (1988)
Article Google Scholar
Mehler, A., Waltinger, U., Wegner, A.: A formal text representation model based on lexical chaining. In: Proceedings of the KI 2007 Workshop on Learning from Non-Vectorial Data (LNVD 2007), 10 September, Universität Osnabrück, pp. 17–26 (2007)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781
Prendinger, H., Piwek, P., Ishizuka, M.: Automatic generation of multi-modal dialogue from text based on discourse structure analysis. In: International Conference on Semantic Computing. ICSC 2007, pp. 27–36, September 2007
Google Scholar
Wu, Z., Zheng, X., Dahlmeier, D.: Character-based text classification using top down semantic model for sentence representation. CoRR abs/1705.10586 (2017). http://arxiv.org/abs/1705.10586
Yang, L., Li, C., Ding, Q., Li, L.: Combining lexical and semantic features for short text classification. Procedia Comput. Sci. 22, 78–86 (2013). 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems - KES2013
Article Google Scholar

Download references

Author information

Authors and Affiliations

Volgograd State Technical University, Volgograd, Russia
Anton Anikin, Oleg Sychev & Vladislav Gurtovoy

Authors

Anton Anikin
View author publications
You can also search for this author in PubMed Google Scholar
Oleg Sychev
View author publications
You can also search for this author in PubMed Google Scholar
Vladislav Gurtovoy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anton Anikin .

Editor information

Editors and Affiliations

Department of Cybernetics, National Research Nuclear University “MEPhI”, Moscow, Russia
Alexei V. Samsonovich

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anikin, A., Sychev, O., Gurtovoy, V. (2019). Multi-level Modeling of Structural Elements of Natural Language Texts and Its Applications. In: Samsonovich, A. (eds) Biologically Inspired Cognitive Architectures 2018. BICA 2018. Advances in Intelligent Systems and Computing, vol 848. Springer, Cham. https://doi.org/10.1007/978-3-319-99316-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-99316-4_1
Published: 24 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99315-7
Online ISBN: 978-3-319-99316-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics