Abstract
Methods of extracting knowledge in the analysis of large volumes of natural language texts are relevant for solving various problems in the field of analysis and generation of textual information, such as text analysis for extracting data, fact and semantics; presenting extracted information in a convenient for machine processing form (for example, ontology); classification and clustering texts, including thematic modeling; information retrieval (including thematic search, search based on the user model, ontology-based models, document sample based search); texts abstracting and annotating; developing of intelligent question-answering systems; generating texts of different types (fiction, marketing, weather forecasts etc.); as well as rewriting texts, preserving the meaning of the original text for presenting it to different target audiences. In order for such methods to work, it is necessary to construct and use models that adequately describe structural elements of the text on different levels (individual words, sentences, thematic text fragments), their characteristics and semantics, as well as relations between them, allowing to form higher-level structures. Such models should also take into account general characteristics of textual data: genre, purpose, target audience, scientific field and others. In this paper, authors review three main approaches to text modeling (structural, statistical and hybrid), their characteristics, pros and cons and applicability on different stages (knowledge extraction, storage and text generation) of solving problems in the field of analysis and generation of textual information.
This paper presents the results of research carried out under the RFBR grant 18-07-00032 “Intelligent support of decision making of knowledge management for learning and scientific research based on the collaborative creation and reuse of the domain information space and ontology knowledge representation model”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anikin, A., Litovkin, D., Kultsova, M., Sarkisova, E.: Ontology-based collaborative development of domain information space for learning and scientific research. In: Ngonga Ngomo, A.C., Křemen, P. (eds.) Proceedings of Knowledge Engineering and Semantic Web: 7th International Conference, KESW 2016, 21-23 September 2016, Prague, Czech Republic, pp. 301–315 (2016)
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The berkeley FrameNet project. In: COLING-ACL 1998, Proceedings of the Conference, Montreal, Canada, pp. 86–90 (1998)
Balikas, G., Amini, M.R., Clausel, M.: On a topic model for sentences. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 921–924. SIGIR 2016. ACM, New York (2016). https://doi.org/10.1145/2911451.2914714
Bense, H.: Using very large scale ontologies for natural language generation. In: JOWO. CEUR Workshop Proceedings, vol. 2050. CEUR-WS.org (2017)
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 113–120. ACM, New York (2006). https://doi.org/10.1145/1143844.1143859
Boas, H.C.: From Theory to Practice: Frame Semantics and the Design of FrameNet, pp. 129–160. Narr, Tübingen (2005)
Daza, A., Calvo, H., Figueroa-Nazuno, J.: Automatic text generation by learning from literary structures. In: Proceedings of the Fifth Workshop on Computational Linguistics for Literature, pp. 9–19. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/W16-0202
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Konstas, I., Lapata, M.: A global model for concept-to-text generation. J. Artif. Int. Res. 48(1), 305–346 (2013). http://dl.acm.org/citation.cfm?id=2591248.2591256
Le, H.T., Abeysinghe, G.: A study to improve the efficiency of a discourse parsing system. In: Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing, pp. 101–114. Springer, Heidelberg (2003)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on International Conference on Machine Learning. ICML2014, vol. 32, pp. II–1188–II–1196. JMLR.org (2014)
Lebret, R., Grangier, D., Auli, M.: Generating text from structured data with application to the biography domain. CoRR abs/1603.07771 (2016). http://arxiv.org/abs/1603.07771
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text 8(3), 243–281 (1988)
Mehler, A., Waltinger, U., Wegner, A.: A formal text representation model based on lexical chaining. In: Proceedings of the KI 2007 Workshop on Learning from Non-Vectorial Data (LNVD 2007), 10 September, Universität Osnabrück, pp. 17–26 (2007)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781
Prendinger, H., Piwek, P., Ishizuka, M.: Automatic generation of multi-modal dialogue from text based on discourse structure analysis. In: International Conference on Semantic Computing. ICSC 2007, pp. 27–36, September 2007
Wu, Z., Zheng, X., Dahlmeier, D.: Character-based text classification using top down semantic model for sentence representation. CoRR abs/1705.10586 (2017). http://arxiv.org/abs/1705.10586
Yang, L., Li, C., Ding, Q., Li, L.: Combining lexical and semantic features for short text classification. Procedia Comput. Sci. 22, 78–86 (2013). 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems - KES2013
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Anikin, A., Sychev, O., Gurtovoy, V. (2019). Multi-level Modeling of Structural Elements of Natural Language Texts and Its Applications. In: Samsonovich, A. (eds) Biologically Inspired Cognitive Architectures 2018. BICA 2018. Advances in Intelligent Systems and Computing, vol 848. Springer, Cham. https://doi.org/10.1007/978-3-319-99316-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-99316-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99315-7
Online ISBN: 978-3-319-99316-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)