Skip to main content

Multi-level Modeling of Structural Elements of Natural Language Texts and Its Applications

  • Conference paper
  • First Online:
Book cover Biologically Inspired Cognitive Architectures 2018 (BICA 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 848))

Included in the following conference series:

Abstract

Methods of extracting knowledge in the analysis of large volumes of natural language texts are relevant for solving various problems in the field of analysis and generation of textual information, such as text analysis for extracting data, fact and semantics; presenting extracted information in a convenient for machine processing form (for example, ontology); classification and clustering texts, including thematic modeling; information retrieval (including thematic search, search based on the user model, ontology-based models, document sample based search); texts abstracting and annotating; developing of intelligent question-answering systems; generating texts of different types (fiction, marketing, weather forecasts etc.); as well as rewriting texts, preserving the meaning of the original text for presenting it to different target audiences. In order for such methods to work, it is necessary to construct and use models that adequately describe structural elements of the text on different levels (individual words, sentences, thematic text fragments), their characteristics and semantics, as well as relations between them, allowing to form higher-level structures. Such models should also take into account general characteristics of textual data: genre, purpose, target audience, scientific field and others. In this paper, authors review three main approaches to text modeling (structural, statistical and hybrid), their characteristics, pros and cons and applicability on different stages (knowledge extraction, storage and text generation) of solving problems in the field of analysis and generation of textual information.

This paper presents the results of research carried out under the RFBR grant 18-07-00032 “Intelligent support of decision making of knowledge management for learning and scientific research based on the collaborative creation and reuse of the domain information space and ontology knowledge representation model”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anikin, A., Litovkin, D., Kultsova, M., Sarkisova, E.: Ontology-based collaborative development of domain information space for learning and scientific research. In: Ngonga Ngomo, A.C., Křemen, P. (eds.) Proceedings of Knowledge Engineering and Semantic Web: 7th International Conference, KESW 2016, 21-23 September 2016, Prague, Czech Republic, pp. 301–315 (2016)

    Google Scholar 

  2. Baker, C.F., Fillmore, C.J., Lowe, J.B.: The berkeley FrameNet project. In: COLING-ACL 1998, Proceedings of the Conference, Montreal, Canada, pp. 86–90 (1998)

    Google Scholar 

  3. Balikas, G., Amini, M.R., Clausel, M.: On a topic model for sentences. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 921–924. SIGIR 2016. ACM, New York (2016). https://doi.org/10.1145/2911451.2914714

  4. Bense, H.: Using very large scale ontologies for natural language generation. In: JOWO. CEUR Workshop Proceedings, vol. 2050. CEUR-WS.org (2017)

    Google Scholar 

  5. Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 113–120. ACM, New York (2006). https://doi.org/10.1145/1143844.1143859

  6. Boas, H.C.: From Theory to Practice: Frame Semantics and the Design of FrameNet, pp. 129–160. Narr, Tübingen (2005)

    Google Scholar 

  7. Daza, A., Calvo, H., Figueroa-Nazuno, J.: Automatic text generation by learning from literary structures. In: Proceedings of the Fifth Workshop on Computational Linguistics for Literature, pp. 9–19. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/W16-0202

  8. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  9. Konstas, I., Lapata, M.: A global model for concept-to-text generation. J. Artif. Int. Res. 48(1), 305–346 (2013). http://dl.acm.org/citation.cfm?id=2591248.2591256

  10. Le, H.T., Abeysinghe, G.: A study to improve the efficiency of a discourse parsing system. In: Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing, pp. 101–114. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  11. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on International Conference on Machine Learning. ICML2014, vol. 32, pp. II–1188–II–1196. JMLR.org (2014)

    Google Scholar 

  12. Lebret, R., Grangier, D., Auli, M.: Generating text from structured data with application to the biography domain. CoRR abs/1603.07771 (2016). http://arxiv.org/abs/1603.07771

  13. Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text 8(3), 243–281 (1988)

    Article  Google Scholar 

  14. Mehler, A., Waltinger, U., Wegner, A.: A formal text representation model based on lexical chaining. In: Proceedings of the KI 2007 Workshop on Learning from Non-Vectorial Data (LNVD 2007), 10 September, Universität Osnabrück, pp. 17–26 (2007)

    Google Scholar 

  15. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781

  16. Prendinger, H., Piwek, P., Ishizuka, M.: Automatic generation of multi-modal dialogue from text based on discourse structure analysis. In: International Conference on Semantic Computing. ICSC 2007, pp. 27–36, September 2007

    Google Scholar 

  17. Wu, Z., Zheng, X., Dahlmeier, D.: Character-based text classification using top down semantic model for sentence representation. CoRR abs/1705.10586 (2017). http://arxiv.org/abs/1705.10586

  18. Yang, L., Li, C., Ding, Q., Li, L.: Combining lexical and semantic features for short text classification. Procedia Comput. Sci. 22, 78–86 (2013). 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems - KES2013

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anton Anikin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Anikin, A., Sychev, O., Gurtovoy, V. (2019). Multi-level Modeling of Structural Elements of Natural Language Texts and Its Applications. In: Samsonovich, A. (eds) Biologically Inspired Cognitive Architectures 2018. BICA 2018. Advances in Intelligent Systems and Computing, vol 848. Springer, Cham. https://doi.org/10.1007/978-3-319-99316-4_1

Download citation

Publish with us

Policies and ethics