Skip to main content

KEMMRL: Knowledge Extraction Model for Morphologically Rich Languages

  • 692 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13891)


There is a growing interest in automatic text processing and knowledge extraction from text repositories which often requires building new language resources and technologies. We present the KEMMRL model designed for the under-resourced but morphologically rich Croatian language. The proposed model uses natural language processing techniques, state-of-the-art deep learning algorithms and a rule-based approach to generate knowledge representations. The output of the newly developed HRtagger and HRparser methods in combination with the KEMMRL model is knowledge represented in the form of an ordered recursive hypergraph. Since the performance of KEMMRL is highly dependent on the applied deep learning methods, we evaluated them using hr500k reference corpus in the training and testing phase and manually designed out-of-domain Semantic Hypergraph Corpus (SemCro). The results of standard evaluation metrics showed that the HRtagger and HRparser achieved significantly better results than other state-of-the-art methods. These methods also showed the best results in measuring the structural similarity of hypergraphs, the highest average similarity to the manually annotated semantic hypergraphs and the number of semantic hyperedges correctly annotated by the model. The semantic hypergraph proved to be an ideal structure to capture and represent knowledge from more complex sentences without information loss. Researchers and developers of similar morphologically rich languages can customize and extend KEMMRL to their requirements. This article highlights the potential benefits of implementing the KEMMRL model into an Intelligent Tutoring System (ITS), and future research may focus on developing and testing such implementations.


  • Knowledge extraction
  • natural language processing
  • deep learning techniques
  • morphologically rich languages
  • knowledge representation
  • semantic hypergraph

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. ReLDIanno – text annotation service for processing slovenian, croatian and serbian – CLARIN Slovenia. Accessed 21 Jan 2022

  2. Agić, Ž., Merkler, D., Berović, D.: Parsing croatian and serbian by using croatian dependency treebanks. In: Proceedings of the 4th Workshop on Statistical Parsing of Morphologically-Rich Languages, pp. 22–33. Association for Computational Linguistics, Seattle, Washington, USA (2013).

  3. Agić, Ž., Tiedemann, J., Merkler, D., Krek, S., Dobrovoljc, K., Može, S.: Cross-lingual dependency parsing of related languages with rich morphosyntactic Tagsets. In: Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants, pp. 13–24. Association for Computational Linguistics, Doha, Qatar (2014).

  4. Željko Agić, Ljubešić, N.: The SETIMES.HR linguistically annotated corpus of croatian, pp. 1724–1727 (2014).

  5. Željko Agić, Ljubešić, N.: Universal dependencies for croatian (that work for serbian, too), pp. 1–8 (2015). http://universaldependencies.github,

  6. Željko Agić, Ljubešić, N., Merkler, D.: Lemmatization and morphosyntactic tagging of croatian and serbian. In: Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, pp. 48–57 (2013).,

  7. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web, pp. 2670–2676 (2007)

    Google Scholar 

  8. Batanović, V., Cvetanović, M., Nikolic, B.: A versatile framework for resource-limited sentiment articulation, annotation, and analysis of short texts. PLOS ONE 15, e0242050 (2020).

  9. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018).

  10. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011).,

  11. Eberendu, A.C.: Unstructured data: an overview of the data of big data. Int. J. Comput. Trends Technol. 38, 46–50 (2016).

  12. Erjavec, T.: Multext-east: morphosyntactic resources for central and eastern European languages. Lang. Res. Eval. 46, 131–142 (2012).,

  13. Halácsy, P., Kornai, A., Oravecz, C.: Hunpos-an open source trigram tagger, pp. 209–212 (2007).,

  14. Ljubešić, N., Agić, Ž., Klubička, F., Batanović, V., Erjavec, T.: Training corpus hr500k 1.0 (2018)., slovenian language resource repository CLARIN.SI

  15. Ljubesic, N., et al., (eds.) Proceedings of the 10th International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, 23–28 May 2016. European Language Resources Association (ELRA) (2016).

  16. Ljubešić, N., Lauc, D.: BERTić- The transformer language model for bosnian, croatian, montenegrin and serbian, pp. 37–42 (2021).,

  17. de Marneffe, M.C., Manning, C.D., Nivre, J., Zeman, D.: Universal dependencies. Comput. Linguist. 47, 255–308 (2021).,

  18. Menezes, T., Roth, C.: Semantic hypergraphs. CoRR abs/1908.10784 (2019).

  19. Menezes, T., Roth, C.: Semantic hypergraphs. (2019).

  20. Paroubek, P., Chaudiron, S., Hirschman, L., Chaudiron, S., Hirschman, L.: Principles of evaluation in natural language processing. Revue TAL 48, 7–31 (2007).

  21. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951).

  22. Stankov, S., Rosić, M., Žitko, B., Grubišić, A.: Tex-sys model for building intelligent tutoring systems. Comput. Educ. 5, 1017–1036 (2008)

    CrossRef  Google Scholar 

  23. Ulčar, M., Robnik-Šikonja, M.: Finest Bert and Crosloengual Bert, pp. 104–111. Springer International Publishing (2020)

    Google Scholar 

  24. Vaswani, A., et al.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017).

Download references


The paper is part of the work supported by the Office of Naval Research Grant No.N00014-20-1-2066

Author information

Authors and Affiliations


Corresponding author

Correspondence to Daniel Vasić .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vasić, D., Žitko, B., Grubišić, A., Gašpar, A. (2023). KEMMRL: Knowledge Extraction Model for Morphologically Rich Languages. In: Frasson, C., Mylonas, P., Troussas, C. (eds) Augmented Intelligence and Intelligent Tutoring Systems. ITS 2023. Lecture Notes in Computer Science, vol 13891. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-32882-4

  • Online ISBN: 978-3-031-32883-1

  • eBook Packages: Computer ScienceComputer Science (R0)