Skip to main content

Recognizing Verb-Based Croatian Idiomatic MWUs

  • Conference paper
  • First Online:
Automatic Processing of Natural-Language Electronic Texts with NooJ (NooJ 2015)

Abstract

This paper tackles the computational problems of Croatian verbal idioms. Croatian language has very rich phraseme structure, as described in Matešić (1982), Menac (2007) and Menac-Mihalić (2007), as well as many others. This work is one of the few attempts of computational analyis of idioms in Croatian language as multi-word units. We used rule-based approach and NooJ syntactic grammars in order to recognize any verb based idiom (of the ~1500 analyzed) in any syntactic position. The Croatian Dictionary of Idioms (Menac et al. 2003) was used for the initial list, which was implemented with new additions during training phase. Grammars were tested within the corpora constructed specifically for this work, and used to calculate statistical measures of recall, precision and f-measure for our grammars. With the final results of recall < 98 %, precision < 96 % and f-measure < 97 %, we consider this a successful attempt in the recognition of verb based idioms in Croatian language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    An on-line version of Croatian corpus of idioms is prepared and maintained by Rittgasser and Fink-Arsovski at http://www.lingua-hr.de/phraseologie/stichwort.html.

  2. 2.

    In the two papers published (Kocijan and Librenjak, 2016a, 2016b) we have used a special category NW to describe the MWUs. Since that feature is no longer supported in NooJ 5, we have decided to change the NW notation with the FXC. The remaining of our dictionaries and grammars remain the same.

References

  • Agić, Ž., Ljubešić, N.: The SETimes. HR linguistically annotated corpus of croatian. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, pp. 1724–1727, Reykjavik (2014)

    Google Scholar 

  • Arsovski, F.Ž., Kovačević, B., Hrnjak, A.: Bibliografija hrvatske frazeologije i popis frazema analiziranih u znanstvenim i stručnim radovima. Knjigra, Zagreb (2010)

    Google Scholar 

  • Chatzitheodorou, K.: Paraphrasing of Italian support verb constructions based on lexical and grammatical resources. In: Proceedings of the Workshop on Lexical and Grammatical Resources for Language Processing, Coling 2014, Dublin, Ireland, pp.1–7 (2014)

    Google Scholar 

  • Fink, Ž., Menac, A.: Hrvatska frazeologija – staro i novo. In: Mokienko, W., Walter, H. (eds). Frazeologia. Komparacja spółczesnych języków słowiańskich, 3. Opole: Universität Greifswald – Institut für Slawistik, Uniwersytet Opolski – Instytut Filologii Polskiej, pp. 88–100 (2008)

    Google Scholar 

  • Gavriilidou Z., Papadopoulou E., Chadjipapa E.: Processing greek frozen expressions with NooJ. In: Formalising Natural Languages with NooJ: Selected Papers from the NooJ 2011 International Conference, Dubrovnik, Croatia, pp. 63–74. Cambridge Scholars Publishing, Newcastle (2012)

    Google Scholar 

  • Granger, S., Paquot, M., Rayson, P.: Extraction of multiword units from EFL and native English corpora. the phraseology of the verb ‘make’. In: Buhofer, A.H., Burger, H. (eds.) Phraseology in Motion I, pp. 57–68. Schneider, Baltmannsweiler (2006)

    Google Scholar 

  • Kocijan, K., Librenjak, S.: The quest for croatian idioms as multi word units. In: Monti, J., Mitkov, R., Corpas Pastor, G., Seretan, V. (eds.) Multiword Units in Machine Translation and Translation Technology. John Benjamins Publishing, Amsterdam (2016a)

    Google Scholar 

  • Kocijan, K., Librenjak, S.: Comparative idioms in croatian: MWU approach. In: Corpas Pastor, G. (ed.) Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives, pp. 523–532. Editions Tradulex, Geneva (2016b)

    Google Scholar 

  • Ljubešić, N., Dobrovoljc, K., Krek, S., Antonić, M.P., Fišer, D.: hrMWELex – a MWE lexicon of croatian extracted from a parsed gigacorpus. In: Language Technologies: Proceedings of the 17th International Multiconference Information Society, IS2014, Ljubljana, Slovenia (2014)

    Google Scholar 

  • Machonis, P.A.: English phrasal verbs: from lexicon-grammar to natural language processing. South. J. Linguist. 34(1), 21–48 (2010)

    Google Scholar 

  • Machonis P.A.: Sorting NooJ out to take multiword expressions into account. In: Vučković, K., Bekavac, B., Silberztein, M. (eds.) Formalising Natural Languages with NooJ: Selected Papers from the NooJ 2011 International Conference, Dubrovnik, Croatia, pp. 152–165. Cambridge Scholars Publishing, Newcastle (2012)

    Google Scholar 

  • Matešić, J.: Frazeološki rječnik hrvatskoga ili srpskog jezika. Školska knjiga, Zagreb (1982)

    Google Scholar 

  • Menac, A., Arsovski, F.Ž., Venturin, R.: Hrvatski frazeološki rječnik. Naklada Ljevak, Zagreb (2003)

    Google Scholar 

  • Menac, A.: Hrvatska frazeologija. Knjigra, Zagreb (2007)

    Google Scholar 

  • Menac-Mihalić, M.: Hrvatski dijalektni frazemi s antroponimom kao sastavnicom. In: Folia Onomastica Croatica, no. 12/13, pp. 361-385 (2007)

    Google Scholar 

  • Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  • Sakaeva, L.R., Nurullina, A.G.: Comparative analysis of verbal, adjectival, adverbial and modal phraseological units with a lexeme “devil” in english and russian languages. Middle-East J. Sci. Res. 18(1), 50–54 (2013). doi:10.5829/idosi.mejsr.2013.18.1.12354

    Google Scholar 

  • Silberztein, M.: NooJ Manual (2003). www.nooj4nlp.net

  • Todorova M.: Morpho-syntactic properties of bulgarian verbal idiomatic expressions. In Blanco, X., Silberztein, M., (eds.) Proceedings of the 2007 International NooJ Conference, pp. 273–279. Cambridge Scholars Publishing, Newcastle (2008)

    Google Scholar 

  • Wehrli, E.: Translating idioms. In: Proceedings of the 36th Annual Meeting of the ACL and 17th International Conference on Computational Linguistics: COLING/ACL-98, Montreal, Canada, pp. 1388–1392 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kristina Kocijan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Kocijan, K., Librenjak, S. (2016). Recognizing Verb-Based Croatian Idiomatic MWUs. In: Okrut, T., Hetsevich, Y., Silberztein, M., Stanislavenka, H. (eds) Automatic Processing of Natural-Language Electronic Texts with NooJ. NooJ 2015. Communications in Computer and Information Science, vol 607. Springer, Cham. https://doi.org/10.1007/978-3-319-42471-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42471-2_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42470-5

  • Online ISBN: 978-3-319-42471-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics