Abstract
This paper tackles the computational problems of Croatian verbal idioms. Croatian language has very rich phraseme structure, as described in Matešić (1982), Menac (2007) and Menac-Mihalić (2007), as well as many others. This work is one of the few attempts of computational analyis of idioms in Croatian language as multi-word units. We used rule-based approach and NooJ syntactic grammars in order to recognize any verb based idiom (of the ~1500 analyzed) in any syntactic position. The Croatian Dictionary of Idioms (Menac et al. 2003) was used for the initial list, which was implemented with new additions during training phase. Grammars were tested within the corpora constructed specifically for this work, and used to calculate statistical measures of recall, precision and f-measure for our grammars. With the final results of recall < 98 %, precision < 96 % and f-measure < 97 %, we consider this a successful attempt in the recognition of verb based idioms in Croatian language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
An on-line version of Croatian corpus of idioms is prepared and maintained by Rittgasser and Fink-Arsovski at http://www.lingua-hr.de/phraseologie/stichwort.html.
- 2.
References
Agić, Ž., Ljubešić, N.: The SETimes. HR linguistically annotated corpus of croatian. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, pp. 1724–1727, Reykjavik (2014)
Arsovski, F.Ž., Kovačević, B., Hrnjak, A.: Bibliografija hrvatske frazeologije i popis frazema analiziranih u znanstvenim i stručnim radovima. Knjigra, Zagreb (2010)
Chatzitheodorou, K.: Paraphrasing of Italian support verb constructions based on lexical and grammatical resources. In: Proceedings of the Workshop on Lexical and Grammatical Resources for Language Processing, Coling 2014, Dublin, Ireland, pp.1–7 (2014)
Fink, Ž., Menac, A.: Hrvatska frazeologija – staro i novo. In: Mokienko, W., Walter, H. (eds). Frazeologia. Komparacja spółczesnych języków słowiańskich, 3. Opole: Universität Greifswald – Institut für Slawistik, Uniwersytet Opolski – Instytut Filologii Polskiej, pp. 88–100 (2008)
Gavriilidou Z., Papadopoulou E., Chadjipapa E.: Processing greek frozen expressions with NooJ. In: Formalising Natural Languages with NooJ: Selected Papers from the NooJ 2011 International Conference, Dubrovnik, Croatia, pp. 63–74. Cambridge Scholars Publishing, Newcastle (2012)
Granger, S., Paquot, M., Rayson, P.: Extraction of multiword units from EFL and native English corpora. the phraseology of the verb ‘make’. In: Buhofer, A.H., Burger, H. (eds.) Phraseology in Motion I, pp. 57–68. Schneider, Baltmannsweiler (2006)
Kocijan, K., Librenjak, S.: The quest for croatian idioms as multi word units. In: Monti, J., Mitkov, R., Corpas Pastor, G., Seretan, V. (eds.) Multiword Units in Machine Translation and Translation Technology. John Benjamins Publishing, Amsterdam (2016a)
Kocijan, K., Librenjak, S.: Comparative idioms in croatian: MWU approach. In: Corpas Pastor, G. (ed.) Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives, pp. 523–532. Editions Tradulex, Geneva (2016b)
Ljubešić, N., Dobrovoljc, K., Krek, S., Antonić, M.P., Fišer, D.: hrMWELex – a MWE lexicon of croatian extracted from a parsed gigacorpus. In: Language Technologies: Proceedings of the 17th International Multiconference Information Society, IS2014, Ljubljana, Slovenia (2014)
Machonis, P.A.: English phrasal verbs: from lexicon-grammar to natural language processing. South. J. Linguist. 34(1), 21–48 (2010)
Machonis P.A.: Sorting NooJ out to take multiword expressions into account. In: Vučković, K., Bekavac, B., Silberztein, M. (eds.) Formalising Natural Languages with NooJ: Selected Papers from the NooJ 2011 International Conference, Dubrovnik, Croatia, pp. 152–165. Cambridge Scholars Publishing, Newcastle (2012)
Matešić, J.: Frazeološki rječnik hrvatskoga ili srpskog jezika. Školska knjiga, Zagreb (1982)
Menac, A., Arsovski, F.Ž., Venturin, R.: Hrvatski frazeološki rječnik. Naklada Ljevak, Zagreb (2003)
Menac, A.: Hrvatska frazeologija. Knjigra, Zagreb (2007)
Menac-Mihalić, M.: Hrvatski dijalektni frazemi s antroponimom kao sastavnicom. In: Folia Onomastica Croatica, no. 12/13, pp. 361-385 (2007)
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)
Sakaeva, L.R., Nurullina, A.G.: Comparative analysis of verbal, adjectival, adverbial and modal phraseological units with a lexeme “devil” in english and russian languages. Middle-East J. Sci. Res. 18(1), 50–54 (2013). doi:10.5829/idosi.mejsr.2013.18.1.12354
Silberztein, M.: NooJ Manual (2003). www.nooj4nlp.net
Todorova M.: Morpho-syntactic properties of bulgarian verbal idiomatic expressions. In Blanco, X., Silberztein, M., (eds.) Proceedings of the 2007 International NooJ Conference, pp. 273–279. Cambridge Scholars Publishing, Newcastle (2008)
Wehrli, E.: Translating idioms. In: Proceedings of the 36th Annual Meeting of the ACL and 17th International Conference on Computational Linguistics: COLING/ACL-98, Montreal, Canada, pp. 1388–1392 (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Kocijan, K., Librenjak, S. (2016). Recognizing Verb-Based Croatian Idiomatic MWUs. In: Okrut, T., Hetsevich, Y., Silberztein, M., Stanislavenka, H. (eds) Automatic Processing of Natural-Language Electronic Texts with NooJ. NooJ 2015. Communications in Computer and Information Science, vol 607. Springer, Cham. https://doi.org/10.1007/978-3-319-42471-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-42471-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42470-5
Online ISBN: 978-3-319-42471-2
eBook Packages: Computer ScienceComputer Science (R0)