Abstract
A feature of Croatian and other Slavic languages is a rich inflection system, which does not exist in English and other languages that traditionally dominate the scientific focus of computational linguistics. In this paper we present the results of the experiments conducted on the corpus of the Croatian online spellchecker Hascheck, which point to using non-nominative cases for discovering collocations between two nouns, specifically the first name and the family name of a person. We analyzed the frequencies and conditional probabilities of the morphemes corresponding to Croatian cases and quantified the level of attraction between two words using the normalized pointwise mutual information measure. Two components of a personal name are more likely to co-occur in any of the non-nominative cases than in nominative. Furthermore, given a component of a personal name, the conditional probability that it is accompanied with the other component of the name are higher for the genitive/accusative and instrumental case than for nominative.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baroni, M., Evert, S.: Statistical Methods for Corpus Exploitation. In: [11], article 36
Bouma, G.: Normalized (Pointwise) Mutual Information in Collocation Extraction. In: Proc. GSCL Conf. 2009, pp. 31–40 (2009)
Church, K., Hanks, P.: Word Association Norms, Mutual Information and Lexicography. Computational Linguistics 16(1), 22–29 (1990)
Dembitz, Š., Randić, M., Gledec, G.: Advantages of Online Spellchecking: a Croatian Example. Software – Practice & Experience 41, 1203–1231 (2011)
Evert, S.: Corpora and Collocations. In: [11], article 58
Fellbaum, C. (ed.): WordNet. An Electronic Lexical Database. MIT Press, Cambridge (1998)
Hascheck. Hrvatski Akademski Spelling Checker, http://hascheck.tel.fer.hr (retrieved December 11, 2011)
Jurafski, D., Martin, J.H.: Speech and Language Processing. Prentice Hall, Englewood Cliffs (2000)
Krstev, C., Vitas, D., Gucul, S.: Recognition of Personal Names in Serbian Texts. In: Proc. RANLP 2005, pp. 288–292 (2005)
Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Soviet Physics Doklady 10, 707–710 (1966)
Lüdeling, A., Kytö, M. (eds.): Corpus Linguistics. An International Handbook. Mouton de Gruyter, Berlin (2008)
Nadeau, D., Sekine, S.: A Survey of Named Entity Recognition and Classification. Linguisticae Investigationes 30(1), 3–26 (2007)
Piskorski, J.: Named-Entity Recognition for Polish with SProUT. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds.) IMTCI 2004. LNCS (LNAI), vol. 3490, pp. 122–133. Springer, Heidelberg (2005)
Popov, B., Kirilov, A., Maynard, D., Manov, D.: Creation of Reusable Components and Language Resources for Named Entity Recognition in Russian. In: Proc. LREC 2004, pp. 309–312 (2004)
Sag, I.A., Baldwin, T., Bond, F., Copestake, A.A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)
Tsvetkov, Y., Wintner, S.: Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources. In: Proc. EMNLP 2011, pp. 836–845 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jurić, D., Banek, M., Dembitz, Š. (2012). Informativeness of Inflective Noun Bigrams in Croatian. In: Jezic, G., Kusek, M., Nguyen, NT., Howlett, R.J., Jain, L.C. (eds) Agent and Multi-Agent Systems. Technologies and Applications. KES-AMSTA 2012. Lecture Notes in Computer Science(), vol 7327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30947-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-30947-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30946-5
Online ISBN: 978-3-642-30947-2
eBook Packages: Computer ScienceComputer Science (R0)