Informativeness of Inflective Noun Bigrams in Croatian

Jurić, Damir; Banek, Marko; Dembitz, Šandor

doi:10.1007/978-3-642-30947-2_15

Damir Jurić²³,
Marko Banek²³ &
Šandor Dembitz²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7327))

Included in the following conference series:

KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications

2054 Accesses
3 Citations

Abstract

A feature of Croatian and other Slavic languages is a rich inflection system, which does not exist in English and other languages that traditionally dominate the scientific focus of computational linguistics. In this paper we present the results of the experiments conducted on the corpus of the Croatian online spellchecker Hascheck, which point to using non-nominative cases for discovering collocations between two nouns, specifically the first name and the family name of a person. We analyzed the frequencies and conditional probabilities of the morphemes corresponding to Croatian cases and quantified the level of attraction between two words using the normalized pointwise mutual information measure. Two components of a personal name are more likely to co-occur in any of the non-nominative cases than in nominative. Furthermore, given a component of a personal name, the conditional probability that it is accompanied with the other component of the name are higher for the genitive/accusative and instrumental case than for nominative.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baroni, M., Evert, S.: Statistical Methods for Corpus Exploitation. In: [11], article 36
Google Scholar
Bouma, G.: Normalized (Pointwise) Mutual Information in Collocation Extraction. In: Proc. GSCL Conf. 2009, pp. 31–40 (2009)
Google Scholar
Church, K., Hanks, P.: Word Association Norms, Mutual Information and Lexicography. Computational Linguistics 16(1), 22–29 (1990)
Google Scholar
Dembitz, Š., Randić, M., Gledec, G.: Advantages of Online Spellchecking: a Croatian Example. Software – Practice & Experience 41, 1203–1231 (2011)
Article Google Scholar
Evert, S.: Corpora and Collocations. In: [11], article 58
Google Scholar
Fellbaum, C. (ed.): WordNet. An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Hascheck. Hrvatski Akademski Spelling Checker, http://hascheck.tel.fer.hr (retrieved December 11, 2011)
Jurafski, D., Martin, J.H.: Speech and Language Processing. Prentice Hall, Englewood Cliffs (2000)
Google Scholar
Krstev, C., Vitas, D., Gucul, S.: Recognition of Personal Names in Serbian Texts. In: Proc. RANLP 2005, pp. 288–292 (2005)
Google Scholar
Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Soviet Physics Doklady 10, 707–710 (1966)
MathSciNet Google Scholar
Lüdeling, A., Kytö, M. (eds.): Corpus Linguistics. An International Handbook. Mouton de Gruyter, Berlin (2008)
Google Scholar
Nadeau, D., Sekine, S.: A Survey of Named Entity Recognition and Classification. Linguisticae Investigationes 30(1), 3–26 (2007)
Article Google Scholar
Piskorski, J.: Named-Entity Recognition for Polish with SProUT. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds.) IMTCI 2004. LNCS (LNAI), vol. 3490, pp. 122–133. Springer, Heidelberg (2005)
Chapter Google Scholar
Popov, B., Kirilov, A., Maynard, D., Manov, D.: Creation of Reusable Components and Language Resources for Named Entity Recognition in Russian. In: Proc. LREC 2004, pp. 309–312 (2004)
Google Scholar
Sag, I.A., Baldwin, T., Bond, F., Copestake, A.A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)
Chapter Google Scholar
Tsvetkov, Y., Wintner, S.: Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources. In: Proc. EMNLP 2011, pp. 836–845 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000, Zagreb, Croatia
Damir Jurić, Marko Banek & Šandor Dembitz

Authors

Damir Jurić
View author publications
You can also search for this author in PubMed Google Scholar
Marko Banek
View author publications
You can also search for this author in PubMed Google Scholar
Šandor Dembitz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000, Zagreb, Croatia
Gordan Jezic & Mario Kusek &
Institute of Informatics (I-32), Division of Knowledge Management Systems, Wroclaw University of Technology, Str. Wyb. Wyspianskiego 27, 50-370, Wroclaw, Poland
Ngoc-Thanh Nguyen
KES International, Shoreham-by-sea, P.O. Box 2115, BN43 9AF, UK
Robert J. Howlett
School of Electrical and Information Engineering, University of South Australia, Mawson Lakes Campus, 5095, Adelaide, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jurić, D., Banek, M., Dembitz, Š. (2012). Informativeness of Inflective Noun Bigrams in Croatian. In: Jezic, G., Kusek, M., Nguyen, NT., Howlett, R.J., Jain, L.C. (eds) Agent and Multi-Agent Systems. Technologies and Applications. KES-AMSTA 2012. Lecture Notes in Computer Science(), vol 7327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30947-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-30947-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30946-5
Online ISBN: 978-3-642-30947-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics