Skip to main content

Informativeness of Inflective Noun Bigrams in Croatian

  • Conference paper
Book cover Agent and Multi-Agent Systems. Technologies and Applications (KES-AMSTA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7327))

Abstract

A feature of Croatian and other Slavic languages is a rich inflection system, which does not exist in English and other languages that traditionally dominate the scientific focus of computational linguistics. In this paper we present the results of the experiments conducted on the corpus of the Croatian online spellchecker Hascheck, which point to using non-nominative cases for discovering collocations between two nouns, specifically the first name and the family name of a person. We analyzed the frequencies and conditional probabilities of the morphemes corresponding to Croatian cases and quantified the level of attraction between two words using the normalized pointwise mutual information measure. Two components of a personal name are more likely to co-occur in any of the non-nominative cases than in nominative. Furthermore, given a component of a personal name, the conditional probability that it is accompanied with the other component of the name are higher for the genitive/accusative and instrumental case than for nominative.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baroni, M., Evert, S.: Statistical Methods for Corpus Exploitation. In: [11], article 36

    Google Scholar 

  2. Bouma, G.: Normalized (Pointwise) Mutual Information in Collocation Extraction. In: Proc. GSCL Conf. 2009, pp. 31–40 (2009)

    Google Scholar 

  3. Church, K., Hanks, P.: Word Association Norms, Mutual Information and Lexicography. Computational Linguistics 16(1), 22–29 (1990)

    Google Scholar 

  4. Dembitz, Š., Randić, M., Gledec, G.: Advantages of Online Spellchecking: a Croatian Example. Software – Practice & Experience 41, 1203–1231 (2011)

    Article  Google Scholar 

  5. Evert, S.: Corpora and Collocations. In: [11], article 58

    Google Scholar 

  6. Fellbaum, C. (ed.): WordNet. An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  7. Hascheck. Hrvatski Akademski Spelling Checker, http://hascheck.tel.fer.hr (retrieved December 11, 2011)

  8. Jurafski, D., Martin, J.H.: Speech and Language Processing. Prentice Hall, Englewood Cliffs (2000)

    Google Scholar 

  9. Krstev, C., Vitas, D., Gucul, S.: Recognition of Personal Names in Serbian Texts. In: Proc. RANLP 2005, pp. 288–292 (2005)

    Google Scholar 

  10. Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Soviet Physics Doklady 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  11. Lüdeling, A., Kytö, M. (eds.): Corpus Linguistics. An International Handbook. Mouton de Gruyter, Berlin (2008)

    Google Scholar 

  12. Nadeau, D., Sekine, S.: A Survey of Named Entity Recognition and Classification. Linguisticae Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  13. Piskorski, J.: Named-Entity Recognition for Polish with SProUT. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds.) IMTCI 2004. LNCS (LNAI), vol. 3490, pp. 122–133. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Popov, B., Kirilov, A., Maynard, D., Manov, D.: Creation of Reusable Components and Language Resources for Named Entity Recognition in Russian. In: Proc. LREC 2004, pp. 309–312 (2004)

    Google Scholar 

  15. Sag, I.A., Baldwin, T., Bond, F., Copestake, A.A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  16. Tsvetkov, Y., Wintner, S.: Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources. In: Proc. EMNLP 2011, pp. 836–845 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jurić, D., Banek, M., Dembitz, Š. (2012). Informativeness of Inflective Noun Bigrams in Croatian. In: Jezic, G., Kusek, M., Nguyen, NT., Howlett, R.J., Jain, L.C. (eds) Agent and Multi-Agent Systems. Technologies and Applications. KES-AMSTA 2012. Lecture Notes in Computer Science(), vol 7327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30947-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30947-2_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30946-5

  • Online ISBN: 978-3-642-30947-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics