Advertisement

Informativeness of Inflective Noun Bigrams in Croatian

  • Damir Jurić
  • Marko Banek
  • Šandor Dembitz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7327)

Abstract

A feature of Croatian and other Slavic languages is a rich inflection system, which does not exist in English and other languages that traditionally dominate the scientific focus of computational linguistics. In this paper we present the results of the experiments conducted on the corpus of the Croatian online spellchecker Hascheck, which point to using non-nominative cases for discovering collocations between two nouns, specifically the first name and the family name of a person. We analyzed the frequencies and conditional probabilities of the morphemes corresponding to Croatian cases and quantified the level of attraction between two words using the normalized pointwise mutual information measure. Two components of a personal name are more likely to co-occur in any of the non-nominative cases than in nominative. Furthermore, given a component of a personal name, the conditional probability that it is accompanied with the other component of the name are higher for the genitive/accusative and instrumental case than for nominative.

Keywords

collocations declension named entity recognition semantics language technologies 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baroni, M., Evert, S.: Statistical Methods for Corpus Exploitation. In: [11], article 36Google Scholar
  2. 2.
    Bouma, G.: Normalized (Pointwise) Mutual Information in Collocation Extraction. In: Proc. GSCL Conf. 2009, pp. 31–40 (2009)Google Scholar
  3. 3.
    Church, K., Hanks, P.: Word Association Norms, Mutual Information and Lexicography. Computational Linguistics 16(1), 22–29 (1990)Google Scholar
  4. 4.
    Dembitz, Š., Randić, M., Gledec, G.: Advantages of Online Spellchecking: a Croatian Example. Software – Practice & Experience 41, 1203–1231 (2011)CrossRefGoogle Scholar
  5. 5.
    Evert, S.: Corpora and Collocations. In: [11], article 58Google Scholar
  6. 6.
    Fellbaum, C. (ed.): WordNet. An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  7. 7.
    Hascheck. Hrvatski Akademski Spelling Checker, http://hascheck.tel.fer.hr (retrieved December 11, 2011)
  8. 8.
    Jurafski, D., Martin, J.H.: Speech and Language Processing. Prentice Hall, Englewood Cliffs (2000)Google Scholar
  9. 9.
    Krstev, C., Vitas, D., Gucul, S.: Recognition of Personal Names in Serbian Texts. In: Proc. RANLP 2005, pp. 288–292 (2005) Google Scholar
  10. 10.
    Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Soviet Physics Doklady 10, 707–710 (1966)MathSciNetGoogle Scholar
  11. 11.
    Lüdeling, A., Kytö, M. (eds.): Corpus Linguistics. An International Handbook. Mouton de Gruyter, Berlin (2008)Google Scholar
  12. 12.
    Nadeau, D., Sekine, S.: A Survey of Named Entity Recognition and Classification. Linguisticae Investigationes 30(1), 3–26 (2007)CrossRefGoogle Scholar
  13. 13.
    Piskorski, J.: Named-Entity Recognition for Polish with SProUT. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds.) IMTCI 2004. LNCS (LNAI), vol. 3490, pp. 122–133. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Popov, B., Kirilov, A., Maynard, D., Manov, D.: Creation of Reusable Components and Language Resources for Named Entity Recognition in Russian. In: Proc. LREC 2004, pp. 309–312 (2004)Google Scholar
  15. 15.
    Sag, I.A., Baldwin, T., Bond, F., Copestake, A.A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Tsvetkov, Y., Wintner, S.: Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources. In: Proc. EMNLP 2011, pp. 836–845 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Damir Jurić
    • 1
  • Marko Banek
    • 1
  • Šandor Dembitz
    • 1
  1. 1.Faculty of Electrical Engineering and ComputingUniversity of ZagrebZagrebCroatia

Personalised recommendations