Morphosyntactic Tagging of Old Icelandic Texts and Its Use in Studying Syntactic Variation and Change

Conference paper
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

We describe experiments with morphosyntactic tagging of Old Icelandic (Old Norse) narrative texts using different tagging models for the TnT tagger [3] and a tagset of almost 700 tags, originally developed for Modern Icelandic. It is shown that by using a model that has been trained on both Old and Modern Icelandic texts, we can get 92.7% tagging accuracy which is considerably better than the 90.4% that have been reported for Modern Icelandic. Although our tagging is morphological in nature, the tags carry a substantial amount of syntactic information and the tagging is detailed enough for the syntactic function of words to be more or less deduced from their morphology and the adjacent words. We show that the morphosyntactic tags can be very useful in locating certain syntactic constructions and features in a large corpus of Old Icelandic narrative texts. We demonstrate this by searching for—and finding—previously undiscovered examples of a number of syntactic constructions in the corpus.We conclude that in a highly inflectional language, a morphologically tagged corpus can be an important tool in studying syntactic variation and change, in the absence of a fully parsed corpus which of course gives more possibilities.

Keywords

morphosyntactic tagging bootstrap training Old Icelandic syntactic variation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

This project was partly supported by grants from the University of Iceland Research Fund to the projects “The syntactic use of Old Icelandic POS tagged texts” and “Icelandic Diachronic Treebank” and by a grant from the Icelandic Research Fund to the project “Viable Language Technology Beyond English”. Thanks to the Text Laboratory at the University of Oslo for giving us access to the Glossa system and especially to Anders Nøklestad for valuable assistance. Thanks are also due to three anonymous reviewers who made many valuable comments on a previous version of this paper.

References

  1. 1.
    Benediktsson, H.: The Old Norse passive: Some observations. In: E. Hovdhaugen (ed.) The Nordic Languages and Modern Linguistics 4, pp. 108–119. Universitetsforlaget, Oslo (1980)Google Scholar
  2. 2.
    Bjarnadóttir, K.: The Icelandic mu-tbl experiment:preparing the corpus (2002). Paper presented at NLP1 final session, January 9. GSLT, VäxjöGoogle Scholar
  3. 3.
    Brants, T.: TnT—a statistical part-of-speech tagger. In: Proceedings of the 6th Applied NLP Conference, ANLP-2000, pp. 224–231. Seattle (2000)Google Scholar
  4. 4.
    Christ, O.: A modular and flexible architecture for an integrated corpus query system. In: Proceedings of COMPLEX’94. 3rd Conference on Computational Lexicography and Text Research, pp. 23–34. Budapest (1994)Google Scholar
  5. 5.
    Degnbol, H.: Hvad en ordbog behøver—og andre ønsker [what a dictionary needs—and others wish for]. In: The Sixth International Saga Conference. Workshop Papers I, pp. 235–254. Det Arnamagnæanske Institut, University of Copenhagen, Copenhagen (1995)Google Scholar
  6. 6.
    Dyvik, H.: Har gammelnorsk passiv? [does Old Norse have the passive?]. In: E. Hovdhaugen (ed.) The Nordic Languages and Modern Linguistics 4, pp. 82–107. Universitetsforlaget, Oslo (1980)Google Scholar
  7. 7.
    Faarlund, J.T.: Syntactic Change. Toward a Theory of Historical Syntax. Mouton, Berlin (1990)CrossRefGoogle Scholar
  8. 8.
    Faarlund, J.T.: The Syntax of Old Norse. Oxford University Press, Oxford (2004)Google Scholar
  9. 9.
    Halldórsson, B., Torfason, J., Tómasson, S., Thorsson, Ö. (eds.): Íslendinga sögur [The Icelandic Family Sagas]. Svart á hvítu (1985–86)Google Scholar
  10. 10.
    Haugan, J.: Old Norse word order and information structure. Ph.D. thesis, NTNU, Trondheim (2001)Google Scholar
  11. 11.
    Helgadóttir, S.: Testing data-driven learning algorithms for pos tagging of Icelandic. In: H. Holmboe (ed.) Nordisk Sprogteknologi. Årbog 2004, pp. 257–265. Museum Tusculanums Forlag, Copenhagen (2005)Google Scholar
  12. 12.
    Helgadóttir, S.: Mörkun íslensks texta [tagging Icelandic text]. OrÃř og tunga 9, 75–107 (2007)Google Scholar
  13. 13.
    Holmberg, A.: Word order and syntactic features in the Scandinavian languages and English. Ph.D. thesis, University of Stockholm, Stockholm (1986)Google Scholar
  14. 14.
    Holmberg, A., Platzack, C.: The Role of Inflection in the Syntax of Scandinavian Languages. Oxford University Press, Oxford (1995)Google Scholar
  15. 15.
    Hwa, R., Resnik, P., Weinberg, A., Cabezas, C., Kolak, O.: Bootstrapping parsers via syntactic projection across parallel texts. Natural Language Engineering 11(3), 311–325 (2005)CrossRefGoogle Scholar
  16. 16.
    Kristjánsdóttir, B., Halldórsson, B., SigurÃřsson, G., Grímsdóttir, G.Á., Ingólfsdóttir, G., Torfason, J., Tómasson, S., Thorsson, Ö. (eds.): Sturlunga saga [The Sturlunga Collection]. Svart á hvítu, Reykjavík (1988)Google Scholar
  17. 17.
    Kristjánsdóttir, B., Halldórsson, B., Torfason, J., Thorsson, Ö. (eds.): Heimskringla [The Sagas of the Kings of Norway]. Mál og menning, Reykjavík (1991)Google Scholar
  18. 18.
    Kroch, A., Santorini, B., Delfs, L.: Penn–Helsinki parsed corpus of Early Modern English. http://www.ling.upenn.edu/hist-corpora/PPCEME-RELEASE-1/ (2004)
  19. 19.
    Kroch, A., Taylor, A.: Penn-Helsinki parsed corpus of Middle English. http://www.ling.upenn.edu/hist-corpora/PPCME2-RELEASE-2/ (2000). Second edition
  20. 20.
    Kroch, A., Taylor, A.: Verb-object order in Early Middle English. In: S. Pintzuk, G. Tsoulas, A. Warner (eds.) Diachronic Syntax: Models and Mechanisms, pp. 132–163. Oxford University Press, Oxford (2001)Google Scholar
  21. 21.
    Kroch, A., Taylor, A., Ringe, D.: The Middle English verb-second constraint: a case study in language contact and language cange. In: S.C. Herring, P. van Reenen, L. Schøsler (eds.) Textual Parameters in Older Language, pp. 353–391. John Benjamins, Philadelphia (2000)Google Scholar
  22. 22.
    Loftsson, H., Kramarczyk, I., Helgadóttir, S., Rögnvaldsson, E.: Improving the POS tagging accuracy of Icelandic text. In: K. Jokinen, E. Bick (eds.) Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA-2009), pp. 103–110. Odense (2009)Google Scholar
  23. 23.
    Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn treebank. Computational Linguistics 19(2), 313–330 (1993)Google Scholar
  24. 24.
    Mason, L.: Object shift in Old Norse. Master’s thesis, University of York, York (1999)Google Scholar
  25. 25.
    Megyesi, B.: Data-driven syntactic analysis—methods and applications for Swedish. Ph.D. thesis, Department of Speech, Music and Hearing. KTH, Stockholm (2002)Google Scholar
  26. 26.
    Nygaard, L., Priestley, J., Nøklestad, A., Johannessen, J.B.: Glossa: A multilingual, multimodal, configurable user interface. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), pp. 617–621. European Language Resources Association (ELRA), Paris (2008)Google Scholar
  27. 27.
    Pind, J., Magnússon, F., Briem, S. (eds.): Íslensk orÃřtíÃřnibók [Icelandic Frequency Dictionary, IFD]. OrÃřabók Háskólans, Reykjavík (1991)Google Scholar
  28. 28.
    Rögnvaldsson, E.: OrÃřstöÃřulykill Íslendinga sagna [the concordance to the Icelandic family sagas]. Skáldskaparmál 1, 54–61 (1990)Google Scholar
  29. 29.
    Rögnvaldsson, E.: Old Icelandic: A non-configurational language? NOWELE 26, 3–29 (1995)Google Scholar
  30. 30.
    Rögnvaldsson, E.: Word order variation in the VP in Old Icelandic. Working Papers in Scandinavian Syntax 58, 55–86 (1996)Google Scholar
  31. 31.
    Rögnvaldsson, E.: SetningafræÃřilegar breytingar í íslensku. [syntactic changes in Icelandic.]. In: H. Thráinsson (ed.) Setningar. Handbók um setningafrÃęÃři [Sentences: A Handbook on Syntax], Íslensk tunga III, pp. 602–635. Almenna bókafélagiÃř, Reykjavík (2005)Google Scholar
  32. 32.
    Rögnvaldsson, E.: The corpus of spoken Icelandic and its morphosyntactic annotation. In: P.J. Henrichsen, P.R. Skadhauge (eds.) Treebanking for Discourse and Speech, Proceedings of the NODALIDA 2005 Special Session on Treebanks for Spoken Language and Discourse, Copenhagen Studies in Language 32, pp. 133–145. Samfundslitteratur, Copenhagen (2006)Google Scholar
  33. 33.
    Rögnvaldsson, E., Ingason, A.K., SigurÃřsson, E.F.: Coping with variation in the Icelandic diachronic treebank. Oslo Studies in Language (2011). Forthcoming.Google Scholar
  34. 34.
    Sundquist, J.D.: Object shift and HolmbergâĂŹs generalization. In: D. Lightfoot (ed.) Syntactic Effects of Morphological Change, pp. 326–347. Oxford University Press, Oxford (2002)CrossRefGoogle Scholar
  35. 35.
    Thráinsson, H.: The Syntax of Icelandic. Cambridge University Press, Cambridge (2007)CrossRefGoogle Scholar
  36. 36.
    Wallenberg, J., Ingason, A.K., SigurÃřsson, E.F., Rögnvaldsson, E.: Icelandic parsed historical corpus (IcePaHC). http://www.linguist.is/icelandic_treebank (2010). Version 0.2

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.University of IcelandReykjavíkIceland
  2. 2.Árni Magnússon InstituteReykjavíkIceland

Personalised recommendations