Skip to main content

Morphosyntactic Tagging of Old Icelandic Texts and Its Use in Studying Syntactic Variation and Change

  • Conference paper
  • First Online:
Language Technology for Cultural Heritage

Abstract

We describe experiments with morphosyntactic tagging of Old Icelandic (Old Norse) narrative texts using different tagging models for the TnT tagger [3] and a tagset of almost 700 tags, originally developed for Modern Icelandic. It is shown that by using a model that has been trained on both Old and Modern Icelandic texts, we can get 92.7% tagging accuracy which is considerably better than the 90.4% that have been reported for Modern Icelandic. Although our tagging is morphological in nature, the tags carry a substantial amount of syntactic information and the tagging is detailed enough for the syntactic function of words to be more or less deduced from their morphology and the adjacent words. We show that the morphosyntactic tags can be very useful in locating certain syntactic constructions and features in a large corpus of Old Icelandic narrative texts. We demonstrate this by searching for—and finding—previously undiscovered examples of a number of syntactic constructions in the corpus.We conclude that in a highly inflectional language, a morphologically tagged corpus can be an important tool in studying syntactic variation and change, in the absence of a fully parsed corpus which of course gives more possibilities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Benediktsson, H.: The Old Norse passive: Some observations. In: E. Hovdhaugen (ed.) The Nordic Languages and Modern Linguistics 4, pp. 108–119. Universitetsforlaget, Oslo (1980)

    Google Scholar 

  2. Bjarnadóttir, K.: The Icelandic mu-tbl experiment:preparing the corpus (2002). Paper presented at NLP1 final session, January 9. GSLT, Växjö

    Google Scholar 

  3. Brants, T.: TnT—a statistical part-of-speech tagger. In: Proceedings of the 6th Applied NLP Conference, ANLP-2000, pp. 224–231. Seattle (2000)

    Google Scholar 

  4. Christ, O.: A modular and flexible architecture for an integrated corpus query system. In: Proceedings of COMPLEX’94. 3rd Conference on Computational Lexicography and Text Research, pp. 23–34. Budapest (1994)

    Google Scholar 

  5. Degnbol, H.: Hvad en ordbog behøver—og andre ønsker [what a dictionary needs—and others wish for]. In: The Sixth International Saga Conference. Workshop Papers I, pp. 235–254. Det Arnamagnæanske Institut, University of Copenhagen, Copenhagen (1995)

    Google Scholar 

  6. Dyvik, H.: Har gammelnorsk passiv? [does Old Norse have the passive?]. In: E. Hovdhaugen (ed.) The Nordic Languages and Modern Linguistics 4, pp. 82–107. Universitetsforlaget, Oslo (1980)

    Google Scholar 

  7. Faarlund, J.T.: Syntactic Change. Toward a Theory of Historical Syntax. Mouton, Berlin (1990)

    Book  Google Scholar 

  8. Faarlund, J.T.: The Syntax of Old Norse. Oxford University Press, Oxford (2004)

    Google Scholar 

  9. Halldórsson, B., Torfason, J., Tómasson, S., Thorsson, Ö. (eds.): Íslendinga sögur [The Icelandic Family Sagas]. Svart á hvítu (1985–86)

    Google Scholar 

  10. Haugan, J.: Old Norse word order and information structure. Ph.D. thesis, NTNU, Trondheim (2001)

    Google Scholar 

  11. Helgadóttir, S.: Testing data-driven learning algorithms for pos tagging of Icelandic. In: H. Holmboe (ed.) Nordisk Sprogteknologi. Årbog 2004, pp. 257–265. Museum Tusculanums Forlag, Copenhagen (2005)

    Google Scholar 

  12. Helgadóttir, S.: Mörkun íslensks texta [tagging Icelandic text]. OrÃř og tunga 9, 75–107 (2007)

    Google Scholar 

  13. Holmberg, A.: Word order and syntactic features in the Scandinavian languages and English. Ph.D. thesis, University of Stockholm, Stockholm (1986)

    Google Scholar 

  14. Holmberg, A., Platzack, C.: The Role of Inflection in the Syntax of Scandinavian Languages. Oxford University Press, Oxford (1995)

    Google Scholar 

  15. Hwa, R., Resnik, P., Weinberg, A., Cabezas, C., Kolak, O.: Bootstrapping parsers via syntactic projection across parallel texts. Natural Language Engineering 11(3), 311–325 (2005)

    Article  Google Scholar 

  16. Kristjánsdóttir, B., Halldórsson, B., SigurÃřsson, G., Grímsdóttir, G.Á., Ingólfsdóttir, G., Torfason, J., Tómasson, S., Thorsson, Ö. (eds.): Sturlunga saga [The Sturlunga Collection]. Svart á hvítu, Reykjavík (1988)

    Google Scholar 

  17. Kristjánsdóttir, B., Halldórsson, B., Torfason, J., Thorsson, Ö. (eds.): Heimskringla [The Sagas of the Kings of Norway]. Mál og menning, Reykjavík (1991)

    Google Scholar 

  18. Kroch, A., Santorini, B., Delfs, L.: Penn–Helsinki parsed corpus of Early Modern English. http://www.ling.upenn.edu/hist-corpora/PPCEME-RELEASE-1/ (2004)

  19. Kroch, A., Taylor, A.: Penn-Helsinki parsed corpus of Middle English. http://www.ling.upenn.edu/hist-corpora/PPCME2-RELEASE-2/ (2000). Second edition

  20. Kroch, A., Taylor, A.: Verb-object order in Early Middle English. In: S. Pintzuk, G. Tsoulas, A. Warner (eds.) Diachronic Syntax: Models and Mechanisms, pp. 132–163. Oxford University Press, Oxford (2001)

    Google Scholar 

  21. Kroch, A., Taylor, A., Ringe, D.: The Middle English verb-second constraint: a case study in language contact and language cange. In: S.C. Herring, P. van Reenen, L. Schøsler (eds.) Textual Parameters in Older Language, pp. 353–391. John Benjamins, Philadelphia (2000)

    Google Scholar 

  22. Loftsson, H., Kramarczyk, I., Helgadóttir, S., Rögnvaldsson, E.: Improving the POS tagging accuracy of Icelandic text. In: K. Jokinen, E. Bick (eds.) Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA-2009), pp. 103–110. Odense (2009)

    Google Scholar 

  23. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn treebank. Computational Linguistics 19(2), 313–330 (1993)

    Google Scholar 

  24. Mason, L.: Object shift in Old Norse. Master’s thesis, University of York, York (1999)

    Google Scholar 

  25. Megyesi, B.: Data-driven syntactic analysis—methods and applications for Swedish. Ph.D. thesis, Department of Speech, Music and Hearing. KTH, Stockholm (2002)

    Google Scholar 

  26. Nygaard, L., Priestley, J., Nøklestad, A., Johannessen, J.B.: Glossa: A multilingual, multimodal, configurable user interface. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), pp. 617–621. European Language Resources Association (ELRA), Paris (2008)

    Google Scholar 

  27. Pind, J., Magnússon, F., Briem, S. (eds.): Íslensk orÃřtíÃřnibók [Icelandic Frequency Dictionary, IFD]. OrÃřabók Háskólans, Reykjavík (1991)

    Google Scholar 

  28. Rögnvaldsson, E.: OrÃřstöÃřulykill Íslendinga sagna [the concordance to the Icelandic family sagas]. Skáldskaparmál 1, 54–61 (1990)

    Google Scholar 

  29. Rögnvaldsson, E.: Old Icelandic: A non-configurational language? NOWELE 26, 3–29 (1995)

    Google Scholar 

  30. Rögnvaldsson, E.: Word order variation in the VP in Old Icelandic. Working Papers in Scandinavian Syntax 58, 55–86 (1996)

    Google Scholar 

  31. Rögnvaldsson, E.: SetningafræÃřilegar breytingar í íslensku. [syntactic changes in Icelandic.]. In: H. Thráinsson (ed.) Setningar. Handbók um setningafrÃęÃři [Sentences: A Handbook on Syntax], Íslensk tunga III, pp. 602–635. Almenna bókafélagiÃř, Reykjavík (2005)

    Google Scholar 

  32. Rögnvaldsson, E.: The corpus of spoken Icelandic and its morphosyntactic annotation. In: P.J. Henrichsen, P.R. Skadhauge (eds.) Treebanking for Discourse and Speech, Proceedings of the NODALIDA 2005 Special Session on Treebanks for Spoken Language and Discourse, Copenhagen Studies in Language 32, pp. 133–145. Samfundslitteratur, Copenhagen (2006)

    Google Scholar 

  33. Rögnvaldsson, E., Ingason, A.K., SigurÃřsson, E.F.: Coping with variation in the Icelandic diachronic treebank. Oslo Studies in Language (2011). Forthcoming.

    Google Scholar 

  34. Sundquist, J.D.: Object shift and HolmbergâĂŹs generalization. In: D. Lightfoot (ed.) Syntactic Effects of Morphological Change, pp. 326–347. Oxford University Press, Oxford (2002)

    Chapter  Google Scholar 

  35. Thráinsson, H.: The Syntax of Icelandic. Cambridge University Press, Cambridge (2007)

    Book  Google Scholar 

  36. Wallenberg, J., Ingason, A.K., SigurÃřsson, E.F., Rögnvaldsson, E.: Icelandic parsed historical corpus (IcePaHC). http://www.linguist.is/icelandic_treebank (2010). Version 0.2

Download references

Acknowledgements

This project was partly supported by grants from the University of Iceland Research Fund to the projects “The syntactic use of Old Icelandic POS tagged texts” and “Icelandic Diachronic Treebank” and by a grant from the Icelandic Research Fund to the project “Viable Language Technology Beyond English”. Thanks to the Text Laboratory at the University of Oslo for giving us access to the Glossa system and especially to Anders Nøklestad for valuable assistance. Thanks are also due to three anonymous reviewers who made many valuable comments on a previous version of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eiríkur Rögnvaldsson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rögnvaldsson, E., Helgadóttir, S. (2011). Morphosyntactic Tagging of Old Icelandic Texts and Its Use in Studying Syntactic Variation and Change. In: Sporleder, C., van den Bosch, A., Zervanou, K. (eds) Language Technology for Cultural Heritage. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20227-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20227-8_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20226-1

  • Online ISBN: 978-3-642-20227-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics