Abstract
We describe experiments with morphosyntactic tagging of Old Icelandic (Old Norse) narrative texts using different tagging models for the TnT tagger [3] and a tagset of almost 700 tags, originally developed for Modern Icelandic. It is shown that by using a model that has been trained on both Old and Modern Icelandic texts, we can get 92.7% tagging accuracy which is considerably better than the 90.4% that have been reported for Modern Icelandic. Although our tagging is morphological in nature, the tags carry a substantial amount of syntactic information and the tagging is detailed enough for the syntactic function of words to be more or less deduced from their morphology and the adjacent words. We show that the morphosyntactic tags can be very useful in locating certain syntactic constructions and features in a large corpus of Old Icelandic narrative texts. We demonstrate this by searching for—and finding—previously undiscovered examples of a number of syntactic constructions in the corpus.We conclude that in a highly inflectional language, a morphologically tagged corpus can be an important tool in studying syntactic variation and change, in the absence of a fully parsed corpus which of course gives more possibilities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Benediktsson, H.: The Old Norse passive: Some observations. In: E. Hovdhaugen (ed.) The Nordic Languages and Modern Linguistics 4, pp. 108–119. Universitetsforlaget, Oslo (1980)
Bjarnadóttir, K.: The Icelandic mu-tbl experiment:preparing the corpus (2002). Paper presented at NLP1 final session, January 9. GSLT, Växjö
Brants, T.: TnT—a statistical part-of-speech tagger. In: Proceedings of the 6th Applied NLP Conference, ANLP-2000, pp. 224–231. Seattle (2000)
Christ, O.: A modular and flexible architecture for an integrated corpus query system. In: Proceedings of COMPLEX’94. 3rd Conference on Computational Lexicography and Text Research, pp. 23–34. Budapest (1994)
Degnbol, H.: Hvad en ordbog behøver—og andre ønsker [what a dictionary needs—and others wish for]. In: The Sixth International Saga Conference. Workshop Papers I, pp. 235–254. Det Arnamagnæanske Institut, University of Copenhagen, Copenhagen (1995)
Dyvik, H.: Har gammelnorsk passiv? [does Old Norse have the passive?]. In: E. Hovdhaugen (ed.) The Nordic Languages and Modern Linguistics 4, pp. 82–107. Universitetsforlaget, Oslo (1980)
Faarlund, J.T.: Syntactic Change. Toward a Theory of Historical Syntax. Mouton, Berlin (1990)
Faarlund, J.T.: The Syntax of Old Norse. Oxford University Press, Oxford (2004)
Halldórsson, B., Torfason, J., Tómasson, S., Thorsson, Ö. (eds.): Íslendinga sögur [The Icelandic Family Sagas]. Svart á hvítu (1985–86)
Haugan, J.: Old Norse word order and information structure. Ph.D. thesis, NTNU, Trondheim (2001)
Helgadóttir, S.: Testing data-driven learning algorithms for pos tagging of Icelandic. In: H. Holmboe (ed.) Nordisk Sprogteknologi. Årbog 2004, pp. 257–265. Museum Tusculanums Forlag, Copenhagen (2005)
Helgadóttir, S.: Mörkun íslensks texta [tagging Icelandic text]. OrÃř og tunga 9, 75–107 (2007)
Holmberg, A.: Word order and syntactic features in the Scandinavian languages and English. Ph.D. thesis, University of Stockholm, Stockholm (1986)
Holmberg, A., Platzack, C.: The Role of Inflection in the Syntax of Scandinavian Languages. Oxford University Press, Oxford (1995)
Hwa, R., Resnik, P., Weinberg, A., Cabezas, C., Kolak, O.: Bootstrapping parsers via syntactic projection across parallel texts. Natural Language Engineering 11(3), 311–325 (2005)
Kristjánsdóttir, B., Halldórsson, B., SigurÃřsson, G., Grímsdóttir, G.Á., Ingólfsdóttir, G., Torfason, J., Tómasson, S., Thorsson, Ö. (eds.): Sturlunga saga [The Sturlunga Collection]. Svart á hvítu, Reykjavík (1988)
Kristjánsdóttir, B., Halldórsson, B., Torfason, J., Thorsson, Ö. (eds.): Heimskringla [The Sagas of the Kings of Norway]. Mál og menning, Reykjavík (1991)
Kroch, A., Santorini, B., Delfs, L.: Penn–Helsinki parsed corpus of Early Modern English. http://www.ling.upenn.edu/hist-corpora/PPCEME-RELEASE-1/ (2004)
Kroch, A., Taylor, A.: Penn-Helsinki parsed corpus of Middle English. http://www.ling.upenn.edu/hist-corpora/PPCME2-RELEASE-2/ (2000). Second edition
Kroch, A., Taylor, A.: Verb-object order in Early Middle English. In: S. Pintzuk, G. Tsoulas, A. Warner (eds.) Diachronic Syntax: Models and Mechanisms, pp. 132–163. Oxford University Press, Oxford (2001)
Kroch, A., Taylor, A., Ringe, D.: The Middle English verb-second constraint: a case study in language contact and language cange. In: S.C. Herring, P. van Reenen, L. Schøsler (eds.) Textual Parameters in Older Language, pp. 353–391. John Benjamins, Philadelphia (2000)
Loftsson, H., Kramarczyk, I., Helgadóttir, S., Rögnvaldsson, E.: Improving the POS tagging accuracy of Icelandic text. In: K. Jokinen, E. Bick (eds.) Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA-2009), pp. 103–110. Odense (2009)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn treebank. Computational Linguistics 19(2), 313–330 (1993)
Mason, L.: Object shift in Old Norse. Master’s thesis, University of York, York (1999)
Megyesi, B.: Data-driven syntactic analysis—methods and applications for Swedish. Ph.D. thesis, Department of Speech, Music and Hearing. KTH, Stockholm (2002)
Nygaard, L., Priestley, J., Nøklestad, A., Johannessen, J.B.: Glossa: A multilingual, multimodal, configurable user interface. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), pp. 617–621. European Language Resources Association (ELRA), Paris (2008)
Pind, J., Magnússon, F., Briem, S. (eds.): Íslensk orÃřtíÃřnibók [Icelandic Frequency Dictionary, IFD]. OrÃřabók Háskólans, Reykjavík (1991)
Rögnvaldsson, E.: OrÃřstöÃřulykill Íslendinga sagna [the concordance to the Icelandic family sagas]. Skáldskaparmál 1, 54–61 (1990)
Rögnvaldsson, E.: Old Icelandic: A non-configurational language? NOWELE 26, 3–29 (1995)
Rögnvaldsson, E.: Word order variation in the VP in Old Icelandic. Working Papers in Scandinavian Syntax 58, 55–86 (1996)
Rögnvaldsson, E.: SetningafræÃřilegar breytingar í íslensku. [syntactic changes in Icelandic.]. In: H. Thráinsson (ed.) Setningar. Handbók um setningafrÃęÃři [Sentences: A Handbook on Syntax], Íslensk tunga III, pp. 602–635. Almenna bókafélagiÃř, Reykjavík (2005)
Rögnvaldsson, E.: The corpus of spoken Icelandic and its morphosyntactic annotation. In: P.J. Henrichsen, P.R. Skadhauge (eds.) Treebanking for Discourse and Speech, Proceedings of the NODALIDA 2005 Special Session on Treebanks for Spoken Language and Discourse, Copenhagen Studies in Language 32, pp. 133–145. Samfundslitteratur, Copenhagen (2006)
Rögnvaldsson, E., Ingason, A.K., SigurÃřsson, E.F.: Coping with variation in the Icelandic diachronic treebank. Oslo Studies in Language (2011). Forthcoming.
Sundquist, J.D.: Object shift and HolmbergâĂŹs generalization. In: D. Lightfoot (ed.) Syntactic Effects of Morphological Change, pp. 326–347. Oxford University Press, Oxford (2002)
Thráinsson, H.: The Syntax of Icelandic. Cambridge University Press, Cambridge (2007)
Wallenberg, J., Ingason, A.K., SigurÃřsson, E.F., Rögnvaldsson, E.: Icelandic parsed historical corpus (IcePaHC). http://www.linguist.is/icelandic_treebank (2010). Version 0.2
Acknowledgements
This project was partly supported by grants from the University of Iceland Research Fund to the projects “The syntactic use of Old Icelandic POS tagged texts” and “Icelandic Diachronic Treebank” and by a grant from the Icelandic Research Fund to the project “Viable Language Technology Beyond English”. Thanks to the Text Laboratory at the University of Oslo for giving us access to the Glossa system and especially to Anders Nøklestad for valuable assistance. Thanks are also due to three anonymous reviewers who made many valuable comments on a previous version of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rögnvaldsson, E., Helgadóttir, S. (2011). Morphosyntactic Tagging of Old Icelandic Texts and Its Use in Studying Syntactic Variation and Change. In: Sporleder, C., van den Bosch, A., Zervanou, K. (eds) Language Technology for Cultural Heritage. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20227-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-20227-8_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20226-1
Online ISBN: 978-3-642-20227-8
eBook Packages: Computer ScienceComputer Science (R0)