The Corpus of American Danish: a language resource of spoken immigrant Danish in North and South America

  • Karoline KühlEmail author
  • Jan Heegård Petersen
  • Gert Foget Hansen
Project Notes


This paper describes the ‘Corpus of American Danish’ (CoAmDa), a newly established corpus of spoken immigrant Danish in North and South America. The CoAmDa amounts to approx. 1.7 million tokens, making it one of the largest corpora of heritage language at present. With regard to text type, the CoAmDa is a non-standard multilingual spoken language resource as Danish is mixed with American English, Canadian English or Argentine Spanish, respectively, in every recording. The aim of this note is to document relevant aspects and specifications of the CoAmDA, viz. the audio data, the sociodemographic metadata of the speakers, the digitization process of analog data, the transcription procedures, the format and tagging of the speech files and the internal validation procedures. In so doing, we wish to share our experience and best practices with regard to achieving a spoken language resource of high quality with the interested public, in particular other researchers working on and with multilingual speech corpora.


Corpus documentation Spoken language resource Validation procedures Heritage language Danish Multilingual spoken language Language contact 



This paper has been written within the framework of the research project ‘Danish Voices in the Americas’ (University of Copenhagen, 2014–2018), funded by the A.P Møller and Hustru Chastine Mc-Kinney Møller Fond til Almeene Formaal, the Carlsberg Foundation as a Semper Ardens project and the Faculty of Humanities at the University of Copenhagen. We wish to thank Professor emer. Inger Kjær for her generous donation of the recordings collected by Iver Kjær (1938–2002) and Mogens Baumann Larsen (1930–2001), Professor Christopher Hale (University of Alberta) for the donation of his recordings from New Denmark (Canada), Professor Tore Kristiansen for the contributions of his recordings from Solvang, California, and Anne Nesser, formerly Aarhus University, for the donation of the recordings from the DANA-project.


  1. Bakker, P., Heegård Petersen, J., & Kühl, K. (forthc.). De nye hjem. In Hjorth, E., Jacobsen, B., Galberg Jacobsen, H., Jørgensen, B., & Jørgensen, M. K. (eds.) Dansk Sproghistorie, vol. 5. København: Det Danske Sprog- og Litteraturselskab.Google Scholar
  2. Bjerg, M. (1993). Living where the world ends: Danish settlements in the Argentine pampa. A brief analysis of ethnic leadership. In B. Flemming L, Bender, H., & Veien, K. (Eds.), On distant shores. Proceedings of the Marcus Lee Hansen Conference (pp. 157–174). Aalborg: Danes Worldwide Archives.Google Scholar
  3. Bjerg, M. (2000). A tale of two settlements: Danish Immigrants on the American Prairie and the Argentine Pampa 1860–1930. The Annals of Iowa 59 (Winter 2000), 1–34.Google Scholar
  4. Boas, H. C., Pierce, M., Weilbacher, H., Roesch, K., & Halder, G. (2010). The Texas German dialect archive. A multimedia resource for research, teaching, and outreach. Journal of Germanic Linguistics, 22(3), 277–296. Scholar
  5. Boas, H. C., & Weilbacher, H. (2006). Documenting Diaspora Experiences: The Texas German Dialect Archive. Proceedings of the Waterloo Conference on Diaspora Experiences.Google Scholar
  6. Bouwsema, K. (2009). Danes in Alberta 1903-1939. A dynamic culture in an ‘invisible’ ethnic group (Master thesis). University of Calgary, Calgary, Alberta. Department of History.Google Scholar
  7. Copeland, P. (2008). Manual of analogue sound restoration techniques. London: The British Library.Google Scholar
  8. Gregersen, F., Maegaard, M., & Pharao, N. (2014). The LANCHART Corpus. In J. Durand, U. Gut, & G. Kristoffersen (Eds.), The Oxford handbook of corpus phonology (pp. 534–545). Oxford: Oxford.Google Scholar
  9. Grøngaard Jeppesen, T. (2005). Danske i USA 1850–2000. En demografisk, social og kulturgeografisk undersøgelse af de danske immigranter og deres efterkommere. Odense: Odense Bys Museer.Google Scholar
  10. Grøngaard Jeppesen, T. (2011). Scandinavian descendants in the United States. Ethnic groups or core Americans?. Odense: Odense Bys Museer.Google Scholar
  11. Hansen, N. (2016). En snert af dansk mellem urskov og pampa. Det danske i to danskerkolonier i Argentina. Edited by Heegård Petersen, J., & Kühl, K. Accessed 15 Feb 2019.
  12. Hansen, G. F., Kühl, K., & Heegård Petersen, J. (2018). Kan nordamerikadansk beskrives som en varietet af dansk? In T. K. Christensen, T. Juel Jensen, C. Fogtmann Fosgerau, Karrebæk, M. Maegaard, N. Pharao, & P. Quist (Eds.), Dansk i det 20. århundrede (pp. 121–134). Copenhagen: U Press.Google Scholar
  13. Heegård Petersen, J., & Albris, J. (2018). Argentinadansk: De dansktalende samfund i Argentina. Mål og Mæle, 39(1), 8–16.Google Scholar
  14. Heegård Petersen, J., & Kühl, K. (2017). Argentinadansk: Semantiske, syntaktiske og morfologiske forskelle til rigsdansk. NyS, 52–53, 231–258. Scholar
  15. Heegård Petersen, J., Thøgersen, J., & Hansen, G. F. (2019a). Correlations between linguistic change and linguistic performance among heritage speakers of Danish in Argentina. Linguistic Approaches to Bilingualism. Scholar
  16. Heegård Petersen, J., Thøgersen, J., Hansen, G. F., & Kühl, K. (2019b). Linguistic proficiency: A quantitative approach to immigrant and heritage speakers of Danish. Corpus Linguistics and Linguistic Theory. Scholar
  17. Johannessen, J.B. (2015). The Corpus of American Norwegian Speech (CANS). In Megyesi, B. (Ed.) Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania (pp. 296–300). NEALT Proceedings Series.Google Scholar
  18. Kjær, I., & Baumann Larsen, M. (1978). Problems and observations of American-Danish. In Weinstock, J. (Ed.), The Nordic Languages and Modern Linguistics 3. Proceedings of the Third International Conference of Nordic and General Linguistics (pp. 189–191). Austin.Google Scholar
  19. Kjær, I., & Baumann Larsen, M. (1992). The spoken Danish language in the U.S. From interaction to recollection. In Flemming Larsen, B. & Bender, H. (Ed.), Danish emigration to the U.S.A. (pp. 106–123). Aalborg: Danes Worldwide Archives.Google Scholar
  20. Kristiansen, T., Harwood, J., & Giles, H. (1991). Ethnolinguistic vitality in ‘the Danish capital of America’. Journal of Multilingual and Multicultural Development, 2(6), 421–448.CrossRefGoogle Scholar
  21. Kühl, K. (2019). New Denmark, Canada: An exceptional case of language maintenance in a Danish immigrant settlement. Journal of Historical Sociolinguistics, 5(1), 1–30. Scholar
  22. Kühl, K., Heegård Petersen, J., Hansen, G. F., & Gregersen, F. (2017). CoAmDa. Et nyt dansk talesprogskorpus. Danske talesprog, 131–160.Google Scholar
  23. Transcriber. A tool for segmenting, labeling and transcribing speech.
  24. van den Heuvel, H., Iskra, D., Sanders, E., & de Vriend, F. (2008). Validation of spoken language resources: An overview of basic aspects. Language Resources and Evaluation, 42, 41–73. Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.Department of Nordic Studies and LinguisticsUniversity of CopenhagenCopenhagenDenmark

Personalised recommendations