A multimedia corpus of the Yiddish language

  • T. A. Arkhangel’skiiEmail author
  • O. A. Sozinova


This paper presents a multimedia corpus of the Yiddish language that was created by the authors. The first version of the corpus has 10 hours of audio and video materials synchronized with a transcript. For the main corpus, a search platform and web interface using the NoSQL database, the Django web framework and a number of modules in the JavaScript language have been created. The web interface makes it possible to make lexicogrammatical queries and browse the results that are presented as transcripts that are highlighted synchronously with the played-back multimedia material. We describe the differences between the multimedia corpus of the Yiddish language and similar multimedia corpuses and the advantages of the created query platform.


Yiddish multimedia corpus documentation of languages 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kiryanov, D.P., Luchina, Ye.S., Panova, T.A., and Tagavileva, M.G., The Corpus of the Yiddish Language, in Tirosh—trudy po yudaike (Tirosh—Works on the Judaica), 2014, no. 14, pp. 78–90.Google Scholar
  2. 2.
    Grishina, Ye. A., Multimedia Russian Corpus (MURCO): problems in annotation, in Natsional’nyi korpus russkogo yazyka: 2006–2008. Novyie rezul’taty i perspektivy (National Corpus of the Russian Language: 2006–2008. New Results and Prospects), St. Petersburg: Nestor-Istoria, 2009, pp. 175–214.Google Scholar
  3. 3.
    MacWhinney, B., From CHILDES to TalkBank, in Research on Child Language Acquisition, Almgren, M., Barreña, A., Ezeizaberrena, M., Idiazabal, I., and MacWhinney, B., Eds., Somerville: Cascadilla, 2001, pp. 17–34.Google Scholar
  4. 4.
    Anderson, J., Beavan, D., and Kay C., SCOTS: The Scottish corpus of texts and speech, in Creating and Digitizing Language Corpus, Beal, J., Corrigan, K., and Moisl, H., Eds. vol. 1: Synchronic Databases, Basingstoke: Palgrave Macmillan, 2007, pp. 17–34.Google Scholar
  5. 5.
    Neidle, C., Vogler, C. A New Web Interface to Facilitate Access to Corpus: Development of the ASLLRP Data access interface, in The 5-th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon, LREC 2012, Istanbul, Turkey, May 27, 2012.Google Scholar
  6. 6.
    Mayumi, B., Kouhei, K., Cibulka, P., and Yutaka, O., A colloquial corpus of Japanese sign language: linguistic resources for observing sign language conversations, in Proc. of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, 2014.Google Scholar
  7. 7.
    Kohn, K., The BACKBONE Project: Pedagogic Corpus for Content and Language Integrated Learning. Objectives, Methodological Approach and Outcomes, Eurocall Rev., vol. 20, No. 2, 2012.Google Scholar
  8. 8.
    Kachinskaya, I.B., Korpus dialektnykh tekstov v natsional’nom korpuse russkogo yazyka: sostoyaniye i perspektivy (National Corpus of Dialect Texts in the National Corpus of the Russian Language: State and Prospects), Lexical Atlas of Russian Folk Talks (Materials and Research), St. Petersburg, 2009, pp. 57–68.Google Scholar
  9. 9.
    Letuchii, A.B., Corpus of dialect texts: tasks and problems, National Corpus of the Russian Language: 2003–2005, Moscow: Indrik, 2005, pp. 215–232.Google Scholar
  10. 10.
    von Waldenfels R., Daniel M., and Dobrushina, N. Why Standard Orthography? Building the Ustya River Basin Corpus, an Online Corpus of a Russian Dialect, Materialy` ezhegodnoi mezhdunarodnoi konferentsii “Dialog” (Materials of Annual International Conference “Dialog”). no. 13 (20), Moscow: RGGU, 2014.Google Scholar
  11. 11.
    Schaechter, M., Fun Folkshprakh tsu Kulturshprakh (The History of the Standardized Yiddish Spelling), New York: YIVO, 1999.Google Scholar
  12. 12.
    Muysken, P., Bilingual speech: A typology of code-mixing, Cambridge: Cambridge University Press, 2000.Google Scholar
  13. 13.
    Davies, M., The advantage of using relational databases for large corpus: speed, advanced queries and unlimited annotation, Int. J. Corp. Ling., 2005, vol. 10, no. 3, pp. 307–334.CrossRefGoogle Scholar
  14. 14.
    Daniel, M.A., Polyakov, A. Ye., Rubakov, S.V., Levonyan, D.V., Plungyan, V.A., and Khurshudyan, V.G., East-Armenian national corpus, Armenian Hum. Vest., 2009, no. 2/3-II, pp. 9–33.Google Scholar
  15. 15.
    Evert, S. and Hardie, A., Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium, Alegria, I., Leturia, I. and Sharoff, S., Eds., Proc. of the 5th Web as Corpus Workshop (WAC5), San Sebastian, Spain, 2011.Google Scholar
  16. 16.
    Abroskin, A.A., Search by corpus, Natl. korpus russ. Yaz.: 2006–2008. Nov. res. i perspekt. (Russian National Corpus: 2006–2008. New Results and Prospects), St. Petersburg: Nestor-Istoriya, 2009, pp. 277–282.Google Scholar
  17. 17.
    Sliwkanich T. et al. Towards scalable summarization and visualization of large text corpus in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, 2012, p. 863.CrossRefGoogle Scholar
  18. 18.
    Niekler, A., Wiedemann, G., and Heyer, G., Leipzig Corpus Miner — A Text Mining Infrastructure for Qualitative Data Analysis in Terminology and Knowledge Engineering 2014, 2014.Google Scholar

Copyright information

© Allerton Press, Inc. 2015

Authors and Affiliations

  1. 1.Higher School of EconomicsNational Research UniversityMoscowRussia

Personalised recommendations