A multimedia corpus of the Yiddish language

Abstract

This paper presents a multimedia corpus of the Yiddish language that was created by the authors. The first version of the corpus has 10 hours of audio and video materials synchronized with a transcript. For the main corpus, a search platform and web interface using the NoSQL database, the Django web framework and a number of modules in the JavaScript language have been created. The web interface makes it possible to make lexicogrammatical queries and browse the results that are presented as transcripts that are highlighted synchronously with the played-back multimedia material. We describe the differences between the multimedia corpus of the Yiddish language and similar multimedia corpuses and the advantages of the created query platform.

This is a preview of subscription content, access via your institution.

References

  1. 1.

    Kiryanov, D.P., Luchina, Ye.S., Panova, T.A., and Tagavileva, M.G., The Corpus of the Yiddish Language, in Tirosh—trudy po yudaike (Tirosh—Works on the Judaica), 2014, no. 14, pp. 78–90.

    Google Scholar 

  2. 2.

    Grishina, Ye. A., Multimedia Russian Corpus (MURCO): problems in annotation, in Natsional’nyi korpus russkogo yazyka: 2006–2008. Novyie rezul’taty i perspektivy (National Corpus of the Russian Language: 2006–2008. New Results and Prospects), St. Petersburg: Nestor-Istoria, 2009, pp. 175–214.

    Google Scholar 

  3. 3.

    MacWhinney, B., From CHILDES to TalkBank, in Research on Child Language Acquisition, Almgren, M., Barreña, A., Ezeizaberrena, M., Idiazabal, I., and MacWhinney, B., Eds., Somerville: Cascadilla, 2001, pp. 17–34.

    Google Scholar 

  4. 4.

    Anderson, J., Beavan, D., and Kay C., SCOTS: The Scottish corpus of texts and speech, in Creating and Digitizing Language Corpus, Beal, J., Corrigan, K., and Moisl, H., Eds. vol. 1: Synchronic Databases, Basingstoke: Palgrave Macmillan, 2007, pp. 17–34.

    Google Scholar 

  5. 5.

    Neidle, C., Vogler, C. A New Web Interface to Facilitate Access to Corpus: Development of the ASLLRP Data access interface, in The 5-th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon, LREC 2012, Istanbul, Turkey, May 27, 2012.

  6. 6.

    Mayumi, B., Kouhei, K., Cibulka, P., and Yutaka, O., A colloquial corpus of Japanese sign language: linguistic resources for observing sign language conversations, in Proc. of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, 2014.

    Google Scholar 

  7. 7.

    Kohn, K., The BACKBONE Project: Pedagogic Corpus for Content and Language Integrated Learning. Objectives, Methodological Approach and Outcomes, Eurocall Rev., vol. 20, No. 2, 2012.

  8. 8.

    Kachinskaya, I.B., Korpus dialektnykh tekstov v natsional’nom korpuse russkogo yazyka: sostoyaniye i perspektivy (National Corpus of Dialect Texts in the National Corpus of the Russian Language: State and Prospects), Lexical Atlas of Russian Folk Talks (Materials and Research), St. Petersburg, 2009, pp. 57–68.

    Google Scholar 

  9. 9.

    Letuchii, A.B., Corpus of dialect texts: tasks and problems, National Corpus of the Russian Language: 2003–2005, Moscow: Indrik, 2005, pp. 215–232.

    Google Scholar 

  10. 10.

    von Waldenfels R., Daniel M., and Dobrushina, N. Why Standard Orthography? Building the Ustya River Basin Corpus, an Online Corpus of a Russian Dialect, Materialy` ezhegodnoi mezhdunarodnoi konferentsii “Dialog” (Materials of Annual International Conference “Dialog”). no. 13 (20), Moscow: RGGU, 2014.

  11. 11.

    Schaechter, M., Fun Folkshprakh tsu Kulturshprakh (The History of the Standardized Yiddish Spelling), New York: YIVO, 1999.

    Google Scholar 

  12. 12.

    Muysken, P., Bilingual speech: A typology of code-mixing, Cambridge: Cambridge University Press, 2000.

    Google Scholar 

  13. 13.

    Davies, M., The advantage of using relational databases for large corpus: speed, advanced queries and unlimited annotation, Int. J. Corp. Ling., 2005, vol. 10, no. 3, pp. 307–334.

    Article  Google Scholar 

  14. 14.

    Daniel, M.A., Polyakov, A. Ye., Rubakov, S.V., Levonyan, D.V., Plungyan, V.A., and Khurshudyan, V.G., East-Armenian national corpus, Armenian Hum. Vest., 2009, no. 2/3-II, pp. 9–33.

    Google Scholar 

  15. 15.

    Evert, S. and Hardie, A., Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium, Alegria, I., Leturia, I. and Sharoff, S., Eds., Proc. of the 5th Web as Corpus Workshop (WAC5), San Sebastian, Spain, 2011.

    Google Scholar 

  16. 16.

    Abroskin, A.A., Search by corpus, Natl. korpus russ. Yaz.: 2006–2008. Nov. res. i perspekt. (Russian National Corpus: 2006–2008. New Results and Prospects), St. Petersburg: Nestor-Istoriya, 2009, pp. 277–282.

    Google Scholar 

  17. 17.

    Sliwkanich T. et al. Towards scalable summarization and visualization of large text corpus in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, 2012, p. 863.

    Google Scholar 

  18. 18.

    Niekler, A., Wiedemann, G., and Heyer, G., Leipzig Corpus Miner — A Text Mining Infrastructure for Qualitative Data Analysis in Terminology and Knowledge Engineering 2014, 2014.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to T. A. Arkhangel’skii.

Additional information

Original Russian Text © T.A. Arkhangel’skii, O.A. Sozinova, 2015, published in Nauchno-Tekhnicheskaya Informatsiya, Seriya 2, 2015, No. 3, pp. 18–24.

About this article

Verify currency and authenticity via CrossMark

Cite this article

Arkhangel’skii, T.A., Sozinova, O.A. A multimedia corpus of the Yiddish language. Autom. Doc. Math. Linguist. 49, 47–53 (2015). https://doi.org/10.3103/S0005105515020028

Download citation

Keywords

  • Yiddish
  • multimedia corpus
  • documentation of languages