Abstract
This paper presents a multimedia corpus of the Yiddish language that was created by the authors. The first version of the corpus has 10 hours of audio and video materials synchronized with a transcript. For the main corpus, a search platform and web interface using the NoSQL database, the Django web framework and a number of modules in the JavaScript language have been created. The web interface makes it possible to make lexicogrammatical queries and browse the results that are presented as transcripts that are highlighted synchronously with the played-back multimedia material. We describe the differences between the multimedia corpus of the Yiddish language and similar multimedia corpuses and the advantages of the created query platform.
This is a preview of subscription content,
to check access.Similar content being viewed by others
References
Kiryanov, D.P., Luchina, Ye.S., Panova, T.A., and Tagavileva, M.G., The Corpus of the Yiddish Language, in Tirosh—trudy po yudaike (Tirosh—Works on the Judaica), 2014, no. 14, pp. 78–90.
Grishina, Ye. A., Multimedia Russian Corpus (MURCO): problems in annotation, in Natsional’nyi korpus russkogo yazyka: 2006–2008. Novyie rezul’taty i perspektivy (National Corpus of the Russian Language: 2006–2008. New Results and Prospects), St. Petersburg: Nestor-Istoria, 2009, pp. 175–214.
MacWhinney, B., From CHILDES to TalkBank, in Research on Child Language Acquisition, Almgren, M., Barreña, A., Ezeizaberrena, M., Idiazabal, I., and MacWhinney, B., Eds., Somerville: Cascadilla, 2001, pp. 17–34.
Anderson, J., Beavan, D., and Kay C., SCOTS: The Scottish corpus of texts and speech, in Creating and Digitizing Language Corpus, Beal, J., Corrigan, K., and Moisl, H., Eds. vol. 1: Synchronic Databases, Basingstoke: Palgrave Macmillan, 2007, pp. 17–34.
Neidle, C., Vogler, C. A New Web Interface to Facilitate Access to Corpus: Development of the ASLLRP Data access interface, in The 5-th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon, LREC 2012, Istanbul, Turkey, May 27, 2012.
Mayumi, B., Kouhei, K., Cibulka, P., and Yutaka, O., A colloquial corpus of Japanese sign language: linguistic resources for observing sign language conversations, in Proc. of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, 2014.
Kohn, K., The BACKBONE Project: Pedagogic Corpus for Content and Language Integrated Learning. Objectives, Methodological Approach and Outcomes, Eurocall Rev., vol. 20, No. 2, 2012.
Kachinskaya, I.B., Korpus dialektnykh tekstov v natsional’nom korpuse russkogo yazyka: sostoyaniye i perspektivy (National Corpus of Dialect Texts in the National Corpus of the Russian Language: State and Prospects), Lexical Atlas of Russian Folk Talks (Materials and Research), St. Petersburg, 2009, pp. 57–68.
Letuchii, A.B., Corpus of dialect texts: tasks and problems, National Corpus of the Russian Language: 2003–2005, Moscow: Indrik, 2005, pp. 215–232.
von Waldenfels R., Daniel M., and Dobrushina, N. Why Standard Orthography? Building the Ustya River Basin Corpus, an Online Corpus of a Russian Dialect, Materialy` ezhegodnoi mezhdunarodnoi konferentsii “Dialog” (Materials of Annual International Conference “Dialog”). no. 13 (20), Moscow: RGGU, 2014.
Schaechter, M., Fun Folkshprakh tsu Kulturshprakh (The History of the Standardized Yiddish Spelling), New York: YIVO, 1999.
Muysken, P., Bilingual speech: A typology of code-mixing, Cambridge: Cambridge University Press, 2000.
Davies, M., The advantage of using relational databases for large corpus: speed, advanced queries and unlimited annotation, Int. J. Corp. Ling., 2005, vol. 10, no. 3, pp. 307–334.
Daniel, M.A., Polyakov, A. Ye., Rubakov, S.V., Levonyan, D.V., Plungyan, V.A., and Khurshudyan, V.G., East-Armenian national corpus, Armenian Hum. Vest., 2009, no. 2/3-II, pp. 9–33.
Evert, S. and Hardie, A., Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium, Alegria, I., Leturia, I. and Sharoff, S., Eds., Proc. of the 5th Web as Corpus Workshop (WAC5), San Sebastian, Spain, 2011.
Abroskin, A.A., Search by corpus, Natl. korpus russ. Yaz.: 2006–2008. Nov. res. i perspekt. (Russian National Corpus: 2006–2008. New Results and Prospects), St. Petersburg: Nestor-Istoriya, 2009, pp. 277–282.
Sliwkanich T. et al. Towards scalable summarization and visualization of large text corpus in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, 2012, p. 863.
Niekler, A., Wiedemann, G., and Heyer, G., Leipzig Corpus Miner — A Text Mining Infrastructure for Qualitative Data Analysis in Terminology and Knowledge Engineering 2014, 2014.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © T.A. Arkhangel’skii, O.A. Sozinova, 2015, published in Nauchno-Tekhnicheskaya Informatsiya, Seriya 2, 2015, No. 3, pp. 18–24.
About this article
Cite this article
Arkhangel’skii, T.A., Sozinova, O.A. A multimedia corpus of the Yiddish language. Autom. Doc. Math. Linguist. 49, 47–53 (2015). https://doi.org/10.3103/S0005105515020028
Received:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0005105515020028