Abstract
Literature review on prosody reveals the lack of corpora for prosodic studies in Catalan and Spanish. In this paper, we present a corpus intended to fill this gap. The corpus comprises two distinct data-sets, a news subcorpus and a dialogue subcorpus, the latter containing either conversational or task-oriented speech. More than 25 h were recorded by twenty eight speakers per language. Among these speakers, eight were professional (four radio news broadcasters and four advertising actors). The entire material presented here has been transcribed, aligned with the acoustic signal and prosodically annotated. Two major objectives have guided the design of this project: (i) to offer a wide coverage of representative real-life communicative situations which allow for the characterization of prosody in these two languages; and (ii) to conduct research studies which enable us to contrast the speakers different speaking styles and discursive practices. All material contained in the corpus is provided under a Creative Commons Attribution 3.0 Unported License.
Similar content being viewed by others
References
Adell, J., Escudero, D., & Bonafonte, A. (2012). Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence. Speech Communication, 54(3), 459–476.
Albelda Marco, M. (2005). Sistemas de transcripción de los corpus orales del español. In M. Carrió (Ed.), Perspectivas interdisciplinares de la linguística aplicada, vol 2. Asociación Española de Lingüística Aplicada, pp. 381–388.
Anderson, A., Bader, M., Bard, E., Boyle, E., Doherty, G. M., Garrod, S., et al. (1991). The hrc map task corpus. Language and Speech, 24, 351–366.
Beckman, M., Hirschberg, J., & Shattuck-Hufnagel, S. (2005). The original ToBI system and the evolution of the ToBI framework. In S. A. Jun (Ed.), Prosodic typology: The phonology of intonation and phrasing (pp. 9–54). New York: Oxford University Press.
Boersma, P., & Weenink, D. (2012). Praat: Doing phonetics by computer [computer program]. version 5.3.09, retrieved 10 march 2012 from http://www.praat.org.
Botinis A., Granstrom, B., & Mobius, B. (2001). Developments and paradigms in intonation research. Speech Communication, 33(4), 263–296.
Campione, E., & Veronis, J. (1998). Multext: A multilingual prosodic database. In Proceedings of ICSLP 98, vol. 7 (pp. 3163–3166).
Cresti, E., & Moneglia, M. (2005). C‐ORAL‐ROM. integrated reference corpora for spoken romance languages. John Benjamins Studies in Corpus Linguistics 15.
de-la-Mota, C., & Rodero, E. (2011) La entonación en la información radiofónica. In El Estudio de la prosodia en España en el siglo XXI: perspectivas y ámbitos (pp. 159–176). Annex de Quaderns de Filologia, Facultat de Filologia, Universitat de València.
Escudero, D., & Cardeñoso Payo, V. (2007). Applying data mining techniques to corpus based prosodic modeling. Speech Communication 49(3), 213–229.
Escudero, D., Cardeñoso, V., & Bonafonte, A. (2002). Corpus based extraction of quantitative prosodic parameters of stress groups in Spanish. In Proceedings of ICASSP 2002, vol. 1 (pp. 481–484).
Escudero, D., Aguilar, L., Bonafonte, A., & Garrido, J. (2009). On the definition of a prosodically balanced copus: Combining greedy algorithms with expert guided manipulation. Revista de la Sociedad Española de Procesamiento del Lenguaje Natural, 43, 93–102.
Escudero, D., Cardeñoso, V., Vivaracho, C., Aguilar, L., de-la-Mota, C., Garrido, J., et al. (2010a). Proyecto glissando: Grabación de corpus prosódico de noticias y diálogos en español. Tech. Rep. IT-DI-2010-3, Departamento de Informática, Universidad de Valladolid.
Escudero, D., Garrido, J., Aguilar, L., Bonafonte, A., González, C., & Rodero, E. (2010b). Glissando project: Bilingual Spanish and Catalan corpus radio news text contents selection. Tech. Rep. IT-DI-2010-2, Departamento de Informática, Universidad de Valladolid.
Escudero, D., Gonzalez-Ferreras, C., Garrido, J. M., Rodero, E., Aguilar, L., & Bonafonte, A. (2010c). Combining greedy algorithms with expert guided manipulation for the definition of a balanced prosodic Spanish–Catalan radio news corpus. In Proceedings of Speech Prosody 2010.
Escudero, D., Aguilar, L., Ferreras, C. G., Vivaracho-Pascual, C., & Cardeñoso-Payo, V. (2011a). Cross-lingual English Spanish tonal accent labeling using decision trees and neural networks. In C. M. Travieso-González & J. B. A. Hernández (Eds.), Advances in nonlinear speech processing—5th international conference on nonlinear speech processing, NOLISP 2011, Las Palmas de Gran Canaria, Spain, November 7–9, 2011. Proceedings, Springer, Lecture Notes in Computer Science, vol. 7015, (pp. 63–70).
Escudero, D., Vivaracho-Pascual, C., González-Ferreras, C., Cardeñoso-Payo, V., & Aguilar, L. (2011b). Analysis of inconsistencies in cross-lingual automatic tobi tonal accent labeling. In I. Habernal & V. Matousek (Eds.), Text, speech and dialogue—14th International conference, TSD 2011, Pilsen, Czech Republic, September 1–5, 2011. Proceedings, Springer, Lecture Notes in Computer Science, vol. 6836 (pp. 41–48).
Escudero, D., Aguilar, L., Vanrell, M., & Prieto, P. (2012). Analysis of inter-transcriber consistency in the Cat_ToBI prosodic labeling system. Speech Communication, 54, 566–582
Eskénazi, M. (1993). Trends in speaking styles research. In Proceedings of Eurospeech 1993, vol. 1, pp. 501–505.
Fernández, A. (2005). Aspectos generales acerca del proyecto internacional AMPER en España. Estudios de Fonética Experimental, XIV, 13–27.
Font, D. (2006). Corpus oral de parla espontània. Gràfics i arxius de veu. Biblioteca Phonica 4.
Garrido, J. (1996). Modelling Spanish intonation for text-to-speech applications. PhD thesis.
Garrido, J. (2010). A tool for automatic F0 stylisation, annotation and modelling of large corpora. In Speech Prosody 2010, Chicago.
Garrido, J., & Rustullet, S. (2011). Patrones melódicos en el habla de diálogo en español: un primer análisis del corpus Glissando. Oralia: Análisis del discurso oral, 14, 129–160.
Garrido, J., Bofias, E., Laplaza, Y., Marquina, M., Aylett, M., & Ch, P. (2008). The Cerevoice speech synthesiser. In Actas de las V Jornadas de Tecnología del Habla (Bilbao).
Gonzalez-Ferreras, C., Escudero‐Mancebo, D., Vivaracho‐Pascual, C., & Cardenoso-Payo, V. (2012). Improving automatic classification of prosodic events by pairwise coupling. Audio, Speech, and Language Processing, IEEE Transactions on, 20(7), 2045–2058.
Hirschberg, J. (2000). A corpus-based approach to the study of speaking style. In H. Horne (Ed.), Prosody: Theory and experiments. Studies presented to Gosta Bruce (pp. 335–350). Berlin: Kluwer Academic Publishers.
Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing: A guide to theory, algorithm and system development. Prentice Hall PTR.
Maekawa, K. (2003). Corpus of spontaneous Japanese: Its design and evaluation. In Proceeding of ISCA and IEEE workshop on spontaneous speech processing and recognition, pp. 7–12.
Maekawa, K., Koiso, H., Furui, S., & Isahara, H. (2000). Spontaneous speech corpus of Japanese. In Proceeding of the 2nd international conference of language resources and evaluation, Vol. 2 (pp. 947–952).
McAllister, J., Sotillo, C., Bard, E., & Anderson, A. (1990). Using the Map Task to investigate variability in speech. Occasional paper.
Nagorski, A., Boves, L., & Steeneken, H. (2002). Optimal selection of speech data for automatic speech recognition systems. In ICSLP, pp. 2473–2476.
Ostendorf, M., Price, P., & Shattuck, S. (1995). The Boston University Radio News Corpus. Tech. rep., Boston University.
Payrató, L., & Fitó, J. (2005). Corpus audiovisual plurilingüe. Tech. Rep. 35, Universitat de Barcelona.
Penny, R. (2000). Variation and change in Spanish. Cambridge: Cambridge University Press.
Pitt, M., Johnson, K., Hume, E., Kiesling, S., & Raymon, W. (2005). The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability. Speech Communication, 45, 89–95.
Pitt, M., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., et al. (2007). Buckeye corpus of conversational speech (2nd release). [http://www.buckeyecorpus.osu.edu]. Columbus, OH: Department of Psychology, Ohio State University (distributor).
Prieto, P., & Cabré, T. (2010). (coords.). The interactive atlas of Catalan intonation. http://prosodia.upf.edu/atlesentonacio/index-english.html.
Rodero, E. (2006). Analysis of intonation in news presentation on television. In Proceedings of ExLing.
Rodero, E. (2007). Characterization of a proper news presentation in the audiovisuals messages. Estudios del mensaje periodístico, pp. 523–542.
van Santen, J. P., & Buchsbaum, A. L. (1997). Methods for optimal text selection. In Proceedings of Eurospeech 1997, pp. 553–556.
Sperber-McQueen, C., & Burnard, L. (1994) Guidelines for electronic text encoding and interchange. Chicago and Oxford: Text Encoding Initiative.
Strangert, E., & Gustafson, J. (2008). What makes a good speaker? Subject ratings, acoustic measurements and perceptual evaluations. In INTERSPEECH’08, pp. 1688–1691.
Taylor, P. (2009). Text-to-speech synthesis. Cambridge: Cambridge University Press.
Xu, Y. (2001). Speech prosody: A methodological review. Journal of Speech Sciences, 1(1), 85–115.
Acknowledgments
This work has been partly supported by the National R&D&I Plan of the Spanish Government (FFI2008-04982-C03-01/FILO, FFI2008-04982-C03-02/FILO and FFI2008-04982-C03-03/FILO projects).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Garrido, J.M., Escudero, D., Aguilar, L. et al. Glissando: a corpus for multidisciplinary prosodic studies in Spanish and Catalan. Lang Resources & Evaluation 47, 945–971 (2013). https://doi.org/10.1007/s10579-012-9213-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-012-9213-0