Language Resources and Evaluation

, Volume 47, Issue 4, pp 945–971 | Cite as

Glissando: a corpus for multidisciplinary prosodic studies in Spanish and Catalan

  • Juan María Garrido
  • David Escudero
  • Lourdes Aguilar
  • Valentín Cardeñoso
  • Emma Rodero
  • Carme de-la-Mota
  • César González
  • Carlos Vivaracho
  • Sílvia Rustullet
  • Olatz Larrea
  • Yesika Laplaza
  • Francisco Vizcaíno
  • Eva Estebas
  • Mercedes Cabrera
  • Antonio Bonafonte
Original Paper

Abstract

Literature review on prosody reveals the lack of corpora for prosodic studies in Catalan and Spanish. In this paper, we present a corpus intended to fill this gap. The corpus comprises two distinct data-sets, a news subcorpus and a dialogue subcorpus, the latter containing either conversational or task-oriented speech. More than 25 h were recorded by twenty eight speakers per language. Among these speakers, eight were professional (four radio news broadcasters and four advertising actors). The entire material presented here has been transcribed, aligned with the acoustic signal and prosodically annotated. Two major objectives have guided the design of this project: (i) to offer a wide coverage of representative real-life communicative situations which allow for the characterization of prosody in these two languages; and (ii) to conduct research studies which enable us to contrast the speakers different speaking styles and discursive practices. All material contained in the corpus is provided under a Creative Commons Attribution 3.0 Unported License.

Keywords

Prosodic corpus Radio news corpus Dialogue corpus Spanish corpus Catalan corpus 

References

  1. Adell, J., Escudero, D., & Bonafonte, A. (2012). Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence. Speech Communication, 54(3), 459–476.CrossRefGoogle Scholar
  2. Albelda Marco, M. (2005). Sistemas de transcripción de los corpus orales del español. In M. Carrió (Ed.), Perspectivas interdisciplinares de la linguística aplicada, vol 2. Asociación Española de Lingüística Aplicada, pp. 381–388.Google Scholar
  3. Anderson, A., Bader, M., Bard, E., Boyle, E., Doherty, G. M., Garrod, S., et al. (1991). The hrc map task corpus. Language and Speech, 24, 351–366.Google Scholar
  4. Beckman, M., Hirschberg, J., & Shattuck-Hufnagel, S. (2005). The original ToBI system and the evolution of the ToBI framework. In S. A. Jun (Ed.), Prosodic typology: The phonology of intonation and phrasing (pp. 9–54). New York: Oxford University Press.CrossRefGoogle Scholar
  5. Boersma, P., & Weenink, D. (2012). Praat: Doing phonetics by computer [computer program]. version 5.3.09, retrieved 10 march 2012 from http://www.praat.org.
  6. Botinis A., Granstrom, B., & Mobius, B. (2001). Developments and paradigms in intonation research. Speech Communication, 33(4), 263–296.CrossRefGoogle Scholar
  7. Campione, E., & Veronis, J. (1998). Multext: A multilingual prosodic database. In Proceedings of ICSLP 98, vol. 7 (pp. 3163–3166).Google Scholar
  8. Cresti, E., & Moneglia, M. (2005). C‐ORAL‐ROM. integrated reference corpora for spoken romance languages. John Benjamins Studies in Corpus Linguistics 15.Google Scholar
  9. de-la-Mota, C., & Rodero, E. (2011) La entonación en la información radiofónica. In El Estudio de la prosodia en España en el siglo XXI: perspectivas y ámbitos (pp. 159–176). Annex de Quaderns de Filologia, Facultat de Filologia, Universitat de València.Google Scholar
  10. Escudero, D., & Cardeñoso Payo, V. (2007). Applying data mining techniques to corpus based prosodic modeling. Speech Communication 49(3), 213–229.CrossRefGoogle Scholar
  11. Escudero, D., Cardeñoso, V., & Bonafonte, A. (2002). Corpus based extraction of quantitative prosodic parameters of stress groups in Spanish. In Proceedings of ICASSP 2002, vol. 1 (pp. 481–484).Google Scholar
  12. Escudero, D., Aguilar, L., Bonafonte, A., & Garrido, J. (2009). On the definition of a prosodically balanced copus: Combining greedy algorithms with expert guided manipulation. Revista de la Sociedad Española de Procesamiento del Lenguaje Natural, 43, 93–102.Google Scholar
  13. Escudero, D., Cardeñoso, V., Vivaracho, C., Aguilar, L., de-la-Mota, C., Garrido, J., et al. (2010a). Proyecto glissando: Grabación de corpus prosódico de noticias y diálogos en español. Tech. Rep. IT-DI-2010-3, Departamento de Informática, Universidad de Valladolid.Google Scholar
  14. Escudero, D., Garrido, J., Aguilar, L., Bonafonte, A., González, C., & Rodero, E. (2010b). Glissando project: Bilingual Spanish and Catalan corpus radio news text contents selection. Tech. Rep. IT-DI-2010-2, Departamento de Informática, Universidad de Valladolid.Google Scholar
  15. Escudero, D., Gonzalez-Ferreras, C., Garrido, J. M., Rodero, E., Aguilar, L., & Bonafonte, A. (2010c). Combining greedy algorithms with expert guided manipulation for the definition of a balanced prosodic Spanish–Catalan radio news corpus. In Proceedings of Speech Prosody 2010.Google Scholar
  16. Escudero, D., Aguilar, L., Ferreras, C. G., Vivaracho-Pascual, C., & Cardeñoso-Payo, V. (2011a). Cross-lingual English Spanish tonal accent labeling using decision trees and neural networks. In C. M. Travieso-González & J. B. A. Hernández (Eds.), Advances in nonlinear speech processing—5th international conference on nonlinear speech processing, NOLISP 2011, Las Palmas de Gran Canaria, Spain, November 7–9, 2011. Proceedings, Springer, Lecture Notes in Computer Science, vol. 7015, (pp. 63–70).Google Scholar
  17. Escudero, D., Vivaracho-Pascual, C., González-Ferreras, C., Cardeñoso-Payo, V., & Aguilar, L. (2011b). Analysis of inconsistencies in cross-lingual automatic tobi tonal accent labeling. In I. Habernal & V. Matousek (Eds.), Text, speech and dialogue—14th International conference, TSD 2011, Pilsen, Czech Republic, September 1–5, 2011. Proceedings, Springer, Lecture Notes in Computer Science, vol. 6836 (pp. 41–48).Google Scholar
  18. Escudero, D., Aguilar, L., Vanrell, M., & Prieto, P. (2012). Analysis of inter-transcriber consistency in the Cat_ToBI prosodic labeling system. Speech Communication, 54, 566–582Google Scholar
  19. Eskénazi, M. (1993). Trends in speaking styles research. In Proceedings of Eurospeech 1993, vol. 1, pp. 501–505.Google Scholar
  20. Fernández, A. (2005). Aspectos generales acerca del proyecto internacional AMPER en España. Estudios de Fonética Experimental, XIV, 13–27.Google Scholar
  21. Font, D. (2006). Corpus oral de parla espontània. Gràfics i arxius de veu. Biblioteca Phonica 4.Google Scholar
  22. Garrido, J. (1996). Modelling Spanish intonation for text-to-speech applications. PhD thesis.Google Scholar
  23. Garrido, J. (2010). A tool for automatic F0 stylisation, annotation and modelling of large corpora. In Speech Prosody 2010, Chicago.Google Scholar
  24. Garrido, J., & Rustullet, S. (2011). Patrones melódicos en el habla de diálogo en español: un primer análisis del corpus Glissando. Oralia: Análisis del discurso oral, 14, 129–160.Google Scholar
  25. Garrido, J., Bofias, E., Laplaza, Y., Marquina, M., Aylett, M., & Ch, P. (2008). The Cerevoice speech synthesiser. In Actas de las V Jornadas de Tecnología del Habla (Bilbao).Google Scholar
  26. Gonzalez-Ferreras, C., Escudero‐Mancebo, D., Vivaracho‐Pascual, C., & Cardenoso-Payo, V. (2012). Improving automatic classification of prosodic events by pairwise coupling. Audio, Speech, and Language Processing, IEEE Transactions on, 20(7), 2045–2058.Google Scholar
  27. Hirschberg, J. (2000). A corpus-based approach to the study of speaking style. In H. Horne (Ed.), Prosody: Theory and experiments. Studies presented to Gosta Bruce (pp. 335–350). Berlin: Kluwer Academic Publishers.Google Scholar
  28. Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing: A guide to theory, algorithm and system development. Prentice Hall PTR.Google Scholar
  29. Maekawa, K. (2003). Corpus of spontaneous Japanese: Its design and evaluation. In Proceeding of ISCA and IEEE workshop on spontaneous speech processing and recognition, pp. 7–12.Google Scholar
  30. Maekawa, K., Koiso, H., Furui, S., & Isahara, H. (2000). Spontaneous speech corpus of Japanese. In Proceeding of the 2nd international conference of language resources and evaluation, Vol. 2 (pp. 947–952).Google Scholar
  31. McAllister, J., Sotillo, C., Bard, E., & Anderson, A. (1990). Using the Map Task to investigate variability in speech. Occasional paper.Google Scholar
  32. Nagorski, A., Boves, L., & Steeneken, H. (2002). Optimal selection of speech data for automatic speech recognition systems. In ICSLP, pp. 2473–2476.Google Scholar
  33. Ostendorf, M., Price, P., & Shattuck, S. (1995). The Boston University Radio News Corpus. Tech. rep., Boston University.Google Scholar
  34. Payrató, L., & Fitó, J. (2005). Corpus audiovisual plurilingüe. Tech. Rep. 35, Universitat de Barcelona.Google Scholar
  35. Penny, R. (2000). Variation and change in Spanish. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  36. Pitt, M., Johnson, K., Hume, E., Kiesling, S., & Raymon, W. (2005). The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability. Speech Communication, 45, 89–95.CrossRefGoogle Scholar
  37. Pitt, M., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., et al. (2007). Buckeye corpus of conversational speech (2nd release). [http://www.buckeyecorpus.osu.edu]. Columbus, OH: Department of Psychology, Ohio State University (distributor).
  38. Prieto, P., & Cabré, T. (2010). (coords.). The interactive atlas of Catalan intonation. http://prosodia.upf.edu/atlesentonacio/index-english.html.
  39. Rodero, E. (2006). Analysis of intonation in news presentation on television. In Proceedings of ExLing.Google Scholar
  40. Rodero, E. (2007). Characterization of a proper news presentation in the audiovisuals messages. Estudios del mensaje periodístico, pp. 523–542.Google Scholar
  41. van Santen, J. P., & Buchsbaum, A. L. (1997). Methods for optimal text selection. In Proceedings of Eurospeech 1997, pp. 553–556.Google Scholar
  42. Sperber-McQueen, C., & Burnard, L. (1994) Guidelines for electronic text encoding and interchange. Chicago and Oxford: Text Encoding Initiative.Google Scholar
  43. Strangert, E., & Gustafson, J. (2008). What makes a good speaker? Subject ratings, acoustic measurements and perceptual evaluations. In INTERSPEECH’08, pp. 1688–1691.Google Scholar
  44. Taylor, P. (2009). Text-to-speech synthesis. Cambridge: Cambridge University Press.Google Scholar
  45. Xu, Y. (2001). Speech prosody: A methodological review. Journal of Speech Sciences, 1(1), 85–115.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Juan María Garrido
    • 1
  • David Escudero
    • 2
  • Lourdes Aguilar
    • 3
  • Valentín Cardeñoso
    • 2
  • Emma Rodero
    • 4
  • Carme de-la-Mota
    • 3
  • César González
    • 2
  • Carlos Vivaracho
    • 2
  • Sílvia Rustullet
    • 1
  • Olatz Larrea
    • 4
  • Yesika Laplaza
    • 1
  • Francisco Vizcaíno
    • 5
  • Eva Estebas
    • 6
  • Mercedes Cabrera
    • 5
  • Antonio Bonafonte
    • 7
  1. 1.Computational Linguistics Group (GLiCom), Department of Translation and Language SciencesUniversitat Pompeu FabraBarcelonaSpain
  2. 2.Department of Computer SciencesUniversidad de ValladolidValladolidSpain
  3. 3.Department of Spanish PhilologyUniversitat Autònoma de BarcelonaBarcelonaSpain
  4. 4.Department of CommunicationUniversitat Pompeu FabraBarcelonaSpain
  5. 5.Department of Modern LanguagesUniversidad de las Palmas de Gran CanariaLas Palmas de Gran CanariaSpain
  6. 6.Department of Modern LanguagesUniversidad Nacional de Educación a DistanciaMadridSpain
  7. 7.Department of Signal Theory and CommunicationsUniversitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations