Advertisement

Language Resources and Evaluation

, Volume 50, Issue 4, pp 767–792 | Cite as

A tool for automatic transcription of intonation: Eti_ToBI a ToBI transcriber for Spanish and Catalan

  • Wendy Elvira-García
  • Paolo Roseano
  • Ana María Fernández-Planas
  • Eugenio Martínez-Celdrán
Original Paper

Abstract

This article presents Eti_ToBI, a tool that automatically labels intonational events in Spanish and Catalan utterances according to the Sp_ToBI and Cat_ToBI current conventions. The system consists in a Praat script that assigns ToBI labels to pitch movements basing the assignments on lexical data introduced by the researcher and the acoustical data that it extracts from sound files. The first part of the article explains the methodological approach that has made possible the automatisation and describes the algorithms used by the script to perform the analysis. The second part presents the reliability results for both Catalan and Spanish corpora showing a level of agreement equal to the one shown by human transcribers among them in the literature.

Keywords

Intonation Automatic intonation recognition Sp_ToBI Cat_ToBI 

Notes

Acknowledgments

This work has been funded by a grant awarded by the Spanish government FFI2012-35998 for the AMPER-CAT project and the predoctoral grant APIF-2012 of the University of Barcelona.

References

  1. Alessandro, C., & Mertens, P. (1995). Automatic pitch contour stylization using a model of tonal perception. Computer Speech and Language, 9(3), 257–288.CrossRefGoogle Scholar
  2. Beckman, M., Díaz-Campos, M., McGory, J. T., & Morgan, T. A. (2002). Intonation across Spanish, in the tones and break indices framework. Probus, 14, 9–36. doi: 10.1515/prbs.2002.008.CrossRefGoogle Scholar
  3. Beckman, M., & Elam, G. A. (1997). Guidelines for ToBI Labelling. The Ohio State University Research Foundation.Google Scholar
  4. Black, A. W., & Hunt, A. J. (1996). Generating F 0 contours from ToBI labels using linear regression. In ICSLP 96. Fourth International Conference on Spoken Language Proceedings (pp. 1385–1388). Philadelphia: IEEE. doi: 10.1109/ICSLP.1996.607872.
  5. Blum-Kulka, S. (1982). Learning to Say What You Mean in a Second Language: A Study of the Speech Act Performance of Learners of Hebrew as a Second Language1. Applied Linguistics, 3(1), 29–59. http://applij.oxfordjournals.org/content/III/1/29.short. Accessed January 21 2015.
  6. Boersma, P. (1993). Acurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In IFA Proceedings 17 (pp. 97–110). http://www.fon.hum.uva.nl/paul/papers/Proceedings_1993.pdf.
  7. Boersma, P., & Weenink, D. (2015). Praat: doing phonetics by computer. http://www.praat.org/.
  8. Borràs-Comes, J., Vanrell, M. del M., & Prieto, P. (2014). The role of pitch range in establishing intonational contrasts. Journal of the International Phonetic Association, 44(01), 1–20. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=9212002&fileId=S0025100313000303. Accessed April 7 2014.
  9. Breen, M., Dilley, L. C., Kraemer, J., & Gibson, E. (2012). Inter-transcriber reliability for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch). Corpus Linguistics and Linguistic Theory, 8(2), 277–312. http://www.isca-speech.org/archive_open/int_97/inta_259.html. Accessed November 17 2014.
  10. Campbell, N. (1996). Autolabelling Japanese ToBI. In ICSLP 96. Fourth International Congress on Conference on Language Processing Proceedings (Vol. 4, pp. 2399 – 2402). Philadelphia: IEEE. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=607292. Accessed September 3 2014.
  11. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.CrossRefGoogle Scholar
  12. Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220. http://psycnet.apa.org/journals/bul/70/4/213/. Accessed July 18 2014.
  13. Cohen, M. A., Grossberg, S., & Wyse, L. L. (1995). A spectral network model of pitch perception. The Journal of the Acoustical Society of America, 98(2 Pt 1), 862–79. http://www.ncbi.nlm.nih.gov/pubmed/7642825. Accessed July 1 2015.
  14. De Looze, C. (2010). Analyse et interprétation de l’empan temporel des variations prosodiques en français et en anglais. Aix-en-Provence. Retrieved from http://halshs.archives-ouvertes.fr/tel-00470641/.
  15. Dorta, J. (Ed.). (2013). Estudio comparativo preliminar de la entonación de Canarias, Cuba y Venezuela. Madrid-Sta Cruz de Tenerife: La Página ediciones.Google Scholar
  16. Escudero, D., Aguilar, L., Vanrell, M. del M., & Prieto, P. (2012). Analysis of inter-transcriber consistency in the Cat_ToBI prosodic labeling system. Speech Communication, 54(4), 566–582. http://www.sciencedirect.com/science/article/pii/S0167639311001749. Accessed April 7 2014.
  17. Escudero-Mancebo, D., González-Ferreras, C., Vivaracho-Pascual, C., & Cardeñoso-Payo, V. (2014). A fuzzy classifier to deal with similarity between labels on automatic prosodic labeling. In Computer Speech & Language (Vol. 28, pp. 326–341). doi: 10.1016/j.csl.2013.08.001.
  18. Estebas-Vilaplana, E., & Prieto, P. (2010). Castilian Spanish intonation (pp. 17–48). Lincom Europa, München: Transcription of Intonation of the Spanish Language.Google Scholar
  19. Face, T., & Prieto, P. (2007). Rising accents in Castilian Spanish: a revision of Sp-ToBI. Journal of Portuguese Linguistics, 6(1), 117.CrossRefGoogle Scholar
  20. Fernández Planas, A. M., & Martínez Celdrán, E. (2003). El tono fundamental y la duración: dos aspectos de la taxonomía prosódica en dos modalidades de habla (enunciativa e interrogativa) del español. Estudios de fonética experimental, 12, 166–200. http://www.raco.cat/index.php/EFE/article/viewArticle/140007/0. Accessed April 7 2014.
  21. Fernández Planas, A. M., Martínez Celdrán, E., Salcioli Guidi, V., Toledo, G., & Castellví Vives, J. (2002). Taxonomía autosegmental en la entonación del español peninsular. In Actas del II Congreso de Fonética Experimental (pp. 180–186). Sevilla.Google Scholar
  22. Fleiss, J. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382. doi: 10.1037/h0031619.CrossRefGoogle Scholar
  23. Frid, J. (1999). An environment for testing prosodic and phonetic transcriptions. In Proceedings of ICPhS 99 (pp. 2319–2322). San Francisco. http://lup.lub.lu.se/record/529087/file/1624474.pdf. Accessed September 3 2014.
  24. Garrido Almiñana, J. M. (2008, April 28). Modelling Spanish Intonation for Text-to-Speech Applications. Universitat Autònoma de Barcelona. http://www.tdx.cat/handle/10803/4885. Accessed July 3 2014.
  25. GraphPad. (2014). QuickCalcs. http://graphpad.com/quickcalcs/kappa1/. Accessed January 6 2014.
  26. Hart, J. t’, & Collier, R. (1975). Integrating Different Levels of Intonation Analysis. Journal of Phonetics, 3(4), 235–255. http://eric.ed.gov/?id=EJ127873. Accessed September 2 2014.
  27. Hermes, D. (1988). Measurement of pitch by subharmonic summation. The journal of the acoustical society of America, 83(1), 257–264. http://scitation.aip.org/content/asa/journal/jasa/83/1/10.1121/1.396427. Accessed July 16 2015.
  28. Hirst, D. (2011). The analysis by synthesis of speech melody: from data to models. Journal of Speech Sciences, 1(1), 55–83. http://www.journalofspeechsciences.org/index.php/journalofspeechsciences/article/viewArticle/21.
  29. Hirst, D., Di Cristo, A., & Espesser, R. (2000). Levels of representation and levels of analysis for the description of intonation systems. Prosody: theory and experiment (pp. 51–88). Dordrecht: Kluwer.CrossRefGoogle Scholar
  30. Hirst, D., & Espesser, R. (1993). Automatic Modelling of Fundamental Frequency Using a Quadratic Spline Function. Travaux de l’Institut de Phonétique d’Aix-en-Provence, 75–85.Google Scholar
  31. Hualde, J. I. (2003). El modelo métrico y autosegmental. In P. Prieto (Ed.), Teorías de la entonación (pp. 155–181). Barcelona: Ariel.Google Scholar
  32. Jeng, F., Hu, J., Dickman, B., & Lin, C. (2011). Evaluation of two algorithms for detecting human frequency-following responses to voice pitch. International Journal of audiology, 50(1), 14–26. http://www.tandfonline.com/doi/abs/10.3109/14992027.2010.515620. Accessed September 16 2015.
  33. Jun, S.-A., Lee, S., Kim, K., & Lee, Y. (2010). Labeler agreement in transcribing korean intonation with K-ToBI. In Interspeech’10 (pp. 211–214). http://www.linguistics.ucla.edu/people/jun/ICSLP-KtobiAgree.pdf. Accessed December 6 2014.
  34. Kim, B., Lee, J., & Lee, G. (2002). Corpus-based Pitch Prediction based on K-ToBI Representation. In ACM Transactions on Asian Language Information Processing (TALIP) (Vol. 1, pp. 207–224). ACM New York, NY, USA. doi:10.1145/772755.772757.Google Scholar
  35. Kotnik, B., Höge, H., & Kačič, Z. (2009). Noise robust F0 determination and epoch-marking algorithms. Signal Processing, 89(12), 2555–2569. doi: 10.1016/j.sigpro.2009.04.017.CrossRefGoogle Scholar
  36. Ladd, D. R. (2008). Intonational phonology Cambridge (2nd ed., Vol. 2). New York: Cambridge University Press.CrossRefGoogle Scholar
  37. Lea, W. (1980). Prosodic aids to speech recognition. In W. Lea (Ed.), Trends in Speech Recognition (pp. 166–205). Englewood: Prentice-Hall.Google Scholar
  38. Lee, J., Kim, B., & Lee, G. (2002). Automatic corpus-based tone and break-index prediction using K-ToBI representation. ACM Transactions on Asian Language Information Processing (TALIP), 1(3), 207–224. doi: 10.1145/772755.772757.CrossRefGoogle Scholar
  39. Liu, M., Xu, B., Hunng, T., Deng, Y., & Li, C. (2000). Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling. In Proceedings of Acoustics, Speech, and Signal Processing, ICASSP 2000 (pp. 1025–1028). Washington, DC.Google Scholar
  40. Martínez Celdrán, E., & Fernández Planas, A. M. (2003). Taxonomía de las estructuras entonativas de las modalidades declarativa e interrogativa del español estándar peninsular estándar según el modelo AM en habla de laboratorio. In E. Herrera & P. Martín (Eds.), La tonía: dimensiones fonéticas y fonológicas (pp. 267–294). México D.F.: El Colegio de México.Google Scholar
  41. Noguchi, H., & Kiriyama, K. (1999). Automatic labeling of Japanese prosody using J-ToBI style description. In EUROSPEECH’99. Sixth European Conference on Speech Communication and Technology (pp. 2259–2262). http://20.210-193-52.unknown.qala.com.sg/archive/archive_papers/eurospeech_1999/e99_2259.pdf. Accessed September 3 2014.
  42. Nolan, F., & Grabe, E. (1997). Can “ToBI” Transcribe Intonational Variation in British English? In Intonation: Theory, Models and Applications (pp. 259–262). Athens, Greece. http://www.isca-speech.org/archive_open/int_97/inta_259.html. Accessed November 17 2014.
  43. Pamies, A., Fernández Planas, A. M., Martínez Celdrán, E., Ortega-Escandell, A., & Amorós Cespedes, M. C. (2002). Umbrales tonales en español peninsular. In Actas del II Congreso de Fonética Experimental (Vol. Sevilla, pp. 272–278).Google Scholar
  44. Pierrehumbert, J. (1980). The Phonology and Phonetics of English Intonation. Cambridge, Massachusetts: MIT.Google Scholar
  45. Pierrehumbert, J. (1983). Automatic recognition of intonation patterns. In Proceedings of the 21st annual meeting on Association for Computational Linguistics (pp. 85–90). http://dl.acm.org/citation.cfm?id=981328. Accessed December 1 2014.
  46. Pierrehumbert, J. (2000). The phonetic grounding of phonology. Bulletin de la communication parlée, 5, 7–23.Google Scholar
  47. Pierrehumbert, J., Beckman, M. E., & Ladd, D. R. (2000). Conceptual foundations of phonology as a laboratory science (pp. 273–304). Phonological knowledge: Conceptual and empirical issues.Google Scholar
  48. Pitrelli, J. F., Beckman, M. E., & Hirschberg, J. (1994). Evaluation of prosodic transcription labeling reliability in the tobi framework. ICSLP. http://20.210-193-52.unknown.qala.com.sg/archive/archive_papers/icslp_1994/i94_0123.pdf. Accessed July 13 2014.
  49. Prieto, P. (2009). Tonal alignment patterns in Catalan nuclear falls. Lingua, 119(6), 865–880.CrossRefGoogle Scholar
  50. Prieto, P. (2014). The intonational phonology of Catalan. In S.-A. Jun (Ed.), Prosodic typology (Vol. 2, pp. 43–80). Oxford: Oxford University Press. http://www.elebilab.com/documentos/archivos/publicaciones/3_GGT-08-04.pdf. Accessed August 26 2014.
  51. Prieto, P., & Cabré, T. (Eds.). (2013). L’entonació dels dialectes catalans. Rubí: Publicacions de l’Abadia de Montserrat.Google Scholar
  52. Prieto, P., & Hualde, J. I. (n.d.). Towards an international phonetic alphabet. Laboratory Phonology. (in press)Google Scholar
  53. Prieto, P., & Roseano, P. (Eds.). (2010). Transcription of Intonation of the Spanish Language. München: Lincom Europa.Google Scholar
  54. Prieto, P., van Santen, J., & Hirschberg, J. (1995). Tonal alignment patterns in Spanish. Journal of Phonetics, 23(4), 429–451.CrossRefGoogle Scholar
  55. Randolph, J. J. (2008). Online Kappa Calculator. http://justus.randolph.name/kappa.
  56. Rietveld, A. C. M. (1984). Syllaben, klemtonen en de automatische detectie van beklemtoonde syllaben in het Nederlands. Université de Nijmegen.Google Scholar
  57. Rietveld, T., & Gussenhoven, C. (1985). On the relation between pitch excursion size and prominence. Journal of Phonetics, 13, 299–308.Google Scholar
  58. Roseano, P., & Fernández Planas, A. M. (2013). Transcripció fonètica i fonològica de l’entonació: una proposta d’etiquetatge automàtic. Estudios de fonética experimental, XXII, 275–332. http://www.raco.cat/index.php/EFE/article/view/275413. Accessed July 18 2014.
  59. Roseano, P., Fernández Planas, A. M., Elvira-García, W., & Martínez Celdrán, E. (2015). Els tons de continuació en parla espontània: Descripció i transcripció. Barcelona: VII Workshop sobre la prosòdia del català.Google Scholar
  60. Rosenberg, A. (2010). AuToBI - a tool for automatic ToBI annotation. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association (pp. 146–149). Mihama, Japan. http://eniac.cs.qc.cuny.edu/andrew/papers/autobi-is10.pdf. Accessed August 26 2014.
  61. Roseano, P., Fernández Planas, A. M., Elvira-García, W., Cerdà Massó, R., & Martínez Celdrán, E. (accepted). Caracterització acústica dels accents prenuclears de les interrogatives absolutes i les declaratives neutres en català central. Estudios de Fonética Experimental, XXV.Google Scholar
  62. Ross, K., & Ostendorf, M. (1996). Prediction of abstract prosodic labels for speech synthesis. Computer Speech & Language, 10(3), 155–185. http://www.sciencedirect.com/science/article/pii/S0885230896900108. Accessed October 29 2014.
  63. Savino, M., Refice, M., & Daleno, D. (2002). Methods and Tools for Prosodic Analysis of a Spoken Italian Corpus. In Proceedings of the I International Conference on Language Resources and Evaluation (pp. 307–312). http://lrec-conf.org/proceedings/lrec2002/pdf/101.pdf. Accessed September 8 2014.
  64. Shriberg, E., Stolcke, A., Hakkani-Tür, D., & Tür, G. (2000). Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32(1), 127–154.CrossRefGoogle Scholar
  65. Siebenhaar, B., & Leemann, A. (2012). Methodological reflections on the phonetic-phonological continuum, illustrated on the prosody of Swiss German dialects. In A. Ender, A. Leemann, & B. Wälchli (Eds.), Methods in Contemporary Linguistics (Vol. 247, pp. 21–44). Berlin: Walter de Gruyter. http://books.google.es/books?hl=es&lr=&id=cf8YDeYvBuQC&oi=fnd&pg=PA21&dq=This+system+has+been+formalized+in+the+ToBI+transcription+sys-+tem.+…+phonetic–+phonological+continuum,+illustrated+on+the+prosody+of+Swiss+German+dialects&ots=cIfe-1AYbo&sig=M9W96TM_PcPLCC49gwaKEGURcg0. Accessed November 17 2014.
  66. Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., et al. (1992). ToBI: A Standard for Labeling English Prosody. In M. M. H. and G. E. W. J. J. Ohala, T. M. Nearey, B. L. Derwing (Ed.), ICSLP 92 Proceedings 1992 International Conference on Spoken Language Processing. Volume 2 (pp. 867–870.). Department of Linguistics, University of Alberta.Google Scholar
  67. Sridhar, V. (2008). Exploiting acoustic and syntactic features for automatic prosody labeling in a maximum entropy framework. IEEE Transactions on Audio, Speech, and Language Processing, 16(4), 797–811. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4453862. Accessed April 7 2014.
  68. Syrdal, A. K., Hirschberg, J., McGory, J., & Beckman, M. (2001). Automatic ToBI prediction and alignment to speed manual labeling of prosody. Speech Communication, 33(1), 135–151. http://www.sciencedirect.com/science/article/pii/S016763930000073X. Accessed April 7 2014.
  69. Syrdal, A. K., & McGory, J. T. (2000). Inter-transcriber reliability of toBI prosodic labeling. INTERSPEECH, 2000, 235–238.Google Scholar
  70. Tatham, M., & Morton, K. (2005). Developments in Speech Synthesis. John Wiley & Sons. http://books.google.com/books?id=6mPk1Dkt_V0C&pgis=1. Accessed November 17 2014.
  71. The Ohio State University Department of Linguistics. (1999). ToBI. http://www.ling.ohio-state.edu/~tobi/. Accessed August 9 2014.
  72. Tür, G., Hakkani-Tür, D., Stolcke, A., & Shriberg, E. (2001). Integrating prosodic and lexical cues for automatic topic segmentation. Computational Linguistics, 27(1), 31–57.CrossRefGoogle Scholar
  73. Vanrell, M. del M. (2011). The phonological relevance of tonal scaling in the intonational grammar of Catalan. Universitat Autònoma de Barcelona.Google Scholar
  74. Wagner, A. (2008). Automatic labeling of prosody. In Proceedings of the 2nd ISCA Workshop on Experimental Linguistics, ExLing 2008 (pp. 25–27). Athens, Greece. http://isca-speech.org/archive_open/archive_papers/exling2008/exl8_221.pdf. Accessed September 3 2014.
  75. Wasserblat, M.., Gainza, M.., Dorran, D.., & Domb, Y.. (2008). Pitch tracking and voiced/unvoiced detection in noisy environment using optim at sequence estimation. In IET Irish Signals and Systems Conference (pp. 43–48). Galway, Ireland.Google Scholar
  76. Wightman, C., & Ostendorf, M. (1994). Automatic labeling of prosodic patterns. In IEEE Transactions on Audio, Speech, and Language Processing (Vol. 2, pp. 469–481). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=326607. Accessed November 17 2014.

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  1. 1.Laboratori of PhoneticsUniversitat de BarcelonaBarcelonaSpain
  2. 2.Department of General LinguisticsUniversitat de BarcelonaBarcelonaSpain
  3. 3.Department of SpanishUniversitat de BarcelonaBarcelonaSpain

Personalised recommendations