Transcription of Catalan Broadcast Conversation

  • Henrik Schulz
  • José A. R. Fonollosa
  • David Rybach
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5729)


The paper describes aspects, methods and results of the development of an automatic transcription system for Catalan broadcast conversation by means of speech recognition. Emphasis is given to Catalan language, acoustic and language modelling methods and recognition. Results are discussed in context of phenomena and challenges in spontaneous speech, in particular regarding phoneme duration and feature space reduction.


Automatic Speech Recognition Acoustic Model Spontaneous Speech Read Speech Maximum Likelihood Linear Regression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Schulz, H., Costa-Jussà, M.R., Fonollosa, J.A.R.: TECNOPARLA - Speech Technologies for Catalan and its Application to Speech-to-Speech Translation. In: Procesamiento del Lenguaje Natural (2008)Google Scholar
  2. 2.
    Herrick, D.: An Acoustic Analysis of Phonological Vowel Reductio. In: SIx Varieties of Catalan. PhD thesis, University of California, Santa Cruz (September 2003)Google Scholar
  3. 3.
    Wheeler, M.W.: The Phonology of Catalan. Oxford University Press, Oxford (2005)Google Scholar
  4. 4.
    Recasens, D.: Place cues for nasal consonants with special reference to Catalan. Journal of the Acoustic Society of America 73, 1346–1353 (1983)CrossRefGoogle Scholar
  5. 5.
    N.N.: The RWTH Aachen University Speech Recognition System (November 2008),
  6. 6.
    Lööf, J., Gollan, C., Hahn, S., Heigold, G., Hoffmeister, B., Plahl, C., Rybach, D., Schlüter, R., Ney, H.: The RWTH 2007 TC-STAR Evaluation System for European English and Spanish. In: Interspeech, Antwerp, Belgium, pp. 2145–2148 (August 2007)Google Scholar
  7. 7.
    Gales, M.: Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language 12(2), 75–98 (1998)CrossRefGoogle Scholar
  8. 8.
    Anastasakos, T., McDonough, J., Schwartz, R., Makhoul, J.: A Compact Model for Speaker-Adaptive Training. In: Proc. ICSLP, pp. 1137–1140 (1996)Google Scholar
  9. 9.
    Stolcke, A.: SRILM-an Extensible Language Modeling Toolkit. In: Seventh International Conference on Spoken Language Processing, ISCA (2002)Google Scholar
  10. 10.
    Leggetter, C., Woodland, P.: Maximum likelihood linear regression for speaker adaptation of HMMs. Computer Speech and Language 9, 171–186 (1995)CrossRefGoogle Scholar
  11. 11.
    van Son, R.J.J.H., van Santen, J.P.H.: Strong Interaction Between Factors Influencing Consonant Duration. In: EUROSPEECH 1997, pp. 319–322 (1997)Google Scholar
  12. 12.
    Furui, S., Nakamura, M., Ichiba, T., Iwano, K.: Why is the recognition of spontaneous speech so hard? In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 9–22. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Henrik Schulz
    • 1
  • José A. R. Fonollosa
    • 1
  • David Rybach
    • 2
  1. 1.TALP Research CenterTechnical University of Catalunya (UPC)BarcelonaSpain
  2. 2.Human Language Technology and Pattern RecognitionRWTH Aachen UniversityAachenGermany

Personalised recommendations