Advertisement

Improving Speech Recognition through Automatic Selection of Age Group – Specific Acoustic Models

  • Annika Hämäläinen
  • Hugo Meinedo
  • Michael Tjalve
  • Thomas Pellegrini
  • Isabel Trancoso
  • Miguel Sales Dias
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8775)

Abstract

The acoustic models used by automatic speech recognisers are usually trained with speech collected from young to middle-aged adults. As the characteristics of speech change with age, such acoustic models tend to perform poorly on children’s and elderly people’s speech. In this study, we investigate whether the automatic age group classification of speakers, together with age group –specific acoustic models, could improve automatic speech recognition performance. We train an age group classifier with an accuracy of about 95% and show that using the results of the classifier to select age group –specific acoustic models for children and the elderly leads to considerable gains in automatic speech recognition performance, as compared with using acoustic models trained with young to middle-aged adults’ speech for recognising their speech, as well.

Keywords

Age group classification acoustic modelling automatic speech recognition children elderly paralinguistic information 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Lee, S., Potamianos, A., Narayanan, S.: Acoustics of Children’s Speech: Developmental Changes of Temporal and Spectral Parameters. J. Acoust. Soc. Am. 10, 1455–1468 (1999)CrossRefGoogle Scholar
  2. 2.
    Huber, J.E., Stathopoulos, E.T., Curione, G.M., Ash, T.A., Johnson, K.: Formants of Children, Women and Men: The Effects of Vocal Intensity Variation. J. Acoust. Soc. Am. 106(3), 1532–1542 (1999)CrossRefGoogle Scholar
  3. 3.
    Xue, S., Hao, G.: Changes in the Human Vocal Tract Due to Aging and the Acoustic Correlates of Speech Production: A Pilot Study. Journal of Speech, Language, and Hearing Research 46, 689–701 (2003)CrossRefGoogle Scholar
  4. 4.
    Pellegrini, T., Hämäläinen, A., Boula de Mareüil, P., Tjalve, M., Trancoso, I., Candeias, S., Sales Dias, M., Braga, D.: A Corpus-Based Study of Elderly and Young Speakers of European Portuguese: Acoustic Correlates and Their Impact on Speech Recognition Performance. In: Interspeech, Lyon (2013)Google Scholar
  5. 5.
    Narayanan, S., Potamianos, A.: Creating Conversational Interfaces for Children. IEEE Speech Audio Process. 10(2), 65–78 (2002)CrossRefGoogle Scholar
  6. 6.
    Strommen, E.F., Frome, F.S.: Talking Back to Big Bird: Preschool Users and a Simple Speech Recognition System. Educ. Technol. Res. Dev. 41(1), 5–16 (1993)CrossRefGoogle Scholar
  7. 7.
    Anderson, S., Liberman, N., Bernstein, E., Foster, S., Cate, E., Levin, B., Hudson, R.: Recognition of Elderly Speech and Voice-Driven Document Retrieval. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Phoenix, AZ, pp. 145–148 (1999)Google Scholar
  8. 8.
    Takahashi, S., Morimoto, T., Maeda, S., Tsuruta, N.: Dialogue Experiment for Elderly People in Home Health Care System. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 418–423. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  9. 9.
    Teixeira, V., Pires, C., Pinto, F., Freitas, J., Dias, M.S., Mendes Rodrigues, E.: Towards Elderly Social Integration using a Multimodal Human-computer Interface. In: Proc. of the 2nd International Living Usability Lab Workshop on AAL Latest Solutions, Trends and Applications, AAL 2012, Milan (2012)Google Scholar
  10. 10.
    Wilpon, J.G., Jacobsen, C.N.: A Study of Speech Recognition for Children and Elderly. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Atlanta, GA, pp. 349–352 (1996)Google Scholar
  11. 11.
    Potamianos, A., Narayanan, S.: Robust Recognition of Children’s Speech. IEEE Speech Audio Process 11(6), 603–615 (2003)CrossRefGoogle Scholar
  12. 12.
    Hämäläinen, A., Miguel Pinto, F., Rodrigues, S., Júdice, A., Morgado Silva, S., Calado, A., Sales Dias, M.: A Multimodal Educational Game for 3-10-year-old Children: Collecting and Automatically Recognising European Portuguese Children’s Speech. In: Workshop on Speech and Language Technology in Education, Grenoble (2013)Google Scholar
  13. 13.
    Pellegrini, T., Trancoso, I., Hämäläinen, A., Calado, A., Sales Dias, M., Braga, D.: Impact of Age in ASR for the Elderly: Preliminary Experiments in European Portuguese. In: IberSPEECH, Madrid (2012)Google Scholar
  14. 14.
    Vipperla, R., Renals, S., Frankel, J.: Longitudinal Study of ASR Performance on Ageing Voices. In: Interspeech, Brisbane, pp. 2550–2553 (2008)Google Scholar
  15. 15.
    Batliner, A., Blomberg, M., D’Arcy, S., Elenius, D., Giuliani, D., Gerosa, M., Hacker, C., Russell, M., Steidl, S., Wong, M.: The PF_STAR Children’s Speech Corpus. In: Interspeech, Lisbon (2005)Google Scholar
  16. 16.
    Hämäläinen, A., Rodrigues, S., Júdice, A., Silva, S.M., Calado, A., Pinto, F.M., Dias, M.S.: The CNG Corpus of European Portuguese Children’s Speech. In: Habernal, I. (ed.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 544–551. Springer, Heidelberg (2013)Google Scholar
  17. 17.
    Cucchiarini, C., Van Hamme, H., van Herwijnen, O., Smits, F.: JASMIN-CGN: Extension of the Spoken Dutch Corpus with Speech of Elderly People, Children and Non-natives in the Human-Machine Interaction Modality. In: Language Resources and Evaluation, Genoa (2006)Google Scholar
  18. 18.
    Hämäläinen, A., Pinto, F., Sales Dias, M., Júdice, A., Freitas, J., Pires, C., Teixeira, V., Calado, A., Braga, D.: The First European Portuguese Elderly Speech Corpus. In: IberSPEECH, Madrid (2012)Google Scholar
  19. 19.
    Hämäläinen, A., Avelar, J., Rodrigues, S., Sales Dias, M., Kolesiński, A., Fegyó, T., Nemeth, G., Csobánka, P., Lan Hing Ting, K., Hewson, D.: The EASR Corpora of European Portuguese, French, Hungarian and Polish Elderly Speech. In: Langauge Resources and Evaluation, Reykjavik (2014)Google Scholar
  20. 20.
    Minematsu, N., Sekiguchi, M., Hirose, K.: Automatic Estimation of One’s Age with His/Her Speech Basedupon Acoustic Modeling Techniques of Speakers. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, pp. 137–140 (2002)Google Scholar
  21. 21.
    Dobry, G., Hecht, R., Avigal, M., Zigel, Y.: Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal. IEEE Transactions on Audio, Speech & Language Processing 19(7), 1975–1985 (2011)CrossRefGoogle Scholar
  22. 22.
    Bahari, M., McLaren, M., Van Hamme, H., Van Leeuwen, D.: Age Estimation from Telephone Speech Using i-Vectors. In: Interspeech, Portland, OR (2012)Google Scholar
  23. 23.
    Neto, J., Martins, C., Meinedo, H., Almeida, L.: The Design of a Large Vocabulary Speech Corpus for Portuguese. In: European Conference on Speech Technology, Rhodes (1997)Google Scholar
  24. 24.
    Eyben, F., Wollmer, M., Schuller, B.: openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In: ACM International Conference on Multimedia, Florence, pp. 1459–1462 (2010)Google Scholar
  25. 25.
    Meinedo, H., Trancoso, I.: Age and Gender Detection in the I-DASH Project. ACM Trans. Speech Lang. Process. 7(4), 13 (2011)CrossRefGoogle Scholar
  26. 26.
    Schuller, B., Steidl, S., Batliner, A., Noeth, E., Vinciarelli, A., Burkhardt, F., van Son, R., Weninger, F., Eyben, F., Bocklet, T., Mohammadi, G., Weiss, B.: The Interspeech 2012 Speaker Trait Challenge. In: Interspeech 2012, Portland, OR (2012)Google Scholar
  27. 27.
    Weninger, F., Eyben, F., Schuller, B.W., Mortillaro, M., Scherer, K.R.: On the Acoustics of Emotion in Audio: What Speech, Music and Sound Have in Common. Frontiers in Psychology, Emotion Science, Special Issue on Expression of Emotion in Music and Vocal Communication 4(Article ID 292), 1–12 (2013)Google Scholar
  28. 28.
    Hall, M.: Correlation-Based Feature Subset Selection for Machine Learning. Hamilton, New Zealand (1998)Google Scholar
  29. 29.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11 (2009)Google Scholar
  30. 30.
    Platt, J.: Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning (1998)Google Scholar
  31. 31.
    Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Computation 13(3), 637–649 (2001)CrossRefzbMATHGoogle Scholar
  32. 32.
    Linville, S.E.: Vocal Aging. Singular, San Diego (2001)Google Scholar
  33. 33.
    Microsoft Speech Platform Runtime (Version 11), http://www.microsoft.com/en-us/download/details.aspx?id=27225 (accessed March 25, 2013)

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Annika Hämäläinen
    • 1
  • Hugo Meinedo
    • 2
  • Michael Tjalve
    • 4
  • Thomas Pellegrini
    • 5
  • Isabel Trancoso
    • 2
    • 3
  • Miguel Sales Dias
    • 1
  1. 1.Microsoft Language Development Center & ISCTEUniversity Institute of LisbonLisbonPortugal
  2. 2.INESC-ID LisboaLisbonPortugal
  3. 3.Instituto Superio TécnicoLisbonPortugal
  4. 4.Microsoft & University of WashingtonSeattleUSA
  5. 5.IRIT - Université Toulouse III - Paul SabatierToulouseFrance

Personalised recommendations