Speech Driven by Artificial Larynx: Potential Advancement Using Synthetic Pitch Contours

  • Hua-Li JianEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9177)


Despite a long history of development, the speech qualities achieved with artificial larynx devices are limited. This paper explores recent advances in prosodic speech processing and technology and assesses their potentials in improving the quality of speech with an artificial larynx – in particular, tone and intonation through pitch variation. Three approaches are discussed: manual pitch control, automatic pitch control and re-synthesized speech.


Artificial larynx Fundamental frequency Assistive technology 


  1. 1.
    Stalker, J.L., Hawk, A.M., Smaldino, J.J.: The intelligibility and acceptability of speech produced by five different electronic artificial larynx devices. J. Commun. Disord. 1(5), 299–301 (1982)CrossRefGoogle Scholar
  2. 2.
    Pindzola, R.H., Moffet, B.: Comparison of ratings of four artificial larynxes. J. Commun. Disord. 21, 459–467 (1988)CrossRefGoogle Scholar
  3. 3.
    Modrzejewski, M., Olszewski, E., Wszol, W., Rerona, E., Strek, P.: Acoustic assessment of voice signal deformation after partial surgery of the larynx. Auris Nasus Larynx 26, 183–190 (1999)CrossRefGoogle Scholar
  4. 4.
    Alipour, F., Scherer, R.C., Finnegan, E.: Measures of spectral slope using an excised larynx model. J. Voice 26(4), 403–411 (2012)CrossRefGoogle Scholar
  5. 5.
    Ooe, K., Fukuda, T., Arai, F.: A new type of artificial larynx using a PZT ceramics vibrator as a sound source. IEEE/ASME Trans. Mechantronics 5(2), 221–225 (2000)CrossRefGoogle Scholar
  6. 6.
    Niu, H.J., Won, M.X. Waq, S.P.: Enhancement of electronic artificial larynx speech by denoising. In: IEEE International Conference on Neural Networks & Signal Processing, pp. 908–911. IEEE Press (2003)Google Scholar
  7. 7.
    Schwarz, R., Huttner, B., Dollinger, M., Luegmair, G., Eysholdt, U., Schuster, M., Lohscheller, J., Gurlek, E.: Substitute voice production: quantification of PE segment vibrations using a biomechanical model. IEEE Trans. Biomed. Eng. 58(10), 2767–2776 (2011)CrossRefGoogle Scholar
  8. 8.
    Sharifzadeh, H.R., McLoughlin, I.V., Ahmadi, F.: Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec. IEEE Trans. Biomed. Eng. 57(10), 2448–2458 (2010)CrossRefGoogle Scholar
  9. 9.
    Ooe, K.: Development of controllable artificial larynx by neck myoelectric signal. Procedia Eng. 47, 869–872 (2012)CrossRefGoogle Scholar
  10. 10.
    Stepp, C.A., Heaton, J.T., Rolland, R.G., Hillman, R.E.: Neck and face surface electromyography for prosthetic voice control after total laryngectomy. IEEE Trans. Neural Syst. Rehabil. Eng. 17(2), 146–155 (2009)CrossRefGoogle Scholar
  11. 11.
    Heaton, J.T., Robertson, M., Griffin, C.: Development of a wireless electromyographically controlled electrolarynx voice prosthesis. In: 33rd Annual International Conference of the IEEE EMBS, pp. 5352–5355. IEEE Press (2011)Google Scholar
  12. 12.
    Uemi, N., Ifukube, T., Tamashi, T., Matsushima, J.: Design of a new electrolarynx having a pitch control function. In: IEEE lnternational Workshop on Robot and Human Communication, pp. 198–203. IEEE Press (1994)Google Scholar
  13. 13.
    Blankinship, E., Beckwith, R.: Tools for expressive text-to-speech markup. In: Proceedings of the 14th Annual ACM Symposium on User Interface Software and Technology, pp. 159–160. ACM press (2001)Google Scholar
  14. 14.
    Győrbíró, N., Fábián, A., Hományi, G.: An activity recognition system for mobile phones. Mobile Netw. Appl. 14(1), 82–91 (2009)CrossRefGoogle Scholar
  15. 15.
    Carrino, F., Ridi, A., Ingold, R., Abou Khaled, O., Mugellini, E.: Gesture vs. gesticulation: a test protocol. In: Kurosu, M. (ed.) HCII/HCI 2013, Part IV. LNCS, vol. 8007, pp. 157–166. Springer, Heidelberg (2013)Google Scholar
  16. 16.
    Plumpe, M., Meredith, S.: Which is more important in a concatenative text to speech system - pitch, duration or spectral discontinuity? In: Proceedings of the Third ESCA/COCOSDA Workshop on Speech Synthesis, Jenolan, Australia (1998)Google Scholar
  17. 17.
    Klabbers, E., van Santen, J.P.H.: Control and prediction of the impact of pitch modification on synthetic speech quality. In: Eurospeech 2003 (2003)Google Scholar
  18. 18.
    Gu, H.Y., Yang, C.C.: An HMM based pitch-contour generation method for mandarin speech synthesis. J. Inf. Sci. Eng. 27, 1561–1580 (2011)MathSciNetGoogle Scholar
  19. 19.
    Chen, J.H., Kao, Y.A.: Pitch marking based on an adaptable filter and a peak-valley estimation method. Comput. Linguist. Chin. Lang. Process. 6(2), 1–12 (2012)zbMATHGoogle Scholar
  20. 20.
    Hirschberg, J.: Accent and discourse context: assigning pitch accent in synthetic speech. In: AAAI 1990 Proceedings (1990)Google Scholar
  21. 21.
    Hirschberg, J., Litman, D.: Disambiguating cue phrases in text and speech. In: Proceedings of COLING 1990, Helsinki, August (1990)Google Scholar
  22. 22.
    Hirschberg, J.: Pitch accent in context predicting intonational prominence from text. Artif. Intell. 63(1), 305–340 (1993)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Chiou, G.I., Hwang, J.N.: Lipreading from color video. IEEE Trans. Image Process. 6(8), 1192–1195 (1997)CrossRefGoogle Scholar
  24. 24.
    Zhou, Z.H., Zhao, G.Y., Pietikainen, M.: Towards a practical lip-reading system. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 137–144 (2011)Google Scholar
  25. 25.
    Li, M., Cheung, Y.M.: A novel motion based lip feature extraction for lip-reading. In: International Conference on Computational Intelligence and Security, CIS 2008, vol. 1, pp. 361–365 (2008)Google Scholar
  26. 26.
    Garay-Vitoria, N., Abascal, J.: Text prediction systems: a survey. Univers. Access. Inf. Soc. 4(3), 188–203 (2006)CrossRefGoogle Scholar
  27. 27.
    Fredkin, E.: Trie Memory. Commun. ACM 3(9), 490–499 (1960)CrossRefGoogle Scholar
  28. 28.
    Philips, L.: Hanging on the metaphone. Comput. Lang. 7(12), 38–43 (1990)Google Scholar
  29. 29.
    Litman, D., Walker, M., Kearns, M.: Automatic detection of poor speech recognition at the dialogue level. In: Proceedings of the 37th Annual Meeting of the Association of Computational Linguistics, ACL 1999, College Park, pp. 309–316 (1999)Google Scholar
  30. 30.
    Litman, D., Pan, S.: Empirically evaluating an adaptable spoken dialogue system. In: Proceedings of the 7th International Conference on User Modeling (UM), Banff, pp. 55–64 (1999)Google Scholar
  31. 31.
    Walker, M., Kamm, C., Litman, D.: Towards developing general models of usability with PARADISE. Nat. Lang. Eng. Special Issue on Best Practice Spoken Language Dialogue System Engineering 6, 363–377 (2000)Google Scholar
  32. 32.
    Hirschberg, J., Litman, D., Swerts, M.: Prosodic and other cues to speech recognition failures. Speech Commun. 43(1), 155–175 (2004)CrossRefGoogle Scholar
  33. 33.
    Ostendorf, M., Byrne, B., Bacchiani, M., Finke, M., Gunawardana, A., Ross, K., Roweis, S., Shriberg, E., Talkin, D.,Waibel, A., Wheatley, B., Zeppenfeld, T.: Modeling systematic variations in pronunciation via a language-dependent hidden speaking mode. In: Report on 1996 CLSP/JHU Workshop on Innovative Techniques for Large Vocabulary Continuous Speech Recognition (1997)Google Scholar
  34. 34.
    Litman, D., Hirschberg, J., Swerts, M.: Predicting user reactions to system error. In: Proceedings of the ACL-2001, Toulouse, pp. 329–369 (2001)Google Scholar
  35. 35.
    Hirschberg, J., Litman, D., Swerts, M.: Identifying user corrections automatically in spoken dialogue systems. In: Procedings of the NAACL 2001, Pittsburgh, pp. 208–215 (2001)Google Scholar
  36. 36.
    Doddington, G., Liggett, W., Martin, A., Przybocki, M., Reynolds, D.: Sheep, goats, lambs and wolves: a statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation. In: Proceedings of the International Conference on Spoken Language Processing-98, Sydney, pp. 608–611 (1998)Google Scholar
  37. 37.
    Hirschberg, J., Litman, D., Swerts, M.: Prosodic cues to recognition errors. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU 1999), Keystone, pp. 349–352 (1999)Google Scholar
  38. 38.
    Litman, D., Hirschberg, J., Swerts, M.: Characterizing and predicting corrections in spoken dialogue systems. Comput. Linguist. 32(3), 417–438 (2006)CrossRefGoogle Scholar
  39. 39.
    Tamura, M., Masuko, T., Tokuda, K., Kobayashi, T.: Text-to-speech synthesis with arbitrary speaker’s voice from average voice. In: Proceedings of Eurospeech 2001, pp. 345–348 (2001)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Institute of Information Technology, Faculty of TechnologyArt and Design Oslo and Akershus University College of Applied SciencesOsloNorway

Personalised recommendations