Reconstruction of Dysphonic Speech by MELP

  • H. Irem Türkmen
  • M. Elif Karsligil
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5197)


The chronical dysphony is the result of neural, structural or pathological effects on the vocal cords or larynx and it causes undesirable changes in the quality of speech. This paper presents a Mixed Excitation Linear Prediction (MELP) based system that reconstructs normally phonated speech from dysphonic speech, while preserving the individuality of the patient. The proposed system can be used as speech prosthesis for the patients who have lost the ability to produce voice. To reconstruct normally phonated speech from dysphonic speech, pitch generation using the perceived pitch relationship with formant frequencies, formant and voicing modification steps were performed for phonemes. The principle novelty of this study is to modify voiced phonemes’ acoustic features while preserving unvoiced ones. Therefore voiced-unvoiced detection is performed for each phoneme.

The proposed system is composed of three main parts. In the analysis phase the acoustic differences observed between normal and dysphonic speech are determined. Acoustic parameters of the dysphonic speech’s voiced phonemes are modified in order to obtain a synthetic speech that is closer to normal speech. Finally, enhanced speech is synthesized by MELP.


Dysphonic speech enhancement MELP Formant modification Pitch and voicing generation 


  1. 1.
    Eastern Virginia Medical School,
  2. 2.
    Aguilar, G., Nakano-Miyatake, M.: Alaryngeal Speech Enhancement Using Pattern Recognition Techniques. IEICE - Transactions on Information and Systems E88-D(7), 1618–1622 (2005)CrossRefGoogle Scholar
  3. 3.
    Bi, N., Qi, Y.: Speech conversion and its application to alaryngeal speech enhancement. In: Proc. ICSP 1996, pp. 1586–1589 (1997)Google Scholar
  4. 4.
    Sawada, H., Takeuchi, N., Hisada, A.: A Real-time Clarification Filter of a Dysphonic Speech and Its Evaluation by Listening Experiments. In: International Conference on Disability, Virtual Reality and Associated Technologies (ICDVRAT 2004), pp. 239–246 (2004)Google Scholar
  5. 5.
    Pozo, A., Young, S.: Continuous Tracheoesophageal Speech Repair. In: EUSIPCO (2006)Google Scholar
  6. 6.
    Qi, Y., Weinberg, B., Bi, N.: Enhancement of female esophageal and tracheoesophageal speech. Journal of the Acoustical Society of America 98, 2461–2465 (1995)CrossRefGoogle Scholar
  7. 7.
    Morris, R.W., Clements, M.A.: Reconstruction of speech from whispers. Medical Engineering and Physics 24(7), 515–520 (2002)CrossRefGoogle Scholar
  8. 8.
    The International Phonetic Association,
  9. 9.
    Atal, B.S., Rabiner, L.R.: A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing assp-24(3) (June 1976)Google Scholar
  10. 10.
    Thomas, I.B.: Perceived pitch of whispered vowels. J. Acoust. Soc. Am. 46(2), 468 (1969)CrossRefGoogle Scholar
  11. 11.
    Higashikawa, M., Nakai, K., Sakakura, A., Takahashi, H.: Perceived pitch of whispered vowels- relationship with formant frequencies: A preliminary study. Journal of Voice, 155–158 (1996)Google Scholar
  12. 12.
    McLoughlin, I.V., Chance, R.J.: LSP-based speech modification for intelligibility enhancement. In: Proceedings 13th International Conference on DSP, vol. 2, pp. 591–594 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • H. Irem Türkmen
    • 1
  • M. Elif Karsligil
    • 1
  1. 1.Computer Engineering DepartmentYildiz Technical UniversityYildizTurkey

Personalised recommendations