Towards a New Image-Based Spectrogram Segmentation Speech Coder Optimised for Intelligibility

  • Keith A. Jellyman
  • Nicholas W. D. Evans
  • W. M. Liu
  • J. S. D. Mason
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5371)


Speech intelligibility is the very essence of communications. When high noise can degrade a speech signal to the threshold of intelligibility, for example in mobile and military applications, introducing further degradation by a speech coder could prove critical. This paper investigates concepts towards a new speech coder that draws upon the field of image processing in a new multimedia approach. The coder is based on a spectrogram segmentation image processing procedure. The design criterion is for minimal intelligibility loss in high noise, as opposed to the conventional quality criterion, and the bit rate must be reasonable. First phase intelligibility listening test results assessing its potential alongside six standard coders are reported. Experimental results show the robustness of the LD-CELP coder, and the potential of the new coder with particularly good results in car noise conditions below -4.0dB.


Speech Signal Image Compression Automatic Speech Recognition Speech Enhancement Speech Intelligibility 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Martin, R.: Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors. In: Proc. IEEE ICASSP, vol. 1, pp. 253–256 (2002)Google Scholar
  2. 2.
    Beerends, J.G.: Extending p.862 PESQ for assessing speech intelligibility. White contribution COM 12-C2 to ITU-T Study, Group 12 (October 2004)Google Scholar
  3. 3.
    Chong-White, N.R., Cox, R.V.: An intelligibility enhancement for the mixed excitation linear prediction speech coder. IEEE Signal Processing Letters 10(9), 263–266 (2003)CrossRefGoogle Scholar
  4. 4.
    Martin, R., Malah, D., Cox, R.V., Accardi, A.J.: A noise reduction preprocessor for mobile voice communication. EURASIP Journal on Applied Signal Processing, 1046–1058 (2004)Google Scholar
  5. 5.
    Demiroglu, C., Anderson, D.V.: A soft decision MMSE amplitude estimator as a noise preprocessor to speech coders using a glottal sensor. In: Proc. ICSLP, pp. 857–860 (2004)Google Scholar
  6. 6.
    Quatieri, T.F., Brady, K., Messing, D., Campbell, J.P., Campbell, W.M., Brandstein, M.S., Clifford, C.J., Tardelli, J.D., Gatewood, P.D.: Exploiting nonacoustic sensors for speech encoding. IEEE Trans. on ASLP 14(2), 533–544 (2006)Google Scholar
  7. 7.
    Hu, Y., Loizou, P.C.: A comparative intelligibility study of speech enhancement algorithms. ICASSP 4(4), 561–564 (2007)Google Scholar
  8. 8.
    Liu, W.M.: Objective assessment of comparative intelligibility. PhD Thesis, University of Wales Swansea University (2008)Google Scholar
  9. 9.
    Supplee, L.N., Cohn, R.P., Collura, J.S., McCree, A.V.: MELP: The new federal standard at 2400 bps. In: Proc. ICASSP, vol. 2, pp. 1591–1594 (1997)Google Scholar
  10. 10.
    Hory, C., Martin, N.: Spectrogram segmentation by means of statistical features for non-stationary signal interpretation. IEEE Trans. on Signal Processing 50, 2915–2925 (2002)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Cox, R.V.: Three new speech coders from the ITU cover a range of applications. IEEE Communications Magazine, 40–47 (1997)Google Scholar
  12. 12.
    Gibson, J.D.: Adaptive prediction in speech differential encoding system. Proc. IEEE 68, 488–525 (1980)CrossRefGoogle Scholar
  13. 13.
    Ekudden, E., Hagen, R., Johansson, I., Svedberg, J.: The adaptive multi-rate speech coder. In: Proc. IEEE Workshop on Speech Coding, pp. 117–119 (1999)Google Scholar
  14. 14.
    Chen, J.-H., Cox, R.V., Lin, Y.-C., Jayant, N., Melchner, M.J.: A low-delay CELP coder for the CCITT 16 kb/s speech coding standard. IEEE Selected Areas in Communications 10(5), 830–849 (1992)CrossRefGoogle Scholar
  15. 15.
    Vary, P., Hellwig, K., Hofmann, R., Sluyter, R.J., Galand, C., Rosso, M.: Speech codec for the european mobile radio system. In: Proc. ICASSP, pp. 227–230 (1988)Google Scholar
  16. 16.
    Tremain, T.E.: The government standard linear predictive coding algorithm: LPC-10. In: Speech Technology, pp. 40–49 (1982)Google Scholar
  17. 17.
    Sun Microsystems. CCITT ADPCM encoder G.711, G.721, G.723, encode (14/04/2008),
  18. 18.
    3GPP. European digital cellular telecommunication system 4750.. 12200 bits/s speech CODEC for adaptive multi-rate speech traffic channels, encoder, v6.0.0 (29/06/2008),
  19. 19.
    Zatsman, A., Concannon, M.: 16 kb/s low-delay CELP algorithm, ccelp, v2.0 (14/04/2008),
  20. 20.
    Jutta. ETSI 06.10 GSM-FR, toast, v1.8 (14/04/2008),
  21. 21.
    Texas Instruments, Inc. 2.4 kb/s proposed federal standard MELP speech coder, melp, v1.2 (14/04/2008)Google Scholar
  22. 22.
    Fingerhut, A.: U.S. department of defence LPC-10 2400bps voice coder, nuke, v1.5 (14/04/2008),
  23. 23.
    Liu, W.M., Jellyman, K.A., Mason, J.S., Evans, N.W.D.: Assessment of objective quality measures for speech intelligibility estimation. In: Proc. ICASSP (2006)Google Scholar
  24. 24.
    ITU recommendation P.56. Objective measurement of active speech level. ITU (1993)Google Scholar
  25. 25.
    Hirsch, H.G., Pearce, D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRW ASR2000 Automatic Speech Recognition: Challenges for the next Millenium (2000)Google Scholar
  26. 26.
    Liu, W.M., Jellyman, K.A., Evans, N.W.D., Mason, J.S.D.: Assessment of objective quality measures for speech intelligibility. Publication in ICSLP (accepted, 2008)Google Scholar
  27. 27.
    Romero Rodriguez, F., Liu, W.M., Evans, N.W.D., Mason, J.S.D.: Morphological filtering of speech spectrograms in the context of additive noise. In: Proc. Eurospeech (2003)Google Scholar
  28. 28.
    Evans, N.W.D.: Spectral subtraction for speech enhancement and automatic speech recognition. PhD Thesis, University of Wales Swansea (2003)Google Scholar
  29. 29.
    McAulay, R.J., Quatieri, T.F.: Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. ASSP 34(4), 744–754 (1986)CrossRefGoogle Scholar
  30. 30.
    ImageMagick Studio LLC. Imagemagick, v6.3.0,
  31. 31.
    Kuhn, M.: JBIG-KIT package, v1.6,

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Keith A. Jellyman
    • 1
  • Nicholas W. D. Evans
    • 1
    • 2
  • W. M. Liu
    • 1
  • J. S. D. Mason
    • 1
  1. 1.School of EngineeringSwansea UniversityUK
  2. 2.EURECOMSophia AntipolisFrance

Personalised recommendations