Skip to main content

Towards a New Image-Based Spectrogram Segmentation Speech Coder Optimised for Intelligibility

  • Conference paper
Advances in Multimedia Modeling (MMM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5371))

Included in the following conference series:

Abstract

Speech intelligibility is the very essence of communications. When high noise can degrade a speech signal to the threshold of intelligibility, for example in mobile and military applications, introducing further degradation by a speech coder could prove critical. This paper investigates concepts towards a new speech coder that draws upon the field of image processing in a new multimedia approach. The coder is based on a spectrogram segmentation image processing procedure. The design criterion is for minimal intelligibility loss in high noise, as opposed to the conventional quality criterion, and the bit rate must be reasonable. First phase intelligibility listening test results assessing its potential alongside six standard coders are reported. Experimental results show the robustness of the LD-CELP coder, and the potential of the new coder with particularly good results in car noise conditions below -4.0dB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Martin, R.: Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors. In: Proc. IEEE ICASSP, vol. 1, pp. 253–256 (2002)

    Google Scholar 

  2. Beerends, J.G.: Extending p.862 PESQ for assessing speech intelligibility. White contribution COM 12-C2 to ITU-T Study, Group 12 (October 2004)

    Google Scholar 

  3. Chong-White, N.R., Cox, R.V.: An intelligibility enhancement for the mixed excitation linear prediction speech coder. IEEE Signal Processing Letters 10(9), 263–266 (2003)

    Article  Google Scholar 

  4. Martin, R., Malah, D., Cox, R.V., Accardi, A.J.: A noise reduction preprocessor for mobile voice communication. EURASIP Journal on Applied Signal Processing, 1046–1058 (2004)

    Google Scholar 

  5. Demiroglu, C., Anderson, D.V.: A soft decision MMSE amplitude estimator as a noise preprocessor to speech coders using a glottal sensor. In: Proc. ICSLP, pp. 857–860 (2004)

    Google Scholar 

  6. Quatieri, T.F., Brady, K., Messing, D., Campbell, J.P., Campbell, W.M., Brandstein, M.S., Clifford, C.J., Tardelli, J.D., Gatewood, P.D.: Exploiting nonacoustic sensors for speech encoding. IEEE Trans. on ASLP 14(2), 533–544 (2006)

    Google Scholar 

  7. Hu, Y., Loizou, P.C.: A comparative intelligibility study of speech enhancement algorithms. ICASSP 4(4), 561–564 (2007)

    Google Scholar 

  8. Liu, W.M.: Objective assessment of comparative intelligibility. PhD Thesis, University of Wales Swansea University (2008)

    Google Scholar 

  9. Supplee, L.N., Cohn, R.P., Collura, J.S., McCree, A.V.: MELP: The new federal standard at 2400 bps. In: Proc. ICASSP, vol. 2, pp. 1591–1594 (1997)

    Google Scholar 

  10. Hory, C., Martin, N.: Spectrogram segmentation by means of statistical features for non-stationary signal interpretation. IEEE Trans. on Signal Processing 50, 2915–2925 (2002)

    Article  MathSciNet  Google Scholar 

  11. Cox, R.V.: Three new speech coders from the ITU cover a range of applications. IEEE Communications Magazine, 40–47 (1997)

    Google Scholar 

  12. Gibson, J.D.: Adaptive prediction in speech differential encoding system. Proc. IEEE 68, 488–525 (1980)

    Article  Google Scholar 

  13. Ekudden, E., Hagen, R., Johansson, I., Svedberg, J.: The adaptive multi-rate speech coder. In: Proc. IEEE Workshop on Speech Coding, pp. 117–119 (1999)

    Google Scholar 

  14. Chen, J.-H., Cox, R.V., Lin, Y.-C., Jayant, N., Melchner, M.J.: A low-delay CELP coder for the CCITT 16 kb/s speech coding standard. IEEE Selected Areas in Communications 10(5), 830–849 (1992)

    Article  Google Scholar 

  15. Vary, P., Hellwig, K., Hofmann, R., Sluyter, R.J., Galand, C., Rosso, M.: Speech codec for the european mobile radio system. In: Proc. ICASSP, pp. 227–230 (1988)

    Google Scholar 

  16. Tremain, T.E.: The government standard linear predictive coding algorithm: LPC-10. In: Speech Technology, pp. 40–49 (1982)

    Google Scholar 

  17. Sun Microsystems. CCITT ADPCM encoder G.711, G.721, G.723, encode (14/04/2008), ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.gz

  18. 3GPP. European digital cellular telecommunication system 4750.. 12200 bits/s speech CODEC for adaptive multi-rate speech traffic channels, encoder, v6.0.0 (29/06/2008), http://www.3gpp.org/ftp/Specs/html-info/26073.htm

  19. Zatsman, A., Concannon, M.: 16 kb/s low-delay CELP algorithm, ccelp, v2.0 (14/04/2008), ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/ldcelp-2.0.tar.gz

  20. Jutta. ETSI 06.10 GSM-FR, toast, v1.8 (14/04/2008), http://kbs.cs.tu-berlin.de/~jutta/toast.html

  21. Texas Instruments, Inc. 2.4 kb/s proposed federal standard MELP speech coder, melp, v1.2 (14/04/2008)

    Google Scholar 

  22. Fingerhut, A.: U.S. department of defence LPC-10 2400bps voice coder, nuke, v1.5 (14/04/2008), http://www.arl.wustl.edu/~jaf/lpc/

  23. Liu, W.M., Jellyman, K.A., Mason, J.S., Evans, N.W.D.: Assessment of objective quality measures for speech intelligibility estimation. In: Proc. ICASSP (2006)

    Google Scholar 

  24. ITU recommendation P.56. Objective measurement of active speech level. ITU (1993)

    Google Scholar 

  25. Hirsch, H.G., Pearce, D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRW ASR2000 Automatic Speech Recognition: Challenges for the next Millenium (2000)

    Google Scholar 

  26. Liu, W.M., Jellyman, K.A., Evans, N.W.D., Mason, J.S.D.: Assessment of objective quality measures for speech intelligibility. Publication in ICSLP (accepted, 2008)

    Google Scholar 

  27. Romero Rodriguez, F., Liu, W.M., Evans, N.W.D., Mason, J.S.D.: Morphological filtering of speech spectrograms in the context of additive noise. In: Proc. Eurospeech (2003)

    Google Scholar 

  28. Evans, N.W.D.: Spectral subtraction for speech enhancement and automatic speech recognition. PhD Thesis, University of Wales Swansea (2003)

    Google Scholar 

  29. McAulay, R.J., Quatieri, T.F.: Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. ASSP 34(4), 744–754 (1986)

    Article  Google Scholar 

  30. ImageMagick Studio LLC. Imagemagick, v6.3.0, http://www.imagemagick.org

  31. Kuhn, M.: JBIG-KIT package, v1.6, http://www.cl.cam.ac.uk/~mgk25/jbigkit/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jellyman, K.A., Evans, N.W.D., Liu, W.M., Mason, J.S.D. (2009). Towards a New Image-Based Spectrogram Segmentation Speech Coder Optimised for Intelligibility. In: Huet, B., Smeaton, A., Mayer-Patel, K., Avrithis, Y. (eds) Advances in Multimedia Modeling . MMM 2009. Lecture Notes in Computer Science, vol 5371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92892-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-92892-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-92891-1

  • Online ISBN: 978-3-540-92892-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics