Towards a New Image-Based Spectrogram Segmentation Speech Coder Optimised for Intelligibility

Jellyman, Keith A.; Evans, Nicholas W. D.; Liu, W. M.; Mason, J. S. D.

doi:10.1007/978-3-540-92892-8_8

Keith A. Jellyman⁵,
Nicholas W. D. Evans^5,6,
W. M. Liu⁵ &
…
J. S. D. Mason⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5371))

Included in the following conference series:

International Conference on Multimedia Modeling

781 Accesses
1 Citations

Abstract

Speech intelligibility is the very essence of communications. When high noise can degrade a speech signal to the threshold of intelligibility, for example in mobile and military applications, introducing further degradation by a speech coder could prove critical. This paper investigates concepts towards a new speech coder that draws upon the field of image processing in a new multimedia approach. The coder is based on a spectrogram segmentation image processing procedure. The design criterion is for minimal intelligibility loss in high noise, as opposed to the conventional quality criterion, and the bit rate must be reasonable. First phase intelligibility listening test results assessing its potential alongside six standard coders are reported. Experimental results show the robustness of the LD-CELP coder, and the potential of the new coder with particularly good results in car noise conditions below -4.0dB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Martin, R.: Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors. In: Proc. IEEE ICASSP, vol. 1, pp. 253–256 (2002)
Google Scholar
Beerends, J.G.: Extending p.862 PESQ for assessing speech intelligibility. White contribution COM 12-C2 to ITU-T Study, Group 12 (October 2004)
Google Scholar
Chong-White, N.R., Cox, R.V.: An intelligibility enhancement for the mixed excitation linear prediction speech coder. IEEE Signal Processing Letters 10(9), 263–266 (2003)
Article Google Scholar
Martin, R., Malah, D., Cox, R.V., Accardi, A.J.: A noise reduction preprocessor for mobile voice communication. EURASIP Journal on Applied Signal Processing, 1046–1058 (2004)
Google Scholar
Demiroglu, C., Anderson, D.V.: A soft decision MMSE amplitude estimator as a noise preprocessor to speech coders using a glottal sensor. In: Proc. ICSLP, pp. 857–860 (2004)
Google Scholar
Quatieri, T.F., Brady, K., Messing, D., Campbell, J.P., Campbell, W.M., Brandstein, M.S., Clifford, C.J., Tardelli, J.D., Gatewood, P.D.: Exploiting nonacoustic sensors for speech encoding. IEEE Trans. on ASLP 14(2), 533–544 (2006)
Google Scholar
Hu, Y., Loizou, P.C.: A comparative intelligibility study of speech enhancement algorithms. ICASSP 4(4), 561–564 (2007)
Google Scholar
Liu, W.M.: Objective assessment of comparative intelligibility. PhD Thesis, University of Wales Swansea University (2008)
Google Scholar
Supplee, L.N., Cohn, R.P., Collura, J.S., McCree, A.V.: MELP: The new federal standard at 2400 bps. In: Proc. ICASSP, vol. 2, pp. 1591–1594 (1997)
Google Scholar
Hory, C., Martin, N.: Spectrogram segmentation by means of statistical features for non-stationary signal interpretation. IEEE Trans. on Signal Processing 50, 2915–2925 (2002)
Article MathSciNet Google Scholar
Cox, R.V.: Three new speech coders from the ITU cover a range of applications. IEEE Communications Magazine, 40–47 (1997)
Google Scholar
Gibson, J.D.: Adaptive prediction in speech differential encoding system. Proc. IEEE 68, 488–525 (1980)
Article Google Scholar
Ekudden, E., Hagen, R., Johansson, I., Svedberg, J.: The adaptive multi-rate speech coder. In: Proc. IEEE Workshop on Speech Coding, pp. 117–119 (1999)
Google Scholar
Chen, J.-H., Cox, R.V., Lin, Y.-C., Jayant, N., Melchner, M.J.: A low-delay CELP coder for the CCITT 16 kb/s speech coding standard. IEEE Selected Areas in Communications 10(5), 830–849 (1992)
Article Google Scholar
Vary, P., Hellwig, K., Hofmann, R., Sluyter, R.J., Galand, C., Rosso, M.: Speech codec for the european mobile radio system. In: Proc. ICASSP, pp. 227–230 (1988)
Google Scholar
Tremain, T.E.: The government standard linear predictive coding algorithm: LPC-10. In: Speech Technology, pp. 40–49 (1982)
Google Scholar
Sun Microsystems. CCITT ADPCM encoder G.711, G.721, G.723, encode (14/04/2008), ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.gz
3GPP. European digital cellular telecommunication system 4750.. 12200 bits/s speech CODEC for adaptive multi-rate speech traffic channels, encoder, v6.0.0 (29/06/2008), http://www.3gpp.org/ftp/Specs/html-info/26073.htm
Zatsman, A., Concannon, M.: 16 kb/s low-delay CELP algorithm, ccelp, v2.0 (14/04/2008), ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/ldcelp-2.0.tar.gz
Jutta. ETSI 06.10 GSM-FR, toast, v1.8 (14/04/2008), http://kbs.cs.tu-berlin.de/~jutta/toast.html
Texas Instruments, Inc. 2.4 kb/s proposed federal standard MELP speech coder, melp, v1.2 (14/04/2008)
Google Scholar
Fingerhut, A.: U.S. department of defence LPC-10 2400bps voice coder, nuke, v1.5 (14/04/2008), http://www.arl.wustl.edu/~jaf/lpc/
Liu, W.M., Jellyman, K.A., Mason, J.S., Evans, N.W.D.: Assessment of objective quality measures for speech intelligibility estimation. In: Proc. ICASSP (2006)
Google Scholar
ITU recommendation P.56. Objective measurement of active speech level. ITU (1993)
Google Scholar
Hirsch, H.G., Pearce, D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRW ASR2000 Automatic Speech Recognition: Challenges for the next Millenium (2000)
Google Scholar
Liu, W.M., Jellyman, K.A., Evans, N.W.D., Mason, J.S.D.: Assessment of objective quality measures for speech intelligibility. Publication in ICSLP (accepted, 2008)
Google Scholar
Romero Rodriguez, F., Liu, W.M., Evans, N.W.D., Mason, J.S.D.: Morphological filtering of speech spectrograms in the context of additive noise. In: Proc. Eurospeech (2003)
Google Scholar
Evans, N.W.D.: Spectral subtraction for speech enhancement and automatic speech recognition. PhD Thesis, University of Wales Swansea (2003)
Google Scholar
McAulay, R.J., Quatieri, T.F.: Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. ASSP 34(4), 744–754 (1986)
Article Google Scholar
ImageMagick Studio LLC. Imagemagick, v6.3.0, http://www.imagemagick.org
Kuhn, M.: JBIG-KIT package, v1.6, http://www.cl.cam.ac.uk/~mgk25/jbigkit/

Download references

Author information

Authors and Affiliations

School of Engineering, Swansea University, UK
Keith A. Jellyman, Nicholas W. D. Evans, W. M. Liu & J. S. D. Mason
EURECOM, Sophia Antipolis, France
Nicholas W. D. Evans

Authors

Keith A. Jellyman
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas W. D. Evans
View author publications
You can also search for this author in PubMed Google Scholar
W. M. Liu
View author publications
You can also search for this author in PubMed Google Scholar
J. S. D. Mason
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Eurécom, 2229, route des crêtes, 06904, Sophia-Antipolis, France
Benoit Huet
Dublin City University, Dublin, Ireland
Alan Smeaton
Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
Ketan Mayer-Patel
Image, Video and Multimedia Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou Str., 157 80, Athens, Greece
Yannis Avrithis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jellyman, K.A., Evans, N.W.D., Liu, W.M., Mason, J.S.D. (2009). Towards a New Image-Based Spectrogram Segmentation Speech Coder Optimised for Intelligibility. In: Huet, B., Smeaton, A., Mayer-Patel, K., Avrithis, Y. (eds) Advances in Multimedia Modeling . MMM 2009. Lecture Notes in Computer Science, vol 5371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92892-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-92892-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92891-1
Online ISBN: 978-3-540-92892-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics