Abstract
Speech intelligibility is the very essence of communications. When high noise can degrade a speech signal to the threshold of intelligibility, for example in mobile and military applications, introducing further degradation by a speech coder could prove critical. This paper investigates concepts towards a new speech coder that draws upon the field of image processing in a new multimedia approach. The coder is based on a spectrogram segmentation image processing procedure. The design criterion is for minimal intelligibility loss in high noise, as opposed to the conventional quality criterion, and the bit rate must be reasonable. First phase intelligibility listening test results assessing its potential alongside six standard coders are reported. Experimental results show the robustness of the LD-CELP coder, and the potential of the new coder with particularly good results in car noise conditions below -4.0dB.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Martin, R.: Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors. In: Proc. IEEE ICASSP, vol. 1, pp. 253–256 (2002)
Beerends, J.G.: Extending p.862 PESQ for assessing speech intelligibility. White contribution COM 12-C2 to ITU-T Study, Group 12 (October 2004)
Chong-White, N.R., Cox, R.V.: An intelligibility enhancement for the mixed excitation linear prediction speech coder. IEEE Signal Processing Letters 10(9), 263–266 (2003)
Martin, R., Malah, D., Cox, R.V., Accardi, A.J.: A noise reduction preprocessor for mobile voice communication. EURASIP Journal on Applied Signal Processing, 1046–1058 (2004)
Demiroglu, C., Anderson, D.V.: A soft decision MMSE amplitude estimator as a noise preprocessor to speech coders using a glottal sensor. In: Proc. ICSLP, pp. 857–860 (2004)
Quatieri, T.F., Brady, K., Messing, D., Campbell, J.P., Campbell, W.M., Brandstein, M.S., Clifford, C.J., Tardelli, J.D., Gatewood, P.D.: Exploiting nonacoustic sensors for speech encoding. IEEE Trans. on ASLP 14(2), 533–544 (2006)
Hu, Y., Loizou, P.C.: A comparative intelligibility study of speech enhancement algorithms. ICASSP 4(4), 561–564 (2007)
Liu, W.M.: Objective assessment of comparative intelligibility. PhD Thesis, University of Wales Swansea University (2008)
Supplee, L.N., Cohn, R.P., Collura, J.S., McCree, A.V.: MELP: The new federal standard at 2400 bps. In: Proc. ICASSP, vol. 2, pp. 1591–1594 (1997)
Hory, C., Martin, N.: Spectrogram segmentation by means of statistical features for non-stationary signal interpretation. IEEE Trans. on Signal Processing 50, 2915–2925 (2002)
Cox, R.V.: Three new speech coders from the ITU cover a range of applications. IEEE Communications Magazine, 40–47 (1997)
Gibson, J.D.: Adaptive prediction in speech differential encoding system. Proc. IEEE 68, 488–525 (1980)
Ekudden, E., Hagen, R., Johansson, I., Svedberg, J.: The adaptive multi-rate speech coder. In: Proc. IEEE Workshop on Speech Coding, pp. 117–119 (1999)
Chen, J.-H., Cox, R.V., Lin, Y.-C., Jayant, N., Melchner, M.J.: A low-delay CELP coder for the CCITT 16 kb/s speech coding standard. IEEE Selected Areas in Communications 10(5), 830–849 (1992)
Vary, P., Hellwig, K., Hofmann, R., Sluyter, R.J., Galand, C., Rosso, M.: Speech codec for the european mobile radio system. In: Proc. ICASSP, pp. 227–230 (1988)
Tremain, T.E.: The government standard linear predictive coding algorithm: LPC-10. In: Speech Technology, pp. 40–49 (1982)
Sun Microsystems. CCITT ADPCM encoder G.711, G.721, G.723, encode (14/04/2008), ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.gz
3GPP. European digital cellular telecommunication system 4750.. 12200 bits/s speech CODEC for adaptive multi-rate speech traffic channels, encoder, v6.0.0 (29/06/2008), http://www.3gpp.org/ftp/Specs/html-info/26073.htm
Zatsman, A., Concannon, M.: 16 kb/s low-delay CELP algorithm, ccelp, v2.0 (14/04/2008), ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/ldcelp-2.0.tar.gz
Jutta. ETSI 06.10 GSM-FR, toast, v1.8 (14/04/2008), http://kbs.cs.tu-berlin.de/~jutta/toast.html
Texas Instruments, Inc. 2.4 kb/s proposed federal standard MELP speech coder, melp, v1.2 (14/04/2008)
Fingerhut, A.: U.S. department of defence LPC-10 2400bps voice coder, nuke, v1.5 (14/04/2008), http://www.arl.wustl.edu/~jaf/lpc/
Liu, W.M., Jellyman, K.A., Mason, J.S., Evans, N.W.D.: Assessment of objective quality measures for speech intelligibility estimation. In: Proc. ICASSP (2006)
ITU recommendation P.56. Objective measurement of active speech level. ITU (1993)
Hirsch, H.G., Pearce, D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRW ASR2000 Automatic Speech Recognition: Challenges for the next Millenium (2000)
Liu, W.M., Jellyman, K.A., Evans, N.W.D., Mason, J.S.D.: Assessment of objective quality measures for speech intelligibility. Publication in ICSLP (accepted, 2008)
Romero Rodriguez, F., Liu, W.M., Evans, N.W.D., Mason, J.S.D.: Morphological filtering of speech spectrograms in the context of additive noise. In: Proc. Eurospeech (2003)
Evans, N.W.D.: Spectral subtraction for speech enhancement and automatic speech recognition. PhD Thesis, University of Wales Swansea (2003)
McAulay, R.J., Quatieri, T.F.: Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. ASSP 34(4), 744–754 (1986)
ImageMagick Studio LLC. Imagemagick, v6.3.0, http://www.imagemagick.org
Kuhn, M.: JBIG-KIT package, v1.6, http://www.cl.cam.ac.uk/~mgk25/jbigkit/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jellyman, K.A., Evans, N.W.D., Liu, W.M., Mason, J.S.D. (2009). Towards a New Image-Based Spectrogram Segmentation Speech Coder Optimised for Intelligibility. In: Huet, B., Smeaton, A., Mayer-Patel, K., Avrithis, Y. (eds) Advances in Multimedia Modeling . MMM 2009. Lecture Notes in Computer Science, vol 5371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92892-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-92892-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92891-1
Online ISBN: 978-3-540-92892-8
eBook Packages: Computer ScienceComputer Science (R0)