A speech quality measure is a valuable assessment tool for the development of speech coding and enhancing techniques. Commonly, two approaches, subjective and objective, are used for measuring the speech quality. Subjective measures are based on the perceptual ratings by a group of listeners while objective metrics assess speech quality using the extracted physical parameters. Objective metrics that correlate well with subjective ratings are attractive as they are less expensive to administer and give more consistent results. In this work, we investigated a novel non-intrusive speech quality metric based on adaptive neuro-fuzzy network techniques. In the proposed method, a first-order Sugeno type fuzzy inference system (FIS) is applied for objectively estimating the speech quality. The features required for the proposed method are extracted from the perceptual spectral density distribution of the input speech by using the co-occurrence matrix analysis technique. The performance of the proposed method was demonstrated through comparisons with the state-of-the-art non-intrusive quality evaluation standard, the ITU-T P.563.
Keywords
- Fuzzy Inference System
- Speech Quality
- Clean Speech
- Consequent Parameter
- Sugeno Type Fuzzy Inference System
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
ITU (1996) Methods for subjective determination of transmission quality. ITU-T P.800.
Quackenbush SR, Barnwell-III TP, and Clements MA (1988) Objective Measures of Speech Qaulity, Prentice-Hall, Englewood Cliffs, NJ.
Dimolitsas S (1989) Objective speech distortion measures and their relevance to speech quality assessments. IEE Proceedings - Communications, Speech and Vision, vol. 136, no. 5, pp. 317-324.
Rix A (2004) Perceptual speech quality assessment - a review. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Montreal, Canada, vol. 3, pp. 1056-1059.
Rix A, Beerends JG, Kim DS, Kroon P, and Ghitza O (2006) Objective assessment of speech and audio quality—technology and applications. IEEE Transactions on Audio, Speech and Language Processing, vol.14, no.6, pp. 1890-1901.
Wang S, Sekey A, and Gersho A (1992) An objective measure for predict-ing subjective quality of speech coders. IEEE Journal on selected areas in communications, vol. 10, no. 5, pp. 819-829.
Beerends JG and Stemerdink JA (1994) A perceptual speech-quality mea-sure based on a psychoacoustic sound representation. Journal of the Audio Engineering Society, vol. 42, no. 3, pp. 115-123.
Yang W, Benbouchta M, and Yantorno R (1998) Performance of the modified bark spectral distortion as an objective speech quality measure. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Washington, USA, vol. 1, pp. 541-544.
Voran S (1999) Objective estimation of perceived speech quality - part i. devel-opment of the measuring normalizing block technique. IEEE Transactions on speech and audio processing, vol. 7, no. 4, pp. 371-382.
Voran S (1999) Objective estimation of perceived speech quality - part ii. eval-uation of the measuring normalizing block technique. IEEE Transactions on speech and audio processing, vol. 7, no. 4, pp. 383-390.
ITU (2001) Perceptual evaluation of speech quality. ITU-T P.862.
Zha W and Chan WY (2004) A data mining approach to objective speech quality measurement. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Montreal, Canada, vol. 3, pp. 461-464.
Kates JM and Arehart KH (2005) A model of speech intelligibility and quality in hearing aids. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New York, USA, pp. 53-56.
Karmakar A, Kumar A, and Patney RK (2006) A multiresolution model of audi-tory excitation pattern and its application to objective evaluation of perceived speech quality. IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 6, pp. 1912-1923.
Chen G, Koh S, and Soon I (2003) Enhanced itakura measure incorporat-ing masking properties of human auditory system. Signal Processing, vol. 83, pp. 1445-1456.
Chen G, Parsa V, and Scollie S (2006) An erb loudness pattern based objective speech quality measure. In Proceedings of Iternational Conference on Spoken Language Processing, Pittsburg, USA, pp. 2174-2177.
Liang J and Kubichek R (1994) Output-based objective speech quality. In Pro-ceedings of IEEE 44th Vehicular Technology Conference, Stockholm, Sweden, vol. 3, pp. 1719-1723.
Jin C and Kubichek R (1996) Vector quantization techniques for output-based objective speech quality. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta, GA, vol. 1, pp. 491-494.
Picovici D and Mahdi AE (2003) Output-based objective speech quality measure using self-organizing map. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Hongkong, China, vol. 1, pp. 476-479.
Picovici D and Mahdi AE (2004) New output-based perceptual measure for predicting subjective quality of speech. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Montreal, Canada, vol. 5, pp. 633-636.
Falk T, Xu Q, and Chan WY (2005) Non-intrusive gmm-based speech quality measurement. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Philadelphia, USA.
Falk T and Chan WY (2006) Nonintrusive speech quality estimation using gaussian mixture models. IEEE Signal Processing Letters, vol.13, no.2, pp. 108-111.
Falk T and Chan WY (2006) Single-ended speech quality measurement using machine learning methods. IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 6, pp. 1935-1947.
Falk T and Chan WY (2006) Enhanced non-intrusive speech quality mea-surement using degradation models. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France, vol. 1, pp. 837-840.
Nielsen LB (1993) Objective scaling of sound quality for normal-hearing and hearing-impaired listerners. Tech. Rep. No. 54, The acoustics laboratory, Technical University of Denmark, Denmark.
Gray P, Hollier MP, and Massara RE (2000) Non-intrusive speech quality assess-ment using vocal-tract models. IEE Proceedings - Vision, Image and Signal Processing, vol. 147, no. 6, pp. 493-501.
Kim DS and Tarraf A (2004), Perceptual model for non-intrusive speech quality assessment. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Montreal, Canada, vol. 3, pp. 1060-1063.
Kim DS (2004) A cue for objective speech quality estimation in temporal envelope representations. IEEE Signal Processing Letters, vol. 1, no. 10, pp. 849-852.
Kim DS (2005) Anique: An auditory model for single-ended speech quality estimation. IEEE Transactions on Speech and Audio Processing, vol. 13, no. 4, pp. 1-11.
Kim DS and Tarraf A (2006) Enhanced perceptual model for non-intrusive speech quality assessment. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France, vol. 1, pp. 829-832.
Chen G and Parsa V (2004) Output-based speech quality evaluation by mea-suring perceptual spectral density distribution. IEE Electronics Letter, vol. 40, no. 12, pp. 783-784.
Chen G and Parsa V (2004) Neuro-fuzzy estimator of speech quality. In Proceedings of International Conference on signal processing and communications (SPCOM), Bangalore, India, pp. 587-591.
Chen G and Parsa V (2005) Non-intrusive speech quality evaluation using an adaptive neurofuzzy inference system. IEEE Signal Processing Letters, vol. 12, no. 5, pp. 403-406.
Chen G and Parsa V (2005) Bayesian model based non-intrusive speech quality evaluation. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Philadelphia, USA, vol. 1, pp. 385-388.
ITU (2004) Single ended method for objective speech quality assessment in narrow-band telephony applicaitons. ITU-T P.563.
Ding L, Radwan A, El-Hennawey MS, and Goubran RA (2006) Measurement of the effects of temporal clipping on speech quality. IEEE Transactions on Instrumentation and Measurement, vol. 55, no. 4, pp. 1197-1203.
Grancharov V, Zhao DY, Lindblom J, and Kleijn WB (2006) Low-complexity, nonintrusive speech quality assessment. IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 6, pp. 1948-1956.
Jang JS (1993) Anfis: adaptive-network-based fuzzy inference systems. IEEE Transactions on System, Man, and Cybernetics, vol. 23, no. 3, pp. 665-685.
Jang JS and Sun CT (1995) Neuro-fuzzy modeling and control. The Proceedings of the IEEE, vol. 83, no. 3, pp. 378-406.
Jang JS, Sun CT, and Mizutani E (1997) Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence, Prentice-Hall, Englewood Cliffs, NJ.
Sugeno M and Kang GT (1988) Structure identificaiton of fuzzy model. Fuzzy Sets and Systems, vol. 28, pp. 15-33.
Takagi T and Sugeno M (1985) Fuzzy identification of systems and its appli-cation to modelling and control. IEEE Transactions on Systems, Man, and Cybernetics, vol. 15, pp. 116-132.
Haralick RM, Shanmugan K, and Dinstein IH (1973) Textural features for image classification. IEEE Transactions on System, Man, and Cybernetics, vol. SMC-3, pp. 610-621.
Haralick RM (1979) Statistical and structural approaches to texture. Proceedings of IEEE, vol. 67, pp. 786-804.
Terzopoulos D(1985) Co-occurrence analysis of speech waveforms. IEEE Transactions on acoustics, speech and signal processing, vol. ASSP-33, no. 1, pp. 5-30.
ITU (1998) ITU-T coded-speech database. ITU-T P-series Supplement 23.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Chen, G., Parsa, V. (2008). Objective Speech Quality Evaluation Using an Adaptive Neuro-Fuzzy Network. In: Prasad, B., Prasanna, S.R.M. (eds) Speech, Audio, Image and Biomedical Signal Processing using Neural Networks. Studies in Computational Intelligence, vol 83. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75398-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-75398-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75397-1
Online ISBN: 978-3-540-75398-8
eBook Packages: EngineeringEngineering (R0)