Abstract
Speech systems work reasonably well under homogeneous acoustic environmental conditions but become fragile in practical applications involving real-world environments (e.g., in-car, broadcast news, digital archives, etc.) where the audio stream contains multi-environment characteristics. To date, most approaches dealing with environmental noise in speech systems are based on assumptions concerning the noise, rather than exploring and characterizing the nature of the noise. In this chapter, we present our recent advances in the formulation and development of an in-vehicle environmental sniffing framework previously presented in [1,2,3,4]. The system is comprised of different components to detect, classify and track acoustic environmental conditions. The first goal of the framework is to seek out detailed information about the environmental characteristics instead of just detecting environmental change points. The second goal is to organize this knowledge in an effective manner to allow intelligent decisions to direct subsequent speech processing systems. After presenting our proposed in-vehicle environmental sniffing framework, we consider future directions and present discussion on supervised versus unsupervised noise clustering, and closed-set versus open-set noise classification.
This work was supported in part by DARPA through SPA WAR under Grant No. N66001-002-8906, from SPA WAR under Grant No. N66001-03-1-8905, m part by NSF under Cooperative Agreement No. IIS-9817485.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Akbacak, J. H. L. Hansen, “Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems,” IEEE ICASSP-2003: International Conference Acoustics Speech & Signal Processing, vol. 2, pp. 113–116, Hong Kong, April 2003..
M. Akbacak, J. H. L. Hansen, “Environmental Sniffing: Robust Digit Recognition for an In-Vehicle Environment”, Interspeech-Eurospeech-2003, pp.2177–2180, Geneva, Switzerland, September 2003.
M. Akbacak, J. H. L. Hansen, “Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems”, IEEE Trans. Speech & Audio Proc, October 2005.
J. H. L. Hansen, X. Zhang, M. Akbacak, U. Yapanel, B. Pellom, W. Ward, Chapter 2, DSP in Mobile and Vehicle Systems, H. Abut, J.H.L. Hansen and K. Takeda (Editors) Springer, 2005.
R. Bakis, S. Sehen, P. Gopalakrishnan, R. Gopinath, S. Maes, and L. Polymenakos, “Transcription of Broadcast News — System Robustness Issues and Adaptation Techniques”, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 711–714, April 1997.
U. Jain, M. A. Siegler, S. J. Doh, E. Gouvea, J. Huerta, P. J. Moreno, B. Raj, and R. M. Stern, “Recognition of Continuous Broadcast News with Multiple Unknown Speakers and Environments”, Proceedings of the ARPA Workshop on Speech Recognition Technology, pp. 61–66, February 1996.
R. Bakis, S. Chen, P. Gopalakrishnan, R. Gopinath, S. Maes, L. Polymenakos, and M. Franz, “Transcription of Broadcast News Shows with the IBM Large Vocabulary Speech Recognition System”, Proceedings of DARPA Speech Recognition Workshop, pp. 67–72, February 1997.
M. A. Siegler, U. Jain, B. Raj, and R. M. Stern, “Automatic Segmentation, Classification and Clustering of Broadcast News Audio”, Proceedings of DARPA Speech Recognition Workshop, pp. 97–99, February 1997.
J. S. Lim, “Speech Enhancement”, Prentice Hall, Englewood Cliffs, NJ, 1983.
J. H. L. Hansen, Speech Enhancement, Encyclopedia of Electrical and Electronics Engineering, John Wiley & Sons, vol. 20, pp. 159–175, 1999.
J. G. Fiscus, “A Post Processing System to yield reduced error rates: Recognizer Output Voting Error Reduction (ROVER)”, IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 347–54, 1997.
S. Chen and P. S. Gopalakrishnan, “Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion”, Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 127–132, February 1998.
B. Zhou and J. H X. Hansen, “Unsupervised Audio Stream Segmentation and Clustering via the Bayesian Information Criterion”, Proc. of Inter. Conf. on Spoken Language Processing ICSLP-2000, vol. 3, pp. 714–717, October 2000.
G. Zhou, J. H. L. Hansen, and J. F. Kaiser, “Nonlinear Feature Based Classification of Speech under Stress”, IEEE Trans, on Speech & Audio Processing, vol. 9, no. 2, pp. 201–216, March 2001.
Y. Gong, “Speech Recognition in Noisy Environments: A Survey, Speech Communication, vol. 16, pp. 261–91, 1995.
C. J. Leggetter and P. C. Woodland, “Maximum Likelihood Linear Regression for speaker adaptation of continuous density hidden Markov models”, Computer Speech and Language, vol. 9, no. 2, pp. 171–185, April, 1995.
[17] M. Gales and S. Young, “Robust Continuous Speech Recognition using Parallel Model Combination”, IEEE Transactions on Speech and Audio Processing, vol. 4, pp. 352–359, September 1996.
H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech”, Journal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738–1752, 1990.
R. Sarikaya and J. H. L. Hansen, “High Resolution Speech Feature Parameterization for Monophone Based Stressed Speech Recognition”, IEEE Signal Processing Letters, vol. 7, no. 7, pp. 182–185, July 2000.
R. Sarikaya and J. H. L. Hansen, “Robust detection of Speech Activity in the Presence of Noise”, International Conference on Spoken Language Processing, vol. 4, pp. 1455–1458, December 1998.
[21] P. Angkititrakul, J. H. L. Hansen, S. Baghaii, “Cluster-dependent Modeling and Confidence Measure Processing for In-Set/Out-of-Set Speaker Identification”, Interspeech-2004/ICSLP-2004: Inter. Conf. Spoken Language Processing, Jeju Island, South Korea, October 2004.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Akbacak, M., Hansen, J.H.L. (2007). Advances in Acoustic Noise Tracking for Robust In-Vehicle Speech Systems. In: Abut, H., Hansen, J.H.L., Takeda, K. (eds) Advances for In-Vehicle and Mobile Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-45976-9_10
Download citation
DOI: https://doi.org/10.1007/978-0-387-45976-9_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-33503-2
Online ISBN: 978-0-387-45976-9
eBook Packages: EngineeringEngineering (R0)