, Volume 54, Issue 12, pp 2481-2491
Date: 03 Dec 2011

A flexible framework for HMM based noise robust speech recognition using generalized parametric space polynomial regression

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Handling variable, non-stationary ambient noise is a challenging task for automatic speech recognition (ASR) systems. To address this issue, multi-style, noise condition independent (CI) model training using speech data collected in diverse noise environments, or uncertainty decoding techniques can be used. An alternative approach is to explicitly approximate the continuous trajectory of Gaussian component mean and variance parameters against the varying noise level, for example, using variable parameter hidden Markov model (VPHMM). This paper investigates a more generalized form of variable parameter HMMs (GVP-HMM). In addition to Gaussian component means and variances, it can also provide a more compact trajectory modeling for tied linear transformations. An alternative noise condition dependent (CD) training algorithm is also proposed to handle the bias to training noise condition distribution. Consistent error rate gains were obtained over conventional VP-HMM mean and variance only trajectory modeling on a media vocabulary Mandarin Chinese in-car navigation command recognition task.

CHENG Ning was born in 1981. He received the Ph.D. degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences, Beijing, China in 2009. Currently, he is a postdoctoral researcher at Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. His research interests include robust speech recognition, speech enhancement and microphone array.
WANG Lan is a Professor of Shen-Zhen Institutes of Advanced Technology, Chinese Academy of Sciences. She received her M.S. degree in the Center of Information Science, Peking University. She obtained her Ph.D. degree from the Machine Intelligence Laboratory of Cambridge University Engineering Department in 2006, and then worked as a research associate in CUED. Her research interests are large vocabulary continuous speech recognition, speech visualization and audio information indexing.
LIU XunYing was born in 1978. He received the Ph.D. degree in speech recognition in 2006 and MPhil degree in computer speech and language processing in 2001 both from University of Cambridge, prior to a bachelor’s degree from Shanghai Jiao Tong University in 2000. He is currently a Senior Research Associate at the Machine Intelligence Laboratory of the Cambridge University Engineering Department. He is the lead researcher on the EPSRC funded Natural Speech Technology and the DARPA funded Broad Operational Language Translation Programs at Cambridge. He was the recipient of best paper award at ISCA Interspeech2010. His current research interests include large vocabulary continuous speech recognition, language modelling and adaptation, weighted finite state transducers, factored acoustic modelling, noise robust speech recognition and statistical machine translation. Dr. Liu Xunying is a member of IEEE and ISCA.