Cognitive Neurodynamics

, 5:253

Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting

  • Martin Wöllmer
  • Erik Marchi
  • Stefano Squartini
  • Björn Schuller
Research Article

DOI: 10.1007/s11571-011-9166-9

Cite this article as:
Wöllmer, M., Marchi, E., Squartini, S. et al. Cogn Neurodyn (2011) 5: 253. doi:10.1007/s11571-011-9166-9

Abstract

Highly spontaneous, conversational, and potentially emotional and noisy speech is known to be a challenge for today’s automatic speech recognition (ASR) systems, which highlights the need for advanced algorithms that improve speech features and models. Histogram Equalization is an efficient method to reduce the mismatch between clean and noisy conditions by normalizing all moments of the probability distribution of the feature vector components. In this article, we propose to combine histogram equalization and multi-condition training for robust keyword detection in noisy speech. To better cope with conversational speaking styles, we show how contextual information can be effectively exploited in a multi-stream ASR framework that dynamically models context-sensitive phoneme estimates generated by a long short-term memory neural network. The proposed techniques are evaluated on the SEMAINE database—a corpus containing emotionally colored conversations with a cognitive system for “Sensitive Artificial Listening”.

Keywords

Long short-term memory Neural networks Histogram equalization Keyword spotting Cognitive agents 

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Martin Wöllmer
    • 1
  • Erik Marchi
    • 2
  • Stefano Squartini
    • 2
  • Björn Schuller
    • 1
  1. 1.Institute for Human-Machine CommunicationTechnische Universität MünchenMünchenGermany
  2. 2.3MediaLabs, A3LAB, DIBET, Dipartimento di Ingegneria Biomedica, Elettronica e TelecomunicazioniUniversità Politecnica delle MarcheAnconaItaly