International Journal of Speech Technology

, Volume 14, Issue 3, pp 147–155

Robust features for multilingual acoustic modeling

Article

DOI: 10.1007/s10772-011-9092-6

Cite this article as:
Santhosh Kumar, C. & Mohandas, V.P. Int J Speech Technol (2011) 14: 147. doi:10.1007/s10772-011-9092-6
  • 85 Downloads

Abstract

In this paper, we propose a technique to derive robust features for multilingual acoustic modeling using hidden Markov model–Gaussian mixture models (HMM-GMM). We achieve this by discriminatively combining the phonetic contexts of the target languages (languages in the multilingual system). Phonetic context is captured using wide temporal context of the features, and the dimensionality of the resulting feature set is reduced to suit the HMM-GMM implementation using a neural network with a bottle-neck in one of the hidden layers. The output before the non-linearity at the bottle-neck layer of the neural network is the new feature. Since the features are optimized for the target languages in the multilingual recognizer, they are referred to as Target Languages Oriented Features (TLOF).

We perform our experiments for two of the most widely spoken Indian languages, Hindi and Tamil. TLOF offers significant performance improvements over both monolingual and multilingual phone recognizers using Mel frequency cepstral coefficients (MFCC). This emphasizes that TLOF can help share data across languages.

It was also seen that TLOF can enhance the performance of monolingual acoustic models, compared to systems using MFCC.

Keywords

Hidden Markov model (HMM)Neural networks (NN)Gaussian mixture models (GMM)MultilingualAcoustic modelingRobust featuresPhone recognitionSpeech recognition

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.ECE Department, Amrita School of EngineeringAmrita Vishwa VidyapeethamEttimadai, CoimbatoreIndia