Automatic Speech Recognition on Mobile Devices and over Communication Networks

Part of the series Advances in Pattern Recognition pp 107-130

Speech Feature Extraction and Reconstruction

  • Ben MilnerAffiliated withSchool of Computing Sciences, University of East Anglia

* Final gross prices may vary according to local VAT.

Get Access

This chapter is concerned with feature extraction and back-end speech reconstruction and is particularly aimed at distributed speech recognition (DSR) and the work carried out by the ETSI Aurora group. Feature extraction is examined first and begins with a basic implementation of mel-frequency cepstral coefficients (MFCCs). Additional processing, in the form of noise and channel compensation, is explained and has the aim of increasing speech recognition accuracy in real-world environments. Source and channel coding issues relevant to DSR are also briefly discussed. Back-end speech reconstruction using a sinusoidal model is explained and it is shown how this is possible by transmitting additional source information (voicing and fundamental frequency) from the terminal device. An alternative method of back-end speech reconstruction is then explained, where the voicing and fundamental frequency are predicted from the received MFCC vectors. This enables speech to be reconstructed solely from the MFCC vector stream and requires no explicit voicing and fundamental frequency transmission.