Advertisement

A Study of Phoneme and Grapheme Based Context-Dependent ASR Systems

  • John Dines
  • Mathew Magimai Doss
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4892)

Abstract

In this paper we present a study of automatic speech recognition systems using context-dependent phonemes and graphemes as sub-word units based on the conventional HMM/GMM system as well as tandem system. Experimental studies conducted on three different continuous speech recognition tasks show that systems using only context-dependent graphemes can yield competitive performance on small to medium vocabulary tasks when compared to a context-dependent phoneme-based automatic speech recognition system. In particular, we demonstrate the utility of tandem features that use an MLP trained to estimate phoneme posterior probabilities in improving grapheme based recognition system performance by implicitly incorporating phonemic knowledge into the system without having to define a phonetically transcribed lexicon.

Keywords

Speech Recognition Automatic Speech Recognition Automatic Speech Recognition System Continuous Speech Recognition Speak Language Processing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kanthak, S., Ney, H.: Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition. In: Proceedings of Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), pp. 845–848 (2002)Google Scholar
  2. 2.
    Killer, M., Stüker, S., Schultz, T.: Grapheme based speech recognition. In: Proceedings of Eurospeech, pp. 3141–3144 (2003)Google Scholar
  3. 3.
    Magimai.-Doss, M., Stephenson, T.A., Bourlard, H., Bengio, S.: Phoneme-Grapheme based automatic speech recognition system. In: Proceedings of Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 94–98 (2003)Google Scholar
  4. 4.
    Schukat-Talamazzini, E.G., Niemann, H., Eckert, W., Kuhn, T., Rieck, S.: Automatic speech recognition without phonemes. In: Eurospeech, pp. 129–132 (1993)Google Scholar
  5. 5.
    Magimai.-Doss, M., Bengio, S., Bourlard, H.: Joint decoding for phoneme-grapheme continuous speech recognition. In: ICASSP. Proceedings of Int. Conf. Acoustics, Speech and Signal Processing, pp. I–177–I–180 (2004)Google Scholar
  6. 6.
    Hermansky, H.: Perceptual Linear Predictive (PLP) analysis of speech. Journal of Acoustical Society of America 87(4), 1738–1752 (1990)CrossRefGoogle Scholar
  7. 7.
    Hermansky, H., Ellis, D., Sharma, S.: Tandem connectionist feature stream extraction for conventional HMM systems. In: ICASSP. Proceedings of Int. Conf. Acoustics, Speech and Signal Processing, pp. III–1635–1638 (2000)Google Scholar
  8. 8.
    Cole, R.A., Fanty, M., Noel, M., Lander, T.: Telephone speech corpus development at CSLU. In: ICSLP 1994. Proceedings of Int. Conf. Spoken Language Processing (1994)Google Scholar
  9. 9.
    Price, P.J., Fisher, W., Bernstein, J.: A database for continuous speech recognition in a 1000 word domain. In: ICASSP 1988. Proceedings of Int. Conf. Acoustics, Speech and Signal Processing, vol. 1, pp. 651–654 (1988)Google Scholar
  10. 10.
    Chen, B., Çetin, Ö., Doddington, G., Morgan, N., Ostendorf, M., Shinozaki, T., Zhu, Q.: A CTS task for meaningful fast-turnaround experiments. In: Proceedings of Rich Transcription Fall Workshop, Palisades, NY (2004)Google Scholar
  11. 11.
    Black, A.W., Lenzo, K., Pagel, V.: Issues in building general letter to sound rules. In: Proceedings of 3rd ESCA Workshop on Speech Synthesis, Jenolan Caves, Australia, pp. 77–80 (1998)Google Scholar
  12. 12.
    Odell, J.J.: The use of context in large vocabulary continuous speech recognition. PhD thesis, Queens College, University of Cambridge (1995)Google Scholar
  13. 13.
    Ciprian, C., Morton, R.: Mutual information phone clustering for decision tree induction. In: ICSLP 2002. Proceedings of Int. Conf. Spoken Language Processing, Denver, Collorado (2002)Google Scholar
  14. 14.
    Zhu, Q., Chen, B., Morgan, N., Stolcke, A.: On using MLP features in lvcsr. In: ICSLP 2004. Proceedings of Int. Conf. Spoken Language Processing, Korea (2004)Google Scholar
  15. 15.
    Ikbal, S., Misra, H., Sivadas, S., Hermansky, H., Bourlard, H.: Entropy based combination of tandem representations for robust speech recognition. In: ICSLP 2004. Proceedings of Int. Conf. Spoken Language Processing, Korea (2004)Google Scholar
  16. 16.
    Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: Hidden Markov model toolkit V3.2.1 reference manual. Technical report, Speech group, Engineering Department, Cambridge University, UK (2002)Google Scholar
  17. 17.
    Mirghafori, N., Morgan, N.: Combining connectionist multi-band and full-band probability streams for speech recognition of natural numbers. In: Proceedings of Int. Conf. Spoken Language Processing, pp. 743–746 (1998)Google Scholar
  18. 18.
    Stolcke, A., Grézl, F., Hwang, M.Y., Lei, X., Morgan, N., Vergyri, D.: Cross-domain and cross-language portability of acoustic features estimated by multilayer perceptrons. In: ICASSP 2006. Proceedings of Int. Conf. on Acoustics, Speech and Signal Processing, Toulouse, France (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • John Dines
    • 1
  • Mathew Magimai Doss
    • 1
    • 2
  1. 1.IDIAP Research InstituteMartignySwitzerland
  2. 2.International Computer Science InstituteBerkeley 

Personalised recommendations