Skip to main content

A Configurable Logic Based Architecture for Real-Time Continuous Speech Recognition Using Hidden Markov Models

  • Chapter
Field-Programmable Custom Computing Technology: Architectures, Tools, and Applications

Abstract

An architecture is presented for real-time continuous speech recognition based on a modified hidden Markov model. The algorithm is adapted to the needs of continuous speech recognition by efficient encoding of the state space, and logarithmic encoding of the weights so that products can be computed as sums. The paper presents the algorithm and its application related modifications, the mapping of the algorithm to a special purpose architecture, and the detailed design of this architecture using configurable logic. Emphasis is given on how the attributes of the algorithm are exploited in a configurable logic based design. A concrete design example is presented with a coprocessor engine having one large FPGA, 64 Mbytes of synchronous DRAM (SDRAM), a small FPGA as a SDRAM controller, and 2 Mbytes SRAM. This engine operating at 66 MHz performs roughly nine times as fast as a high end personal computer running a fully optimized version of the same algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Speech Technology is the Next Big Thing in Computing. Business Week, February 23, 1998.

    Google Scholar 

  2. H. Murveit, “An Integrated Circuit Based Speech Recognition System,” Ph.D. Thesis, Department of Electrical Engineering, University of California at Berkeley, 1983.

    Google Scholar 

  3. S. Narayanaswamy, “Pen and Speech Recognition in the User Interface for Mobile Multimedia Terminals,” Ph.D. Thesis, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, 1996.

    Google Scholar 

  4. S. Hauck, “The Roles of FPGA’s in Reprogrammable Systems,” Proceedings of the IEEE, April 1998, pp. 615–637.

    Google Scholar 

  5. H. Schmit and D. Thomas, “Hidden Markov and Fuzzy Controllers in FPGAs,” Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, Computer Society Press, 1995, pp. 214–221.

    Google Scholar 

  6. D. Yeh, G. Feygin, and P. Chow, “RACER: A Reconfigurable Constraint-Length 14 Viterbi Decoder,” Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, Computer Society Press, 1996, pp. 60–69.

    Google Scholar 

  7. S.J. Young, “A Review of Large-Vocabulary Continuous-Speech Recognition,” IEEE Signal Processing Magazine, September 1996, pp. 45–57.

    Google Scholar 

  8. S.B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Trans. Acoustics Speech and Signal Processing, vol. ASSP-28, no. 4, 1980, pp. 357–366.

    Article  Google Scholar 

  9. V. Digalakis, P. Monaco, and H. Murveit, “Genones: Generalized Mixture Tying in Continuous Hidden Markov Model-Based Speech Recognizers,” IEEE Trans. Speech Audio Processing, 1996, pp. 281–289.

    Google Scholar 

  10. F. Jelinek, Statistical Methods for Speech Recognition, MIT Press, 1997.

    Google Scholar 

  11. V. Digalakis, L. Neumeyer, and M. Perakakis, “Quantization of Cepstral Parameters for Speech Recognition Over the WWW,” IEEE Journal on Selected Areas in Communications, 1999, pp. 82–90.

    Google Scholar 

  12. S. Tsakalidis, V. Digalakis, and L. Neumeyer, “Efficient Speech Recognition Using Subvector Quantization and Discrete-Mixture HMMs,” IEEE International Conference on Acoustics, Speech and Signal Processing, submitted.

    Google Scholar 

  13. P. Price, “Evaluation of Spoken Language Systems: The ATIS Domain,” Proceedings of the Third DARPA Speech and Natural Language Workshop, Morgan Kaufmann, June 1990.

    Google Scholar 

  14. L. Moll and M. Shand, “Systems Performance Measurement on PCI Pamette,” Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, Computer Society Press, 1997, pp. 125–133.

    Google Scholar 

  15. Intel Corporation, PC SDRAM Specification ver. 1.51, 1997.

    Google Scholar 

  16. Digital Equipment Corporation, PCI Pamette user-area Interface for Firmware v1.9, 1997.

    Google Scholar 

  17. J.F. Wakerly, Digital Design Principles and Practices, Prentice Hall, 1990.

    Google Scholar 

  18. R.H. Katz, Contemporary Logic Design, The Benjamin/ Cummings Publishing Com., 1994.

    Google Scholar 

  19. Altera Corporation, ALTERA MAX+PLUS II VHDL, Altera Corporation 101 Innovation Drive, San Jose, CA, USA, 1996.

    Google Scholar 

  20. Altera Corporation, ALTERA Data Book 1998, Altera Corporation 101 Innovation Drive, San Jose, CA, USA, 1998.

    Google Scholar 

  21. J.M. Berge, A. Fonkoua, S. Maginot, and J. Rouillard, VHDL Designer’s Reference, Kluwer Academic Publishers, 1992.

    Google Scholar 

  22. Altera Corporation, FLEX 10KE Embedded Programmable Logic Family, Altera Corporation 101 Innovation Drive, San Jose, CA, USA, 1998.

    Google Scholar 

  23. IBM Corporation, 168 Pin SDRAM Registered DIMM Functional Description & Timing Diagrams, 1998.

    Google Scholar 

  24. IBM Corporation, 8M × 64/72 Bank Unbuffered SDRAM Module, 1998.

    Google Scholar 

  25. SAMSUNG Electronics, KM68257C 32K×8 Bit High Speed Static RAM(5V Operating), February 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer Science+Business Media New York

About this chapter

Cite this chapter

Stogiannos, P., Dollas, A., Digalakis, V. (2000). A Configurable Logic Based Architecture for Real-Time Continuous Speech Recognition Using Hidden Markov Models. In: Arnold, J., Luk, W., Pocek, K. (eds) Field-Programmable Custom Computing Technology: Architectures, Tools, and Applications. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-4417-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-4417-3_7

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-6988-2

  • Online ISBN: 978-1-4615-4417-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics