Spoken Language Interfaces for Embedded Applications

  • Dragos Burileanu

Abstract

Speech-enabled interfaces have been increasingly appearing in small devices, such as cellular phones, PDAs, car kits, and various other consumer electronics products, resulting is what is now being called “embedded speech.” The new generation of small-scale computing devices has severe resource constraints, notably low CPU resources and small memory footprints. This makes the design and efficient implementation of speech interfaces for these devices a challenging task. This chapter discusses first the evolution of spoken language interfaces and evaluates their potential benefits for embedded applications. The basic requirements for these kinds of interfaces and the inherent restrictions imposed by low-resource systems are investigated. Then, the chapter analyzes current theoretical and practical solutions in adapting speech recognition and synthesis technologies to portable electronic devices. As a concrete example, implementation issues in developing an optimized embedded version of a complete text-to-speech synthesis system are described.

Keywords

spoken language interfaces embedded speech portable devices automatic speech recognition text-to-speech synthesis multimodality 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Astrov, S., Bauer, J. G., & Stan, S. (2003). High performance speaker and vocabulary independent ASR technology for mobile phones. Proceedings of ICASSP 2003 (p. 2.281-2.284). IEEE.Google Scholar
  2. Bi, N., Garudadri, H., Chang, C., DeJaco, A., Qi, Y., Malayath, N., & Huang, W. (2002). A robust speech recognition system embedded in CDMA cellular phone chipsets. Proceedings of ICASSP 2002 (pp. 4.3804-4.3807). IEEE.Google Scholar
  3. Boite, R., Bourlard, H., Dutoit, T., Hancq, J., & Leich, H. (2000). Traitement de la parole. Lausanne: Presses Polytechniques et Universitaires Romandes.Google Scholar
  4. Burileanu, D. (2002). Basic research and implementation decisions for a text-to-speech synthesis system in Romanian. International Journal of Speech Technology, 5(3), 211-225.CrossRefMATHGoogle Scholar
  5. Burileanu, D., Fecioru, A., & Ion, D. (2003a). On automatic speech synthesis for spoken language interfaces. In C. Burileanu (Ed.), Speech technology and human-computer dialogue (pp. 127-138). Bucharest: Publishing House of the Romanian Academy.Google Scholar
  6. Burileanu, D., Sima, M., Negrescu, C., & Croitoru, V. (2003b). Robust recognition of small-vocabulary telephone-quality speech. In C. Burileanu (Ed.), Speech technology and human-computer dialogue (pp. 145-154). Bucharest: Publishing House of the Romanian Academy.Google Scholar
  7. Burileanu, D., Fecioru, A., Ion, D., Stoica, M., & Ilas, C. (2004). An optimized TTS system implementation using a Motorola StarCore SC140-based processor”. Proceedings of ICASSP 2004 (pp. 5.317-5.320). IEEE.Google Scholar
  8. Comerford, L., Frank, D., Gopalakrishnan, P., Gopinath, R., & Sedivy, J. (2001). The IBM personal speech assistant. Proceedings of ICASSP 2001 (pp. 1.1-1.4). IEEE.Google Scholar
  9. Cornu, E., Destrez, N., Dufaux, A., Sheikhzadeh, H., and Brennan, R. (2002). An ultra low power, ultra miniature voice command system based on hidden Markov models. Proceedings of ICASSP 2002 (pp. 4.3800-4.3803). IEEE.Google Scholar
  10. Deligne, S., Dharanipragada, S., Gopinath, R., Maison, B., Olsen, P., & Printz, H. (2002). A robust high accuracy speech recognition system for mobile applications. IEEE Transactions on Speech and Audio Processing, 10(8), 551-561.CrossRefGoogle Scholar
  11. Deng, L., Wang. K., Acero, A., Hon, H.-W., Droppo, J., Boulis, C., Wang, Y.-Y., Jacoby, D., Mahajan, M., Chelba, C., & Huang, X. (2002). Distributed speech processing in MiPad’s multimodal user interface. IEEE Transactions on Speech and Audio Processing, 10(8), 605-619.CrossRefGoogle Scholar
  12. European Telecommunications Standards Institute (2002). Speech processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms (ETSI ES 202 050, v1.1.1).Google Scholar
  13. Gong, L., & Lai, J. (2003). To mix or not to mix synthetic speech and human speech? International Journal of Speech Technology, 6(2), 123-132.CrossRefGoogle Scholar
  14. Gong, Y., & Kao, Y.-H. (2000). Implementing a high accuracy speaker-independent continuous speech recognizer on a fixed-point DSP. Proceedings of ICASSP 2000 (pp. 3686-3689). IEEE.Google Scholar
  15. Hickey, M., & Brittan, P. (2001). Lessons from the development of a conversational interface. Proceedings of EUROSPEECH’2001 (pp. 2.1295-2.1298). ESCA.Google Scholar
  16. Hoffmann, R., Jokisch, O., Hirschfeld, D., Strecha, G., Kruschke, H., Kordon, U., & Koloska, U. (2003). A multilingual TTS system with less than 1 Mbyte footprint for embedded applications. Proceedings of ICASSP 2003 (pp. 1.532-1.535). IEEE.Google Scholar
  17. Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Upper Saddle River, NJ: Prentice Hall.Google Scholar
  18. Keller, E. (2002). Towards greater naturalness: Future directions of research in speech synthesis. In E. Keller, G. Bailly, A. Monaghan, J. Terken, & M. Huckvale (Eds.), Improvements in speech synthesis (pp. 3-17). Chichester: John Wiley & Sons, Ltd.Google Scholar
  19. Lévi, C., Linarès, G., Nocera, P., & Bonastre, J.-F. (2004). Reducing computational and memory cost for cellular phone embedded speech recognition systems. Proceedings of ICASSP 2004 (pp. 5.309-5.312). IEEE.Google Scholar
  20. Li, X., Malkin, J., & Bilmes, J. (2004). Codebook design for ASR systems using custom arithmetic units. Proceedings of ICASSP 2004 (pp. 1.845-1.848). IEEE.Google Scholar
  21. Malkin, J., Li, X., & Bilmes, J. (2004). Custom arithmetic for high-speed, low-resource ASR systems. Proceedings of ICASSP 2004 (pp. 5.305-5.308). IEEEGoogle Scholar
  22. Mark, W. (1999). Turning pervasive computing into mediated spaces. IBM Systems Journal, 38(4), 677-692.MathSciNetCrossRefGoogle Scholar
  23. Möbius, B. (2003). Rare events and closed domains: two delicate concepts in speech synthesis. International Journal of Speech Technology, 6(1), 57-71.CrossRefMATHGoogle Scholar
  24. Monaghan, A., Kassaei, M., Luckin, M., Amador-Hernandez, M., Lowry, A., Faulkner, D., & Sannier, F. (2001). Multilingual TTS for computer telephony: The Aculab approach. Proceedings of EUROSPEECH’2001 (pp. 1.513-1.516).ESCA.Google Scholar
  25. Motorola, Inc. (2001, November). SC140 DSP Core: Reference Manual, Rev. 3, MNSC140CORE/D.Google Scholar
  26. Motorola, Inc. (2002, May). MSC8101: Reference Manual, Rev. 2, MSC8101RM/D.Google Scholar
  27. Novak, M., Hampl, R., Krbec, P., Bergl, V., & Sedivy, J. (2003). Two-pass search strategy for large list recognition on embedded speech recognition platforms. Proceedings of ICASSP 2003 (pp. 1.200-1.203). IEEE.Google Scholar
  28. Pieraccini, R., Levin, E., & Eckert, W. (1998). Spoken language dialogue: Architectures and algorithms. Proceedings of the XXIIème Journées d’Etudes sur la Parole, Martigny, Suisse, pp. 387-395.Google Scholar
  29. Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  30. Rouillard, J., & Caelen, J. (1999). Multimodal information seeking dialogues on the World Wide Web. Proceedings of EUROSPEECH’99 (pp. 6.2151-6.2154). ESCA.Google Scholar
  31. Sheikhzadeh, H., Cornu, E., Brennan, R., & Schneider, T. (2002). Real-time speech synthesis on an ultra low-resource, programmable DSP system. Proceedings of ICASSP 2002 (pp. 1.433-1.436). IEEE.Google Scholar
  32. Varga, I., Aalburg, S., Andrassy, B., Astrov, S., Bauer, J. G., Beaugeant, C., GeiÖler, C., & Höge, H. (2002). ASR in mobile phones - an industrial approach. IEEE Transactions on Speech and Audio Processing, 10(8), 562-569.CrossRefGoogle Scholar
  33. Vasilache, M., Iso-Sipilä, J., & Viikki, O. (2004). On a practical design of a low complexity speech recognition engine. Proceedings of ICASSP 2004 (pp. 5.113-5.116). IEEE.Google Scholar
  34. Wang, D., Zhang, L., Liu, J., & Liu, R. (2004). Embedded speech recognition system on 8-bit MCU core. Proceedings of ICASSP 2004 (pp. 5.301-5.304). IEEE.Google Scholar
  35. Wouters, J., & Macon, M. W. (2001). Control of spectral dynamics in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(1), 30-38.CrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, LLC 2008

Authors and Affiliations

  • Dragos Burileanu
    • 1
  1. 1.Speech Technology and Human-Computer Dialogue Laboratory“Politehnica” University of BucharestBucharestRomania

Personalised recommendations