Skip to main content

Part of the book series: Advances in Pattern Recognition ((ACVPR))

We examine architectures for mobile speech applications. These use speech engines for synthesizing audio output and for recognizing audio input; a key architectural decision is whether to embed these speech engines on the mobile device or to locate them in the network. While both approaches have advantages, our focus here is on networked speech application architectures. Because user experience with speech is greatly improved when the speech modality is coupled with a visual modality, mobile speech applications will increasingly tend to be multimodal, so speech architectures therefore must support multimodal user interaction. Good architectures must reflect commercial reality and be economical, efficient, robust, reliable, and scalable. They must leverage existing commercial ecosystems if possible, and we contend that speech and multimodal applications must build on both the web model of application development and deployment, and the large ecosystem that has grown up around the W3C’s web speech standards.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Atkins, D., Ball, T., Baran, T., Benedikt, M., Cox, K., Ladd, D., Mataga, P., Puchol, C., Ramming, J.C., Rehor, K., and Tuckey, C. (1997) Mawl: Integrated web and telephone service creation. Bell Labs Technical Journal, 2(1), pp. 19-35.

    Article  Google Scholar 

  • Auburn, R. (2007) Voice browser call control: CCXML version 1.0, W3C Working Draft, http://www.w3.org/TR/ccxml/

  • Axelsson, J., Cross, C., Ferrans, J., McCobb, G., Raman, T., and Wilson, L. (2004) XHTML+Voice Profile 1.2, VoiceXML Forum, March 2004, http://www.voicexml.org/specs/multimodal/x+v/12/spec.html

  • Boyer, L., Danielsen, P., Ferrans, J., Karam, G., Ladd, D., Lucas, B., and Rehor, K. (2000) Voice Extensible Markup Language (VoiceXML) version 1.0, VoiceXML Forum. Bryant, R. (2007) Data-intensive supercomputing: The case for DISC, CMU Technical Report CMU-CS-07-128. May 10, 2007.

    Google Scholar 

  • Burke, D. and McGlashan, S. (2006) Video interactive services with VoiceXML. VoiceXML Review, 6(2), March/April 2006, http://www.voicexml.org/Review/Mar2006/features/video_interactive_services.html

  • Delaney, B., Simunic, T., and Jayant, N. (2005) Energy-aware distributed speech recognition for wireless mobile devices. IEEE Design and Test of Computers, 22(1), pp. 39-49.

    Article  Google Scholar 

  • Deng, L. and Huang, X. (2004) Challenges in adopting speech recognition. CACM, 47(1), pp. 69-75.

    Google Scholar 

  • Engelsma, J. and Cross, C. (2007) Distributed multimodal synchronization protocol, IETF Internet Draft, (Work in Progress), January 2007.

    Google Scholar 

  • Engelsma, J. and Ferrans, J. (2007) Bypassing bluetooth device discovery using a multimodal user interface, In Proceedings of the 4th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (Mobiquitous 2007), Philadelphia, PA.

    Google Scholar 

  • Ferrans, J. (2003) The Motorola VoxGateway, lessons learned. VoiceXML Review, 3 (4), July/August 2003, http://www.voicexmlreview.org/Jul2003.

  • Harel, D. (1987) Statecharts: A visual formalism for complex systems. Science Computer Pro-gramming, 8, pp. 231-274.

    Article  MATH  MathSciNet  Google Scholar 

  • Kamvar, M. and Baluja, S. (2005) A large scale study of wireless search behavior: Google Mobile Search. In Proceedings of ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 2005), pp. 701-709.

    Google Scholar 

  • Kennedy, N.(2005) Igor Jablokov interview on multimodal search, October16,2005, http://www.niallkennedy.com/blog/archives/2005/10/igor_jablokov_interview_on_mul.html

  • Ladd, D., Hay, M., McClaughrey, P., and Ferrans, J. (1999) VoxML 1.1 Language Reference, http://www.w3.org/Voice/1999/VoxML.pdf

  • Maes, S. and Saraswat, V. (2003) Multimodal interaction requirements, W3C Note, http://www.w3.org/TR/mmi-reqs

  • McGlashan, S., Burnett, D., Carter, J., Danielsen, P., Ferrans, J., Hunt, A., Lucas, B., Porter, B., Rehor, K., and Tryphonas, S. (2004) Voice Extensible Markup Language (VoiceXML) version 2.0, W3C Recommendation, http://www.w3.org/TR/voicexml20

  • Neurosky (2007) http://www.neurosky.com

  • Open Mobile Alliance (2006) OMA multimodal and multi-device enabler architecture, OMA-AD-MMMD-V1_0-20061011-D, October 2006, http://member.openmobilealliance.org/ftp/Public_documents/BT/MAE/Permanent_documents/OMA-AD-MMMD-V1_0-20061011-D.zip

  • Oviatt, S., (2000) Taming recognition errors with a multimodal interface. CACM, 43(9), pp. 45-51.

    Google Scholar 

  • Pearce, D. (2000) Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities for distributed speech recognition front-ends. In Proceedings of Ap-plied Voice Input/Output Society Conference (AVIOS 2000), San Jose, CA.

    Google Scholar 

  • Pearce, D. (2004) Robustness to transmission channel—The DSR approach. In Proceedings COST278 & ISCA Research Workshop on Robustness Issues in Conversational Interaction.

    Google Scholar 

  • Pearce, D., Engelsma, J., Ferrans, J., and Johnson, J. (2005) An architecture for seamless access to distributed multimodal services. In Proceedings of 9th European Conference on Speech Com-munication and Technology (Interspeech 2005), pp. 2845-2848.

    Google Scholar 

  • Pearce, M. (2002) Pearce principle, private communication, January 2002.

    Google Scholar 

  • Raggett, D. (1999) Introduction to TalkML, http://www.w3.org/Voice/TalkML/

  • Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and Schooler, E. (2002) SIP: Session Initiation Protocol. IETF RFC 3261, June 2002, http://www.ietf.org/rfc/rfc3261.txt

  • Shanmugham, P., Monaco, P., and Eberman, B. (2006) A media resource control protocol (MRCP). IETF RFC 4463, April 2006, http://www.rfc-editor.org/rfc/rfc4463.txt

  • Suhm, B., Myers, B., and Waibel, A. (2001) Multimodal error correction for speech interfaces. ACM Transactions on Computer-Human Interaction, 8(1), pp. 60-98, March 2001.

    Article  Google Scholar 

  • Sutherland, I. and Danielsen, P. (2006) VoiceXML and voice-over-IP. VoiceXML Review, 6(3), September/October 2006. http://www.voicexml.org/Review/Oct2006/features/voip.html

  • Zyda, M., Thukral, D., Jakatdar, S., Engelsma, J., Ferrans, J., Hans, M., Shi, L., Kitson, F., and Vasudevan, V. (2007) Educating the next generation of mobile game developers. IEEE Com-puter Graphics and Applications, 27(2), pp. 95-96.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag London Limited

About this chapter

Cite this chapter

Ferrans, J.C., Engelsma, J. (2008). Software Architectures for Networked Mobile Speech Applications. In: Automatic Speech Recognition on Mobile Devices and over Communication Networks. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-143-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-143-5_13

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-142-8

  • Online ISBN: 978-1-84800-143-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics