Software Architectures for Networked Mobile Speech Applications

Ferrans, James C.; Engelsma, Jonathan

doi:10.1007/978-1-84800-143-5_13

James C. Ferrans³ &
Jonathan Engelsma³

Part of the book series: Advances in Pattern Recognition ((ACVPR))

1186 Accesses
1 Citations

We examine architectures for mobile speech applications. These use speech engines for synthesizing audio output and for recognizing audio input; a key architectural decision is whether to embed these speech engines on the mobile device or to locate them in the network. While both approaches have advantages, our focus here is on networked speech application architectures. Because user experience with speech is greatly improved when the speech modality is coupled with a visual modality, mobile speech applications will increasingly tend to be multimodal, so speech architectures therefore must support multimodal user interaction. Good architectures must reflect commercial reality and be economical, efficient, robust, reliable, and scalable. They must leverage existing commercial ecosystems if possible, and we contend that speech and multimodal applications must build on both the web model of application development and deployment, and the large ecosystem that has grown up around the W3C’s web speech standards.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atkins, D., Ball, T., Baran, T., Benedikt, M., Cox, K., Ladd, D., Mataga, P., Puchol, C., Ramming, J.C., Rehor, K., and Tuckey, C. (1997) Mawl: Integrated web and telephone service creation. Bell Labs Technical Journal, 2(1), pp. 19-35.
Article Google Scholar
Auburn, R. (2007) Voice browser call control: CCXML version 1.0, W3C Working Draft, http://www.w3.org/TR/ccxml/
Axelsson, J., Cross, C., Ferrans, J., McCobb, G., Raman, T., and Wilson, L. (2004) XHTML+Voice Profile 1.2, VoiceXML Forum, March 2004, http://www.voicexml.org/specs/multimodal/x+v/12/spec.html
Boyer, L., Danielsen, P., Ferrans, J., Karam, G., Ladd, D., Lucas, B., and Rehor, K. (2000) Voice Extensible Markup Language (VoiceXML) version 1.0, VoiceXML Forum. Bryant, R. (2007) Data-intensive supercomputing: The case for DISC, CMU Technical Report CMU-CS-07-128. May 10, 2007.
Google Scholar
Burke, D. and McGlashan, S. (2006) Video interactive services with VoiceXML. VoiceXML Review, 6(2), March/April 2006, http://www.voicexml.org/Review/Mar2006/features/video_interactive_services.html
Delaney, B., Simunic, T., and Jayant, N. (2005) Energy-aware distributed speech recognition for wireless mobile devices. IEEE Design and Test of Computers, 22(1), pp. 39-49.
Article Google Scholar
Deng, L. and Huang, X. (2004) Challenges in adopting speech recognition. CACM, 47(1), pp. 69-75.
Google Scholar
Engelsma, J. and Cross, C. (2007) Distributed multimodal synchronization protocol, IETF Internet Draft, (Work in Progress), January 2007.
Google Scholar
Engelsma, J. and Ferrans, J. (2007) Bypassing bluetooth device discovery using a multimodal user interface, In Proceedings of the 4th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (Mobiquitous 2007), Philadelphia, PA.
Google Scholar
Ferrans, J. (2003) The Motorola VoxGateway, lessons learned. VoiceXML Review, 3 (4), July/August 2003, http://www.voicexmlreview.org/Jul2003.
Harel, D. (1987) Statecharts: A visual formalism for complex systems. Science Computer Pro-gramming, 8, pp. 231-274.
Article MATH MathSciNet Google Scholar
Kamvar, M. and Baluja, S. (2005) A large scale study of wireless search behavior: Google Mobile Search. In Proceedings of ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 2005), pp. 701-709.
Google Scholar
Kennedy, N.(2005) Igor Jablokov interview on multimodal search, October16,2005, http://www.niallkennedy.com/blog/archives/2005/10/igor_jablokov_interview_on_mul.html
Ladd, D., Hay, M., McClaughrey, P., and Ferrans, J. (1999) VoxML 1.1 Language Reference, http://www.w3.org/Voice/1999/VoxML.pdf
Maes, S. and Saraswat, V. (2003) Multimodal interaction requirements, W3C Note, http://www.w3.org/TR/mmi-reqs
McGlashan, S., Burnett, D., Carter, J., Danielsen, P., Ferrans, J., Hunt, A., Lucas, B., Porter, B., Rehor, K., and Tryphonas, S. (2004) Voice Extensible Markup Language (VoiceXML) version 2.0, W3C Recommendation, http://www.w3.org/TR/voicexml20
Neurosky (2007) http://www.neurosky.com
Open Mobile Alliance (2006) OMA multimodal and multi-device enabler architecture, OMA-AD-MMMD-V1_0-20061011-D, October 2006, http://member.openmobilealliance.org/ftp/Public_documents/BT/MAE/Permanent_documents/OMA-AD-MMMD-V1_0-20061011-D.zip
Oviatt, S., (2000) Taming recognition errors with a multimodal interface. CACM, 43(9), pp. 45-51.
Google Scholar
Pearce, D. (2000) Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities for distributed speech recognition front-ends. In Proceedings of Ap-plied Voice Input/Output Society Conference (AVIOS 2000), San Jose, CA.
Google Scholar
Pearce, D. (2004) Robustness to transmission channel—The DSR approach. In Proceedings COST278 & ISCA Research Workshop on Robustness Issues in Conversational Interaction.
Google Scholar
Pearce, D., Engelsma, J., Ferrans, J., and Johnson, J. (2005) An architecture for seamless access to distributed multimodal services. In Proceedings of 9th European Conference on Speech Com-munication and Technology (Interspeech 2005), pp. 2845-2848.
Google Scholar
Pearce, M. (2002) Pearce principle, private communication, January 2002.
Google Scholar
Raggett, D. (1999) Introduction to TalkML, http://www.w3.org/Voice/TalkML/
Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and Schooler, E. (2002) SIP: Session Initiation Protocol. IETF RFC 3261, June 2002, http://www.ietf.org/rfc/rfc3261.txt
Shanmugham, P., Monaco, P., and Eberman, B. (2006) A media resource control protocol (MRCP). IETF RFC 4463, April 2006, http://www.rfc-editor.org/rfc/rfc4463.txt
Suhm, B., Myers, B., and Waibel, A. (2001) Multimodal error correction for speech interfaces. ACM Transactions on Computer-Human Interaction, 8(1), pp. 60-98, March 2001.
Article Google Scholar
Sutherland, I. and Danielsen, P. (2006) VoiceXML and voice-over-IP. VoiceXML Review, 6(3), September/October 2006. http://www.voicexml.org/Review/Oct2006/features/voip.html
Zyda, M., Thukral, D., Jakatdar, S., Engelsma, J., Ferrans, J., Hans, M., Shi, L., Kitson, F., and Vasudevan, V. (2007) Educating the next generation of mobile game developers. IEEE Com-puter Graphics and Applications, 27(2), pp. 95-96.
Google Scholar

Download references

Author information

Authors and Affiliations

Motorola Labs, USA
James C. Ferrans & Jonathan Engelsma

Authors

James C. Ferrans
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Engelsma
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ferrans, J.C., Engelsma, J. (2008). Software Architectures for Networked Mobile Speech Applications. In: Automatic Speech Recognition on Mobile Devices and over Communication Networks. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-143-5_13

Download citation

DOI: https://doi.org/10.1007/978-1-84800-143-5_13
Publisher Name: Springer, London
Print ISBN: 978-1-84800-142-8
Online ISBN: 978-1-84800-143-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics