International Journal of Speech Technology

, Volume 10, Issue 1, pp 17–30 | Cite as

Examining modality usage in a conversational multimodal application for mobile e-mail access

  • Jennifer LaiEmail author
  • Stella Mitchell
  • Christopher Pavlovski


As Third Generation (3G) networks emerge they provide not only higher data transmission rates but also the ability to transmit both voice and low latency data within the same session. This paper describes the architecture and implementation of a multimodal application (voice and text) that uses natural language understanding combined with a WAP browser to access email messages on a cell phone. We present results from the use of the system by users as part of a laboratory trial that evaluated usage. The user trial also compared the multimodal system with a text-only system that is representative of current products in the market today. We discuss the observed modality issues and highlight implementation problems and usability concerns that were encountered in the trial. Findings indicate that speech was used the majority of the time by participants for both input and navigation even though most of the participants had little or no prior experience with speech systems (yet did have prior experience with text-only access to applications on their phones). To our knowledge this represents the first implementation and evaluation of its kind using this combination of technologies on an unmodified cell phone. Design implications resulting from the study findings and usability issues encountered are presented to inform the design of future conversational multimodal mobile applications.


Multimodal interfaces Modality usage Natural language understanding Mobile phones Speech technologies 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Baddeley, A. (1992). Working memory. Science, 255, 556–559. CrossRefGoogle Scholar
  2. Cohen, M., Giangola, J., & Balogh, J. (2004). Voice user interface design. Reading: Addison-Wesley Professional. Google Scholar
  3. Cohen, P., McGee, D., & Clow, J. (2000). The efficiency of multimodal interactions for a map-based task. In Proceedings of the sixth conference on applied natural language processing (pp. 331–338) 2000. Google Scholar
  4. Gong, L. (2003). Multimodal interactions on mobile devices and users’ behavioral and attitudinal preferences. In C. Stephanidis (Ed.), Universal access in HCI: inclusive design in the information society (pp. 1402–1406). Mahwah: Lawrence Erlbaum Associates. Google Scholar
  5. Grover, D. L., King, M. T., & Kushler, C. A. (1998). Reduced keyboard disambiguating computer. Tegic Communications, Inc. Seattle. U.S. Patent 5,818,437, 1998. Google Scholar
  6. James, C., & Reischel, K. (2001). Text input for mobile devices: comparing model prediction to actual performance. In Proceedings of the conference on human factors in computing systems (pp. 365–372) 2001. Google Scholar
  7. Jurafsky, D., & Martin, J. (2000). Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. New York: Prentice Hall. Google Scholar
  8. Lai, J. (2004). Facilitating mobile communication with multimodal access to email messages on a cell phone. In Proceedings of the conference on human factors in computing systems—late breaking, CHI 2004. Google Scholar
  9. Lai, J., & Yankelovich, N. (2002). Conversational speech interfaces. In A. Sears, J. Jacko (Eds.), The handbook of human computer interaction. LEA, New Jersey. Google Scholar
  10. Lai, J., Mitchell, S., Viveros, M., Wood, D., & Lee, K. M. (2002). Ubiquitous access to unified messaging: a study of usability, and the limits of pervasive computing. International Journal of Human Computer Interaction, 14(3–4), 335–348. Google Scholar
  11. Longueuil, D. (2002). Wireless messaging demystified: SMS, EMS, MMS, IM, and other. New York: McGraw-Hill Professional. Google Scholar
  12. Mayer, R. E., & Moreno, R. (1998). A split-attention effect in multimedia learning: Evidence for dual information processing systems in working memory. Journal of Educational Psychology, 90, 312–320. CrossRefGoogle Scholar
  13. Oviatt, S. (1996). Multimodal interfaces for dynamic interactive maps. In Proceedings of the conference on human factors in computing systems, CHI ’96 (pp. 95–103) 1996. Google Scholar
  14. Oviatt, S. (1999). Mutual disambiguation of recognition errors in a multimodal architecture. In Proceedings of the conference on human factors in computing systems, CHI ’99, NY (pp. 576–583) 1999. Google Scholar
  15. Oviatt, S. L. (2000). Multimodal signal processing in naturalistic noisy environments. In B. Yuan, T. Huang, & X. Tang (Eds.), Proceedings of the international conference of spoken language processing (ICSLP ‘2000) (Vol. 2, pp. 696–699). Beijing: Chinese Friendship Publishers. Google Scholar
  16. Oviatt, S. (2003). Multimodal interfaces. In J. Jacko & A. Sears (Eds.), The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications (Chap. 14, pp. 286–304). Mahwah: Lawrence Erlbaum Associates. Google Scholar
  17. Pavlovski, C., Lai, J., & Mitchell, S. (2004). Etiology of user experience with natural language speech. In ICSLP 2004. Google Scholar
  18. Ruuska, P., & Frantti, T. (2001). The multicall service to support multimedia services in the UMTS networks. In Proceedings of the 27th Euromicro conference, Warsaw, Poland, 2001. Google Scholar
  19. Salonisdis, T., & Digalakis, V. (1998). Robust speech recognition for multiple topological scenarios of the GSM Mobile phone system. In International conference in acoustics and signal processing (ICASSP), Seattle, USA, 12–15 May 1998. Google Scholar
  20. Sawhney, N., & Schmandt, C. (2000). Nomadic radio. ACM Transactions on Computer-Human Interaction, 7(3). Google Scholar
  21. Wickens, C., Sandry, D., & Vidulich, M. (1983). Compatibility and resource competition between modalities of input, central processing, and output. Human Factors, 25, 227–248. Google Scholar
  22. W3C Note 8. (2003). Multimodal interaction requirements.
  23. Yankelovich, N., Levow, G., & Marx, M. (1995). Designing SpeechActs: issues in speech user interfaces. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 369–376), Denver, Colorado, United States, 07–11 May, 1995. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Jennifer Lai
    • 1
    Email author
  • Stella Mitchell
    • 1
  • Christopher Pavlovski
    • 2
  1. 1.IBM T.J. Watson Research CenterHawthorneUSA
  2. 2.IBM CorporationBrisbaneAustralia

Personalised recommendations