User Expectations from Dictation on Mobile Devices

  • Santosh Basapur
  • Shuang Xu
  • Mark Ahlenius
  • Young Seok Lee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4551)


Mobile phones, with their increasing processing power and memory, are enabling a diversity of tasks. The traditional text entry method using keypad is falling short in numerous ways. Some solutions to this problem include: QWERTY keypads on phone, external keypads, virtual keypads on table tops (Seimens at CeBIT ’05) and last but not the least, automatic speech recognition (ASR) technology. Speech recognition allows for dictation which facilitates text input via voice. Despite the progress, ASR systems still do not perform satisfactorily in mobile environments. This is mainly due to the complexity of capturing large vocabulary spoken by diverse speakers in various acoustic conditions. Therefore, dictation has its advantages but also comes with its own set of usability problems. The objective of this research is to uncover the various uses and benefits of using dictation on a mobile phone. This study focused on the users’ needs, expectations, and their concerns regarding the new input medium. Focus groups were conducted to investigate and discuss current data entry methods, potential use and usefulness of dictation feature, users’ reaction to errors from ASR during dictation, and possible error correction methods. Our findings indicate a strong requirement for dictation. All participants perceived dictation to be very useful, as long as it is easily accessible and usable. Potential applications for dictation were found in two distinct areas namely communication and personal use.


Mobile Phone Mobile Device Speech Recognition Automatic Speech Recognition Text Input 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Card, S.K., Moran, T.P., Newell, A.: The Psychology of Human-Computer Interaction, pp. 259–297. Lawrence Erlbaum Associates, New Jersey (1983)Google Scholar
  2. 2.
  3. 3.
    Cox, A., Walton, A.: Evaluating the viability of speech recognition for mobile text entry. In: Proceedings of HCI 2004: Design for Life, pp. 25–28 (2004)Google Scholar
  4. 4.
    Dragon Naturally Speaking Software website:
  5. 5.
    Dunlop, M.D., Crossan, A.: Predictive text entry methods for mobile phones. Personal Technologies, vol. 4, pp. 134–143. Springer, London (2000)Google Scholar
  6. 6.
    Feng, J., Karat, C.-M., Sears, A.: How productivity improves in hands-free continuous dictation tasks: lessons learned from a longitudinal study. Interacting with Computers 17, 265–289 (2005)CrossRefGoogle Scholar
  7. 7.
    Grinter, R., Eldridge, M.: y do tngrs luv 2 txt msg. In: Prinz, W., et al.: (eds.)Proceedings of the Seventh European Conference on Computer-Supported Cooperative Work ECSCW 2001, Dordecht, Netherlands: Kluwer, pp. 219–238 (2001)Google Scholar
  8. 8.
    Kamm, C.: User interfaces for voice applications. Paper presented in Colloquim: Human-Machine Communication by Voice at National Academy of Sciences at the Arnold and Mabel Beckman Center, Irvine, CA (February 8-9, 1993)Google Scholar
  9. 9.
    Karat, C.-M., Halverson, C., Karat, J., Horn, D.: Patterns of entry and correction in large vocabulary continuous speech recognition systems. In: Proceedings of CHI 1999, pp. 568–575 (1999)Google Scholar
  10. 10.
    Leiser, R.G.: Improving natural language and speech interfaces by the use of metalinguistic phenomena. Applied Ergonomics 20, 168–173 (1989)CrossRefGoogle Scholar
  11. 11.
    MacKenzie, I.S., Soukoreff, R.W.: Text entry for mobile computing: Models and methods, theory and practice. Human-Computer Interaction 17, 147–198 (2002)CrossRefGoogle Scholar
  12. 12.
    Marturano, L., Wheatley, D.: User centered research and design at Motorola. In: Proceedings of CHI 2000, pp. 221–222 (2000)Google Scholar
  13. 13.
  14. 14.
    Mobile Data Association Website:
  15. 15.
    Munteanu, C., Baecker, R., Penn, G., Toms, E., James, D.: The Effect of Speech Recognition Accuracy Rates on the Usefulness and Usability of Webcast Archives. In: Proceedings of Computer Human Interaction Conference, Montreal, Canada, pp. 493–502. ACM Press, New York (2006)Google Scholar
  16. 16.
    Oniszczak, A., MacKenzie, S.I.: A Comparison of Two Input Methods for Keypads on Mobile Devices. In: Proceedings of the third Nordic conference on Human-computer interaction, Tampere Finland, pp. 101–104. ACM Press, New York (2004)CrossRefGoogle Scholar
  17. 17.
    Oviatt, S.L., Cohen, P.R., Wu, L., Vergo, J., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winogard, T., Landay, J., Larson, J., Ferro, D.: Designing the user interface for multimodal speech and gesture applications: state-of-the-art systems and research directions. Human-Computer Interaction 15(4), 263–322 (2000)CrossRefGoogle Scholar
  18. 18.
    Palm Handheld Products website:
  19. 19.
    Rudnicky, A.I., Lee, K-F., Hauptmann, A.G.: Survey of current speech technology. Communications of the ACM 37(3), 52–57 (1994)CrossRefGoogle Scholar
  20. 20.
    Sears, A., Karat, C.-M., Oseitutu, K., Karimullah, A., Feng, J.: Productivity, satisfaction, and interaction strategies of individual with spinal cord injuries and traditional users interacting with speech recognition software. Universal Access in the information Society 1, 4–15 (2001)Google Scholar
  21. 21.
    Silfverberg, M., MacKenzie, I.S., Korhonen, P.: Perdicting text entry speed on mobile phones. In: Proceedings of CHI 2000, pp. 9–16. ACM Press, Amsterdam (2000)Google Scholar
  22. 22.
    Suhm, B., Myers, B., Waibel, A.: Multimodal error correction for speech user interfaces. ACM Transactions on Computer-Human Interaction 8(1), 60–98 (2001)CrossRefGoogle Scholar
  23. 23.
    Tarasewich, P.: Evaluation of thumbwheel text entry methods. Extended Abstracts of the CHI 2003 Conference, pp. 756–757 (2003)Google Scholar
  24. 24.
    Waibel, A., Suhm, B., Vo, M.T., Yang, J.: Multimodal Interfaces For Multimedia Information Agents. In: International Conference on Acoustics, Speech, and Signal Processing 1997, ICASSP 1997, Munich, Germany, 04 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Santosh Basapur
    • 1
  • Shuang Xu
    • 1
  • Mark Ahlenius
    • 1
  • Young Seok Lee
    • 2
  1. 1.Human Interaction Research Center of Excellence, Motorola Labs, Schaumburg, IL 60196USA
  2. 2.The Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, Virginia 24061USA

Personalised recommendations