Skip to main content
Log in

Tlk or txt? Using voice input for SMS composition

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

This paper reports a series of investigations, which aim to test the appropriateness of voice recognition as an interaction method for mobile phone use. First, a KLM model was used in order to compare the speed of using voice recognition against using multi-tap and predictive text (the two most common methods of text entry) to interact with the phone menus and compose a text message. The results showed that speech is faster than the other two methods and that a combination of input methods provides the quickest task completion times. The first experiment used a controlled message creation task to validate the KLM predictions. This experiment also confirmed that the result was not due to a speed/accuracy trade off and that participants preferred to use the combination of input methods rather than a single method for menu interaction and text composition. The second experiment investigated the effect of limited visual feedback (when walking down the road or driving a car for example) on interaction, providing further evidence in support of speech as a useful input method. These experiments not only indicate the usefulness of voice in SMS input but also that users could also be satisfied with voice input in hands-busy, eyes-busy situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. A similar predictive technology is also provided by Motorola, ‘iTap’.

References

  1. Grinter RE, Eldridge M (2001) Y do tngrs luv 2 txt msg? In: Proceedings of seventh ECSCW 2001, pp 219–238

  2. Nieminen-Sundell R, Vaananen-Vainio-Mattila K (2003) Usability meets sociology for richer consumer studies. In: Lindholm C, Keinonen T, Kiljander H (eds) Mobile usability, pp 113–128

  3. Mobile Data Association (2004) http://www.mda-mobiledata.org

  4. Hirotaka N (2003) Reassessing current cell phone designs: using thumb input effectively. In: Proceedings of ACM CHI 2003, pp 938–939

  5. MacKenzie SI, Kober H, Smith D, Jones T, Skepner E (2001) Letterwise: Prefix-based disambiguation for mobile text input. In: Proceedings of ACM CHI 2001, pp 111–120

  6. Butts L, Cockburn A (2001) An evaluation of mobile phone text input methods. In: Proceedings of AUIC 2002, ACM Press, pp 55–59

  7. Silfverberg M, MacKenzie IS, Korhonen P (2000) Predicting text entry speed on mobile phones. In: Proceedings of ACM CHI 2000, pp 9–16

  8. Cohen PR (1992) The role of natural language in a multimodal interface. In: Proceedings of UIST 1992, pp 143–149

  9. Card SK, Moran TP, Newell A (1983) The psychology of human–computer interaction. Lawrence Erlbaum Associates, London

    Google Scholar 

  10. Dunlop MD, Crossan A (2000) Predictive text entry methods for mobile phones. Personal Technol 4:134–143

    Article  Google Scholar 

  11. James CL, Reischel KM (2001) Text input for mobile devices: comparing model prediction to actual performance. In: Proceedings of ACM CHI 2000, pp 365–371

  12. Brewster S, Lumsden J, Bell M, Hall M, Tasker S (2003) Multimodal ‘eyes-free’ interaction techniques for wearable devices. In: Proceedings of ACM CHI 2003, pp 473–480

  13. Ward DJ, Blackwell AF, MacKay DJC. (2000) Dasher—a data entry interface using continuous gestures and language models. In: Proceedings of UIST 2000, pp 129–137

  14. Ward DJ, Blackwell AF, MacKay DJC (2002) Dasher: A gesture-driven data entry interface for mobile computing. Hum Comput Interact 17(2& 3):199–228

    Article  Google Scholar 

  15. Grasso MA, Ebert DS, Finin TW (1998) The Integrality of Speech in multimodal interfaces. ACM Trans Comput Hum Interact 5(4):303–325

    Article  Google Scholar 

  16. Gould JD (1978) How experts dictate. Human perception and performance. J Exp Psychol 4(4):648–661

    Google Scholar 

  17. Karat C-M, Halverson C, Horn D, Karat J (1999) Patterns of entry and correction in large vocabulary continuous speech recognition systems. In: Proceedings of ACM CHI 1999, pp 568–575

  18. Karat J, Horn D, Halverson, C, Karat C-M (2000) Overcoming unusability: developing efficient strategies in speech recognition systems. In: Proceedings of ACM CHI 2000, pp 141–142

  19. Eedro J, Moreno J-M, Van Thong BL (2002). From multimedia retrieval to knowledge management. Computer 58–66

  20. Sears A, Feng J, Oseitutu K, Karat CM (2003) Hands-free, speech-based navigation during dictation: difficulties, consequences and solutions. Hum Comput Interact 18:229–257

    Article  Google Scholar 

  21. Van Buskirk R, LaLomia M(1995) A comparison of speech and mouse/keyboard GUI navigation. In: Proceedings of ACM CHI 1995, pp 96

  22. Christian K, Kules B, Shneiderman B, Youssef A (2000) A comparison of voice controlled and mouse controlled web browsing. In: Proceedings of ASSETS 2000, pp 72–79

  23. Oviatt S (1996) Multimodal interfaces for dynamic interactive maps. In: Proceedings of ACM CHI 1996, pp 95–102

  24. Oviatt S, DeAngeli A, Kuhn K (1997) Integration and synchronization of inputmodes during multimodal human-computer interaction. In: Proceedings of ACM CHI 1997, pp 415–422

  25. Cohen PR, Johnston M, McGee DR, Oviatt S, Clow J, Smith I (1998) The efficiency of multimodal interaction: a case study. In: Proceedings of spoken language processing 1998, pp 249–252

  26. Suhm B, Myers B, Wailbel A (2001) Multimodal error correction for speech user interfaces. ACM Trans Comput Hum Interact 8(1):60–98

    Article  Google Scholar 

  27. Lai J (2004) Facilitating mobile communication with multimodal access to email messages on a cell phone. In: Proceedings of ACM CHI 2004, pp 1259–1262

  28. Lerch FJ, Mantel MM, Olsen JR (1989) Translating ideas into actions: Cognitive analysis of errors in spreadsheet formulas. In: Proceedings of ACM CHI 1989, pp 121–126

  29. John BE, Newell A (1989) Cumulating the science of HCI: from S-R compatibility to transcription typing. In: Proceedings of ACM CHI 1989, pp 109–114

  30. John BE, Vera AH, Newell A (1994) Toward real-time GOMS: a model of expert behaviour in a highly interactive task. Behav Inf Technol 13(4):255–267

    Article  Google Scholar 

  31. Kieras DE (1988) Towards a practical GOMS model methodology for user interface design. In: Helander M et al (eds) The handbook of human-computer interaction, Elsevier, New York, pp 135–138

    Google Scholar 

  32. Gray WD, John BE, Atwood ME (1993) Project Ernestine: a validation of GOMS for prediction and explanation of real-world task performance. Hum Comput Interact 8:237–309

    Article  Google Scholar 

  33. Hart SG, Staveland LE (1988) Development of a multi-dimensional workload rating scale: Results of empirical and theoretical research. In: Hancock PA, Meshkati N (eds) Human mental workload, North Holland Press, Amsterdam, pp 239–250

    Google Scholar 

  34. Grinter RE, Eldridge M (2003) Wan2tlk? Everyday text messaging. In: Proceedings of ACM CHI 2003, pp 441–448

  35. Furr RM, Rosenthal R (2003) Repeated-measures contrasts for “multiple-pattern” hypotheses. Psychol Methods 8(3):275–293

    Article  Google Scholar 

  36. Rudnicky A (1993) Mode preference in a simple data-retrieval task. In: Proceedings of ACM CHI 1993, pp 71–72

  37. Kirstoffersen S, Ljunberg F (1999) “Making place” to make IT work: empirical explorations of HCI for mobile CSCW. In: Proceedings of ACM SIGGROUP conference on supporting group work, pp 276–285

  38. Brewster SA, Dunlop MD (2004) Mobile human–compter interaction. Mobile HCI 2004, LNCS 3160. Springer, Heidelberg

    Google Scholar 

  39. Oulasvirta A, Tamminen S, Roto V, Kuorelahti J (2005) Interaction in 4-second bursts: the fragmented nature of attentional resources in mobile HCI. In: Proceedings of ACM CHI 2005, pp 919–928

  40. Silfverberg M (2003) Using mobile keypads with limited visual feedback: implications to handheld and wearable devices. In: Proceedings of Mobile HCI 2003, LNCS 2795. Springer, Heidelberg, pp 76–90

  41. Clawson J, Lyons K, Starner T, Clarkson E (2005) The impacts of limited visual feedback on mobile text entry for the Twiddler and mini-QWERTY keyborad. In: Proceedings of 9th IEEE international symposium on wearable computers

  42. Pavlovych A, Stuerzlinger W (2004) Model for non-expert text entry speed on 12-button phone keypads. In: Proceedings of ACM CHI 2004, pp 351–358

  43. MacKenzie IS, Sourkoreff RW (2002) Text entry for mobile computing: models and methods, theory and practice. Hum Comput Interact 17(2&3):147–198

    Article  Google Scholar 

  44. Rodman RD (1999) Computer speech technology. Artech House, Norwood

  45. Oviatt S (1992) Pen/voice: complementary multimodal communication. In: Proceedings of speech technology 1992, pp 238–241

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul A. Cairns.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cox, A.L., Cairns, P.A., Walton, A. et al. Tlk or txt? Using voice input for SMS composition. Pers Ubiquit Comput 12, 567–588 (2008). https://doi.org/10.1007/s00779-007-0178-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-007-0178-8

Keywords

Navigation