Virtual Reality

, Volume 17, Issue 4, pp 293–305 | Cite as

A usability study of multimodal input in an augmented reality environment

  • Minkyung Lee
  • Mark Billinghurst
  • Woonhyuk Baek
  • Richard Green
  • Woontack Woo
Original Article


In this paper, we describe a user study evaluating the usability of an augmented reality (AR) multimodal interface (MMI). We have developed an AR MMI that combines free-hand gesture and speech input in a natural way using a multimodal fusion architecture. We describe the system architecture and present a study exploring the usability of the AR MMI compared with speech-only and 3D-hand-gesture-only interaction conditions. The interface was used in an AR application for selecting 3D virtual objects and changing their shape and color. For each interface condition, we measured task completion time, the number of user and system errors, and user satisfactions. We found that the MMI was more usable than the gesture-only interface conditions, and users felt that the MMI was more satisfying to use than the speech-only interface conditions; however, it was neither more effective nor more efficient than the speech-only interface. We discuss the implications of this research for designing AR MMI and outline directions for future work. The findings could also be used to help develop MMIs for a wider range of AR applications, for example, in AR navigation tasks, mobile AR interfaces, or AR game applications.


Multimodal interface Augmented reality Usability Efficiency Effectiveness Satisfaction 



This work was supported by the DigiLog Miniature Augmented Reality Research Program funded by KAIST Research Foundation. It was supported by the Global Frontier R&D Program on <Human-centered Interaction for Coexistence> funded by the National Research Foundation of Korea grant funded by the Korean Government (MSIP) (NRF-2010-0029751).


  1. Azuma RT (1997) A survey of augmented reality. Presence: Teleoperators Virtual Environ 6(4):355–385Google Scholar
  2. Bevan N (1995) Measuring usability as quality of use. Softw Qual J 4:111–150CrossRefGoogle Scholar
  3. Bolt RA (1980) Put-that-there: voice and gesture at the graphics interface. Proc Annu Conf Comput Graph Interact Tech 14(3):262–270Google Scholar
  4. Borgefors G (1986) Distance transformations in digital images. Comput Vis Graph Image Process 34:344–371CrossRefGoogle Scholar
  5. Chai D, Bouzerdoum A (2000) A Bayesian approach to skin color classification in YCbCr color space. Proc IEEE TENCONO’00 2:421–424Google Scholar
  6. Chu CP, Dani TH, Gadh R (1997) Multimodal interface for a virtual reality based computer aided design system. Proc IEEE Int Conf Robot Automat 2:1329–1334CrossRefGoogle Scholar
  7. Cohen PR, Sullivan JW (1989) Synergistic user of direct manipulation and natural language. In: CHI'89 Proceedings of the SIGCHI conference on human factors in computing systems. ACM Press, New York, pp 227–233Google Scholar
  8. Cohen PR, Johnston M, McGee D, Oviatt S, Pittman J, Smith I, Chen L, Clow J (1997) QuickSet: multimodal interaction for distributed applications. In: Proceedings of the fifth ACM international conference on multimedia. ACM Press, New York, pp 31–40Google Scholar
  9. Fels S, Hinton G (1995) Glove-TalkII: an adaptive gesture-to-formant interface. In: CHI'95 Proceedings of the SIGCHI conference on human factors in computing systems. ACM Press, New York, pp 456–463Google Scholar
  10. Frøkjær E, Hertzum M, Hornbæk K (2000) Measuring usability: are effectiveness, efficiency, and satisfaction really correlated? CHI Conf Proc 2(1):345–352Google Scholar
  11. Hartley R, Zisserman A (2004) Multiple view geometry in computer vision. Cambridge University Press, CambridgeGoogle Scholar
  12. Hauptmann AG (1989) Speech and gestures for graphic image manipulation. CHI Conf Proc 241–245Google Scholar
  13. Heidemann G, Bax I, Bekel H (2004) Multimodal interaction in an augmented reality scenario. In: ICMI’04 Proceedings of the 6th international conference on multimodal interfaces. ACM, New York, pp 53–60Google Scholar
  14. Irawati S, Green S, Billinghurst M, Duenser A, Ko H (2006a) Move the couch where? Developing an augmented reality multimodal interface. ICAT: 1–4Google Scholar
  15. Irawati S, Green S, Billinghurst M, Duenser A, Ko H (2006b) An evaluation of an augmented reality multimodal interface using speech and paddle gestures. In: Advances in artificial reality and tele-existence, Lecture notes in computer science, vol 4282. pp 272–283Google Scholar
  16. LaViola Jr. JJ (1999) A multimodal interface framework for using hand gestures and speech in virtual environment applications. Gesture-Based Commun Hum Comp Interact 303–341Google Scholar
  17. Kaiser E, Olwal A, McGee D, Benko H, Corradini A, Li X, Cohen P, Feiner S (2003) Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. Proceedings of international conference on multimodal interfaces 12–19Google Scholar
  18. Kato H, Billinghurst M, Poupyrev I, Imamoto K, Tachibana K (2000) Virtual object manipulation on a table-top AR environment. In: Proceedings of the international symposium on augmented reality (ISAR 2000). Munich, Germany, pp 111–119Google Scholar
  19. Kölsch M, Turk M, Tobias H (2004) Vision-based interfaces for mobility. Proc MobiQuitous’04 86–94Google Scholar
  20. Kölsch M, Turk M, Tobias H (2006) Multimodal interaction with a wearable augmented reality system. IEEE Comput Graph Appl 26(3):62–71CrossRefGoogle Scholar
  21. Koons DB, Sparrell CJ (1994) ICONIC: speech and depictive gestures at the human-machine interface. In: CHI'94 Conference companion on human factors in computing systems. ACM, New York, pp 453–454Google Scholar
  22. Krum DM, Omotesto O, Ribarsky W, Starner T, Hodges LF (2002) Speech and gesture control of a whole earth 3D visualization environment. Proc Jt Eurograph-IEEE TCVG Symp Vis 195–200Google Scholar
  23. Latoschik ME (2001) A gesture processing framework for multimodal interaction in virtual reality. AFRIGRAPH 2001:95–100CrossRefGoogle Scholar
  24. Lee M, Billinghurst M (2008) A wizard of Oz study for an AR multimodal interface. Proc Int Conf Multimod Interfaces 249–256Google Scholar
  25. Lucente M, Zwart GJ, George AD (1998) Visualization space: a testbed for deviceless multimodal user interface. Proc AAAI Spring Symp Intell Environ. AAAI TR SS-98-02Google Scholar
  26. Olwal A, Benko H, Feiner S (2003) Sense shapes: using statistical geometry for object selection in a multimodal augmented reality system. Proc Int Symp Mix Augment Real 300–301Google Scholar
  27. Oviatt S, Coulson R, Lunsford R (2004) When Do We Interact Multimodally? Cognitive load and multimodal communication patterns. Proc Int Conf Multimod Interfaces 129–136Google Scholar
  28. Point Grey Research Inc (2009) Accessed 20 Nov 2009 [26]
  29. Quek F, McNeil D, Bryll R, Duncan S, Ma X, Kirbas C, McCullough KE, Ansari R (2002) Multimodal human discourse: gesture and speech. TOCHI 9(3):171–193CrossRefGoogle Scholar
  30. Rauschert I, Agrawal P, Sharmar R, Fuhrmann S, Brewer I, MacEachren A, Wang H, Cai G (2002) Designing a human-centered, multimodal GIS interface to support emergency management. Proc Geogrc Inf Syst 119–124Google Scholar
  31. Shneiderman B (2000) The limits of speech recognition. Commun ACM 43(9):63–65CrossRefGoogle Scholar
  32. Tse E, Greenberg S, Shen C (2006) GSI DEMO: Multiuser gesture/speech interaction over digital tables by wrapping single user applications. Proc Int Conf Multimod Interfaces 76–83Google Scholar
  33. Weimer D, Genapathy SK (1989) A synthetic visual environment with hand gesturing and voice input. Proc Conf Hum Factors Comput Syst 235–240Google Scholar
  34. Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 1330–1334Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Minkyung Lee
    • 1
  • Mark Billinghurst
    • 2
  • Woonhyuk Baek
    • 3
  • Richard Green
    • 4
  • Woontack Woo
    • 5
  1. 1.Technology Strategy Office, R&D CenterKTSeoulKorea
  2. 2.The HIT Lab NZUniversity of CanterburyChristchurchNew Zealand
  3. 3.Multimedia Research TeamDaum CommunicationsJeju-doKorea
  4. 4.Computer Science and Software EngineeringUniversity of CanterburyChristchurchNew Zealand
  5. 5.GSCT UVR LabKAISTDaejeonKorea

Personalised recommendations