Skip to main content
Log in

Using hand gestures to control mobile spoken dialogue systems

  • Long paper
  • Published:
Universal Access in the Information Society Aims and scope Submit manuscript

Abstract

Speech and hand gestures offer the most natural modalities for everyday human-to-human interaction. The availability of diverse spoken dialogue applications and the proliferation of accelerometers on consumer electronics allow the introduction of new interaction paradigms based on speech and gestures. Little attention has been paid, however, to the manipulation of spoken dialogue systems (SDS) through gestures. Situation-induced disabilities or real disabilities are determinant factors that motivate this type of interaction. In this paper, six concise and intuitively meaningful gestures are proposed that can be used to trigger the commands in any SDS. Using different machine learning techniques, a classification error for the gesture patterns of less than 5 % is achieved, and the proposed set of gestures is compared to ones proposed by users. Examining the social acceptability of the specific interaction scheme, high levels of acceptance for public use are encountered. An experiment was conducted comparing a button-enabled and a gesture-enabled interface, which showed that the latter imposes little additional mental and physical effort. Finally, results are provided after recruiting a male subject with spastic cerebral palsy, a blind female user, and an elderly female person.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. http://www.who.int/.

References

  1. Bisani, M., Ney, H.: Bootstrap estimates for confidence intervals in ASR performance evaluation. In: Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 409–411. Montreal, Canada (2004)

  2. Bolt, R.A.: Put-that-there: voice and gesture at the graphics interface. In: Proceedings of Computer Graphics and Interactive Techniques. Seattle, Washington, United States (1980)

  3. Bouillon, P., Halimi, S., Rayner, M., Tsourakis, N.: Evaluating a web-based spoken translation game for learning domain language. In: Proceedings of the Fifth International Technology, Education and Development Conference. Valencia, Spain (2011)

  4. Carbini, S., Delphin-Poulat, L., Perron, L., Viallet, J.E.: From a wizard of oz experiment to a real time speech and gesture multimodal interface. Signal Process. 86(12), 3559–3577 (2006)

    Article  MATH  Google Scholar 

  5. Cho, S.-J., Choi, E., Bang, W.-C., Yang, J., Sohn, J., Kim, D.Y., Lee, Y.-B., Kim, S.: Two-stage recognition of raw acceleration signals for 3D-gesture-understanding cell phones. In: 10th International Workshop on Frontiers in Handwriting Recognition (2006)

  6. Choe, B., Min, J., Cho, S.: Online gesture recognition for user interface on accelerometer built-in mobile phones. In: Neural Information Processing, Models and Applications (2010)

  7. Dong, L., Frank, E., Kramer, K.: Ensembles of balanced nested dichotomies for multi-class problems. In: PKDD, pp. 84–95 (2005)

  8. Ferscha, A., Vogl, S., Emsenhuber, B., Wally, B.: Physical shortcuts for media remote controls. In: Proceedings of the 2nd International Conference on Intelligent Technologies For Interactive Entertainment (ICST). Brussels, Belgium (2007)

  9. Fuchs, M., Tsourakis, N., Rayner, M.: A Lightweight scalable architecture for web deployment of multilingual spoken dialogue systems. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC) 2012. Istanbul, Turkey (2012)

  10. Gama, J.: Functional trees. Mach. Learn. 55, 219–250 (2004)

    Article  MATH  Google Scholar 

  11. Hall, M, Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Google Scholar 

  12. Hauptmann, A.G.: Speech and gestures for graphic image manipulation. ACM SIGCHI Bull. 20, 241–245 (1989)

    Article  Google Scholar 

  13. Joachims, T.: SVM-multiclass http://svmlight.joachims.org/svm_multiclass.html (2004)

  14. Kane, S., Wobbrock, J.O., Ladner, R.: Usable gestures for blind people: understanding preference and performance. In: Proceedings of the Annual Conference on Human Factors in Computing Systems (2011)

  15. Kauppila, M., Pirttikangas, S., Su, X., Riekki, J.: Accelerometer based gestural control of browser applications. In: International Workshop on Real Field Identification (RFId2007). In Conjunction with 4th International Symposium on Ubiquitous Computing Systems (UCS 2007), pp. 25–28 (2007)

  16. Kela, J., Korpipää, P., Mantyjarvi, J., Kallio, S., Savino, G., Jozzo, L., Marca D.: Accelerometer-based gesture control for a design environment. In: Personal Ubiquitous Computing, July 2006, pp. 285–299 (2006)

  17. Kristoffersen, S., Ljungberg, F.: Making place to make IT work: empirical explorations of HCI for mobile CSCW. In: Proceedings of the International ACM SIGGROUP Conference on Supporting Group Work, pp. 276-285. ACM, New York (1999)

  18. Lee, K.-B., Kim J.-H., Hong K.-S.: An implementation of multi-modal game interface based on pdas. In: Software Engineering Research, Management and Applications 2007. Proceedings of the 5th ACIS International Conference on, Busan, Korea, 2007, pp. 759–768 (2007)

  19. Lee, Y., Kozar, K., Larsen, K.: The Technology Acceptance Model: Past, Present and Future. In: Communications of the ACM (Volume 12, Article 50), 2003, 752–780 (2003)

  20. Lim, C.J, Pan, Y., Lee, J.: Human factors and design issues in multimodal (speech/gesture) interface. Int. J. Digit. Content Technol. Appl. 2(1), 67–77 (2008)

    Google Scholar 

  21. Liu, J., Kavakli, M.: A survey of speech-hand gesture recognition for the development of multimodal interfaces in computer games. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp.1564–1569 (2010)

  22. Liu, J., Wang, Z., Zhong, L., Wickramasuriya, J., Vasudevan, V.: uWave: Accelerometer-based personalized gesture recognition and its applications. In: Proceedings of IEEE International Conference Pervasive Computing and Communication (PerCom) (2009)

  23. Martin, B.: Instance-Based Learning: Nearest Neighbor with Generalization. Master Thesis, University of Waikato, Hamilton, New Zealand (1995)

  24. McGookin, D., Brewster, S., Jiang, W.: Investigating touchscreen accessibility for people with visual impairments. In: Proceedings of NordiCHI ’08, pp. 298-307. ACM, New York (2008)

  25. McNeill, D.: What Gestures Reveal About Thought. The University of Chicago Press, Chicago (1992)

  26. Morganti, E., Angelini, L., Adami, A., Lalanne, D., Lorenzelli, L., Mugellini, E.: A smart watch with embedded sensors to recognize objects, grasps and forearm gestures. Procedia Eng 41, 1169–1175 (2012)

    Article  Google Scholar 

  27. Mustonen, T., Olkkonen, M., Hakkinen, J.: Examining mobile phone text legibility while walking. In: CHI’04 Extended Abstracts on Human Factors in Computing Systems (2004)

  28. Neto, A., Duarte, C.: A study on the use of gestures for large displays. In: Proceedings of the 11th International Conference on Enterprise Information Systems (ICEIS) (2009)

  29. Oviatt, S.: Multimodal interfaces. In: Jacko, J., Sears, A. (eds.) The Human-Computer Interaction Handbook, pp. 482–503. Lawrence Erlbaum Associates, Mahwah, New Jersey (2003)

  30. Oviatt, S., Cohen, P., Wu, L., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J.: Designing the user interface for multimodal speech and pen-based gesture applications: state-of-the-art systems and future research directions. Hum. Comput. Interact. 15(4), 263–322 (2000)

    Article  Google Scholar 

  31. Perakakis, M., Potamianos, A.: Multimodal system evaluation using modality efficiency and synergy metrics. In: Proceedings of the 10th International Conference on Multimodal Interfaces, Chania, Crete (2008)

  32. Prasad, V.S.N., Kellokumpu, V., Davis, L.S.: Ballistic hand movements. In: Proceedings of AMDO’2006, pp.153–164 (2006)

  33. Prekopcsak, Z.: Accelerometer based real-time gesture recognition. In: Proceedings of the 12th International Student Conference on Electrical Engineering, Prague, Czech Republic (2008)

  34. Rico, J., Brewster, S.: Usable gestures for mobile interfaces: evaluating social acceptability. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems (CHI ’10), pp. 887–896. ACM, New York (2010)

  35. Ronkainen, S., Haäkkila, J., Kaleva, S., Colley, A., Linjama, J.: Tap input as an embedded interaction method for mobile devices. In: Proceedings of TEI 2007, pp. 263–270. ACM Press, New York (2007)

  36. Schlömer, T., Poppinga, B., Henze, N., Boll, S.: Gesture recognition with a Wii controller. In: Proceedings of TEI’08 Conference Tangible and Embedded Interaction, pp. 11–14 (2008)

  37. Sears, A., Young, M.: Physical disabilities and computing technologies: an analysis of impairments. In: Jacko, J., Sears, A. (eds.) The Human-Computer Interaction Handbook, pp. 482–503. Lawrence Erlbaum Associates, Mahwah, New Jersey (2003)

  38. Shinohara, K., Wobbrock, J.O.: In the shadow of misperception: assistive technology use and social interactions. In: Proceedings of CHI (2011)

  39. Tanaka, K.: Next major application systems and key techniques in speech recognition technology. In: Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP) (1998)

  40. Turunen, M., Kallinen, A., Sanchez, I., Riekki, J., Hella, J., Olsson, T., Melto, A., Rajaniemi, J.-H., Hakulinen, J., Makinen, E., Valkama, P, Miettinen, T., Pyykkönen, M., Saloranta, T., Gilman, E., Raisamo, R.: Multimodal interaction with speech and physical touch interface in a media center application. In: Proceedings of the International Conference on Advances in Computer Entertainment Technology, New York, NY (2009)

  41. U.S. Census Bureau: An Aging World, issued in June 2009

  42. Wang, C., Seneff, S.: Automatic assessment of student translations for foreign language tutoring. In: Proceedings of NAACL/HLT 2007. Rochester, NY (2007)

  43. Wobbrock, J.O., Wilson, A.D., Li, Y.: Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In: Proceedings of the ACM Symposium User Interface Software and Technology (UIST) (2007)

  44. Wu, J., Pan, G., Zhang, D., Qi, G., Li, S.: Gesture recognition with a 3-D accelerometer. In: Zhang, D., Portmann, M., Tan, A.-H., Indulska, J. (eds.) UIC 2009. LNCS, vol. 5585, pp. 25–38. Springer, Heidelberg (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikos Tsourakis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsourakis, N. Using hand gestures to control mobile spoken dialogue systems. Univ Access Inf Soc 13, 257–275 (2014). https://doi.org/10.1007/s10209-013-0317-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10209-013-0317-0

Keywords

Navigation