GEMINI: A Generic Multi-Modal Natural Interface Framework for Videogames

  • Luís Filipe TeófiloEmail author
  • Pedro Alves Nogueira
  • Pedro Brandão Silva
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 206)


In recent years videogame companies have recognized the role of player engagement as a major factor in user experience and enjoyment. This encouraged a greater investment in new types of game controllers such as the WiiMoteTM, Rock BandTM instruments and the KinectTM. However, the native software of these controllers was not originally designed to be used in other game applications. This work addresses this issue by building a middleware framework, which maps body poses or voice commands to actions in any game. This not only warrants a more natural and customized user-experience but it also defines an interoperable virtual controller. In this version of the framework, body poses and voice commands are respectively recognized through the Kinect’s built-in cameras and microphones. The acquired data is then translated into the native interaction scheme in real time using a lightweight method based on spatial restrictions. The system is also prepared to use Nintendo’s WiimoteTM as an auxiliary and unobtrusive gamepad for physically or verbally impractical commands. System validation was performed by analyzing the performance of certain tasks and examining user reports. Both confirmed this approach as a practical and alluring alternative to the game’s native interaction scheme. In sum, this framework provides a game-controlling tool that is totally customizable and very flexible, thus expanding the market of game consumers.


Multi-modal natural interfaces videogames 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bianchi-Berthouze, et al.: Does body movement engage you more in digital game play? And Why? In: Proceedings of the International Conference of Affective Computing and Intelligent Interaction, pp. 102–113 (2007)Google Scholar
  2. 2.
    Lazzaro, N.: Why we play games: Four keys to more emotion without story. Technical report, XEO Design Inc. (2004)Google Scholar
  3. 3.
    Suma, E.A., et al.: FAAST: The Flexible Action and Articulated Skeleton Toolkit. In: IEEE Virtual Reality Conference, pp. 247–248 (2011)Google Scholar
  4. 4.
    Vera, L., et al.: Augmented Mirror: Interactive augmented reality system based on kinect. In: Interact 2011, vol. Part IV (2011)Google Scholar
  5. 5.
    Kratz, L., et al.: Wizards: 3D gesture recognition for game play input. Future Play (2007)Google Scholar
  6. 6.
    Zhang, X., et al.: Hand gesture recognition and virtual game control based on a 3D accelerometer and EMG sensors. In: International Conference on Intelligent user Interfaces (2009)Google Scholar
  7. 7.
    Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26(1), 43–49 (1978)zbMATHCrossRefGoogle Scholar
  8. 8.
  9. 9.
    Schwarz, L.A., et al.: Recognizing multiple human activities and tracking full-body pose in unconstrained environments. Pattern Recognition 45, 11–23 (2012)CrossRefGoogle Scholar
  10. 10.
    Arantes, M., Gonzaga, A.: Human gait recognition using extraction and fusion of global motion features. Multimedia Tools and Applications 55, 655–675 (2011)CrossRefGoogle Scholar
  11. 11.
    Saon, G., Soltau, H.: Boosting systems for large vocabulary continuous speech recognition. Speech Communication 54, 212–218 (2012)CrossRefGoogle Scholar
  12. 12.
    Jelinek, F.: Statistical methods for speech recognition. MIT Press (1998) ISBN-13:978-0-262-1066-3Google Scholar
  13. 13.
    Shih, P.Y., Lin, P.C., Wang, J.F., Lin, Y.N.: Robust several-speaker speech recognition with highly dependable online speaker adaptation and identification. Journal of Network and Computer Applications 34, 1459–1467 (2011)CrossRefGoogle Scholar
  14. 14.
    Bernsen, N.O., Dybkjær, L.: Exploring Natural Interaction in the Car. In: Proceedings of the CLASS Workshop on Natural Interactivity and Intelligent Interactive Information Representation, pp. 75–79 (2001)Google Scholar
  15. 15.
    Santos, E.S.: Interaction in Augmented Reality Environments Using Kinect. In: Proceedings of the XIII Symposium on Virtual Reality, pp. 112–121 (2011)Google Scholar
  16. 16.
    Linder, N.: LuminAR: a compact and kinetic projected augmented reality interface. M.Sc. thesis. MIT, Cambridge, Massachusetts, USA (2011)Google Scholar
  17. 17.
    Yamada, T., Hayamizu, Y., Yamamoto, Y., Yomogida, Y., Izadi-Najafabadi, A., Futaba, D.N., Hata, K.: A stretchable carbon nanotube strain sensor for human-motion detection. Nature Nanotechnology 6, 296–301 (2011)CrossRefGoogle Scholar
  18. 18.
    Lockman, J., Fisher, R.S., Olson, D.M.: Detection of seizure-like movements using a wrist accelerometer. Epilepsy and Behavior 20(4), 638–641 (2011)CrossRefGoogle Scholar
  19. 19.
    Text and videos available at the Immersence website, (last visited on April 10, 2012)
  20. 20.
    Microsoft Corporation, Kinect sensor manual and warranty (2012), (last visited on April 9, 2012)

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Luís Filipe Teófilo
    • 1
    • 2
    Email author
  • Pedro Alves Nogueira
    • 1
    • 2
  • Pedro Brandão Silva
    • 1
    • 3
  1. 1.DEIFEUP – Faculty of Engineering, University of PortoPortoPortugal
  2. 2.LIACC – Artificial Intelligence and Computer Science Lab.University of PortoPortoPortugal
  3. 3.INESC PORTO – Instituto de Engenharia de Sistemas e ComputadoresPortoPortugal

Personalised recommendations