A Video System for Recognizing Gestures by Artificial Neural Networks for Expressive Musical Control

  • Paul Modler
  • Tony Myatt
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2915)


In this paper we describe a system to recognize gestures to control musical processes. For that we applied a Time Delay Neuronal Network to match gestures processed as variation of luminance information in video streams. This resulted in recognition rates of about 90% for 3 different types of hand gestures and it is presented here as a prototype for a gestural recognition system that is tolerant to ambient conditions and environments. The neural network can be trained to recognize gestures difficult to be described by postures or sign language. This can be used to adapt to unique gestures of a performer or video sequences of arbitrary moving objects. We will discuss the outcome of extending the system to learn successfully a set of 17 hand gestures. The application was implemented in jMax to achieve real-time conditions and easy integration into a musical environment. We will describe the design and learning procedure of the using the Stuttgart Neuronal Network Simulator. The system aims to integrate into an environment that enables expressive control of musical parameters (KANSEI).


Recognition Rate Video Stream Gesture Recognition Hand Gesture Gesture Type 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berthold, R.M.: A Time Delay Radial Basis Function Network for Phoneme Recognition. In: Proceedings of IEEE International Conference on Neural Networks, Orlando, FL, vol. 7, pp. 4470–4473. IEEE Computer Society Press, Los Alamitos (1994)Google Scholar
  2. 2.
    Bowden, R.: Learning Nonlinear Models of Shape and Motion. PhD Thesis, Brunel University (1999)Google Scholar
  3. 3.
    Camurri, A., Trocca, R., Volpe, G.: Interactive Systems Design: A KANSEI-based Approach. In: Proc. NIME 2002, Dublin (May 2002)Google Scholar
  4. 4.
    de Cecco, M., Dechelle, F.: jMax/FTS Documentation (1999),
  5. 5.
    Cuttler, R., Turk, M.: View-based Interpretation of Real-time Optical Flow for Gesture Recognition. In: Proceedings of the 3 IEEE International Conference of Face and Gesture Recognition (1998)Google Scholar
  6. 6.
    Marolt, Martia: A Comparison of feed forward neural network architectures for piano music transcription. In: Proceedings of the ICMC 1999, ICMA 1999 (1999)Google Scholar
  7. 7.
    MEGA, Multisensory Expressive Gesture Applications, V Framework Programme IST Project No.1999-20410 (2002),
  8. 8.
    Modler, Paul, Zannos, Ioannis: Emotional Aspects of Gesture Recognition by Neural Networks, using dedicated Input Devices. In: Camurri, A., (ed.) Proc. of KANSEI The Technology of Emotion, AIMI International Workshop, Universita Genova, Genova (1997)Google Scholar
  9. 9.
    Modler, Paul: A General Purpose Open Source Artificial Neural Network Simulator for jMax. IRCAM-Forum, Paris (November 2002)Google Scholar
  10. 10.
    Myatt, A.: Strategies for interaction in construction 3, Organised Sound. In: CUP, Cambridge, UK, vol. 7(3), pp. 157–169 (2002)Google Scholar
  11. 11.
  12. 12.
    The RIMM Project, Real-time Interactive Multiple Media Content Generation Using High Performance Computing and Multi-Parametric Human-Computer Interfaces, European Commission 5th Framework programme Information, Societies, Technology (2002),
  13. 13.
    Rokeby, David: SoftVNS Motion Tracking system,
  14. 14.
    SNNS, Stuttgarter Neural Network Simulator, User Manual 4.1, Stuttgart, University of Stuttgart (1995)Google Scholar
  15. 15.
    Vassilakis, H., Howell, J.A., Buxton, H.I.: Comparison of Feedforward (TDRBF) and Generative (TDRGBN) Network for Gesture Based Control. In: Proceedings of the Int. Gesture Workshop (2001)Google Scholar
  16. 16.
    Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.J.: Phoneme recognition using time-delay neural networks. IEEE Transactions On Acoustics, Speech, and Signal Processing 37(3), 328–339 (1989)CrossRefGoogle Scholar
  17. 17.
    Wanderley, M.M., Marc, B.: Trends in Gestural Control of Music, CD-Rom, IRCAM, Paris (2000)Google Scholar
  18. 18.
    Zell, A.: Simulation Neuronaler Netze, Bonn, Paris. Addison Wesley, Reading (1994)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Paul Modler
    • 1
  • Tony Myatt
    • 2
  1. 1.Hochschule für GestaltungKarlsruheGermany
  2. 2.Music DepartmentUniversity of YorkYorkUK

Personalised recommendations