Capturing Expressive and Indicative Qualities of Conducting Gesture: An Application of Temporal Expectancy Models

  • Dilip Swaminathan
  • Harvey Thornburg
  • Todd Ingalls
  • Stjepan Rajko
  • Jodi James
  • Ellen Campana
  • Kathleya Afanador
  • Randal Leistikow
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4969)


Many event sequences in everyday human movement exhibit temporal structure: for instance, footsteps in walking, the striking of balls in a tennis match, the movements of a dancer set to rhythmic music, and the gestures of an orchestra conductor. These events generate prior expectancies regarding the occurrence of future events. Moreover, these expectancies play a critical role in conveying expressive qualities and communicative intent through the movement; thus they are of considerable interest in musical control contexts. To this end, we introduce a novel Bayesian framework which we call the temporal expectancy model and use it to develop an analysis tool for capturing expressive and indicative qualities of the conducting gesture based on temporal expectancies. The temporal expectancy model is a general dynamic Bayesian network (DBN) that can be used to encode prior knowledge regarding temporal structure to improve event segmentation. The conducting analysis tool infers beat and tempo, which are indicative and articulation which is expressive, as well as temporal expectancies regarding beat (ictus and preparation instances) from conducting gesture. Experimental results using our analysis framework reveal a very strong correlation in how significantly the preparation expectancy builds up for staccato vs legato articulation, which bolsters the case for temporal expectancy as cognitive model for event anticipation, and as a key factor in the communication of expressive qualities of conducting gesture. Our system operates on data obtained from a marker based motion capture system, but can be easily adapted for more affordable technologies like video camera arrays.


Temporal Structure Magnitude Velocity Dynamic Bayesian Network Communicative Intent Temporal Expectancy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ozyurek, S.K.A., Willems, R.M., Hagoort, P.: On-line integration of semantic information from speech and gesture: Insights from event-related brain potentials. Journal of Cognitive Neuroscience 19, 605–616 (2007)CrossRefGoogle Scholar
  2. 2.
    Arulampalam, S., Maskell, S., Gordon, N., Clapp, T.: Tutorial on particle filters for on-line nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing (2001)Google Scholar
  3. 3.
    Barabasi, A.L.: The origin of bursts and heavy tails in human dynamics. Nature 435, 207 (2005)CrossRefGoogle Scholar
  4. 4.
    Berger, J., Gang, D.: A neural network model of metric perception and cognition in the audition of functional tonal music. In: International Computer Music Conference (1997)Google Scholar
  5. 5.
    Bregler, C.: Learning and recognizing human dynamics in video sequences. In: International conference on computer vision and pattern recognition (1997)Google Scholar
  6. 6.
    Cemgil, A., Kappen, H.J., Desain, P., Honing, H.: On tempo tracking: Tempogram representation and Kalman filtering. In: Proceedings of the 2000 International Computer Music Conference, pp. 352–355 (2000)Google Scholar
  7. 7.
    Cemgil, A.T.: Bayesian Music Transcription. PhD thesis, Radboud University (2004)Google Scholar
  8. 8.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley and Sons, Chichester (1999)Google Scholar
  9. 9.
    Desain, P.: What rhythm do I have in mind? Detection of imagined temporal patterns from single trial ERP. In: Proceedings of the International Conference on Music Perception and Cognition (ICMPC) (2004)Google Scholar
  10. 10.
    Campana, S.P.M.K.T.E., Silverman, L., Bennetto, L.: Listeners immediately integrate natural combinations of speech and iconic gesture. Language and Cognitive Processes (submitted)Google Scholar
  11. 11.
    Hackney, P.: Making Connections: Total Body Integration Through Bartenieff Fundamentals. Routledge (2000)Google Scholar
  12. 12.
    Hagendoorn, I.G.: Some speculative hypotheses about the nature and perception of dance and choreography. Journal of Consciousness Studies, 79–110 (2004)Google Scholar
  13. 13.
    Hainsworth, S.W.: Techniques for the Automated Analysis of Musical Audio. PhD thesis, University of Cambridge (2003)Google Scholar
  14. 14.
    Holle, H., Gunter, T.C.: The role of iconic gestures in speech disambiguation: ERP evidence. Journal of Cognitive Neuroscience 19, 1175–1192 (2007)CrossRefGoogle Scholar
  15. 15.
    Huron, D.: Sweet Anticipation: Music and the Psychology of Expectation (Bradford Books). MIT Press, Cambridge (2006)Google Scholar
  16. 16.
    Jaynes, E.T.: Probability Theory: Logic of Science, Cambridge (2003)Google Scholar
  17. 17.
    Jones, M.R., McAuley, J.D.: Time judgments in global temporal contexts. Perception and Psychophysics, 398–417 (2005)Google Scholar
  18. 18.
    Kay, S.M.: Fundamentals of statistical signal processing: estimation theory. Prentice-Hall, Inc., Englewood Cliffs (1993)zbMATHGoogle Scholar
  19. 19.
    Kolesnik, P., Wanderley, M.: Recognition, analysis and performance with expressive conducting gestures. In: International Computer Music Conference (2004)Google Scholar
  20. 20.
    Kutas, M., Federmeier, K.: Electrophysiology reveals semantic memory use in language comprehension. Trends in Cognitive Science 4, 463–470 (2000)CrossRefGoogle Scholar
  21. 21.
    Lee, E., Grull, I., Kiel, H., Borchers, J.: conga: A framework for adaptive conducting gesture analysis. In: International Conference on New Interfaces for Musical Expression (2006)Google Scholar
  22. 22.
    Leistikow, R.: Bayesian Modeling of Musical Expectations using Maximum Entropy Stochastic Grammars. PhD thesis, Stanford University (2006)Google Scholar
  23. 23.
    McAuley, J.D.: The effect of tempo and musical experience on perceived beat. Australian Journal of Psychology, 176–187 (1999)Google Scholar
  24. 24.
    Meyer, L.B.: Emotion and Meaning in Music. University Of Chicago Press (1961)Google Scholar
  25. 25.
    Miranda, R.A., Ullman, M.T.: Double dissociation between rules and memory in music: An event-related potential study. NeuroImage 38, 331–345 (2007)CrossRefGoogle Scholar
  26. 26.
    Moeslund, T.B., Granum, E.: A survey of computer vision-based human motion capture. Computer Vision and Image Understanding 81, 231–268 (2001)zbMATHCrossRefGoogle Scholar
  27. 27.
    Moeslund, T.B., Hilton, A., Kruger, V.: A survey of advances in vision-based human motion capture and analysis. International Journal of Computer Vision and Image Understanding (2006)Google Scholar
  28. 28.
    Murphy, D., Andersen, T., Jensen, K.: Conducting Audio Files via Computer Vision. In: Camurri, A., Volpe, G. (eds.) GW 2003. LNCS (LNAI), vol. 2915, pp. 529–540. Springer, Heidelberg (2004)Google Scholar
  29. 29.
    Murphy, K.: Dynamic Bayesian Networks:Representation, Inference and Learning. PhD thesis, University of California, Berkeley (2002)Google Scholar
  30. 30.
    Narmour, E.: The Analysis and Cognition of Basic Melodic Structures: The Implication-Realization Model. University of Chicago Press, Chicago (1990)Google Scholar
  31. 31.
    Povel, D.J., Essens, P.: Perception of temporal patterns. Music Perception, 411–440 (1985)Google Scholar
  32. 32.
    Ross, S.: Stochastic Processes. Wiley Interscience, Chichester (1995)Google Scholar
  33. 33.
    Kelly, C.K.S.D., Hopkins, M.: Neural correlates of bimodal speech and gesture comprehension. Brain and Language 89(1), 253–260 (2004)CrossRefGoogle Scholar
  34. 34.
    Koelsch, D.S.K.S.T.G.S., Kasper, E., Friederici, A.D.: Music, language and meaning: Brain signatures of semantic processing. Nature Neuroscience 7, 302–307 (2004)CrossRefGoogle Scholar
  35. 35.
    Savitzky, A., Golay, M.J.E.: Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 1627–1639 (1964)Google Scholar
  36. 36.
    Thornburg, H.: Detection and Modeling of Transient Audio Signals with Prior Information. PhD thesis, Stanford University (2005)Google Scholar
  37. 37.
    Thornburg, H., Swaminathan, D., Ingalls, T., Leistikow, R.: Joint segmentation and temporal structure inference for partially-observed event sequences. In: International Workshop on Multimedia Signal Processing (2006)Google Scholar
  38. 38.
    Torresani, L., Hackney, P., Bregler, C.: Learning motion style synthesis from perceptual observations. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 1393–1400. MIT Press, Cambridge (2007)Google Scholar
  39. 39.
    Ude, A.: Robust estimation of human body kinematics from video. In: Proc. IEEE/RSJ Conf. Intelligent Robots and Systems (1999)Google Scholar
  40. 40.
    Urtasun, R., Fleet, D.J., Fua, P.: 3d people tracking with gaussian process dynamical models. In: IEEE Conference on Computer Vision and Pattern Recognition (2006)Google Scholar
  41. 41.
    Usa, S., Mochida, Y.: A multi-modal conducting simulator. In: International Computer Music Conference (1978)Google Scholar
  42. 42.
    Weisstein, E.W.: MathWorld–A Wolfram Web Resource,
  43. 43.
    Wu, Y.C., Coulson, S.: Meaningful gestures: Electrophysiological indices of iconic gesture comprehension. Psychophysiology 42, 654–667 (2005)CrossRefGoogle Scholar
  44. 44.
    Wu, Y.C., Coulson, S.: How iconic gestures enhance communication: An ERP study. Brain and Language (in press, 2007)Google Scholar
  45. 45.
    Xenakis, I.: Formalized Music. Pendragon Press, Stuyvesant (1992)Google Scholar
  46. 46.
    Zanto, T.P., Snyder, J.S., Large, E.W.: Neural correlates of rhythmic expectancy. Advances in Cognitive Psychology, 221–231 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Dilip Swaminathan
    • 1
  • Harvey Thornburg
    • 1
  • Todd Ingalls
    • 1
  • Stjepan Rajko
    • 1
  • Jodi James
    • 1
  • Ellen Campana
    • 1
  • Kathleya Afanador
    • 1
  • Randal Leistikow
    • 2
  1. 1.Arts, Media and EngineeringArizona State UniversityUSA
  2. 2.Zenph Studios IncRaleighUSA

Personalised recommendations