Systems for Interactive Control of Computer Generated Music Performance

  • Marco Fabiani
  • Anders Friberg
  • Roberto Bresin


This chapter is a literature survey of systems for real-time interactive control of automatic expressive music performance. A classification is proposed based on two initial design choices: the music material to interact with (i.e., MIDI or audio recordings) and the type of control (i.e., direct control of the low-level parameters such as tempo, intensity, and instrument balance or mapping from high-level parameters, such as emotions, to low-level parameters). Their pros and cons are briefly discussed. Then, a generic approach to interactive control is presented, comprising four steps: control data collection and analysis, mapping from control data to performance parameters, modification of the music material, and audiovisual feedback synthesis. Several systems are then described, focusing on different technical and expressive aspects. For many of the surveyed systems, a formal evaluation is missing. Possible methods for the evaluation of such systems are finally discussed.


Emotional Expression Gesture Recognition Audio Recording Music Therapy Performance Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Hunt A, Kirk R, Neighbour M (2004) Multiple media interfaces for music therapy. IEEE Multimed 11(3):50–58CrossRefGoogle Scholar
  2. 2.
    Dravins C, van Besouw R, Hansen KF, Kuske S (2010) Exploring and enjoying non-speech sounds through a cochlear implant: the therapy of music. In: 11th international conference on cochlear implants and other implantable technologies, Stockholm, Sweden, p 356Google Scholar
  3. 3.
    Usa S, Mochida Y (1998) A multi-modal conducting simulator. In: Proceedings of international computer music conference (ICMC1998), Ann Arbor, MI, USA, pp 25–32Google Scholar
  4. 4.
    Peng L, Gerhard D (2009) A gestural interface for orchestral conducting education. In: Proceedings of 1st international conference on computer supported education (CSEDU), Lisboa, Portugal, pp 406–409Google Scholar
  5. 5.
    Hashida M, Tanaka S, Katayose H (2009) Mixtract: a directable musical expression system. In: Proceedings of 3rd international conference on affective computing and intelligent interaction (ACII2009), Amsterdam, the NetherlandsGoogle Scholar
  6. 6.
    Mathews MV (1989) The conductor program and mechanical baton. In: Current directions in computer music research. MIT Press, Cambridge, MAGoogle Scholar
  7. 7.
    Goto M (2007) Active music listening interfaces based on sound source separation and F0 estimation. J Acoust Soc Am 122(5):2988MathSciNetCrossRefGoogle Scholar
  8. 8.
    Goto M (2007) Active music listening interfaces based on signal processing. In: IEEE international conference on acoustics, speech and signal processing (ICASSP2007), Honolulu, Hawaii, pp 1441–1444Google Scholar
  9. 9.
    Camurri A (2009) Non-verbal full body emotional and social interaction: a case study on multimedia systems for active music listening. In: Niholt A, Reidsma D, Hondorp H (eds) Intelligent technologies for interactive entertainment, LNCST. Springer, Berlin/Heidelberg/New York, pp 9–18CrossRefGoogle Scholar
  10. 10.
    Personal Orchestra. Accessed 15 Nov 2010
  11. 11.
    Schertenleib S, Gutierrez M, Vexo F, Thalmann D (2004) Conducting a virtual orchestra. IEEE Multimed 11(3):40–49CrossRefGoogle Scholar
  12. 12.
    Bruegge B, Teschner C, Lachenmaier P, Fenzl E, Schmidt D, Bierbaum S (2007) Pinocchio: conducting a virtual symphony orchestra. In: Proceedings of international conference on advances in computer entertainment technology (ACE2007), Salzburg, AustriaGoogle Scholar
  13. 13.
    Buxton W, Reeves W, Fedorkow G, Smith KC, Baecker R (1980) A microcomputer-based conducting system. Comput Music J 4(1):8–21CrossRefGoogle Scholar
  14. 14.
    Mathews MV (1991) The Radio Baton and conductor program, or: pitch, the most important and least expressive part of music. Comput Music J 15(4):37–46CrossRefGoogle Scholar
  15. 15.
    Marrin T (1997) Possibilities for the digital baton as a general-purpose gestural interface. In: Extended abstracts conference on human factors in computing systems (CHI97), Atlanta, GA, USA, pp 311–312Google Scholar
  16. 16.
    Wii Remote. Accessed 16 Feb 2011
  17. 17.
    Marrin Nakra T, Ivanov Y, Smaragdis P, Adult C (2009) The UBS virtual maestro: an interactive conducting system. In: Proceedings of conference on new interfaces for musical expression (NIME09), Pittsburgh, PA, USAGoogle Scholar
  18. 18.
    Fabiani M, Dubus G, Bresin R (2011) MoodifierLive: interactive and collaborative music performance on mobile devices. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME11) (pp. 116–119). Oslo, Norway: University of Oslo and Norwegian Academy of Music.Google Scholar
  19. 19.
    Marrin Nakra T (2000) Inside the conductor’s jacket: analysis, interpretation and musical synthesis of expressive gesture. PhD dissertation, MIT Massachusetts Institute of TechnologyGoogle Scholar
  20. 20.
    Baba T, Hashida M, Katayose H (2010) “VirtualPhilharmony”: a conducting system with heuristics of conducting an orchestra. In: Proceedings of conference on new interfaces for musical expression (NIME10), Sydney, Australia, pp 263–270Google Scholar
  21. 21.
    Bertini G, Carosi P (1992) Light baton: a system for conducting computer music performance. In: Proceedings of international computer music conference (ICMC1992), San Jose, CA, USA, pp 73–76Google Scholar
  22. 22.
    Borchers J, Lee E, Samminger W (2004) Personal orchestra: a real-time audio/video system for interactive conducting. Multimed Sys 9(5):458–465CrossRefGoogle Scholar
  23. 23.
    Lightning II. Accessed 4 Feb 2011
  24. 24.
    Lightning III MIDI Controller. Accessed 2 Feb 2011
  25. 25.
    Morita H, Hashimoto S, Otheru S (1991) A computer music system that follows a human conductor. IEEE Comput 24(7):44–53CrossRefGoogle Scholar
  26. 26.
    Camurri A, Hashimoto S, Ricchetti M, Trocca R, Suzuki K, Volpe G (2000) EyesWeb: toward gesture and affect recognition in interactive dance and music systems. Comput Music J 24(1):941–952CrossRefGoogle Scholar
  27. 27.
    Camurri A, Mazzarino B, Volpe G (2004) Analysis of expressive gesture: the EyesWeb expressive gesture processing library. In: Camurri A, Volpe G (eds) Gesture-based communication in human-computer interaction, vol 2915, LNAI. Springer, Heidelberg, pp 460–467CrossRefGoogle Scholar
  28. 28.
    Max – Interactive Visual Programming Environment for Music, Audio, and Media. Accessed Feb 16 2011
  29. 29.
    OpenCV. Accessed 16 Feb 2011
  30. 30.
    Segen J, Gluckman J, Kumar S (2000) Visual interface for conducting virtual orchestra. In: Proceedings of 15th international conference on pattern recognition (ICPR), Barcelona, Spain, pp 276–279Google Scholar
  31. 31.
    Murphy D, Andersen TH, Jensen K (2004) Conducting audio files via computer vision. In: Camurri A, Volpe G (eds) Gesture-based communication in human-computer interaction, vol 2915, LNAI. Springer, Heidelberg, pp 529–540CrossRefGoogle Scholar
  32. 32.
    Kolesnik P, Wanderley M (2004) Recognition, analysis and performance with expressive conducting gestures. In: Proceedings of international computer music conference (ICMC2004), Miami, FL, USA, pp 572–575Google Scholar
  33. 33.
    Friberg A (2006) pDM: an expressive sequencer with real-time control of the KTH music-performance rules. Comput Music J 30(1):37–48CrossRefGoogle Scholar
  34. 34.
    Modler P, Myatt T, Saup M (2003) An experimental set of hand gestures for expressive control of musical parameters in realtime. In: Proceedings of conference on new interfaces for musical expression (NIME03), Montreal, Canada, pp 146–150Google Scholar
  35. 35.
    Garnett GE, Jonnalagadda M, Elezovic I, Johnson T, Small K (2001) Technological advances for conducting a virtual ensemble. In: Proceedings of international computer Music Conference (ICMC2001), Havana, Cuba, pp 167–169Google Scholar
  36. 36.
    Dixon S (2005) Live tracking of musical performances using on-line time warping. In: Proceedings of 8th international conference on digital audio effects (DAFx05), Madrid, SpainGoogle Scholar
  37. 37.
    Katayose H, Okudaira K (2004) iFP: a music interface using an expressive performance template. In: Rauterberg M (ed) Proceedings of 3 rd international conference on entertainment computing (ICEC2004), vol 3166, LNCS. Springer, Eindhoven, pp 225–251Google Scholar
  38. 38.
    Dillon R, Wong G, Ang R (2006) Virtual orchestra: an immersive computer game for fun and education. In: Proceedings of international conference on game research and development (CyberGames ‘06), Perth, Australia, pp 215–218Google Scholar
  39. 39.
    Höfer A, Hadjakos A, Mühlhäuser M (2009) Gyroscope-based conducting gesture recognition. In: Proceedings of conference on new interfaces for musical expression (NIME09), Pittsburgh, PA, USA, pp 175–176Google Scholar
  40. 40.
    Kinect. Accessed 16 Feb 2011
  41. 41.
    Baird B, Izmirli O (2001) Modeling the tempo coupling between an ensemble and the conductor. In: Proceedings of international computer music conference (ICMC2001), Havana, CubaGoogle Scholar
  42. 42.
    Ilmonen T, Takala T (2000) The virtual orchestra performance. In: Proceedings of conference on human factors in computing systems (CHI2000), Springer, Haag, the Netherlands, pp 203–204Google Scholar
  43. 43.
    Chew E, Francois A, Liu J, Yang A (2005) ESP: a driving interface for musical expression synthesis. In: Proceedings of conference on new interfaces for musical expression (NIME05), Vancouver, B.C., CanadaGoogle Scholar
  44. 44.
    Rudolf M (1995) The grammar of conducting: a comprehensive guide to baton technique and interpretation. Schirmer Books, New YorkGoogle Scholar
  45. 45.
    Usa S, Machida Y (1998) A conducting recognition system on the model of musicians’ process. J Acoust Soc Jpn 19(4):275–288CrossRefGoogle Scholar
  46. 46.
    Lee E, Gruell I, Kiel H, Borchers J (2006) conga: a framework for adaptive conducting gesture analysis. In: Proceedings of conference on new interfaces for musical expression (NIME06), Paris, FranceGoogle Scholar
  47. 47.
    Bevilacqua F, Zamborlin B, Sypniewski A, Schnell N, Guédy F, Rasamimanana N (2010) Continuous realtime gesture following and recognition. In: Kopp S, Wachsmuth I (eds) Gesture in embodied communication and human-computer interaction, vol 5934, LNAI. Springer, Heidelberg, pp 73–84CrossRefGoogle Scholar
  48. 48.
    Marrin Nakra T (2001) Synthesizing expressive music through the language of conducting. J New Music Res 31(1):11–26CrossRefGoogle Scholar
  49. 49.
    Widmer G, Goebl W (2004) Computational models of expressive music performance: the state of the art. J New Music Res 33(3):203–216CrossRefGoogle Scholar
  50. 50.
    Friberg A, Bresin R, Sundberg J (2006) Overview of the KTH rule system for musical performance. Adv Cognit Psychol Spec Issue Music Perform 2(2–3):145–161Google Scholar
  51. 51.
    Hashida M, Matsui T, Katayose H (2008) A new music database describing deviation information of performance expressions. In: Proceedings of 9th international conference music information retrieval (ISMIR2008), Philadelphia, PA, USA, pp 489–494Google Scholar
  52. 52.
    Hashida M, Nagata N, Katayose H (2007) jPop-E: an assistant system for performance rendering of ensemble music. In: Proceedings of conference on new interfaces for musical expression (NIME07), New York, NY, USA, pp 313–316Google Scholar
  53. 53.
    Fabiani M (2011) Interactive computer-aided expressive music performance – Analysis, control, modification, and synthesis methods. Doctoral dissertation, KTH Royal Institute of Technology.Google Scholar
  54. 54.
    Bresin R, Friberg A (2011) Emotion rendering in music: Range and characteristic values of seven musical variables. Cortex 47(9):1068–1081.Google Scholar
  55. 55.
    Juslin PN (2001) Communicating emotion in music performance: a review and a theoretical framework. In: Juslin PN, Sloboda JA (eds) Music and emotion: theory and research. Oxford University Press, New York, pp 305–333.Google Scholar
  56. 56.
    Russell JA (1980) A circumplex model of affect. J Personal Soc Psychol 39:1161–1178CrossRefGoogle Scholar
  57. 57.
    Friberg A, Colombo V, Fryden L, Sundberg J (2000) Generating musical performances with director musices. Comput Music J 24:23–29CrossRefGoogle Scholar
  58. 58.
    Bresin R, Friberg A, Sundberg J (2002) Director musices: the KTH performance rules system. In: Proceedings of SIGMUS-46, Kyoto, Japan, pp 43–48Google Scholar
  59. 59.
    Bresin R, Battel GU (2000) Articulation strategies in expressive piano performance. Analysis of legato, staccato, and repeated notes in performances of the andante movement of Mozart’s sonata in G major (K 545). J New Music Res 29(3):211–224CrossRefGoogle Scholar
  60. 60.
    Laroche J, Dolson M (1999) Improved phase vocoder time-scale modification of audio. IEEE Trans Speech Audio Process 7(3):323–332CrossRefGoogle Scholar
  61. 61.
    Fabiani M (2009) A method for the modification of acoustic instrument tone dynamics. In: Proceedings of 12th international conference on digital audio effects (DAFx-09), Como, ItalyGoogle Scholar
  62. 62.
    Luce DA (1975) Dynamic spectrum changes of orchestral instruments. J Audio Eng Soc 23(7):565–568Google Scholar
  63. 63.
    Wright M, Beauchamp J, Fitz K, Rodet X, Robel A, Serra X, Wakefield G (2000) Analysis/synthesis comparison. Organ Sound 5(3):173–189Google Scholar
  64. 64.
    Arcos JL, Lopez de antaras R, Serra X (1997) Saxex: a case-based reasoning system for generating expressive musical performances. J New Music Res 27:194–210CrossRefGoogle Scholar
  65. 65.
    Lee E, Karrer T, Borchers J (2006) Toward a framework for interactive systems to conduct digital audio and video streams. Comput Music J 30(1):21–36CrossRefGoogle Scholar
  66. 66.
    Lokki T, Savioja L, Huopaniemi J, Hänninen R, Ilmonen T, Hiipakka J, Pulkki V, Väänänen R, Takala T (1999) Virtual concerts in virtual spaces – in real time. In: Joint ASA/EAA meeting – special session on auditory display, Berlin, GermanyGoogle Scholar
  67. 67.
    Friberg A, Battel GU (2002) Structural communication. In: Parncutt R, McPherson GE (eds) The science and psychology of music performance: creative strategies for teaching and learning. Oxford University Press, New York/Oxford, pp 199–218Google Scholar
  68. 68.
    Mancini M, Bresin R, Pelachaud C (2007) A virtual head driven by music expressivity. IEEE Trans Audio Speech Lang Process 15(6):1833–1841CrossRefGoogle Scholar
  69. 69.
    Mathews MV, Friberg A, Bennett G, Sapp C, Sundberg J (2003) A marriage of the director musices program and the conductor program. In: Proceedings of Stockholm music acoustics conference (SMAC03), Stockholm, Sweden, pp 13–16Google Scholar
  70. 70.
    Borchers JO, Samminger W, Muhlhauser M (2002) Engineering a realistic real-time conducting system for the audio/video rendering of a real orchestra. In: IEEE 4th international symposium on multimedia software engineering (MSE2002), Newport Beach, CA, USAGoogle Scholar
  71. 71.
    Lee E, Nakra TM, Borchers J (2004) You’re the conductor: a realistic interactive conducting system for children. In: Proceedings of conference on new interfaces for musical expression (NIME04), Hamamatsu, JapanGoogle Scholar
  72. 72.
    Lee E, Kiel H, Dedenbach S, Gruell I, Karrer T, Wolf M, Borchers J (2006) iSymphony: an adaptive interactive orchestral conducting system for conducting digital audio and video streams. In: Extended abstracts conference on human factors in computing systems (CHI06), Montreal, CanadaGoogle Scholar
  73. 73.
    Borchers J, Hadjakos A, Muhlhauser M (2006) MICON: a music stand for interactive conducting. In: Proceedings of conference on new interfaces for musical expression (NIME06), Paris, France, pp 254–259Google Scholar
  74. 74.
    Karrer T, Lee E, Borchers J (2006) PhaVoRIT: a phase vocoder for real-time interactive time-stretching. In: Proceedings of international computer music conference (ICMC2006), New Orleans, LA, USA, pp 708–715Google Scholar
  75. 75.
    Bresin R, Widmer G (2000) Production of staccato articulation in Mozart sonatas played on a grand piano. Preliminary results. TMH-QPSR, 4/2000. KTH Royal Institute of Technology, StockholmGoogle Scholar
  76. 76.
    Peretz I (2001) Listen to the brain: a biological perspective on musical emotions. In: Juslin PN, Sloboda JA (eds) Music and emotion: theory and research. Oxford University Press, New York, pp 105–134Google Scholar
  77. 77.
    Sundberg J, Askenfelt A, Frydén L (1983) Musical performance: a synthesis-by-rule approach. Comput Music J 7(1):37–43CrossRefGoogle Scholar
  78. 78.
    Sundberg J (1993) How can music be expressive? Speech Commun 13:239–253CrossRefGoogle Scholar
  79. 79.
    Friberg A (1991) Generative rules for music performance: a formal description of a rule system. Comput Music J 15(2):56–71MathSciNetCrossRefGoogle Scholar
  80. 80.
    Friberg A, Bresin R, Frydén L, Sundberg J (1998) Musical punctuation on the microlevel: automatic identification and performance of small melodic units. J New Music Res 27(3):271–292CrossRefGoogle Scholar
  81. 81.
    Bresin R, Friberg A (2000) Emotional coloring of computer-controlled music performances. Comput Music J 24(4):44–63CrossRefGoogle Scholar
  82. 82.
    Pure Data. Accessed 16 Feb 2011
  83. 83.
    Friberg A (2005) Home conducting: control the overall musical expression with gestures. In: Proceedings of international computer music conference (ICMC2005), Barcelona, Spain, pp 479–482Google Scholar
  84. 84.
    Canazza S, De Poli G, Rodà A, Vidolin A (2003) An abstract control space for communication of sensory expressive intentions in music performance. Comput Music J 32(3):281–294Google Scholar
  85. 85.
    Friberg A (2005) A fuzzy analyzer of emotional expression in music performance and body motion. In: Proceedings of music and music science, Stockholm, SwedenGoogle Scholar
  86. 86.
    Bresin R (2005) What is the color of that music performance? In: Proceedings of international computer music conference (ICMC2005), Barcelona, Spain, pp 367–370Google Scholar
  87. 87.
    Zhu X, Beauregard GT, Wyse L (2007) Real-time signal estimation from modified short-time Fourier transform magnitude spectra. IEEE Trans Audio Speech Lang Process 15(5):1645–1653CrossRefGoogle Scholar
  88. 88.
    Fabiani M (2010) Frequency, phase and amplitude estimation of overlapping partials in monaural musical signals. In: Proceedings of 13th international conference on digital audio effects (DAFx-10), Graz, AustriaGoogle Scholar
  89. 89.
    pys60. Accessed 16 Feb 2011
  90. 90.
    Bech S, Zacharov N (2006) Perceptual audio evaluation – theory, method and application. Wiley, New YorkCrossRefGoogle Scholar
  91. 91.
    Fabiani M, Bresin R, Dubus G (2011) Interactive sonification of expressive hand gestures on a handheld device. Journal on Multimodal User Interfaces, 1–9Google Scholar
  92. 92.
    Rubin J, Chisnell D (2008) Handbook of usability testing – how to plan and conduct effective tests. Wiley, New YorkGoogle Scholar
  93. 93.
    Nielsen J (1994) Heuristic evaluation. In: Nielsen J, Mack RL (eds) Usability inspection methods. Wiley, New YorkGoogle Scholar
  94. 94.
    Borchers J, Samminger W, Muhlhausen M (2001) Conducting a realistic electronic orchestra. In: Proceedings of 14th annual ACM symposium on user interface software and technology (UIST2001), Orlando, FL, USAGoogle Scholar
  95. 95.
    Dannenberg R (1984) An on-line algorithm for real-time accompaniment. In: Proceedings of international computer music conference (ICMC’84), Paris, FranceGoogle Scholar
  96. 96.
    Vercoe BL (1984) The synthetic performer in the context of live performance. In: Proceedings of international computer music conference (ICMC’84), Paris, France, pp 199–200Google Scholar
  97. 97.
    Csikszentmihalyi M (1998) Finding flow: The psychology of engagement with everyday life. Basic Books, New YorkGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Marco Fabiani
    • 1
  • Anders Friberg
    • 1
  • Roberto Bresin
    • 1
  1. 1.Department of Speech, Music & HearingKTH Royal Institute of TechnologyStockholmSweden

Personalised recommendations