Machine Learning

, Volume 65, Issue 2–3, pp 361–387 | Cite as

Modeling, analyzing, and synthesizing expressive piano performance with graphical models

  • Graham GrindlayEmail author
  • David Helmbold


Trained musicians intuitively produce expressive variations that add to their audience’s enjoyment. However, there is little quantitative information about the kinds of strategies used in different musical contexts. Since the literal synthesis of notes from a score is bland and unappealing, there is an opportunity for learning systems that can automatically produce compelling expressive variations. The ESP (Expressive Synthetic Performance) system generates expressive renditions using hierarchical hidden Markov models trained on the stylistic variations employed by human performers. Furthermore, the generative models learned by the ESP system provide insight into a number of musicological issues related to expressive performance.


Graphical models Hierarchical hidden Markov models Music performance Musical information retrieval 


  1. Arcos, J., & de Mántaras, R.L. (2001). An interactive cbr approach for generating expressive music. Journal of Applied Intelligence, 27(1), 115–129.CrossRefGoogle Scholar
  2. Bengio, Y. (1999). Markovian models for sequential data. Neural Computing Surveys, 2, 129–162.Google Scholar
  3. Bilmes, J. (1997). A gentle tutorial on the EM algorithm and its application to parameter estimation for gaussian mixture and hidden Markov models. Technical Report ICSI-TR-97-021, University of California at Berkeley.Google Scholar
  4. Brand, M., & Hertzmann, A. (2000). Style machines. In: Proceedings of ACM SIGGRAPH 2000 (pp. 183–192).Google Scholar
  5. Brand, M. (1999a). An entropic estimator for structure discovery. In M. J. Kearns, S. A. Solla, and D. A. Cohn (Eds.), Advances in Neural Information Processing Systems 11. MIT Press: Cambridge, MA.Google Scholar
  6. Brand, M. (1999b). Pattern discovery via entropy minimization. In: D. Heckerman and C. Whittaker (Eds.), Artificial Intelligence and Statistics, Morgan Kaufman.Google Scholar
  7. Brand, M. (1999c). Structure learning in conditional probability models via an entropic prior and parameter extinction. Neural Computation, 11(5), 1155–1182.CrossRefGoogle Scholar
  8. Brand, M. (1999d). Voice puppetry. In: A. Rockwood (Ed.), Proceedings of ACM SIGGRAPH 1999 (pp. 21–28), Los Angeles.Google Scholar
  9. Bresin, R., Friberg, A., & Sundberg, J. (2002). Director musices: The KTH performance rules system. In: Proceedings of SIGMUS-46, Kyoto.Google Scholar
  10. Casey, M. (2003). Musical structure and content repurposing with bayesian models. In: Proceedings of the Cambridge Music Processing Colloquium.Google Scholar
  11. Cemgil, A., Kappen, H., & Barber, D. (2003). Generative model based polyphonic music transcription. In: D., Heckerman and C. Whittaker (Eds.), IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. New Paltz, NY.Google Scholar
  12. Cemgil, A., & Kappen, H. (2003). Monte carlo methods for tempo tracking and rhythm quantization. Journal of Artifical Intelligence Research, 18, 45–81.zbMATHGoogle Scholar
  13. Corless, R., Gonnet, G., Hare, D., & Knuth, D. (1996). On the lambert W function. Advances in Computational Mathematics, 5, 329–359.zbMATHMathSciNetCrossRefGoogle Scholar
  14. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum-likelihood from incomplete data via the em algorithm. Journal of the Royal Statistics Society, Series B, 39(1), 1–38.zbMATHMathSciNetGoogle Scholar
  15. de Mántaras, R.L., & Arcos, J. (2002). AI and music: From composition to expressive performances. AI Magazine, 23(3), 43–57.Google Scholar
  16. Dixon, S. (2001). Automatic extraction of tempo and beat from expressive performances. Journal New Music Research, 30(1), 39–58.CrossRefGoogle Scholar
  17. Fine, S., Singer, Y., & Tishby, N. (1998). The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32(1), 41–62.zbMATHCrossRefGoogle Scholar
  18. Grindlay, G. (2005). Modeling expressive musical performance with hidden Markov models. Master’;s thesis, Dept. of Computer Science, U.C., Santa Cruz.Google Scholar
  19. Lang, D., & de Freitas, N. (2005). Beat tracking the graphical model way. In: L. K. Saul, Y. Weiss, and L. Bottou, (Eds.), Advances in Neural Information Processing Systems 17 (pp. 745–752). Cambridge, MA: MIT Press.Google Scholar
  20. Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music. MIT Press.Google Scholar
  21. Murphy, K.P., & Paskin, M.A. (2002). Linear-time inference in hierarchical hmms. In: T. G. Dietterich, S. Becker, and Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems 14 (pp. 833–840), Cambridge MIT Press.Google Scholar
  22. Murphy, K. (2004). The BayesNet toolbox. URL:
  23. Raphael, C. (2002a). Automatic transcription of piano music. In D. Heckerman and C. Whittaker, (Eds.), Proceedings ISMIR. Paris. France.Google Scholar
  24. Raphael, C. (2002b). A Bayesian Network for real-time musical accompaniment. In T.G. Dietterich, S. Becker, and Z. Ghahramani, (Eds.), Advances in Neural Information Processing Systems 14, (pp. 1433–1439). Cambridge, MA: MIT Press.Google Scholar
  25. Repp, B. (1992). Diversity and commonality in music performance: An analysis of timing microstructure in schumann’;s trumerei. Journal of the Acoustical Society of America, 92, 2546–2568.CrossRefGoogle Scholar
  26. Repp, B. (1997). Expressive timing in a debussy prelude: A comparison of student and expert pianists. Musicae Scientiae, 1(2), 257–268.Google Scholar
  27. Saunders, C., Hardoon, D. R., Shawe-Taylor, J., & Widmer, G. (2004). Using string kernels to identify famous performers from their playing style. In: Proceedings of the 15th European Conference on Machine Learning (ECML’;2004) (pp. 384–395), Springer.Google Scholar
  28. Scheirer, E. (1995). Extracting expressive performance information from recorded music. Master’;s thesis, Program in Media Arts and Science, Massachusetts Institute of Technology.Google Scholar
  29. Stamatatos, E., & Widmer, G. (2005). Automatic identification of music performers with learning ensembles. Artificial Intelligence, 165(1), 37–56.MathSciNetCrossRefGoogle Scholar
  30. Tobudic, A., & Widmer, G. (2003). Learning to play mozart: Recent improvements. In: Proceedings of the IJCAI’;03 Workshop on Methods for Automatic Music Performance and their Applications in a Public Rendering Contest.Google Scholar
  31. Tobudic, A., & Widmer, G. (2005). Learning to play like the great pianists. In: Proceedings of the 19th International Joint Conference on Aritificial Intelligence (IJCAI’;05).Google Scholar
  32. Todd, N. (1992). The dynamics of dynamics: A model of musical expression. Journal of the Acoustical Society of America, 91, 3540–3550.CrossRefGoogle Scholar
  33. Wang, T., Zheng, N., Li, Y., Xu, Y., & Shum, H. (2003). Learning kernel-based hmms for dynamic sequence synthesis. Graphical Models, 65(4), 206–221.CrossRefGoogle Scholar
  34. Widmer, G., & Goebl, W. (2004). Computational models of expressive music performance: The state of the art. Journal of New Music Research, 33(3), 203–216.CrossRefGoogle Scholar
  35. Widmer, G., & Tobudic, A. (2003). Playing mozart by analogy: Learning multi-level timing and dynamics strategies. Journal of New Music Research, 32(3), 259–268.CrossRefGoogle Scholar
  36. Widmer, G. (2003). Discovering simple rules in complex data: A meta-learning algorithm and some surprising musical discoveries. Artificial Intellignece, 146(2), 129–148.zbMATHMathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  1. 1.Media LaboratoryMassachusetts Institute of TechnologyCambridge
  2. 2.Computer Science DeptartmentUniversity of CaliforniaSanta Cruz

Personalised recommendations