Music Learning: Automatic Music Composition and Singing Voice Assessment

  • Lorenzo J. TardónEmail author
  • Isabel Barbancho
  • Carles Roig
  • Emilio Molina
  • Ana M. Barbancho
Part of the Springer Handbooks book series (SHB)


Traditionally, singing skills are learned and improved by means of the supervised rehearsal of a set of selected exercises. A music teacher evaluates the user's performance and recommends new exercises according to the user's evolution.

In this chapter, the goal is to describe a virtual environment that partially resembles the traditional music learning process and the music teacher's role, allowing for a complete interactive self-learning process.

An overview of the complete chain of an interactive singing-learning system including tools and concrete techniques will be presented. In brief, first, the system should provide a set of training exercises. Then, it should assess the user's performance. Finally, the system should be able to provide the user with new exercises selected or created according to the results of the evaluation.

Following this scheme, methods for the creation of user-adapted exercises and the automatic evaluation of singing skills will be presented. A technique for the dynamical generation of musically meaningful singing exercises, adapted to the user's level, will be shown. It will be based on the proper repetition of musical structures, while assuring the correctness of harmony and rhythm. Additionally, a module for singing assessment of the user's performance, in terms of intonation and rhythm, will be shown.


dynamic time warping


experiments in musical intelligence


interonset interval


musical instrument digital interface


root mean square


rhythm self-similarity matrix


sequential minimal optimization


total intonation error



This work has been funded by Ministerio de Economía y Competitividad of the Spanish Government under Project No. TIN2016-75866-C3-2-R. This work has been done at Universidad de Málaga, Campus de Excelencia Internacional Andalucía Tech.


  1. 42.1
    M.P. Ryynänen, A.P. Klapuri: Automatic transcription of melody, bass line, and chords in polyphonic music, Comput. Music J. 32(3), 72–86 (2008)CrossRefGoogle Scholar
  2. 42.2
    J. Serrá, E. Gómez, P. Herrera: Audio cover song identification and similiraty: Background, approaches, evaluation, and beyond. In: Advances in Music Information Retrieval, Vol. 274, ed. by Z.W. Ras, A.A. Wieczorkowska (Springer, Berlin, Heidelberg 2010) pp. 307–332CrossRefGoogle Scholar
  3. 42.3
    S. Koelsch, W.A. Siebel: Towards a neural basis of music perception, Proc. TRENDS Cogn. Sci. 9(12), 578–584 (2005)CrossRefGoogle Scholar
  4. 42.4
    R.F. Goldman: Ionisation; Density, 21.5; Integrales; Octandre; Hyperprism; Poeme Electronique, Musical Q. 47(1), 133–134 (1961)CrossRefGoogle Scholar
  5. 42.5
    G. Nierhaus: Algorithmic Composition: Paradigms of Automated Music Generation, Vol. 34 (Springer, Wien 2010)zbMATHGoogle Scholar
  6. 42.6
    C. Uhle, J. Herre: Estimation of tempo, micro time and time signature from percussive music. In: Proc. Int. Conf. Digital Audio Effects (DAFx) (2003)Google Scholar
  7. 42.7
    F. Gouyon, P. Herrera, P. Cano: Pulse-dependent analyses of percussive music, Proc. ICASSP 4, 396–401 (2002)Google Scholar
  8. 42.8
    S. Tojo, K. Hirata: Structural similarity based on time-span tree. In: Proc. 9th Int. Symp. Comput. Music Model. Retriev. (CMMR) (2012) pp. 645–660Google Scholar
  9. 42.9
    M. Müller, D.P.W. Ellis, A. Klapuri, G. Richard: Signal processing for music analysis, IEEE J. Sel. Top. Signal Process. 5(6), 1088–1110 (2011)CrossRefGoogle Scholar
  10. 42.10
    A. Van Der Merwe, W. Schulze: Music generation with Markov models, IEEE Multimed. 18(3), 78–85 (2011)CrossRefGoogle Scholar
  11. 42.11
    M. Pearce, G. Wiggins: Towards a framework for the evaluation of machine compositions. In: Proc. AISB’01 Symp. AI Creat. Arts Sci (2001) pp. 22–32Google Scholar
  12. 42.12
    D. Conklin: Music generation from statistical models. In: Proc. Symp. Artif. Intell. Creat. Arts Sci. (AISB) (2003) pp. 30–35Google Scholar
  13. 42.13
    E.R. Miranda, J.A. Biles: Evolutionary Computer Music (Springer, London 2007)CrossRefGoogle Scholar
  14. 42.14
    D. Cope: Computer modeling of musical intelligence in EMI, Comput. Music J. 16(2), 69–83 (1992)CrossRefGoogle Scholar
  15. 42.15
    D. Cope: Computer Models of Musical Creativity (MIT Press, Cambridge 2005)Google Scholar
  16. 42.16
    M. Delgado, W. Fajardo, M. Molina-Solana: Inmamusys: Intelligent multiagent music system, Expert Syst. Appl. 36(3), 4574–4580 (2009)CrossRefGoogle Scholar
  17. 42.17
    D.M. Howard, G. Welch, J. Brereton, E. Himonides, M. Decosta, J. Williams, A. Howard: WinSingad: A real-time display for the singing studio, Logop. Phoniatr. Vocology 29(3), 135–144 (2004)CrossRefGoogle Scholar
  18. 42.18
    Barcelona Music and Audio Technologies: SKORE Performance Rating, (2008)
  19. 42.19
    O. Mayor, J. Bonada, A. Loscos: The singing tutor: Expression categorization and segmentation of the singing voice. In: Proc. AES 121st Convention (2006)Google Scholar
  20. 42.20
    D. Rossiter, D.M. Howard: ALBERT: A real-time visual feedback computer tool for professional vocal development, J. Voice Off. J. Voice Found. 10(4), 321–336 (1996)Google Scholar
  21. 42.21
    Sony Computer Entertainment Europe: Singstar (SCEE London Studios 2004)Google Scholar
  22. 42.22
    T. Nakano, M. Goto, Y. Hiraga: An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features. In: Proc. INTERSPEECH (ICSLP) (2006) pp. 1706–1709Google Scholar
  23. 42.23
    J. Callaghan, P. Wilson: How to Sing and See: Singing Pedagogy in the Digital Era (Cantare Systems, Surry Hills 2004)Google Scholar
  24. 42.24
    D. Hoppe, M. Sadakata, P. Desain: Development of real-time visual feedback assistance in singing training: A review, J. Comput. Assist. Learn. 22(4), 308–316 (2006)CrossRefGoogle Scholar
  25. 42.25
    S. Grollmisch, E. Cano Cerón, C. Dittmar: Songs2see: Learn to play by playing. In: 41st Int. Audio Eng. Soc. Conf. (AES) (2011)Google Scholar
  26. 42.26
    Z. Jin, J. Jia, Y. Liu, Y. Wang, L. Cai: An automatic grading method for singing evaluation, Rec. Adv. Comput. Sci. Inf. Eng. 5, 691–696 (2012)Google Scholar
  27. 42.27
    C. Dittmar, E. Cano, J. Abeßer, S. Grollmisch: Music information retrieval meets music education, Multimod. Music Process. 3, 95–120 (2012)Google Scholar
  28. 42.28
    E. Gómez, A. Klapuri, B. Meudic: Melody description and extraction in the context of music content processing, J. New Music Res. 32(1), 23–40 (2003)CrossRefGoogle Scholar
  29. 42.29
    A. De Cheveigné, H. Kawahara: YIN, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Am. 111(4), 1917 (2002)CrossRefGoogle Scholar
  30. 42.30
    T. Viitaniemi, A. Klapuri, A. Eronen: A probabilistic model for the transcription of single-voice melodies. In: Proc. 2003 Finn. Signal Process. Symp. FINSIG'03 (2003) pp. 59–63Google Scholar
  31. 42.31
    M. Ryynänen, A. Klapuri: Modelling of Note Events for Singing Transcription. In: Proc. ISCA Tutor. Res. Workshop Stat. Percept. Audio Process. (SAPA) (2004)Google Scholar
  32. 42.32
    G.E. Poliner, D.P.W. Ellis, A.F. Ehmann, E. Gómez, S. Streich, B. Ong: Melody transcription from music audio: Approaches and evaluation, IEEE Trans. Audio Speech Lang. Process. 15(4), 1247–1256 (2007)CrossRefGoogle Scholar
  33. 42.33
    E. Molina: Automatic Scoring of Signing Voice Based on Melodic Similarity Measures (Universitat Pompeu Fabra, Barcelona 2012)Google Scholar
  34. 42.34
    R.J. McNab, L.A. Smith, I.H. Witten: Signal processing for melody transcription, Proc. 19th Australas. Comput. Sci. Conf. 18(4), 301–307 (1996)Google Scholar
  35. 42.35
    M. Ryynänen: Singing transcription. In: Signal Processing Methods for Music Transcription, ed. by A. Klapuri, M. Davy (Springer Science/Business Media LLC, New York 2006) pp. 361–390CrossRefGoogle Scholar
  36. 42.36
    J.J. Mestres, J.B. Sanjaume, M. De Boer, A.L. Mira: Audio Recording Analysis and Rating, US Patent 8158871 (2012) Google Scholar
  37. 42.37
    G. Haus, E. Pollastri: An audio front end for query-by-humming systems. In: Proc. 2nd Int. Symp. Music Inf. Retriev. (ISMIR) (2001) pp. 65–72Google Scholar
  38. 42.38
    W. Krige, T. Herbst, T. Niesler: Explicit transition modelling for automatic singing transcription, J. New Music Res. 37(4), 311–324 (2008)CrossRefGoogle Scholar
  39. 42.39
    E. Molina: Hacer música… para aprender a componer, Eufonia, Didáct. Músic. 51, 53–64 (2011)Google Scholar
  40. 42.40
    M.K. Shan, S.C. Chiu: Algorithmic compositions based on discovered musical patterns, Multimed. Tools Appl. 46(1), 1–23 (2010)CrossRefGoogle Scholar
  41. 42.41
    P.J. Ponce de León: Statistical description models for melody analysis and characterization. In: Proc. Int. Comput. Music Conf., ed. by J.M. Iñesta (2004) pp. 149–156Google Scholar
  42. 42.42
    Association MIDI Manufacturers: The Complete MIDI 1.0 Detailed Specification (The MIDI Manufacturers Association, Los Angeles 1996)Google Scholar
  43. 42.43
    R.S. Brindle: Musical Composition (Oxford Univ. Press, Oxford 1986)Google Scholar
  44. 42.44
    F. Lerdahl, R. Jackendoff: A Generative Theory of Tonal Music (MIT Press, Cambridge 1983)Google Scholar
  45. 42.45
    W.T. Fitch, A.J. Rosenfeld: Perception and production of syncopated rhythms, Music Percept. 25, 43–58 (2007)CrossRefGoogle Scholar
  46. 42.46
    W. Appel: Harvard Dictionary of Music, 2nd edn. (The Belknap Press of Harvard Univ., Cambridge, London 2000)Google Scholar
  47. 42.47
    K. Seyerlehner, G. Widmer, D. Schnitzer: From rhythm patterns to perceived tempo. In: Int. Soc. Music Inf. Retriev. (ISMIR) (2007) pp. 519–524Google Scholar
  48. 42.48
    M.F. McKinney, D. Moelants: Ambiguity in Tempo Perception: What Draws Listeners to Different Metrical Levels? (Univ. of California Press, Oakland 2006) pp. 155–166Google Scholar
  49. 42.49
    M. Gainza, D. Barry, E. Coyle: Automatic bar line segmentation. In: 123rd Convent. Audio Eng. Soc. Convent. Paper (2007)Google Scholar
  50. 42.50
    M. Gainza, E. Coyle: Time signature detection by using a multi resolution audio similarity matrix. In: 122nd Convent. Audio Eng. Soc. Convent. Paper (2007)Google Scholar
  51. 42.51
    J. Foote, M. Cooper: Visualizing musical structure and rhythm via self-similarity. In: Proc. 2001 Int. Comput. Music Conf. (2001) pp. 419–422Google Scholar
  52. 42.52
    J.R. Quinlan: C4.5: Programs for Machine Learning (Morgan Kaufmann, San Francisco 1993)Google Scholar
  53. 42.53
    J. Platt: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines (Microsoft Research, Redmond 1998)Google Scholar
  54. 42.54
    M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten: The WEKA data mining software: An update, SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  55. 42.55
    W.J. Downling, D.S. Fujitani: Contour, interval and pitch recognition in memory for melodies, J. Acoust. Soc. Am. 49, 524–531 (1971)CrossRefGoogle Scholar
  56. 42.56
    E. Schellenberg: Simplifying the implication-realization model of musical expectancy, Music Percept. 14(3), 295–318 (1997)CrossRefGoogle Scholar
  57. 42.57
    E. Narmour: The Analysis and Cognition of Melodic Complexity: The Implication-Realization Model (Univ. of Chicago Press, Chicago, London 1992)Google Scholar
  58. 42.58
    D. Roca, E. Molina (Eds.): Vademecum Musical (Enclave Creativa, Madrid 2006)Google Scholar
  59. 42.59
    B. Benward: Music: In Theory and Practice, Vol. 1, 7th edn. (McGraw-Hill, New York 2003)Google Scholar
  60. 42.60
    R.W. Ottman: Elementary Harmony: Theory and Practice, 5th edn. (Prentice Hall, Englewood Cliffs 1989)Google Scholar
  61. 42.61
    A.E. Yilmaz, Z. Telatar: Note-against-note two-voice counterpoint by means of fuzzy logic, Knowl.-Based Syst. 23(3), 256–266 (2010)CrossRefGoogle Scholar
  62. 42.62
    E. Molina, I. Barbancho, E. Gomez, A.M. Barbancho, L.J. Tardon: Fundamental frequency alignment vs. note-based melodic similarity for singing voice assessment. In: IEEE Int. Conf. on Acoust. Speech Signal Process. (ICASSP) (2013) pp. 744–748Google Scholar
  63. 42.63
    J. Wapnick, E. Ekholm: Expert consensus in solo voice performance evaluation, J. Voice 11(4), 429–436 (1997)CrossRefGoogle Scholar
  64. 42.64
    L.R. Rabiner, R.W. Schafer: Digital Processing of Speech Signals, Prentice-Hall Series in Signal Processing No. 7, Vol. 25 (Prentice Hall, Englewood Cliffs 1978) p. 290Google Scholar
  65. 42.65
    A. De Cheveigné: Matlab Implementation of YIN Algorithm, (2012)
  66. 42.66
    H. Sakoe: Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. Acoust. Speech Signal Process. 26, 43–49 (1978)CrossRefGoogle Scholar
  67. 42.67
    C.A. Ratanamahatana, E. Keogh: Everything you know about dynamic time warping is wrong. In: 3rd Workshop Min. Tempor. Seq. Data, 10th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. (KDD-2004) (2004)Google Scholar
  68. 42.68
    D. Ellis: Dynamic Time Warp (DTW) in Matlab, (2003)

Copyright information

© Springer-Verlag Berlin Heidelberg 2018

Authors and Affiliations

  • Lorenzo J. Tardón
    • 1
    Email author
  • Isabel Barbancho
    • 2
  • Carles Roig
    • 3
  • Emilio Molina
    • 1
  • Ana M. Barbancho
    • 4
  1. 1.Departamento de Ingeniería de Comunicaciones, ETSI TelecomunicaciónUniversidad de MálagaMalagaSpain
  2. 2.ATIC Research Group, Dep. Ingeniería de Comunicaciones, ETSI TelecomunicaciónUniversidad de MálagaMalagaSpain
  3. 3.ATIC Research Group, Dep. Ingeniería de Comunicaciones, ETSI TelecomunicaciónUniversidad de MálagaMalagaSpain
  4. 4.ATIC Research Group, Dep. Ingeniería de Comunicaciones, ETSI TelecomunicaciónUniversidad de MálagaMalagaSpain

Personalised recommendations