Journal of Intelligent Information Systems

, Volume 41, Issue 3, pp 407–434 | Cite as

Automatic music transcription: challenges and future directions

  • Emmanouil Benetos
  • Simon Dixon
  • Dimitrios Giannoulis
  • Holger Kirchhoff
  • Anssi Klapuri
Article

Abstract

Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects.

Keywords

Music signal analysis Music information retrieval Automatic music transcription 

References

  1. Abdallah, S.A., & Plumbley, M.D. (2004). Polyphonic transcription by non-negative sparse coding of power spectra. In 5th int. conf. on music information retrieval (pp. 318–325).Google Scholar
  2. Arberet, S., Ozerov, A., Bimbot, F. & Gribonval, R (2012). A tractable framework for estimating and combining spectral source models for audio source separation. Signal Processing, 92(8), 1886–1901.CrossRefGoogle Scholar
  3. Barbancho, A., Klapuri, A., Tardon, L. & Barbancho, I (2012). Automatic transcription of guitar chords and fingering from audio. IEEE Trans. Audio, Speech, and Language Processing, 20(3), 915–921.CrossRefGoogle Scholar
  4. Barbancho, I., de la Bandera, C., Barbancho, A., Tardon, L. (2009). Transcription and expressiveness detection system for violin music. In Int. conf. audio, speech, and signal processing (pp. 189–192).Google Scholar
  5. Barbedo, J. & Tzanetakis, G (2011). Musical instrument classification using individual partials. IEEE Trans. Audio, Speech, and Language Processing, 19(1), 111–122.CrossRefGoogle Scholar
  6. Bay, M. & Beauchamp, J. W (2012). Multiple-timbre fundamental frequency tracking using an instrument spectrum library. The. Journal of the Acoustical Society of America, 132(3), 1886.CrossRefGoogle Scholar
  7. Bay, M., Ehmann, A.F., Downie, J.S. (2009). Evaluation of multiple-F0 estimation and tracking systems. In 10th int. society for music information retrieval conf. (pp. 315–320).Google Scholar
  8. Bello, J., Daudet, L., Abdallah, S., Duxbury, C., Davies, M. & Sandler, M (2005). A tutorial on onset detection in musical signals. IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1047.CrossRefGoogle Scholar
  9. Bello, J.P. (2003). Towards the automated analysis of simple polyphonic music: A knowledge-based approach. Ph.D. thesis, Department of Electronic Engineering, Queen Mary University of London.Google Scholar
  10. Bello, J. P., Daudet, L. & Sandler, M. B (2006). Automatic piano transcription using frequency and time-domain information. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2242–2251.CrossRefGoogle Scholar
  11. Benetos, E., & Dixon, S. (2011). Polyphonic music transcription using note onset and offset detection. In IEEE international conference on acoustics, speech, and signal processing (pp. 37–40). Prague, Czech Republic.Google Scholar
  12. Benetos, E. & Dixon, S (2012). A shift-invariant latent variable model for automatic music transcription. Computer Music Journal, 36(4), 81–94.CrossRefGoogle Scholar
  13. Benetos, E., Dixon, S., Giannoulis, D., Kirchhoff, H., Klapuri, A. (2012). Automatic music transcription: Breaking the glass ceiling. In 13th int. society for music information retrieval conf. (pp. 379–384).Google Scholar
  14. Benetos, E., Klapuri, A., Dixon, S. (2012). Score-informed transcription for automatic piano tutoring. In 20th European signal processing conf. (pp. 2153–2157).Google Scholar
  15. Bertin, N., Badeau, R., Richard, G. (2007). Blind signal decompositions for automatic transcription of polyphonic music: NMF and K-SVD on the benchmark. In IEEE international conference on acoustics, speech, and signal processing (pp. 65–68).Google Scholar
  16. Bertin, N., Badeau, R. & Vincent, E (2010). Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Trans. Audio, Speech, and Language Processing, 18(3), 538–549.CrossRefGoogle Scholar
  17. Böck, S., Arzt, A., Krebs, F., Schedl, M. (2012). Online realtime onset detection with recurrent neural networks. In Proceedings of the 15th international conference on digital audio effects.Google Scholar
  18. Bosch, J., Janer, J., Fuhrmann, F., Herrera, P. (2012). A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In 13th int. society for music information retrieval conf. (pp. 559–564).Google Scholar
  19. Brown, J (1991). Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America, 89(1), 425–434.CrossRefGoogle Scholar
  20. Buckheit, J.B., & Donoho, D.L. (1995). WaveLab and reproducible research. Tech. Rep. 474, Dept of Statistics, Stanford Univ.Google Scholar
  21. Burred, J., Robel, A., Sikora, T. (2009). Polyphonic musical instrument recognition based on a dynamic model of the spectral envelope. In Int. conf. audio, speech, and signal processing (pp. 173–176).Google Scholar
  22. Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C. & Slaney, M (2008). Content-based music information retrieval: current directions and future challenges. Proceedings of the IEEE, 96(4), 668–696.CrossRefGoogle Scholar
  23. Cemgil, A. & Kappen, B (2003). Monte carlo methods for tempo tracking and rhythm quantization. Journal of Artificial Intelligence Research, 18, 45–81.MATHGoogle Scholar
  24. Cemgil, A.T. (2004). Bayesian music transcription. Ph.D. thesis, Radboud University Nijmegen, Netherlands.Google Scholar
  25. Cemgil, A. T., Kappen, H. J. & Barber, D (2006). A generative model for music transcription. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 679–694.CrossRefGoogle Scholar
  26. Collins, N. (2005). A comparison of sound onset detection algorithms with emphasis on psychoacoustically motivated detection functions. In 118th convention of the audio engineering society. Barcelona, Spain.Google Scholar
  27. Cont, A. (2006). Realtime multiple pitch observation using sparse non-negative constraints. In 7th international conference on music information retrieval.Google Scholar
  28. Dannenberg, R. (2005). Toward automated holistic beat tracking, music analysis, and understanding. In 6th int. conf. on music information retrieval (pp. 366–373).Google Scholar
  29. Davies, M. & Plumbley, M (2007). Context-dependent beat tracking of musical audio. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 1009–1020.CrossRefGoogle Scholar
  30. Davy, M., Godsill, S. & Idier, J (2006). Bayesian analysis of western tonal music. Journal of the Acoustical Society of America, 119(4), 2498–2517.CrossRefGoogle Scholar
  31. Degara, N., Davies, M., Pena, A. & Plumbley, M (2011). Onset event decoding exploiting the rhythmic structure of polyphonic music. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1228–1239.CrossRefGoogle Scholar
  32. Degara, N., Rua, E. A., Pena, A., Torres-Guijarro, S., Davies, M. & Plumbley, M (2012). Reliability-informed beat tracking of musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 290–301.CrossRefGoogle Scholar
  33. Desain, P. & Honing, H (1999). Computational models of beat induction: the rule-based approach. Journal of New. Music Research, 28(1), 29–42.CrossRefGoogle Scholar
  34. Dessein, A., Cont, A., Lemaitre, G. (2010). Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. In 11th int. society for music information retrieval conf. (pp. 489–494).Google Scholar
  35. Dittmar, C., & Abeßer, J. (2008). Automatic music transcription with user interaction. In 34. Deutsche jahrestagung für akustik (DAGA) (pp. 567–568).Google Scholar
  36. Dittmar, C., Cano, E., Abeßer, J., Grollmisch, S. (2012). Music information retrieval meets music education. In M. Müller, M. Goto, M. Schedl (Eds.), Multimodal music processing. Dagstuhl follow-ups (Vol. 3, pp. 95–120). Schloss Dagstuhl–Leibniz-Zentrum für Informatik.Google Scholar
  37. Dixon, S (2001). Automatic extraction of tempo and beat from expressive performances. Journal of New. Music Research, 30(1), 39–58.CrossRefGoogle Scholar
  38. Dixon, S., Goebl, W. & Cambouropoulos, E (2006). Perceptual smoothness of tempo in expressively performed music. Music Perception, 23(3), 195–214.CrossRefGoogle Scholar
  39. Dressler, K. (2012). Multiple fundamental frequency extraction for MIREX 2012. In Music information retrieval evaluation eXchange. http:www.music-ir.org/mirex/abstracts/2012/KD1.pdf.
  40. Duan, Z., Pardo, B. & Zhang, C (2010). Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 2121–2133.CrossRefGoogle Scholar
  41. Durrieu, J., & Thiran, J. (2012). Musical audio source separation based on user-selected F0 track. In 10th int. conf. latent variable analysis and source separation (pp. 438–445).Google Scholar
  42. Eggink, J., & Brown, G. (2003). A missing feature approach to instrument identification in polyphonic music. In Int. conf. audio, speech, and signal processing (Vol. 5, pp. 553–556).Google Scholar
  43. Emiya, V., Badeau, R. & David, B (2010). Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1643–1654.CrossRefGoogle Scholar
  44. Ewert, S., & Müller, M. (2011). Estimating note intensities in music recordings. In Int. conf. audio, speech, and signal processing (pp. 385–388).Google Scholar
  45. Ewert, S., & Müller, M. (2012). Using score-informed constraints for NMF-based source separation. In Int. conf. audio, speech, and signal processing (pp. 129–132).Google Scholar
  46. Ewert, S., Muller, M., Grosche, P. (2009). High resolution audio synchronization using chroma onset features. In IEEE international conference on audio, speech and signal processing (pp. 1869–1872).Google Scholar
  47. Eyben, F., Böck, S., Schuller, B., Graves, A. (2012). Universal onset detection with bidirectional long short-term memory neural networks. In 11th international society for music information retrieval conference.Google Scholar
  48. Fourer, D., & Marchand, S. (2012). Informed multiple-F0 estimation applied to monaural audio source separation. In 20th European signal processing conf. (pp. 2158–2162).Google Scholar
  49. Freund, Y., Schapire, R. & Abe, N (1999). A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5), 771–780.Google Scholar
  50. Fuentes, B., Badeau, R., Richard, G. (2011). Adaptive harmonic time-frequency decomposition of audio using shift-invariant PLCA. In Int. conf. audio, speech, and signal processing (pp. 401–404).Google Scholar
  51. Fuentes, B., Badeau, R., Richard, G. (2012). Blind harmonic adaptive decomposition applied to supervised source separation. In 20th European signal processing conf. (pp. 2654–2658).Google Scholar
  52. Gang, R., Bocko, G., Lundberg, J., Roessner, S., Headlam, D., Bocko, M. (2011). A real-time signal processing framework of musical expressive feature extraction using MATLAB. In 12th int. society for music information retrieval conf. (pp. 115–120).Google Scholar
  53. Giannoulis, D., & Klapuri, A. (2013). Musical instrument recognition in polyphonic audio using missing feature approach. In IEEE transactions on audio, speech, and language processing (Vol. 21, no. 9, pp. 1805–1817). doi:10.1109/TASL.2013.2248720.
  54. Gillet, O., & Richard, G. (2003). Automatic labelling of tabla signals. In 4th int. conf. on music information retrieval.Google Scholar
  55. Goto, M (2004). A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, 43, 311–329.CrossRefGoogle Scholar
  56. Goto, M. (2012). Grand challenges in music information research. In M. Müller, M. Goto, M. Schedl (Eds.), Multimodal music processing. Dagstuhl follow-ups (Vol. 3, pp. 217–225). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.Google Scholar
  57. Goto, M., Hashiguchi, H., Nishimura, T., Oka, R. (2002). RWC music database: Popular, classical, and jazz music databases. In Proc. ISMIR (Vol. 2, pp. 287–288).Google Scholar
  58. Gouyon, F. & Dixon, S (2005). A review of automatic rhythm description systems. Computer Music Journal, 29(1), 34–54.CrossRefGoogle Scholar
  59. Gouyon, F., Klapuri, A., Dixon, S., Alonso, M., Tzanetakis, G. & Uhle, C (2006). An experimental comparison of audio tempo induction algorithms. IEEE Transactions on Audio, Speech and Language Processing, 14(5), 1832–1844.CrossRefGoogle Scholar
  60. Grindlay, G. & Ellis, D (2011). Transcribing multi-instrument polyphonic music with hierarchical eigeninstruments. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1159–1169.CrossRefGoogle Scholar
  61. Grosche, P., Schuller, B., Müller, M. & Rigoll, G (2012). Automatic transcription of recorded music. Acta. Acustica United with Acustica, 98(2), 199–215.CrossRefGoogle Scholar
  62. Heittola, T., Klapuri, A., Virtanen, T. (2009). Musical instrument recognition in polyphonic audio using source-filter model for sound separation. In 10th int. society for music information retrieval conf. (pp. 327–332).Google Scholar
  63. Herrera-Boyer, P., Klapuri, A., Davy, M. (2006). Automatic classification of pitched musical instrument sounds. In Signal processing methods for music transcription (pp. 163–200).Google Scholar
  64. Holzapfel, A., Stylianou, Y., Gedik, A. & Bozkurt, B (2010). Three dimensions of pitched instrument onset detection. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1517–1527.CrossRefGoogle Scholar
  65. Huang, X., Acero, A., Hon, H.W. (Eds.). (2001). Spoken language processing: A guide to theory, algorithm and system development. Prentice Hall.Google Scholar
  66. Humphrey, E.J., Bello, J.P., LeCun, Y. (2013). Feature learning and deep architectures: new directions for music informatics. Journal of Intelligent Information Systems. doi:10.1007/s10844-013-0248-5.
  67. Itoyama, K., Goto, M., Komatani, K., Ogata, T., Okuno, H. (2011). Simultaneous processing of sound source separation and musical instrument identification using Bayesian spectral modeling. In Int. conf. audio, speech, and signal processing (pp. 3816–3819).Google Scholar
  68. Izmirli, O. (2005). An algorithm for audio key finding. In Music information retrieval evaluation exchange. http://www.music-ir.org/mirex/abstracts/2005/izmirli.pdf.
  69. Kameoka, H., Nishimoto, T. & Sagayama, S (2007). A multipitch analyzer based on harmonic temporal structured clustering. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 982–994.CrossRefGoogle Scholar
  70. Kameoka, H., Ochiai, K., Nakano, M., Tsuchiya, M., Sagayama, S. (2012). Context-free 2D tree structure model of musical notes for Bayesian modeling of polyphonic spectrograms. In 13th int. society for music information retrieval conf. (pp. 307–312).Google Scholar
  71. Kasimi, A.A., Nichols, E., Raphael, C. (2007). A simple algorithm for automatic generation of polyphonic piano fingerings. In 8th international conference on music information retrieval (pp. 355–356). Vienna, Austria.Google Scholar
  72. Kirchhoff, H., Dixon, S., Klapuri, A. (2012). Shift-variant non-negative matrix deconvolution for music transcription. In Int. conf. audio, speech, and signal processing (pp. 125–128).Google Scholar
  73. Kitahara, T., Goto, M., Komatani, K., Ogata, T. & Okuno, H. G (2007). Instrogram: probabilistic representation of instrument existence for polyphonic music. Information and Media Technologies, 2(1), 279–291.Google Scholar
  74. Klapuri, A (2003). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Audio, Speech, and Language Processing, 11(6), 804–816.CrossRefGoogle Scholar
  75. Klapuri, A., Davy, M. (Eds.). (2006). Signal processing methods for music transcription. Springer.Google Scholar
  76. Klapuri, A., Eronen, A. & Astola, J (2006). Analysis of the meter of acoustic musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 342–355.CrossRefGoogle Scholar
  77. Klapuri, A., Eronen, A., Seppänen, J., Virtanen, T. (2001). Automatic transcription of music. In Symposium on stochastic modeling of music. Ghent, Belgium.Google Scholar
  78. Koretz, A. & Tabrikian, J (2011). Maximum a posteriori probability multiple pitch tracking using the harmonic model. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2210–2221.CrossRefGoogle Scholar
  79. Lacoste, A., & Eck, D. (2007). A supervised classification algorithm for note onset detection. EURASIP Journal on Applied Signal Processing, 2007(1), 1–13. ID 43745.Google Scholar
  80. Large, E. & Kolen, J (1994). Resonance and the perception of musical meter. Connection Science, 6, 177–208.CrossRefGoogle Scholar
  81. Lee, C. T., Yang, Y. H. & Chen, H (2012). Multipitch estimation of piano music by exemplar-based sparse representation. IEEE Trans. Multimedia, 14(3), 608–618.CrossRefGoogle Scholar
  82. Lee, K. & Slaney, M (2008). Acoustic chord transcription and key extraction from audio using key-dependent hmms trained on synthesized audio. IEEE Transactions on Audio, Speech and Language Processing, 16(2), 291–301.CrossRefGoogle Scholar
  83. Leveau, P., Vincent, E., Richard, G. & Daudet, L (2008). Instrument-specific harmonic atoms for mid-level music representation. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 116–128.CrossRefGoogle Scholar
  84. Little, D., & Pardo, B. (2008). Learning musical instruments from mixtures of audio with weak labels. In 9th int. conf. on music information retrieval (p. 127).Google Scholar
  85. Loscos, A., Wang, Y., Boo, W. (2006). Low level descriptors for automatic violin transcription. In 7th int. conf. on music information retrieval (pp. 164–167).Google Scholar
  86. Maezawa, A., Itoyama, K., Komatani, K., Ogata, T. & Okuno, H. G (2012). Automated violin fingering transcription through analysis of an audio recording. Computer Music Journal, 36(3), 57–72.CrossRefGoogle Scholar
  87. Marolt, M (2012). Automatic transcription of bell chiming recordings. IEEE Transactions on Audio, Speech, and Language Processing, 20(3), 844–853.CrossRefGoogle Scholar
  88. Mauch, M. & Dixon, S (2010). Simultaneous estimation of chords and musical context from audio. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1280–1289.CrossRefGoogle Scholar
  89. Mauch, M., Noland, K., Dixon, S. (2009). Using musical structure to enhance automatic chord transcription. In 10th int. society for music information retrieval conf. (pp. 231–236).Google Scholar
  90. McKinney, M., Moelants, D., Davies, M. & Klapuri, A (2007). Evalutation of audio beat tracking and music tempo extraction algorithms. Journal of New. Music Research, 36(1), 1–16.CrossRefGoogle Scholar
  91. Music Information Retrieval Evaluation eXchange (MIREX) (2011). http://music-ir.org/mirexwiki/. Accessed 8 Jul 2013.
  92. Müller, M., Ellis, D., Klapuri, A. & Richard, G (2011). Signal processing for music analysis. IEEE J. Selected Topics in Signal Processing, 5(6), 1088–1110.CrossRefGoogle Scholar
  93. Nam, J., Ngiam, J., Lee, H., Slaney, M. (2011). A classification-based polyphonic piano transcription approach using learned feature representations. In 12th int. society for music information retrieval conf. (pp. 175–180).Google Scholar
  94. Nesbit, A., Hollenberg, L., Senyard, A. (2004). Towards automatic transcription of Australian aboriginal music. In 5th int. conf. on music information retrieval (pp. 326–330).Google Scholar
  95. Noland, K., & Sandler, M. (2006). Key estimation using a hidden markov model. In Proceedings of the 7th international conference on music information retrieval (ISMIR) (pp. 121–126).Google Scholar
  96. Ochiai, K., Kameoka, H., Sagayama, S. (2012). Explicit beat structure modeling for non-negative matrix factorization-based multipitch analysis. In Int. conf. audio, speech, and signal processing (pp. 133–136).Google Scholar
  97. O’Hanlon, K., Nagano, H., Plumbley, M. (2012). Structured sparsity for automatic music transcription. In IEEE international conference on audio, speech and signal processing (pp. 441–444).Google Scholar
  98. Oram, A., & Wilson, G. (2010). Making software: What really works, and why we believe it. O’Reilly Media, Incorporated.Google Scholar
  99. Oudre, L., Grenier, Y., Févotte, C. (2009). Template-based chord recognition: Influence of the chord types. In 10th international society for music information retrieval conference (pp. 153–158).Google Scholar
  100. Özaslan, T., Serra, X., Arcos, J.L. (2012). Characterization of embellishments in Ney performances of Makam music in Turkey. In 13th int. society for music information retrieval conf. Google Scholar
  101. Ozerov, A., Vincent, E. & Bimbot, F (2012). A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio, Speech, and Language Processing, 20(4), 1118–1133.CrossRefGoogle Scholar
  102. Papadopoulos, H., & Peeters, G. (2008). Simultaneous estimation of chord progression and downbeats from an audio file. In IEEE international conference on acoustics, speech and signal processing (pp. 121–124).Google Scholar
  103. Papadopoulos, H. & Peeters, G (2011). Joint estimation of chords and downbeats from an audio signal. IEEE Transactions on Audio, Speech and Language Processing, 19(1), 138–152.CrossRefGoogle Scholar
  104. Peeling, P. & Godsill, S (2011). Multiple pitch estimation using non-homogeneous Poisson processes. IEEE J. Selected Topics in Signal Processing, 5(6), 1133–1143.CrossRefGoogle Scholar
  105. Peeters, G. (2006). Musical key estimation of audio signal based on hidden Markov modeling of chroma vectors. In Proceedings of the 9th international conference on digital audio effects (pp. 127–131).Google Scholar
  106. Pertusa, A., & Iñesta, J.M. (2008). Multiple fundamental frequency estimation using Gaussian smoothness. In int. conf. audio, speech, and signal processing (pp. 105–108).Google Scholar
  107. Poliner, G. & Ellis, D (2007). A discriminative model for polyphonic piano transcription. EURASIP J. Advances in Signal Processing, 8, 154–162.Google Scholar
  108. Poliner, G., Ellis, D., Ehmann, A., Gomez, E., Streich, S. & Ong, B (2007). Melody transcription from music audio: Approaches and evaluation. IEEE Trans. Audio, Speech, and Language Processing, 15(4), 1247–1256.CrossRefGoogle Scholar
  109. Raczyński, S.A., Ono, N., Sagayama, S. (2009). Note detection with dynamic bayesian networks as a postanalysis step for NMF-based multiple pitch estimation techniques. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 49–52).Google Scholar
  110. Raczynski, S.A., Vincent, E., Bimbot, F., Sagayama, S., et al. (2010). Multiple pitch transcription using DBN-based musicological models. In 2010 int. society for music information retrieval conf. (ISMIR) (pp. 363–368).Google Scholar
  111. Radicioni, D.P., & Lombardo, V. (2005) Fingering for music performance. In International computer music conference (pp. 527–530).Google Scholar
  112. Raphael, C. (2005). A graphical model for recognizing sung melodies. In 6th international conference on music information retrieval (pp. 658–663).Google Scholar
  113. Reis, G., Fonseca, N., de Vega, F.F., Ferreira, A. (2008). Hybrid genetic algorithm based on gene fragment competition for polyphonic music transcription. In Conf. applications of evolutionary computing (pp. 305–314).Google Scholar
  114. Röbel, A. (2005). Onset detection in polyphonic signals by means of transient peak classification. In Music information retrieval evaluation exchange. http://www.music-ir.org/evaluation/mirex-results/articles/onset/roebel.pdf.
  115. Ryynänen, M., & Klapuri, A. (2005). Polyphonic music transcription using note event modeling. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 319–322).Google Scholar
  116. Ryynänen, M. & Klapuri, A (2008). Automatic transcription of melody, bass line, and chords in polyphonic music. Computer Music Journal, 32(3), 72–86.CrossRefGoogle Scholar
  117. Scheirer, E. (1997). Using musical knowledge to extract expressive performance information from audio recordings. In H. Okuno, D. Rosenthal (Eds.), Readings in computational auditory scene analysis. Lawrence Erlbaum.Google Scholar
  118. Serra, X., Magas, M., Benetos, E., Chudy, M., Dixon, S., Flexer, A., Gómez, E., Gouyon, F., Herrera, P., Jorda, S., Paytuvi, O., Peeters, G., Schlüter, J., Vinet, H., Widmer, G. (2013). Roadmap for music information research. Creative Commons BY-NC-ND 3.0 license. http://mires.eecs.qmul.ac.uk.
  119. Smaragdis, P., & Brown, J.C. (2003). Non-negative matrix factorization for polyphonic music transcription. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 177–180).Google Scholar
  120. Smaragdis, P. & Mysore, G. J (2009). Separation by humming: User-guided sound extraction from monophonic mixtures. In, IEEE workshop on applications of signal processing to audio and acoustics (WASPAA). USA: New Paltz.Google Scholar
  121. Smaragdis, P., Raj, B. & Shashanka, M (2006). A probabilistic latent variable model for acoustic modeling. In, Neural information processing systems workshop. Canada: Whistler.Google Scholar
  122. Vandewalle, P., Kovacevic, J. & Vetterli, M (2009). Reproducible research in signal processing. Signal Processing Magazine, IEEE, 26(3), 37–47.CrossRefGoogle Scholar
  123. Vincent, E., Bertin, N. & Badeau, R (2010). Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio, Speech, and Language Processing, 18(3), 528–537.CrossRefGoogle Scholar
  124. Wang, Y. & Zhang, B (2008). Application-specific music transcription for tutoring. IEEE MultiMedia, 15(3), 70–74.CrossRefGoogle Scholar
  125. Wilson, G., Aruliah, D., Brown, C.T., Hong, N.P.C., Davis, M., Guy, R.T., Haddock, S.H., Huff, K., Mitchell, I.M., Plumbley, M.D., et al. (2012). Best practices for scientific computing. arXiv preprint arXiv:1210.0530.
  126. Wu, J., Vincent, E., Raczynski, S., Nishimoto, T., Ono, N., Sagayama, S. (2011). Multipitch estimation by joint modeling of harmonic and transient sounds. In Int. conf. audio, speech, and signal processing (pp. 25–28).Google Scholar
  127. Yeh, C. (2008). Multiple fundamental frequency estimation of polyphonic recordings. Ph.D. thesis, Université Paris VI - Pierre et Marie Curie, France.Google Scholar
  128. Yoshii, K. & Goto, M (2012). A nonparametric Bayesian multipitch analyzer based on infinite latent harmonic allocation. IEEE Trans. Audio, Speech, and Language Processing, 20(3), 717–730.CrossRefGoogle Scholar
  129. Zhou, R., & Reiss, J. (2007). Music onset detection combining energy-based and pitch-based approaches. In Music information retrieval evaluation exchange. http://www.music-ir.org/mirex/abstracts/2007/OD_zhou.pdf.

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Emmanouil Benetos
    • 1
  • Simon Dixon
    • 2
  • Dimitrios Giannoulis
    • 2
  • Holger Kirchhoff
    • 2
  • Anssi Klapuri
    • 3
    • 4
  1. 1.Department of Computer ScienceCity University LondonLondonUK
  2. 2.Centre for Digital MusicQueen Mary University of LondonLondonUK
  3. 3.Ovelin Ltd.HelsinkiFinland
  4. 4.Tampere University of TechnologyTampereFinland

Personalised recommendations