Skip to main content

Automatic Music Transcription: From Monophonic to Polyphonic

  • Chapter
Musical Robots and Interactive Multimodal Systems

Part of the book series: Springer Tracts in Advanced Robotics ((STAR,volume 74))

Abstract

Music understanding from an audio track and performance is a key problem and a challenge for many applications ranging from: automated music transcoding, music education, interactive performance, etc. The transcoding of polyphonic music is a one of the most complex and still open task to be solved in order to become a common tool for the above mentioned applications. Techniques suitable for monophonic transcoding have shown to be largely unsuitable for polyphonic cases. Recently, a range of polyphonic transcoding algorithms and models have been proposed and compared against worldwide accepted test cases such as those adopted in the MIREX competition. Several different approaches are based on techniques such as: pitch trajectory analysis, harmonic clustering, bispectral analysis, event tracking, nonnegative matrix factorization, hidden Markov model. This chapter analyzes the evolution of music understanding algorithms and models from monophonic to polyphonic, showing and comparing the solutions, while analysing them against commonly accepted assessment methods and formal metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Argenti, F., Nesi, P., Pantaleo, G.: Automatic Transcription of Polyphonic Music Based on Constant-Q Bispectral Analysis for MIREX 2009. In: Proc. of 10th ISMIR Conference (2009)

    Google Scholar 

  2. Bello, J.P.: Towards the Automated Analysis of Simple Polyphonic Music: A Knowledge-based Approach. PhD Thesis (2003)

    Google Scholar 

  3. Bello, J.P., Daudet, L., Sandler, M.B.: Automatic piano transcription using frequency and time-domain information. IEEE Transactions on Audio, Speech, and Language Processing 14(6), 2242–2251 (2006)

    Article  Google Scholar 

  4. Ortiz-Berenguer, L.I., Casajús-Quirós, F.J., Torres-Guijarro, S.: Multiple piano note identification using a spectral matching method with derived patterns. Journal of Audio Engineering Society 53(1/2), 32–43 (2005)

    Google Scholar 

  5. Bregman, A.: Auditory Scene Analysis.The MIT Press, Cambridge (1990)

    Google Scholar 

  6. Brossier, P.M.: Automatic Annotation of Musical Audio for Interactive Applications. PhD Thesis, Centre for Digital Music Queen Mary, University of London (2006)

    Google Scholar 

  7. Bruno, I., Nesi, P.: Automatic Music Transcription Supporting Different Instruments. Journal of New Music Research 34(2), 139–149 (2005)

    Article  Google Scholar 

  8. Cemgil, A.T., Kappen, H.J., Barber, D.: A Generative Model for Music Transcription. IEEE Transaction on Audio, Speech and Language Processing 14(2), 679–694 (2006)

    Article  Google Scholar 

  9. Chang, W.C., Su, A.W.Y., Yeh, C., Roebel, A., Rodet, X.: Multiple F0 Tracking Based on a High Order HMM Model. In: Proc. of the 11th Int. Conference on Digital Audio Effects, DAFx 2008 (2008)

    Google Scholar 

  10. Chafe, C., Jaffe, D.: Source separation and note identification in polyphonic music. In: Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 1986), vol. 11, pp. 1289–1292 (1986)

    Google Scholar 

  11. Cont, A., Shlomo, D., Wessel, D.: Realtime multiple-pitch and multiple-instrument for music signals using sparse non-negative constraints. In: Proc. of 10th Int. Conference of Digital Audio Effects, DAFx 2007 (2007)

    Google Scholar 

  12. De Cheveigné, A., Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America 111(4), 1917–1930 (2002)

    Article  Google Scholar 

  13. Duan, Z., Zhang, Y., Zhang, C., Shi, Z.: Unsupervised single-channel music source separation by average harmonic structure modeling. IEEE Transactions on Audio, Speech and Language Processing 16(4), 766–778 (2008)

    Article  Google Scholar 

  14. Dubois, C., Davy, M.: Joint detection and tracking of time-varying harmonic components: a general bayesian framework. IEEE Transactions on Audio, Speech and Language Processing 15(4), 1283–1295 (2007)

    Article  Google Scholar 

  15. Duan, Z., Han, J., Pardo, B.: Harmonically Informed Multi-pitch Tracking. In: Proc. of 10th International Society for Music Information Retrieval Conference, ISMIR 2009 (2009)

    Google Scholar 

  16. Ellis, D.P.W.: Prediction-driven Computational Auditory Scene Analysis. PhD Thesis, Massachusetts Institute of Technology (1996)

    Google Scholar 

  17. Fernández-Cid, P., Casajús-Quirós, F.J.: Multi-pitch estimation for Polyphonic Musical Signals. In: Proc. Of IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP 1998), vol. 6, pp. 3565–3568 (1998)

    Google Scholar 

  18. Fletcher, N.F., Rossing, T.D.: The physics of musical instruments, 2nd edn. Springer, New York (1998)

    MATH  Google Scholar 

  19. Friedman, D.H.: Multichannel Zero-Crossing-Interval Pitch Estimation. In: Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 1979), vol. 4, pp. 764–767 (1979)

    Google Scholar 

  20. Gerhard, D.: Pitch Extraction and Fundamental Frequency: History and Current Techniques. Tech. Report TR-CS 2003-06, Dep. of Computer Science, University or Regina, Canada (2003)

    Google Scholar 

  21. Godsill, S.J., Davy, M.: Bayesian Harmonic Models for Musical Signal Analysis. Bayesian Statistics 7, 105–124 (2003)

    MathSciNet  Google Scholar 

  22. Godsill, S.J., Davy, M., Idier, J.: Bayesian analysis of polyphonic western tonal music. Journal of the Acoustical Society of America 119(4), 2498–2517 (2006)

    Article  Google Scholar 

  23. Gold, B., Rabiner, L.R.: Parallel Processing Techniques for Estimating Pitch Periods of Speech in the Time Domain. Journal of Acoustic Society of America 46(2), 442–448 (1969)

    Article  Google Scholar 

  24. Goto, M.: A robust predominant-f0 estimation method for real-time detection of melody and bass lines in cd recordings. In: Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2000), Istanbul, Turkey, vol. 2, pp. 757–760 (2000)

    Google Scholar 

  25. Goto, M.: A real-time music-scene-description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication - ISCA Journal 43(4), 311–329 (2004)

    Article  Google Scholar 

  26. Hawley, M.: Structure out of sound. Ph.D. thesis, MIT Media Laboratory, Cambridge, Massachusetts (1993)

    Google Scholar 

  27. Kameoka, H., Nishimoto, T., Sagayama, S.: A multipitch analyzer based on harmonic temporal structured clustering. IEEE Transactions on Audio, Speech, and Language Processing 15(3), 982–994 (2007)

    Article  Google Scholar 

  28. Kashino, K., Tanaka, H.: A Sound Source Separation System Using Spectral Features Integrated by Dempster’s Law of Combination. Annual Report of the Engineering Research Institute, vol. 52. University of Tokyo (1992)

    Google Scholar 

  29. Kashino, K., Tanaka, H.: A Sound Source Separation System with the Ability of Automatic Tone Modeling. In: Proc. of International Computer Music Conference (ICMC 1993), pp. 248–255 (1993)

    Google Scholar 

  30. Kashino, K., Nakadai, K., Kinoshita, T., Tanaka, H.: Application of Bayesian Probability Network to Music Scene Analysis. In: Computational Auditory Scene Analysis Workshop (IJCAI 1995), pp. 32–40 (1995)

    Google Scholar 

  31. Katayose, H., Inokuchi, S.: The KANSEI Music System. Computer Music Journal 13(4), 72–77 (1989)

    Article  Google Scholar 

  32. Klapuri, A.P.: Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Speech and Audio Processing 11(6), 804–816 (2003)

    Article  Google Scholar 

  33. Klapuri, A.P.: Signal Processing Methods for the Automatic Transcription of Music. PhD thesis, Tampere University of Technology (2004)

    Google Scholar 

  34. Klapuri, A.P.: Automatic Music Transcription as We Know it Today. Journal of New Music Research 2004 33(3), 269–282 (2004)

    Article  Google Scholar 

  35. Klapuri, A.P.: A perceptually motivated multiple-f0 estimation method. In: Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 291–294 (2005)

    Google Scholar 

  36. Klapuri, A.P.: Multipitch analysis of polyphonic music and speech signals using an auditory model. IEEE Transactions on Audio, Speech, and Language Processing 16(2), 255–266 (2008)

    Article  Google Scholar 

  37. Maher, R.C.: Evaluation for a Method for Separating Digitized Duet Signals. Journal of Acoustic Engineering Society 38(12), 956–979 (1990)

    Google Scholar 

  38. Markel, J.D.: The SIFT Algorithm for Fundamental Frequency Estimation. IEEE Transactions on Audio and Electroacoustics 16, 367–377 (1972)

    Article  Google Scholar 

  39. Marolt, M.: SONIC: Transcription of polyphonic piano music with neural networks. In: Workshop on Current Research Directions in Computer Music, Barcelona, Spain, pp. 217–224 (2001)

    Google Scholar 

  40. Marolt, M.: Networks of adaptive oscillators for partial tracking and transcription of music recordings. Journal of New Music Research 33(1), 49–59 (2004)

    Article  Google Scholar 

  41. Martin, K.D.: A blackboard system for automatic transcription of simple polyphonic music. Perceptual Computing Technical Report 385, MIT Media Lab (1996)

    Google Scholar 

  42. Martin, K.D.: Automatic Transcription of Simple Polyphonic Music: Robust Front End Processing. Technical Report #399, MIT Media Lab, Perceptual Computing Section, The MIT Press (1996)

    Google Scholar 

  43. Meddis, R., O’Mard, L.: A Unitary Model of Pitch Perception. The Journal of the Acoustical Society of America 102(3), 1811–1820 (1997)

    Article  Google Scholar 

  44. Miller, N.J.: Pitch detection by data reduction. IEEE Transaction on Audio, Speech and Language Processing 23(1), 72–79 (1975)

    Article  Google Scholar 

  45. Moorer, J.: On the transcription of musical sound by computer. Computer Music Journal 1(4), 32–38 (1977)

    Google Scholar 

  46. Nawab, S.H., Ayyash, S.A., Wotiz, R.: Identification of musical chords using constant-q spectra. In: IEEE Proc. on Acoustic, Speech and Signal Processing (ICASSP 2001), vol. 5, pp. 3373–3376 (2001)

    Google Scholar 

  47. Peretz, I., Coltheart, M.: Modularity of music processing. Nature Neuroscience 6(7), 688–691 (2003)

    Article  Google Scholar 

  48. Pertusa, A., Iñesta, J.M.: Multiple Fundamental Frequency estimation using Gaussian smoothness. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 2008, Las Vegas, USA, pp. 105–108 (2008)

    Google Scholar 

  49. Piszczalski, M., Galler, B.: Automatic music transcription. Computer Music Journal 1(4), 24–31 (1977)

    Google Scholar 

  50. Piszczalski, M., Galler, B.: Automatic music transcription. Computer Music Journal 66(3), 710–720 (1979)

    Google Scholar 

  51. Poliner, G.E., Ellis, D.P.W.: A Discriminative Model for Polyphonic Piano Transcription. IEEE Transaction on Audio, Speech and Language Processing 14(4), 1247–1256 (2007)

    Article  Google Scholar 

  52. Rabiner, L.R.: On the Use of Autocorrelation Analysis for Pitch Detection. IEEE Transaction on Acoustics, Speech and Signal Processing 25(1), 24–33 (1977)

    Article  Google Scholar 

  53. Rabiner, L.R.: A Comparative Performance Study of Several Pitch Detection Algorithms. IEEE Transaction on Acoustics, Speech and Signal Processing 24(5), 399–418 (1977)

    Article  Google Scholar 

  54. Raczynksi, S., Ono, N., Sagayama, S.: Multipitch analysis with harmonic nonnegative matrix approximation. In: Proc. of the 8th International Conference on Music Information Retrieval (ISMIR 2007), pp. 381–386 (2007)

    Google Scholar 

  55. Raphael, C.: Automatic transcription of piano music. In: Proc. on 3rd Int. Conf. on Music Information Retrieval, pp. 15–19 (2002)

    Google Scholar 

  56. Ryynänen, M.P., Klapuri, A.P.: Polyphonic Music Transcription Using Note Event Modeling. In: Proc. of 2005 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 16-19, pp. 319–322 (2005)

    Google Scholar 

  57. Ryynänen, M.P., Klapuri, A.P.: Automatic bass line transcription from streaming polyphonic audio. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), vol. 4, pp. 1437–1440 (2007)

    Google Scholar 

  58. Slaney, M., Lyon, R.F.: A Perceptual Pitch Detector. In: Proc. of IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP 1990), vol. 1, pp. 357–360 (1990)

    Google Scholar 

  59. Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz (NY), pp. 177–180 (2003)

    Google Scholar 

  60. Tolonen, T., Karjalainen, M.: A computationally efficient multipitch analysis model. IEEE Transactions on Speech and Audio Processing 8(6), 708–716 (2000)

    Article  Google Scholar 

  61. Vincent, E., Bertin, N., Badeau, R.: Harmonic and Inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch Transcription. In: Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2008), pp. 109–112 (2008)

    Google Scholar 

  62. Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. In: Proc. of IEEE Int. Conf. on Computational Intelligence for Measurement Systems and Applications, vol. 15(3), pp. 1066–1074 (2007)

    Google Scholar 

  63. Yeh, C.: Multiple Fundamental Frequency Estimation of Polyphonic Recordings. PhD Thesis, Université Paris VI (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Argenti, F., Nesi, P., Pantaleo, G. (2011). Automatic Music Transcription: From Monophonic to Polyphonic. In: Solis, J., Ng, K. (eds) Musical Robots and Interactive Multimodal Systems. Springer Tracts in Advanced Robotics, vol 74. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22291-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22291-7_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22290-0

  • Online ISBN: 978-3-642-22291-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics