Abstract
Music understanding from an audio track and performance is a key problem and a challenge for many applications ranging from: automated music transcoding, music education, interactive performance, etc. The transcoding of polyphonic music is a one of the most complex and still open task to be solved in order to become a common tool for the above mentioned applications. Techniques suitable for monophonic transcoding have shown to be largely unsuitable for polyphonic cases. Recently, a range of polyphonic transcoding algorithms and models have been proposed and compared against worldwide accepted test cases such as those adopted in the MIREX competition. Several different approaches are based on techniques such as: pitch trajectory analysis, harmonic clustering, bispectral analysis, event tracking, nonnegative matrix factorization, hidden Markov model. This chapter analyzes the evolution of music understanding algorithms and models from monophonic to polyphonic, showing and comparing the solutions, while analysing them against commonly accepted assessment methods and formal metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Argenti, F., Nesi, P., Pantaleo, G.: Automatic Transcription of Polyphonic Music Based on Constant-Q Bispectral Analysis for MIREX 2009. In: Proc. of 10th ISMIR Conference (2009)
Bello, J.P.: Towards the Automated Analysis of Simple Polyphonic Music: A Knowledge-based Approach. PhD Thesis (2003)
Bello, J.P., Daudet, L., Sandler, M.B.: Automatic piano transcription using frequency and time-domain information. IEEE Transactions on Audio, Speech, and Language Processing 14(6), 2242–2251 (2006)
Ortiz-Berenguer, L.I., Casajús-Quirós, F.J., Torres-Guijarro, S.: Multiple piano note identification using a spectral matching method with derived patterns. Journal of Audio Engineering Society 53(1/2), 32–43 (2005)
Bregman, A.: Auditory Scene Analysis.The MIT Press, Cambridge (1990)
Brossier, P.M.: Automatic Annotation of Musical Audio for Interactive Applications. PhD Thesis, Centre for Digital Music Queen Mary, University of London (2006)
Bruno, I., Nesi, P.: Automatic Music Transcription Supporting Different Instruments. Journal of New Music Research 34(2), 139–149 (2005)
Cemgil, A.T., Kappen, H.J., Barber, D.: A Generative Model for Music Transcription. IEEE Transaction on Audio, Speech and Language Processing 14(2), 679–694 (2006)
Chang, W.C., Su, A.W.Y., Yeh, C., Roebel, A., Rodet, X.: Multiple F0 Tracking Based on a High Order HMM Model. In: Proc. of the 11th Int. Conference on Digital Audio Effects, DAFx 2008 (2008)
Chafe, C., Jaffe, D.: Source separation and note identification in polyphonic music. In: Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 1986), vol. 11, pp. 1289–1292 (1986)
Cont, A., Shlomo, D., Wessel, D.: Realtime multiple-pitch and multiple-instrument for music signals using sparse non-negative constraints. In: Proc. of 10th Int. Conference of Digital Audio Effects, DAFx 2007 (2007)
De Cheveigné, A., Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America 111(4), 1917–1930 (2002)
Duan, Z., Zhang, Y., Zhang, C., Shi, Z.: Unsupervised single-channel music source separation by average harmonic structure modeling. IEEE Transactions on Audio, Speech and Language Processing 16(4), 766–778 (2008)
Dubois, C., Davy, M.: Joint detection and tracking of time-varying harmonic components: a general bayesian framework. IEEE Transactions on Audio, Speech and Language Processing 15(4), 1283–1295 (2007)
Duan, Z., Han, J., Pardo, B.: Harmonically Informed Multi-pitch Tracking. In: Proc. of 10th International Society for Music Information Retrieval Conference, ISMIR 2009 (2009)
Ellis, D.P.W.: Prediction-driven Computational Auditory Scene Analysis. PhD Thesis, Massachusetts Institute of Technology (1996)
Fernández-Cid, P., Casajús-Quirós, F.J.: Multi-pitch estimation for Polyphonic Musical Signals. In: Proc. Of IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP 1998), vol. 6, pp. 3565–3568 (1998)
Fletcher, N.F., Rossing, T.D.: The physics of musical instruments, 2nd edn. Springer, New York (1998)
Friedman, D.H.: Multichannel Zero-Crossing-Interval Pitch Estimation. In: Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 1979), vol. 4, pp. 764–767 (1979)
Gerhard, D.: Pitch Extraction and Fundamental Frequency: History and Current Techniques. Tech. Report TR-CS 2003-06, Dep. of Computer Science, University or Regina, Canada (2003)
Godsill, S.J., Davy, M.: Bayesian Harmonic Models for Musical Signal Analysis. Bayesian Statistics 7, 105–124 (2003)
Godsill, S.J., Davy, M., Idier, J.: Bayesian analysis of polyphonic western tonal music. Journal of the Acoustical Society of America 119(4), 2498–2517 (2006)
Gold, B., Rabiner, L.R.: Parallel Processing Techniques for Estimating Pitch Periods of Speech in the Time Domain. Journal of Acoustic Society of America 46(2), 442–448 (1969)
Goto, M.: A robust predominant-f0 estimation method for real-time detection of melody and bass lines in cd recordings. In: Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2000), Istanbul, Turkey, vol. 2, pp. 757–760 (2000)
Goto, M.: A real-time music-scene-description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication - ISCA Journal 43(4), 311–329 (2004)
Hawley, M.: Structure out of sound. Ph.D. thesis, MIT Media Laboratory, Cambridge, Massachusetts (1993)
Kameoka, H., Nishimoto, T., Sagayama, S.: A multipitch analyzer based on harmonic temporal structured clustering. IEEE Transactions on Audio, Speech, and Language Processing 15(3), 982–994 (2007)
Kashino, K., Tanaka, H.: A Sound Source Separation System Using Spectral Features Integrated by Dempster’s Law of Combination. Annual Report of the Engineering Research Institute, vol. 52. University of Tokyo (1992)
Kashino, K., Tanaka, H.: A Sound Source Separation System with the Ability of Automatic Tone Modeling. In: Proc. of International Computer Music Conference (ICMC 1993), pp. 248–255 (1993)
Kashino, K., Nakadai, K., Kinoshita, T., Tanaka, H.: Application of Bayesian Probability Network to Music Scene Analysis. In: Computational Auditory Scene Analysis Workshop (IJCAI 1995), pp. 32–40 (1995)
Katayose, H., Inokuchi, S.: The KANSEI Music System. Computer Music Journal 13(4), 72–77 (1989)
Klapuri, A.P.: Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Speech and Audio Processing 11(6), 804–816 (2003)
Klapuri, A.P.: Signal Processing Methods for the Automatic Transcription of Music. PhD thesis, Tampere University of Technology (2004)
Klapuri, A.P.: Automatic Music Transcription as We Know it Today. Journal of New Music Research 2004 33(3), 269–282 (2004)
Klapuri, A.P.: A perceptually motivated multiple-f0 estimation method. In: Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 291–294 (2005)
Klapuri, A.P.: Multipitch analysis of polyphonic music and speech signals using an auditory model. IEEE Transactions on Audio, Speech, and Language Processing 16(2), 255–266 (2008)
Maher, R.C.: Evaluation for a Method for Separating Digitized Duet Signals. Journal of Acoustic Engineering Society 38(12), 956–979 (1990)
Markel, J.D.: The SIFT Algorithm for Fundamental Frequency Estimation. IEEE Transactions on Audio and Electroacoustics 16, 367–377 (1972)
Marolt, M.: SONIC: Transcription of polyphonic piano music with neural networks. In: Workshop on Current Research Directions in Computer Music, Barcelona, Spain, pp. 217–224 (2001)
Marolt, M.: Networks of adaptive oscillators for partial tracking and transcription of music recordings. Journal of New Music Research 33(1), 49–59 (2004)
Martin, K.D.: A blackboard system for automatic transcription of simple polyphonic music. Perceptual Computing Technical Report 385, MIT Media Lab (1996)
Martin, K.D.: Automatic Transcription of Simple Polyphonic Music: Robust Front End Processing. Technical Report #399, MIT Media Lab, Perceptual Computing Section, The MIT Press (1996)
Meddis, R., O’Mard, L.: A Unitary Model of Pitch Perception. The Journal of the Acoustical Society of America 102(3), 1811–1820 (1997)
Miller, N.J.: Pitch detection by data reduction. IEEE Transaction on Audio, Speech and Language Processing 23(1), 72–79 (1975)
Moorer, J.: On the transcription of musical sound by computer. Computer Music Journal 1(4), 32–38 (1977)
Nawab, S.H., Ayyash, S.A., Wotiz, R.: Identification of musical chords using constant-q spectra. In: IEEE Proc. on Acoustic, Speech and Signal Processing (ICASSP 2001), vol. 5, pp. 3373–3376 (2001)
Peretz, I., Coltheart, M.: Modularity of music processing. Nature Neuroscience 6(7), 688–691 (2003)
Pertusa, A., Iñesta, J.M.: Multiple Fundamental Frequency estimation using Gaussian smoothness. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 2008, Las Vegas, USA, pp. 105–108 (2008)
Piszczalski, M., Galler, B.: Automatic music transcription. Computer Music Journal 1(4), 24–31 (1977)
Piszczalski, M., Galler, B.: Automatic music transcription. Computer Music Journal 66(3), 710–720 (1979)
Poliner, G.E., Ellis, D.P.W.: A Discriminative Model for Polyphonic Piano Transcription. IEEE Transaction on Audio, Speech and Language Processing 14(4), 1247–1256 (2007)
Rabiner, L.R.: On the Use of Autocorrelation Analysis for Pitch Detection. IEEE Transaction on Acoustics, Speech and Signal Processing 25(1), 24–33 (1977)
Rabiner, L.R.: A Comparative Performance Study of Several Pitch Detection Algorithms. IEEE Transaction on Acoustics, Speech and Signal Processing 24(5), 399–418 (1977)
Raczynksi, S., Ono, N., Sagayama, S.: Multipitch analysis with harmonic nonnegative matrix approximation. In: Proc. of the 8th International Conference on Music Information Retrieval (ISMIR 2007), pp. 381–386 (2007)
Raphael, C.: Automatic transcription of piano music. In: Proc. on 3rd Int. Conf. on Music Information Retrieval, pp. 15–19 (2002)
Ryynänen, M.P., Klapuri, A.P.: Polyphonic Music Transcription Using Note Event Modeling. In: Proc. of 2005 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 16-19, pp. 319–322 (2005)
Ryynänen, M.P., Klapuri, A.P.: Automatic bass line transcription from streaming polyphonic audio. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), vol. 4, pp. 1437–1440 (2007)
Slaney, M., Lyon, R.F.: A Perceptual Pitch Detector. In: Proc. of IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP 1990), vol. 1, pp. 357–360 (1990)
Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz (NY), pp. 177–180 (2003)
Tolonen, T., Karjalainen, M.: A computationally efficient multipitch analysis model. IEEE Transactions on Speech and Audio Processing 8(6), 708–716 (2000)
Vincent, E., Bertin, N., Badeau, R.: Harmonic and Inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch Transcription. In: Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2008), pp. 109–112 (2008)
Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. In: Proc. of IEEE Int. Conf. on Computational Intelligence for Measurement Systems and Applications, vol. 15(3), pp. 1066–1074 (2007)
Yeh, C.: Multiple Fundamental Frequency Estimation of Polyphonic Recordings. PhD Thesis, Université Paris VI (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Argenti, F., Nesi, P., Pantaleo, G. (2011). Automatic Music Transcription: From Monophonic to Polyphonic. In: Solis, J., Ng, K. (eds) Musical Robots and Interactive Multimodal Systems. Springer Tracts in Advanced Robotics, vol 74. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22291-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-22291-7_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22290-0
Online ISBN: 978-3-642-22291-7
eBook Packages: EngineeringEngineering (R0)