Automatic Music Transcription: From Monophonic to Polyphonic

Argenti, Fabrizio; Nesi, Paolo; Pantaleo, Gianni

doi:10.1007/978-3-642-22291-7_3

Fabrizio Argenti⁶,
Paolo Nesi⁶ &
Gianni Pantaleo⁶

Part of the book series: Springer Tracts in Advanced Robotics ((STAR,volume 74))

1633 Accesses
1 Citations

Abstract

Music understanding from an audio track and performance is a key problem and a challenge for many applications ranging from: automated music transcoding, music education, interactive performance, etc. The transcoding of polyphonic music is a one of the most complex and still open task to be solved in order to become a common tool for the above mentioned applications. Techniques suitable for monophonic transcoding have shown to be largely unsuitable for polyphonic cases. Recently, a range of polyphonic transcoding algorithms and models have been proposed and compared against worldwide accepted test cases such as those adopted in the MIREX competition. Several different approaches are based on techniques such as: pitch trajectory analysis, harmonic clustering, bispectral analysis, event tracking, nonnegative matrix factorization, hidden Markov model. This chapter analyzes the evolution of music understanding algorithms and models from monophonic to polyphonic, showing and comparing the solutions, while analysing them against commonly accepted assessment methods and formal metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Argenti, F., Nesi, P., Pantaleo, G.: Automatic Transcription of Polyphonic Music Based on Constant-Q Bispectral Analysis for MIREX 2009. In: Proc. of 10th ISMIR Conference (2009)
Google Scholar
Bello, J.P.: Towards the Automated Analysis of Simple Polyphonic Music: A Knowledge-based Approach. PhD Thesis (2003)
Google Scholar
Bello, J.P., Daudet, L., Sandler, M.B.: Automatic piano transcription using frequency and time-domain information. IEEE Transactions on Audio, Speech, and Language Processing 14(6), 2242–2251 (2006)
Article Google Scholar
Ortiz-Berenguer, L.I., Casajús-Quirós, F.J., Torres-Guijarro, S.: Multiple piano note identification using a spectral matching method with derived patterns. Journal of Audio Engineering Society 53(1/2), 32–43 (2005)
Google Scholar
Bregman, A.: Auditory Scene Analysis.The MIT Press, Cambridge (1990)
Google Scholar
Brossier, P.M.: Automatic Annotation of Musical Audio for Interactive Applications. PhD Thesis, Centre for Digital Music Queen Mary, University of London (2006)
Google Scholar
Bruno, I., Nesi, P.: Automatic Music Transcription Supporting Different Instruments. Journal of New Music Research 34(2), 139–149 (2005)
Article Google Scholar
Cemgil, A.T., Kappen, H.J., Barber, D.: A Generative Model for Music Transcription. IEEE Transaction on Audio, Speech and Language Processing 14(2), 679–694 (2006)
Article Google Scholar
Chang, W.C., Su, A.W.Y., Yeh, C., Roebel, A., Rodet, X.: Multiple F0 Tracking Based on a High Order HMM Model. In: Proc. of the 11th Int. Conference on Digital Audio Effects, DAFx 2008 (2008)
Google Scholar
Chafe, C., Jaffe, D.: Source separation and note identification in polyphonic music. In: Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 1986), vol. 11, pp. 1289–1292 (1986)
Google Scholar
Cont, A., Shlomo, D., Wessel, D.: Realtime multiple-pitch and multiple-instrument for music signals using sparse non-negative constraints. In: Proc. of 10th Int. Conference of Digital Audio Effects, DAFx 2007 (2007)
Google Scholar
De Cheveigné, A., Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America 111(4), 1917–1930 (2002)
Article Google Scholar
Duan, Z., Zhang, Y., Zhang, C., Shi, Z.: Unsupervised single-channel music source separation by average harmonic structure modeling. IEEE Transactions on Audio, Speech and Language Processing 16(4), 766–778 (2008)
Article Google Scholar
Dubois, C., Davy, M.: Joint detection and tracking of time-varying harmonic components: a general bayesian framework. IEEE Transactions on Audio, Speech and Language Processing 15(4), 1283–1295 (2007)
Article Google Scholar
Duan, Z., Han, J., Pardo, B.: Harmonically Informed Multi-pitch Tracking. In: Proc. of 10th International Society for Music Information Retrieval Conference, ISMIR 2009 (2009)
Google Scholar
Ellis, D.P.W.: Prediction-driven Computational Auditory Scene Analysis. PhD Thesis, Massachusetts Institute of Technology (1996)
Google Scholar
Fernández-Cid, P., Casajús-Quirós, F.J.: Multi-pitch estimation for Polyphonic Musical Signals. In: Proc. Of IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP 1998), vol. 6, pp. 3565–3568 (1998)
Google Scholar
Fletcher, N.F., Rossing, T.D.: The physics of musical instruments, 2nd edn. Springer, New York (1998)
MATH Google Scholar
Friedman, D.H.: Multichannel Zero-Crossing-Interval Pitch Estimation. In: Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 1979), vol. 4, pp. 764–767 (1979)
Google Scholar
Gerhard, D.: Pitch Extraction and Fundamental Frequency: History and Current Techniques. Tech. Report TR-CS 2003-06, Dep. of Computer Science, University or Regina, Canada (2003)
Google Scholar
Godsill, S.J., Davy, M.: Bayesian Harmonic Models for Musical Signal Analysis. Bayesian Statistics 7, 105–124 (2003)
MathSciNet Google Scholar
Godsill, S.J., Davy, M., Idier, J.: Bayesian analysis of polyphonic western tonal music. Journal of the Acoustical Society of America 119(4), 2498–2517 (2006)
Article Google Scholar
Gold, B., Rabiner, L.R.: Parallel Processing Techniques for Estimating Pitch Periods of Speech in the Time Domain. Journal of Acoustic Society of America 46(2), 442–448 (1969)
Article Google Scholar
Goto, M.: A robust predominant-f0 estimation method for real-time detection of melody and bass lines in cd recordings. In: Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2000), Istanbul, Turkey, vol. 2, pp. 757–760 (2000)
Google Scholar
Goto, M.: A real-time music-scene-description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication - ISCA Journal 43(4), 311–329 (2004)
Article Google Scholar
Hawley, M.: Structure out of sound. Ph.D. thesis, MIT Media Laboratory, Cambridge, Massachusetts (1993)
Google Scholar
Kameoka, H., Nishimoto, T., Sagayama, S.: A multipitch analyzer based on harmonic temporal structured clustering. IEEE Transactions on Audio, Speech, and Language Processing 15(3), 982–994 (2007)
Article Google Scholar
Kashino, K., Tanaka, H.: A Sound Source Separation System Using Spectral Features Integrated by Dempster’s Law of Combination. Annual Report of the Engineering Research Institute, vol. 52. University of Tokyo (1992)
Google Scholar
Kashino, K., Tanaka, H.: A Sound Source Separation System with the Ability of Automatic Tone Modeling. In: Proc. of International Computer Music Conference (ICMC 1993), pp. 248–255 (1993)
Google Scholar
Kashino, K., Nakadai, K., Kinoshita, T., Tanaka, H.: Application of Bayesian Probability Network to Music Scene Analysis. In: Computational Auditory Scene Analysis Workshop (IJCAI 1995), pp. 32–40 (1995)
Google Scholar
Katayose, H., Inokuchi, S.: The KANSEI Music System. Computer Music Journal 13(4), 72–77 (1989)
Article Google Scholar
Klapuri, A.P.: Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Speech and Audio Processing 11(6), 804–816 (2003)
Article Google Scholar
Klapuri, A.P.: Signal Processing Methods for the Automatic Transcription of Music. PhD thesis, Tampere University of Technology (2004)
Google Scholar
Klapuri, A.P.: Automatic Music Transcription as We Know it Today. Journal of New Music Research 2004 33(3), 269–282 (2004)
Article Google Scholar
Klapuri, A.P.: A perceptually motivated multiple-f0 estimation method. In: Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 291–294 (2005)
Google Scholar
Klapuri, A.P.: Multipitch analysis of polyphonic music and speech signals using an auditory model. IEEE Transactions on Audio, Speech, and Language Processing 16(2), 255–266 (2008)
Article Google Scholar
Maher, R.C.: Evaluation for a Method for Separating Digitized Duet Signals. Journal of Acoustic Engineering Society 38(12), 956–979 (1990)
Google Scholar
Markel, J.D.: The SIFT Algorithm for Fundamental Frequency Estimation. IEEE Transactions on Audio and Electroacoustics 16, 367–377 (1972)
Article Google Scholar
Marolt, M.: SONIC: Transcription of polyphonic piano music with neural networks. In: Workshop on Current Research Directions in Computer Music, Barcelona, Spain, pp. 217–224 (2001)
Google Scholar
Marolt, M.: Networks of adaptive oscillators for partial tracking and transcription of music recordings. Journal of New Music Research 33(1), 49–59 (2004)
Article Google Scholar
Martin, K.D.: A blackboard system for automatic transcription of simple polyphonic music. Perceptual Computing Technical Report 385, MIT Media Lab (1996)
Google Scholar
Martin, K.D.: Automatic Transcription of Simple Polyphonic Music: Robust Front End Processing. Technical Report #399, MIT Media Lab, Perceptual Computing Section, The MIT Press (1996)
Google Scholar
Meddis, R., O’Mard, L.: A Unitary Model of Pitch Perception. The Journal of the Acoustical Society of America 102(3), 1811–1820 (1997)
Article Google Scholar
Miller, N.J.: Pitch detection by data reduction. IEEE Transaction on Audio, Speech and Language Processing 23(1), 72–79 (1975)
Article Google Scholar
Moorer, J.: On the transcription of musical sound by computer. Computer Music Journal 1(4), 32–38 (1977)
Google Scholar
Nawab, S.H., Ayyash, S.A., Wotiz, R.: Identification of musical chords using constant-q spectra. In: IEEE Proc. on Acoustic, Speech and Signal Processing (ICASSP 2001), vol. 5, pp. 3373–3376 (2001)
Google Scholar
Peretz, I., Coltheart, M.: Modularity of music processing. Nature Neuroscience 6(7), 688–691 (2003)
Article Google Scholar
Pertusa, A., Iñesta, J.M.: Multiple Fundamental Frequency estimation using Gaussian smoothness. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 2008, Las Vegas, USA, pp. 105–108 (2008)
Google Scholar
Piszczalski, M., Galler, B.: Automatic music transcription. Computer Music Journal 1(4), 24–31 (1977)
Google Scholar
Piszczalski, M., Galler, B.: Automatic music transcription. Computer Music Journal 66(3), 710–720 (1979)
Google Scholar
Poliner, G.E., Ellis, D.P.W.: A Discriminative Model for Polyphonic Piano Transcription. IEEE Transaction on Audio, Speech and Language Processing 14(4), 1247–1256 (2007)
Article Google Scholar
Rabiner, L.R.: On the Use of Autocorrelation Analysis for Pitch Detection. IEEE Transaction on Acoustics, Speech and Signal Processing 25(1), 24–33 (1977)
Article Google Scholar
Rabiner, L.R.: A Comparative Performance Study of Several Pitch Detection Algorithms. IEEE Transaction on Acoustics, Speech and Signal Processing 24(5), 399–418 (1977)
Article Google Scholar
Raczynksi, S., Ono, N., Sagayama, S.: Multipitch analysis with harmonic nonnegative matrix approximation. In: Proc. of the 8th International Conference on Music Information Retrieval (ISMIR 2007), pp. 381–386 (2007)
Google Scholar
Raphael, C.: Automatic transcription of piano music. In: Proc. on 3rd Int. Conf. on Music Information Retrieval, pp. 15–19 (2002)
Google Scholar
Ryynänen, M.P., Klapuri, A.P.: Polyphonic Music Transcription Using Note Event Modeling. In: Proc. of 2005 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 16-19, pp. 319–322 (2005)
Google Scholar
Ryynänen, M.P., Klapuri, A.P.: Automatic bass line transcription from streaming polyphonic audio. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), vol. 4, pp. 1437–1440 (2007)
Google Scholar
Slaney, M., Lyon, R.F.: A Perceptual Pitch Detector. In: Proc. of IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP 1990), vol. 1, pp. 357–360 (1990)
Google Scholar
Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz (NY), pp. 177–180 (2003)
Google Scholar
Tolonen, T., Karjalainen, M.: A computationally efficient multipitch analysis model. IEEE Transactions on Speech and Audio Processing 8(6), 708–716 (2000)
Article Google Scholar
Vincent, E., Bertin, N., Badeau, R.: Harmonic and Inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch Transcription. In: Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2008), pp. 109–112 (2008)
Google Scholar
Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. In: Proc. of IEEE Int. Conf. on Computational Intelligence for Measurement Systems and Applications, vol. 15(3), pp. 1066–1074 (2007)
Google Scholar
Yeh, C.: Multiple Fundamental Frequency Estimation of Polyphonic Recordings. PhD Thesis, Université Paris VI (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems and Informatics, University of Florence, Via S. Marta 3, Florence, 50139, Italy
Fabrizio Argenti, Paolo Nesi & Gianni Pantaleo

Authors

Fabrizio Argenti
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Nesi
View author publications
You can also search for this author in PubMed Google Scholar
Gianni Pantaleo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Humanoid Robotics Institute Faculty of Science and Engineering, Waseda University, 3-4-1 Ookubo, 168-8555, Shinjuku-ku, Tokyo, Japan
Jorge Solis
Interdisciplinary Centre for Scientific Research in Music (ICSRiM), University of Leeds School of Computing & School of Music, LS2 9JT, Leeds, UK
Kia Ng

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Argenti, F., Nesi, P., Pantaleo, G. (2011). Automatic Music Transcription: From Monophonic to Polyphonic. In: Solis, J., Ng, K. (eds) Musical Robots and Interactive Multimodal Systems. Springer Tracts in Advanced Robotics, vol 74. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22291-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-22291-7_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22290-0
Online ISBN: 978-3-642-22291-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics