Automatic Music Transcription

Klapuri, Anssi; Virtanen, Tuomas

doi:10.1007/978-0-387-30441-0_20

Anssi Klapuri⁴ &
Tuomas Virtanen⁴

973 Accesses
8 Citations

Written musical notation describes music in a symbolic form that is suitable for performing a piece using the available musical instruments. Traditionally, musical notation indicates the pitch, target instrument, timing, and duration of each sound to be played. The aim of music transcription either by humans or by a machine is to infer these musical parameters, given only the acoustic recording of a performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 629.00; Price excludes VAT (USA)

Softcover Book: USD 799.99; Price excludes VAT (USA)

Hardcover Book: USD 799.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. A. Abdallah. Towards Music Perception by Redundancy Reduction and Unsupervised Learning in Probabilistic Models. PhD thesis, Dept. Electronic Eng., King’s College London, 2002.
Google Scholar
S. A. Abdallah and M. D. Plumbley. An independent component analysis approach to automatic music transcription. Audio Eng. Soc. 114th Convention, Preprint No. 5754, Amsterdam, Netherlands, March 2003.
Google Scholar
M. Abe and J. O. Smith. Design criteria for simple sinusoidal parameter estimation based on quadratic interpolation of FFT magnitude peaks. Audio Eng. Soc. 117th Convention, Preprint No. 6256, San Francisco, CA, 2004.
Google Scholar
X. Amatriain, J. Bonada, A. Loscos, and X. Serra. Spectral processing. In U. Zölzer, editor, DAFX – Digital Audio Effects. Wiley New York, 2002.
Google Scholar
A. S. Bregman. Auditory Scene Analysis. MIT Press, Cambridge, MA, 1990.
Google Scholar
J. C. Brown. Musical fundamental frequency tracking using a pattern recognition method. J. Acoust. Soc. Am., 92(3):1394–1402, 1992.
Article ADS Google Scholar
J.-F. Cardoso. High-order contrasts for independent component analysis. Neural Computation, 11(1), 157–192, 1999.
Article MathSciNet Google Scholar
M. A. Casey and A. Westner. Separation of mixed audio sources by independent subspace analysis. Proc. 2000 Int. Computer Music Conf., Berlin, Germany, 2000, pp. 154–161.
Google Scholar
A. T. Cemgil, B. Kappen, and D. Barber. A generative model for music transcription. IEEE Trans on Speech and Audio Processing, 14(2):679–694, 2006.
Article Google Scholar
M. Davy and S. Godsill. Bayesian harmonic models for musical signal analysis. Proc. Seventh Valencia Int. Meeting on Bayesian Statistics 7, pp. 105–124, Tenerife, Spain, June 2002.
Google Scholar
A. de Cheveigné. Pitch perception models. In C. J. Plack, A. J. Oxenham, R. R. Fay, and A. N. Popper, editors, Pitch. Springer, New York, 2005, pp. 169–233.
Chapter Google Scholar
A. de Cheveigné and H. Kawahara. Multiple period estimation and pitch perception model. Speech Communication, 27:175–185, 1999.
Article Google Scholar
A. de Cheveigné and H. Kawahara. Comparative evaluation of F0 estimation algorithms. Proc. 7th European Conf. Speech Communication and Technology, pp. 2451–2454, Aalborg, Denmark, 2001.
Google Scholar
A. de Cheveigné and H. Kawahara. YIN, a fundamental frequency estimator for speech and music. J. Acoust Soc. of Am., 111(4): 1917–1930, 2002.
Article ADS Google Scholar
Ph. Depalle and T. Hélie. Extraction of spectral peak parameters using a short-time fourier transform modeling and no sidelobe windows. Proc. 1997 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 19–22, New Palz, NY, 1997.
Google Scholar
B. Doval and X. Rodet. Estimation of fundamental frequency of musical sound signals. Proc. 1991 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 3657–3660, Toronto, Canada, 1991.
Google Scholar
D. P. W. Ellis. Prediction-Driven Computational Auditory Scene Analysis. PhD thesis, Massachusetts Institute of Technology, 1996.
Google Scholar
D. FitzGerald. Automatic Drum Transcription and Source Separation. PhD thesis, Dublin Institute of Technology, 2004.
Google Scholar
D. FitzGerald, M. Cranitch, and E. Coyle. Generalised prior subspace analysis for polyphonic pitch transcription. Proc. Int. Conf. Digital Audio Effects, pp. 77–81, Madrid, Spain, 2005.
Google Scholar
N. H. Fletcher and T. D. Rossing. The Physics of Musical Instruments. Springer, Berlin, Germany, 2nd edition, 1998.
MATH Google Scholar
D. Godsmark and G. J. Brown. A blackboard architecture for computational auditory scene analysis. Speech Communication, 27(3):351–366, 1999.
Article Google Scholar
M. Goto. A robust predominant-F0 estimation method for real-time detection of melody and bass lines in cd recordings. Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, pp. 757–760, Istanbul, Turkey, June 2000.
Google Scholar
M. Goto. A real-time music scene description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, 43(4):311–329, 2004.
Article Google Scholar
R. Gribonval and E. Bacry. Harmonic decomposition of audio signals with matching pursuit. IEEE Trans. Signal Processing, 51(1):101–111, 2003.
Google Scholar
W. M. Hartmann. Signals, Sound, and Sensation. Springer, New York, 1998.
Google Scholar
W. J. Hess. Pitch and voicing determination. In S. Furui and M. M. Sondhi, editors, Advances in Speech Signal Processing, pp. 3–48. Marcel Dekker, New York, 1991.
Google Scholar
M. J. Hewitt and R. Meddis. An evaluation of eight computer models of mammalian inner hair-cell function. J. Acoust Soc. Am., 90(2):904–917, 1991.
Article ADS Google Scholar
P. Hoyer. Non-negative sparse coding. In IEEE Workshop on Networks for Signal Processing XII, Martigny, Switzerland, 2002.
Google Scholar
A. Hyvörinen. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neural Networks, 10(3):626–634, 1999.
Article Google Scholar
A. Hyvörinen, J. Karhunen, and E. Oja. Independent Component Analysis. Wiley, New York, 2001.
Book Google Scholar
G.-J. Jang and T.-W. Lee. A maximum likelihood approach to single channel source separation. J. Machine Learning Research, 23: 1365–1392, 2003.
Article MathSciNet Google Scholar
H. Kameoka, T. Nishimoto, and S. Sagayama. Separation of harmonic structures based on tied gaussian mixture model and information criterion for concurrent sounds. In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, (ICASSP 2004), Montreal, 2004.
Google Scholar
K. Kashino, K. Nakadai, T. Kinoshita, and H. Tanaka. Organisation of hierarchical perceptual sounds: music scene analysis with autonomous processing modules and a quantitative information integration mechanism. In Int. Joint Conf. Artificial Intelligence, pp. 158–164, Montreal, 1995.
Google Scholar
K. Kashino and H. Tanaka. A sound source separation system with the ability of automatic tone modeling. Proc. 1993 Int. Computer Music Conf., pp. 248–255, Hong Kong, China, 1993.
Google Scholar
M. Kay. Modern Spectral Estimation. Prentice Hall, Englewood Cliffs, NJ, 1988.
MATH Google Scholar
S. M. Kay. Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall, Englewood Cliffs, NJ 1993.
MATH Google Scholar
A. Klapuri and M. Davy, editors. Signal Processing Methods for Music Transcription. Springer, New York, 2006.
Google Scholar
A. P. Klapuri. Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Trans. Speech and Audio Processing, 11(6):804–815, 2003.
Article Google Scholar
A. Klapuri. Multipitch analysis of polyphonic music and speech signals using an auditory model. IEEE Trans. Audio, Speech and Language Processing, 16(2):255–266, 2008.
Article Google Scholar
M. Lahat, R. Niederjohn, and D. A. Krubsack. Spectral autocorrelation method for measurement of the fundamental frequency of noise-corrupted speech. IEEE Trans. Acoustics, Speech, and Signal Processing, 6:741–750, June 1987.
Google Scholar
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Neural Information Processing Systems, pp. 556–562, Denver, CO, 2001.
Google Scholar
T.-W. Lee, M. Girolami, A. J. Bell, and T. J. Sejnowski. A unifying information-theoretic framework for independent component analysis. Computers and Mathematics with Applications, 31(11): 1–21, 2000.
Article MathSciNet Google Scholar
R. C. Maher. Evaluation of a method for separating digitized duet signals. J. Audio Eng. Soc., 38(12):956–979, 1990.
Google Scholar
R. C. Maher and J. W. Beauchamp. Fundamental frequency estimation of musical signals using a two-way mismatch procedure. J. Acoust. Soc. Am., 95(4):2254–2263, 1994.
Article ADS Google Scholar
M. Marolt. SONIC: transcription of polyphonic piano music with neural networks. Proc. MOSART Workshop on Current Research Directions in Computer Music, pp. 217–224, Barcelona, Spain, November 2001.
Google Scholar
K. D. Martin. Automatic transcription of simple polyphonic music: robust front end processing. Technical Report 399, MIT Media Laboratory, Perceptual Computing Section, 1996.
Google Scholar
R. J. McAulay and T. F. Quatieri. Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. on Speech and Audio Processing, 34(4):744–754, 1986.
Google Scholar
R. Meddis and M. J. Hewitt. Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: pitch identification. J. Acousti. Soc. Am., 89(6):2866–2882, 1991.
Article ADS Google Scholar
B. C. J. Moore, editor. Hearing – Handbook of Perception and Cognition. Academic Press, San Diego, CA, 2nd edition, 1995.
Google Scholar
J. A. Moorer. On the Segmentation and Analysis of Continuous Musical Sound by Digital Computer. PhD thesis, Department of Music, Stanford University, 1975. Distributed as Department of Music report No. STAN-M-3.
Google Scholar
B. A. Olshausen and D. F. Field. Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Research, 37:3311–3325, 1997.
Article Google Scholar
J. Paulus and T. Virtanen. Drum transcription with non-negative spectrogram factorisation. European Signal Processing Conf., Antalya, Turkey, 2005.
Google Scholar
G. Peterschmitt, E. Gómez, and P. Herrera. Pitch-based solo location. Proc. MOSART Workshop on Current Research Directions in Computer Music, Barcelona, Spain, 2001.
Google Scholar
M. D. Plumbley and E. Oja. A “non-negative PCA” algorithm for independent component analysis. IEEE Trans. on Neural Networks, 15(1):66–67, 2004.
Article Google Scholar
T. F. Quatieri and R. G. Danisewicz. An approach to co-channel talker interference suppression using a sinusoidal model for speech. IEEE Trans. Acoustics, Speech, and Signal Processing, 38(1):56–69, 1990.
Article Google Scholar
L. R. Rabiner, M. J. Cheng, A. E. Rosenberg, and C. A. McGonegal. A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoustics, Speech, and Signal Processing, 24(5):399–418, 1976.
Article Google Scholar
X. Rodet. Musical sound signal analysis/synthesis: Sinusoidal+ residual and elementary waveform models. Proc. IEEE Time-Frequency and Time-Scale Workshop, Coventry, UK, 1997.
Google Scholar
X. Serra. Musical sound modeling with sinusoids plus noise. In C. Roads, S. Pope, A. Picialli, and G. De Poli, editors, Musical Signal Processing. pp. 91–122, Swets & Zeitlinger, 1997.
Google Scholar
P. Smaragdis. Discovering auditory objects through non-negativity constraints. Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, Jeju, Korea, 2004.
Google Scholar
P. Smaragdis and J. C. Brown. Non-negative matrix factorization for polyphonic music transcription. Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 177–180, New Palz, NY, 2003.
Google Scholar
J. O. Smith and X. Serra. Parshl: an analysis/synthesis program for Proc. non-harmonic sounds based on a sinusoidal representation. Int. Computer Music Conf., pp. 290–297, Urbana, IL, 1987.
Google Scholar
A. D. Sterian. Model-Based Segmentation of Time–Frequency Images for Musical transcription. PhD thesis, University of Michigan, 1999.
Google Scholar
P. Stoica and R. L. Moses. Introduction to Spectral Analysis. Prentice Hall, Englwood Cliffs, NJ, 1997.
MATH Google Scholar
D. Talkin. A robust algorithm for pitch tracking. In W. B. Kleijn and K. K. Paliwal, editors, Speech Coding and Synthesis, pp. 495–517. Elsevier, Amsterdam, 1995.
Google Scholar
T. Tolonen and M. Karjalainen. A computationally efficient multipitch analysis model. IEEE Trans. on Speech and Audio Processing, 8(6):708–716, 2000.
Article Google Scholar
E. Vincent and X. Rodet. Music transcription with ISA and HMM. Proc. 5th Int. Symp. on Independent Component Analysis and Blind Signal Separation, pp. 1119–1204, London, U.K., 2004.
Google Scholar
T. Virtanen. Sound source separation using sparse coding with temporal continuity objective. Proc. 2003 Int. Computer Music Conference, pp. 231–234, Singapore, 2003.
Google Scholar
T. Virtanen. Separation of sound sources by convolutive sparse coding. Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, Jeju, Korea, 2004.
Google Scholar
T. Virtanen, Monaural Sound Source Separation by Non-Negative Matrix Factorization with Temporal Continuity and Sparseness Criteria. IEEE Trans. Audio, Speech, and Language Processing, 15(3):1066–1074, 2007
Article Google Scholar
T. Virtanen and A. Klapuri. Separation of harmonic sounds using multipitch analysis and iterative parameter estimation. Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 83–86, New Paltz, NY, 2001.
Google Scholar
T. Virtanen and A. Klapuri. Separation of harmonic sounds using linear models for the overtone series. Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, pp. 1747–1760, Orlando, FL, 2002.
Google Scholar
P. J. Walmsley. Signal Separation of Musical Instruments. Simulation-Based Methods for Musical Signal Decomposition and Transcription. PhD thesis, Department of Engineering, University of Cambridge, September 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Signal Processing, Tampere University of Technology, Tampere, Finland
Anssi Klapuri & Tuomas Virtanen

Authors

Anssi Klapuri
View author publications
You can also search for this author in PubMed Google Scholar
Tuomas Virtanen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Research Council Institute for Microstructural Sciences, Acoustics and Signal Processing Group, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada
David Havelock
Department of Environmental Psychology, Osaka University Graduate School of Human Sciences, 1-2 Yamadaok Suita, Osaka, Japan
Sonoko Kuwano
Institute of Technical Acoustics, RWTH Aachen University, Aachen, Germany
Michael Vorländer

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Klapuri, A., Virtanen, T. (2008). Automatic Music Transcription. In: Havelock, D., Kuwano, S., Vorländer, M. (eds) Handbook of Signal Processing in Acoustics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-30441-0_20

Download citation

DOI: https://doi.org/10.1007/978-0-387-30441-0_20
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-77698-9
Online ISBN: 978-0-387-30441-0
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics