Written musical notation describes music in a symbolic form that is suitable for performing a piece using the available musical instruments. Traditionally, musical notation indicates the pitch, target instrument, timing, and duration of each sound to be played. The aim of music transcription either by humans or by a machine is to infer these musical parameters, given only the acoustic recording of a performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. A. Abdallah. Towards Music Perception by Redundancy Reduction and Unsupervised Learning in Probabilistic Models. PhD thesis, Dept. Electronic Eng., King’s College London, 2002.
S. A. Abdallah and M. D. Plumbley. An independent component analysis approach to automatic music transcription. Audio Eng. Soc. 114th Convention, Preprint No. 5754, Amsterdam, Netherlands, March 2003.
M. Abe and J. O. Smith. Design criteria for simple sinusoidal parameter estimation based on quadratic interpolation of FFT magnitude peaks. Audio Eng. Soc. 117th Convention, Preprint No. 6256, San Francisco, CA, 2004.
X. Amatriain, J. Bonada, A. Loscos, and X. Serra. Spectral processing. In U. Zölzer, editor, DAFX – Digital Audio Effects. Wiley New York, 2002.
A. S. Bregman. Auditory Scene Analysis. MIT Press, Cambridge, MA, 1990.
J. C. Brown. Musical fundamental frequency tracking using a pattern recognition method. J. Acoust. Soc. Am., 92(3):1394–1402, 1992.
J.-F. Cardoso. High-order contrasts for independent component analysis. Neural Computation, 11(1), 157–192, 1999.
M. A. Casey and A. Westner. Separation of mixed audio sources by independent subspace analysis. Proc. 2000 Int. Computer Music Conf., Berlin, Germany, 2000, pp. 154–161.
A. T. Cemgil, B. Kappen, and D. Barber. A generative model for music transcription. IEEE Trans on Speech and Audio Processing, 14(2):679–694, 2006.
M. Davy and S. Godsill. Bayesian harmonic models for musical signal analysis. Proc. Seventh Valencia Int. Meeting on Bayesian Statistics 7, pp. 105–124, Tenerife, Spain, June 2002.
A. de Cheveigné. Pitch perception models. In C. J. Plack, A. J. Oxenham, R. R. Fay, and A. N. Popper, editors, Pitch. Springer, New York, 2005, pp. 169–233.
A. de Cheveigné and H. Kawahara. Multiple period estimation and pitch perception model. Speech Communication, 27:175–185, 1999.
A. de Cheveigné and H. Kawahara. Comparative evaluation of F0 estimation algorithms. Proc. 7th European Conf. Speech Communication and Technology, pp. 2451–2454, Aalborg, Denmark, 2001.
A. de Cheveigné and H. Kawahara. YIN, a fundamental frequency estimator for speech and music. J. Acoust Soc. of Am., 111(4): 1917–1930, 2002.
Ph. Depalle and T. Hélie. Extraction of spectral peak parameters using a short-time fourier transform modeling and no sidelobe windows. Proc. 1997 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 19–22, New Palz, NY, 1997.
B. Doval and X. Rodet. Estimation of fundamental frequency of musical sound signals. Proc. 1991 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 3657–3660, Toronto, Canada, 1991.
D. P. W. Ellis. Prediction-Driven Computational Auditory Scene Analysis. PhD thesis, Massachusetts Institute of Technology, 1996.
D. FitzGerald. Automatic Drum Transcription and Source Separation. PhD thesis, Dublin Institute of Technology, 2004.
D. FitzGerald, M. Cranitch, and E. Coyle. Generalised prior subspace analysis for polyphonic pitch transcription. Proc. Int. Conf. Digital Audio Effects, pp. 77–81, Madrid, Spain, 2005.
N. H. Fletcher and T. D. Rossing. The Physics of Musical Instruments. Springer, Berlin, Germany, 2nd edition, 1998.
D. Godsmark and G. J. Brown. A blackboard architecture for computational auditory scene analysis. Speech Communication, 27(3):351–366, 1999.
M. Goto. A robust predominant-F0 estimation method for real-time detection of melody and bass lines in cd recordings. Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, pp. 757–760, Istanbul, Turkey, June 2000.
M. Goto. A real-time music scene description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, 43(4):311–329, 2004.
R. Gribonval and E. Bacry. Harmonic decomposition of audio signals with matching pursuit. IEEE Trans. Signal Processing, 51(1):101–111, 2003.
W. M. Hartmann. Signals, Sound, and Sensation. Springer, New York, 1998.
W. J. Hess. Pitch and voicing determination. In S. Furui and M. M. Sondhi, editors, Advances in Speech Signal Processing, pp. 3–48. Marcel Dekker, New York, 1991.
M. J. Hewitt and R. Meddis. An evaluation of eight computer models of mammalian inner hair-cell function. J. Acoust Soc. Am., 90(2):904–917, 1991.
P. Hoyer. Non-negative sparse coding. In IEEE Workshop on Networks for Signal Processing XII, Martigny, Switzerland, 2002.
A. Hyvörinen. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neural Networks, 10(3):626–634, 1999.
A. Hyvörinen, J. Karhunen, and E. Oja. Independent Component Analysis. Wiley, New York, 2001.
G.-J. Jang and T.-W. Lee. A maximum likelihood approach to single channel source separation. J. Machine Learning Research, 23: 1365–1392, 2003.
H. Kameoka, T. Nishimoto, and S. Sagayama. Separation of harmonic structures based on tied gaussian mixture model and information criterion for concurrent sounds. In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, (ICASSP 2004), Montreal, 2004.
K. Kashino, K. Nakadai, T. Kinoshita, and H. Tanaka. Organisation of hierarchical perceptual sounds: music scene analysis with autonomous processing modules and a quantitative information integration mechanism. In Int. Joint Conf. Artificial Intelligence, pp. 158–164, Montreal, 1995.
K. Kashino and H. Tanaka. A sound source separation system with the ability of automatic tone modeling. Proc. 1993 Int. Computer Music Conf., pp. 248–255, Hong Kong, China, 1993.
M. Kay. Modern Spectral Estimation. Prentice Hall, Englewood Cliffs, NJ, 1988.
S. M. Kay. Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall, Englewood Cliffs, NJ 1993.
A. Klapuri and M. Davy, editors. Signal Processing Methods for Music Transcription. Springer, New York, 2006.
A. P. Klapuri. Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Trans. Speech and Audio Processing, 11(6):804–815, 2003.
A. Klapuri. Multipitch analysis of polyphonic music and speech signals using an auditory model. IEEE Trans. Audio, Speech and Language Processing, 16(2):255–266, 2008.
M. Lahat, R. Niederjohn, and D. A. Krubsack. Spectral autocorrelation method for measurement of the fundamental frequency of noise-corrupted speech. IEEE Trans. Acoustics, Speech, and Signal Processing, 6:741–750, June 1987.
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Neural Information Processing Systems, pp. 556–562, Denver, CO, 2001.
T.-W. Lee, M. Girolami, A. J. Bell, and T. J. Sejnowski. A unifying information-theoretic framework for independent component analysis. Computers and Mathematics with Applications, 31(11): 1–21, 2000.
R. C. Maher. Evaluation of a method for separating digitized duet signals. J. Audio Eng. Soc., 38(12):956–979, 1990.
R. C. Maher and J. W. Beauchamp. Fundamental frequency estimation of musical signals using a two-way mismatch procedure. J. Acoust. Soc. Am., 95(4):2254–2263, 1994.
M. Marolt. SONIC: transcription of polyphonic piano music with neural networks. Proc. MOSART Workshop on Current Research Directions in Computer Music, pp. 217–224, Barcelona, Spain, November 2001.
K. D. Martin. Automatic transcription of simple polyphonic music: robust front end processing. Technical Report 399, MIT Media Laboratory, Perceptual Computing Section, 1996.
R. J. McAulay and T. F. Quatieri. Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. on Speech and Audio Processing, 34(4):744–754, 1986.
R. Meddis and M. J. Hewitt. Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: pitch identification. J. Acousti. Soc. Am., 89(6):2866–2882, 1991.
B. C. J. Moore, editor. Hearing – Handbook of Perception and Cognition. Academic Press, San Diego, CA, 2nd edition, 1995.
J. A. Moorer. On the Segmentation and Analysis of Continuous Musical Sound by Digital Computer. PhD thesis, Department of Music, Stanford University, 1975. Distributed as Department of Music report No. STAN-M-3.
B. A. Olshausen and D. F. Field. Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Research, 37:3311–3325, 1997.
J. Paulus and T. Virtanen. Drum transcription with non-negative spectrogram factorisation. European Signal Processing Conf., Antalya, Turkey, 2005.
G. Peterschmitt, E. Gómez, and P. Herrera. Pitch-based solo location. Proc. MOSART Workshop on Current Research Directions in Computer Music, Barcelona, Spain, 2001.
M. D. Plumbley and E. Oja. A “non-negative PCA” algorithm for independent component analysis. IEEE Trans. on Neural Networks, 15(1):66–67, 2004.
T. F. Quatieri and R. G. Danisewicz. An approach to co-channel talker interference suppression using a sinusoidal model for speech. IEEE Trans. Acoustics, Speech, and Signal Processing, 38(1):56–69, 1990.
L. R. Rabiner, M. J. Cheng, A. E. Rosenberg, and C. A. McGonegal. A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoustics, Speech, and Signal Processing, 24(5):399–418, 1976.
X. Rodet. Musical sound signal analysis/synthesis: Sinusoidal+ residual and elementary waveform models. Proc. IEEE Time-Frequency and Time-Scale Workshop, Coventry, UK, 1997.
X. Serra. Musical sound modeling with sinusoids plus noise. In C. Roads, S. Pope, A. Picialli, and G. De Poli, editors, Musical Signal Processing. pp. 91–122, Swets & Zeitlinger, 1997.
P. Smaragdis. Discovering auditory objects through non-negativity constraints. Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, Jeju, Korea, 2004.
P. Smaragdis and J. C. Brown. Non-negative matrix factorization for polyphonic music transcription. Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 177–180, New Palz, NY, 2003.
J. O. Smith and X. Serra. Parshl: an analysis/synthesis program for Proc. non-harmonic sounds based on a sinusoidal representation. Int. Computer Music Conf., pp. 290–297, Urbana, IL, 1987.
A. D. Sterian. Model-Based Segmentation of Time–Frequency Images for Musical transcription. PhD thesis, University of Michigan, 1999.
P. Stoica and R. L. Moses. Introduction to Spectral Analysis. Prentice Hall, Englwood Cliffs, NJ, 1997.
D. Talkin. A robust algorithm for pitch tracking. In W. B. Kleijn and K. K. Paliwal, editors, Speech Coding and Synthesis, pp. 495–517. Elsevier, Amsterdam, 1995.
T. Tolonen and M. Karjalainen. A computationally efficient multipitch analysis model. IEEE Trans. on Speech and Audio Processing, 8(6):708–716, 2000.
E. Vincent and X. Rodet. Music transcription with ISA and HMM. Proc. 5th Int. Symp. on Independent Component Analysis and Blind Signal Separation, pp. 1119–1204, London, U.K., 2004.
T. Virtanen. Sound source separation using sparse coding with temporal continuity objective. Proc. 2003 Int. Computer Music Conference, pp. 231–234, Singapore, 2003.
T. Virtanen. Separation of sound sources by convolutive sparse coding. Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, Jeju, Korea, 2004.
T. Virtanen, Monaural Sound Source Separation by Non-Negative Matrix Factorization with Temporal Continuity and Sparseness Criteria. IEEE Trans. Audio, Speech, and Language Processing, 15(3):1066–1074, 2007
T. Virtanen and A. Klapuri. Separation of harmonic sounds using multipitch analysis and iterative parameter estimation. Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 83–86, New Paltz, NY, 2001.
T. Virtanen and A. Klapuri. Separation of harmonic sounds using linear models for the overtone series. Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, pp. 1747–1760, Orlando, FL, 2002.
P. J. Walmsley. Signal Separation of Musical Instruments. Simulation-Based Methods for Musical Signal Decomposition and Transcription. PhD thesis, Department of Engineering, University of Cambridge, September 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Klapuri, A., Virtanen, T. (2008). Automatic Music Transcription. In: Havelock, D., Kuwano, S., Vorländer, M. (eds) Handbook of Signal Processing in Acoustics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-30441-0_20
Download citation
DOI: https://doi.org/10.1007/978-0-387-30441-0_20
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-77698-9
Online ISBN: 978-0-387-30441-0
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)