Skip to main content

Written musical notation describes music in a symbolic form that is suitable for performing a piece using the available musical instruments. Traditionally, musical notation indicates the pitch, target instrument, timing, and duration of each sound to be played. The aim of music transcription either by humans or by a machine is to infer these musical parameters, given only the acoustic recording of a performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 629.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 799.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 799.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. A. Abdallah. Towards Music Perception by Redundancy Reduction and Unsupervised Learning in Probabilistic Models. PhD thesis, Dept. Electronic Eng., King’s College London, 2002.

    Google Scholar 

  2. S. A. Abdallah and M. D. Plumbley. An independent component analysis approach to automatic music transcription. Audio Eng. Soc. 114th Convention, Preprint No. 5754, Amsterdam, Netherlands, March 2003.

    Google Scholar 

  3. M. Abe and J. O. Smith. Design criteria for simple sinusoidal parameter estimation based on quadratic interpolation of FFT magnitude peaks. Audio Eng. Soc. 117th Convention, Preprint No. 6256, San Francisco, CA, 2004.

    Google Scholar 

  4. X. Amatriain, J. Bonada, A. Loscos, and X. Serra. Spectral processing. In U. Zölzer, editor, DAFX – Digital Audio Effects. Wiley New York, 2002.

    Google Scholar 

  5. A. S. Bregman. Auditory Scene Analysis. MIT Press, Cambridge, MA, 1990.

    Google Scholar 

  6. J. C. Brown. Musical fundamental frequency tracking using a pattern recognition method. J. Acoust. Soc. Am., 92(3):1394–1402, 1992.

    Article  ADS  Google Scholar 

  7. J.-F. Cardoso. High-order contrasts for independent component analysis. Neural Computation, 11(1), 157–192, 1999.

    Article  MathSciNet  Google Scholar 

  8. M. A. Casey and A. Westner. Separation of mixed audio sources by independent subspace analysis. Proc. 2000 Int. Computer Music Conf., Berlin, Germany, 2000, pp. 154–161.

    Google Scholar 

  9. A. T. Cemgil, B. Kappen, and D. Barber. A generative model for music transcription. IEEE Trans on Speech and Audio Processing, 14(2):679–694, 2006.

    Article  Google Scholar 

  10. M. Davy and S. Godsill. Bayesian harmonic models for musical signal analysis. Proc. Seventh Valencia Int. Meeting on Bayesian Statistics 7, pp. 105–124, Tenerife, Spain, June 2002.

    Google Scholar 

  11. A. de Cheveigné. Pitch perception models. In C. J. Plack, A. J. Oxenham, R. R. Fay, and A. N. Popper, editors, Pitch. Springer, New York, 2005, pp. 169–233.

    Chapter  Google Scholar 

  12. A. de Cheveigné and H. Kawahara. Multiple period estimation and pitch perception model. Speech Communication, 27:175–185, 1999.

    Article  Google Scholar 

  13. A. de Cheveigné and H. Kawahara. Comparative evaluation of F0 estimation algorithms. Proc. 7th European Conf. Speech Communication and Technology, pp. 2451–2454, Aalborg, Denmark, 2001.

    Google Scholar 

  14. A. de Cheveigné and H. Kawahara. YIN, a fundamental frequency estimator for speech and music. J. Acoust Soc. of Am., 111(4): 1917–1930, 2002.

    Article  ADS  Google Scholar 

  15. Ph. Depalle and T. Hélie. Extraction of spectral peak parameters using a short-time fourier transform modeling and no sidelobe windows. Proc. 1997 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 19–22, New Palz, NY, 1997.

    Google Scholar 

  16. B. Doval and X. Rodet. Estimation of fundamental frequency of musical sound signals. Proc. 1991 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 3657–3660, Toronto, Canada, 1991.

    Google Scholar 

  17. D. P. W. Ellis. Prediction-Driven Computational Auditory Scene Analysis. PhD thesis, Massachusetts Institute of Technology, 1996.

    Google Scholar 

  18. D. FitzGerald. Automatic Drum Transcription and Source Separation. PhD thesis, Dublin Institute of Technology, 2004.

    Google Scholar 

  19. D. FitzGerald, M. Cranitch, and E. Coyle. Generalised prior subspace analysis for polyphonic pitch transcription. Proc. Int. Conf. Digital Audio Effects, pp. 77–81, Madrid, Spain, 2005.

    Google Scholar 

  20. N. H. Fletcher and T. D. Rossing. The Physics of Musical Instruments. Springer, Berlin, Germany, 2nd edition, 1998.

    MATH  Google Scholar 

  21. D. Godsmark and G. J. Brown. A blackboard architecture for computational auditory scene analysis. Speech Communication, 27(3):351–366, 1999.

    Article  Google Scholar 

  22. M. Goto. A robust predominant-F0 estimation method for real-time detection of melody and bass lines in cd recordings. Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, pp. 757–760, Istanbul, Turkey, June 2000.

    Google Scholar 

  23. M. Goto. A real-time music scene description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, 43(4):311–329, 2004.

    Article  Google Scholar 

  24. R. Gribonval and E. Bacry. Harmonic decomposition of audio signals with matching pursuit. IEEE Trans. Signal Processing, 51(1):101–111, 2003.

    Google Scholar 

  25. W. M. Hartmann. Signals, Sound, and Sensation. Springer, New York, 1998.

    Google Scholar 

  26. W. J. Hess. Pitch and voicing determination. In S. Furui and M. M. Sondhi, editors, Advances in Speech Signal Processing, pp. 3–48. Marcel Dekker, New York, 1991.

    Google Scholar 

  27. M. J. Hewitt and R. Meddis. An evaluation of eight computer models of mammalian inner hair-cell function. J. Acoust Soc. Am., 90(2):904–917, 1991.

    Article  ADS  Google Scholar 

  28. P. Hoyer. Non-negative sparse coding. In IEEE Workshop on Networks for Signal Processing XII, Martigny, Switzerland, 2002.

    Google Scholar 

  29. A. Hyvörinen. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neural Networks, 10(3):626–634, 1999.

    Article  Google Scholar 

  30. A. Hyvörinen, J. Karhunen, and E. Oja. Independent Component Analysis. Wiley, New York, 2001.

    Book  Google Scholar 

  31. G.-J. Jang and T.-W. Lee. A maximum likelihood approach to single channel source separation. J. Machine Learning Research, 23: 1365–1392, 2003.

    Article  MathSciNet  Google Scholar 

  32. H. Kameoka, T. Nishimoto, and S. Sagayama. Separation of harmonic structures based on tied gaussian mixture model and information criterion for concurrent sounds. In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, (ICASSP 2004), Montreal, 2004.

    Google Scholar 

  33. K. Kashino, K. Nakadai, T. Kinoshita, and H. Tanaka. Organisation of hierarchical perceptual sounds: music scene analysis with autonomous processing modules and a quantitative information integration mechanism. In Int. Joint Conf. Artificial Intelligence, pp. 158–164, Montreal, 1995.

    Google Scholar 

  34. K. Kashino and H. Tanaka. A sound source separation system with the ability of automatic tone modeling. Proc. 1993 Int. Computer Music Conf., pp. 248–255, Hong Kong, China, 1993.

    Google Scholar 

  35. M. Kay. Modern Spectral Estimation. Prentice Hall, Englewood Cliffs, NJ, 1988.

    MATH  Google Scholar 

  36. S. M. Kay. Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall, Englewood Cliffs, NJ 1993.

    MATH  Google Scholar 

  37. A. Klapuri and M. Davy, editors. Signal Processing Methods for Music Transcription. Springer, New York, 2006.

    Google Scholar 

  38. A. P. Klapuri. Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Trans. Speech and Audio Processing, 11(6):804–815, 2003.

    Article  Google Scholar 

  39. A. Klapuri. Multipitch analysis of polyphonic music and speech signals using an auditory model. IEEE Trans. Audio, Speech and Language Processing, 16(2):255–266, 2008.

    Article  Google Scholar 

  40. M. Lahat, R. Niederjohn, and D. A. Krubsack. Spectral autocorrelation method for measurement of the fundamental frequency of noise-corrupted speech. IEEE Trans. Acoustics, Speech, and Signal Processing, 6:741–750, June 1987.

    Google Scholar 

  41. D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Neural Information Processing Systems, pp. 556–562, Denver, CO, 2001.

    Google Scholar 

  42. T.-W. Lee, M. Girolami, A. J. Bell, and T. J. Sejnowski. A unifying information-theoretic framework for independent component analysis. Computers and Mathematics with Applications, 31(11): 1–21, 2000.

    Article  MathSciNet  Google Scholar 

  43. R. C. Maher. Evaluation of a method for separating digitized duet signals. J. Audio Eng. Soc., 38(12):956–979, 1990.

    Google Scholar 

  44. R. C. Maher and J. W. Beauchamp. Fundamental frequency estimation of musical signals using a two-way mismatch procedure. J. Acoust. Soc. Am., 95(4):2254–2263, 1994.

    Article  ADS  Google Scholar 

  45. M. Marolt. SONIC: transcription of polyphonic piano music with neural networks. Proc. MOSART Workshop on Current Research Directions in Computer Music, pp. 217–224, Barcelona, Spain, November 2001.

    Google Scholar 

  46. K. D. Martin. Automatic transcription of simple polyphonic music: robust front end processing. Technical Report 399, MIT Media Laboratory, Perceptual Computing Section, 1996.

    Google Scholar 

  47. R. J. McAulay and T. F. Quatieri. Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. on Speech and Audio Processing, 34(4):744–754, 1986.

    Google Scholar 

  48. R. Meddis and M. J. Hewitt. Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: pitch identification. J. Acousti. Soc. Am., 89(6):2866–2882, 1991.

    Article  ADS  Google Scholar 

  49. B. C. J. Moore, editor. Hearing – Handbook of Perception and Cognition. Academic Press, San Diego, CA, 2nd edition, 1995.

    Google Scholar 

  50. J. A. Moorer. On the Segmentation and Analysis of Continuous Musical Sound by Digital Computer. PhD thesis, Department of Music, Stanford University, 1975. Distributed as Department of Music report No. STAN-M-3.

    Google Scholar 

  51. B. A. Olshausen and D. F. Field. Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Research, 37:3311–3325, 1997.

    Article  Google Scholar 

  52. J. Paulus and T. Virtanen. Drum transcription with non-negative spectrogram factorisation. European Signal Processing Conf., Antalya, Turkey, 2005.

    Google Scholar 

  53. G. Peterschmitt, E. Gómez, and P. Herrera. Pitch-based solo location. Proc. MOSART Workshop on Current Research Directions in Computer Music, Barcelona, Spain, 2001.

    Google Scholar 

  54. M. D. Plumbley and E. Oja. A “non-negative PCA” algorithm for independent component analysis. IEEE Trans. on Neural Networks, 15(1):66–67, 2004.

    Article  Google Scholar 

  55. T. F. Quatieri and R. G. Danisewicz. An approach to co-channel talker interference suppression using a sinusoidal model for speech. IEEE Trans. Acoustics, Speech, and Signal Processing, 38(1):56–69, 1990.

    Article  Google Scholar 

  56. L. R. Rabiner, M. J. Cheng, A. E. Rosenberg, and C. A. McGonegal. A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoustics, Speech, and Signal Processing, 24(5):399–418, 1976.

    Article  Google Scholar 

  57. X. Rodet. Musical sound signal analysis/synthesis: Sinusoidal+ residual and elementary waveform models. Proc. IEEE Time-Frequency and Time-Scale Workshop, Coventry, UK, 1997.

    Google Scholar 

  58. X. Serra. Musical sound modeling with sinusoids plus noise. In C. Roads, S. Pope, A. Picialli, and G. De Poli, editors, Musical Signal Processing. pp. 91–122, Swets & Zeitlinger, 1997.

    Google Scholar 

  59. P. Smaragdis. Discovering auditory objects through non-negativity constraints. Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, Jeju, Korea, 2004.

    Google Scholar 

  60. P. Smaragdis and J. C. Brown. Non-negative matrix factorization for polyphonic music transcription. Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 177–180, New Palz, NY, 2003.

    Google Scholar 

  61. J. O. Smith and X. Serra. Parshl: an analysis/synthesis program for Proc. non-harmonic sounds based on a sinusoidal representation. Int. Computer Music Conf., pp. 290–297, Urbana, IL, 1987.

    Google Scholar 

  62. A. D. Sterian. Model-Based Segmentation of Time–Frequency Images for Musical transcription. PhD thesis, University of Michigan, 1999.

    Google Scholar 

  63. P. Stoica and R. L. Moses. Introduction to Spectral Analysis. Prentice Hall, Englwood Cliffs, NJ, 1997.

    MATH  Google Scholar 

  64. D. Talkin. A robust algorithm for pitch tracking. In W. B. Kleijn and K. K. Paliwal, editors, Speech Coding and Synthesis, pp. 495–517. Elsevier, Amsterdam, 1995.

    Google Scholar 

  65. T. Tolonen and M. Karjalainen. A computationally efficient multipitch analysis model. IEEE Trans. on Speech and Audio Processing, 8(6):708–716, 2000.

    Article  Google Scholar 

  66. E. Vincent and X. Rodet. Music transcription with ISA and HMM. Proc. 5th Int. Symp. on Independent Component Analysis and Blind Signal Separation, pp. 1119–1204, London, U.K., 2004.

    Google Scholar 

  67. T. Virtanen. Sound source separation using sparse coding with temporal continuity objective. Proc. 2003 Int. Computer Music Conference, pp. 231–234, Singapore, 2003.

    Google Scholar 

  68. T. Virtanen. Separation of sound sources by convolutive sparse coding. Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, Jeju, Korea, 2004.

    Google Scholar 

  69. T. Virtanen, Monaural Sound Source Separation by Non-Negative Matrix Factorization with Temporal Continuity and Sparseness Criteria. IEEE Trans. Audio, Speech, and Language Processing, 15(3):1066–1074, 2007

    Article  Google Scholar 

  70. T. Virtanen and A. Klapuri. Separation of harmonic sounds using multipitch analysis and iterative parameter estimation. Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 83–86, New Paltz, NY, 2001.

    Google Scholar 

  71. T. Virtanen and A. Klapuri. Separation of harmonic sounds using linear models for the overtone series. Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, pp. 1747–1760, Orlando, FL, 2002.

    Google Scholar 

  72. P. J. Walmsley. Signal Separation of Musical Instruments. Simulation-Based Methods for Musical Signal Decomposition and Transcription. PhD thesis, Department of Engineering, University of Cambridge, September 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Klapuri, A., Virtanen, T. (2008). Automatic Music Transcription. In: Havelock, D., Kuwano, S., Vorländer, M. (eds) Handbook of Signal Processing in Acoustics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-30441-0_20

Download citation

Publish with us

Policies and ethics