An Auditory Model Based Approach for Melody Detection in Polyphonic Musical Recordings

  • Rui Pedro Paiva
  • Teresa Mendes
  • Amílcar Cardoso
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3310)


We present a method for melody detection in polyphonic musical signals based on a model of the human auditory system. First, a set of pitch candidates is obtained for each frame, based on the output of an ear model and periodicity detection using correlograms. Trajectories of the most salient pitches are then constructed. Next, note candidates are obtained by trajectory segmentation (in terms of frequency and pitch salience variations). Too short, low-salience and harmonically-related notes are then eliminated. Finally, the melody is extracted by selecting the most important notes at each time, based on their pitch salience. We tested our method with excerpts from 12 songs encompassing several genres. In the songs where the solo stands out clearly, most of the melody notes were successfully detected. However, for songs where the melody is not that salient, the algorithm was not very accurate. Nevertheless, the followed approach seems promising.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bainbridge, D., Nevill-Manning, C., Witten, I., Smith, L., McNab, R.: Towards a Digital Library of Popular Music. In: ACM International Conference on Digital Libraries, pp. 161–169 (1999)Google Scholar
  2. 2.
    Bello, J.P., Monti, G., Sandler, M.: Techniques for Automatic Music Transcription. In: First International Symposium on Music Information Retrieval (2000)Google Scholar
  3. 3.
    Bregman, A.S.: Auditory Scene Analysis: the Perceptual Organization of Sound. MIT Press, Cambridge (1990)Google Scholar
  4. 4.
    Chai, W.: Melody Retrieval on the Web. MSc Thesis, Massachusetts Institute of Technology (2001)Google Scholar
  5. 5.
    Ellis, D.: Prediction-Driven Computational Auditory Scene Analysis. PhD Thesis, Massachusetts Institute of Technology (1996)Google Scholar
  6. 6.
    Ghias, A., Logan, J., Chamberlin, D., Smith, B.C.: Query by Humming: Musical Information Retrieval in an Audio Database. In: ACM Multimedia Conference (1995)Google Scholar
  7. 7.
    Goto, M.: A Predominant-F0 Estimation Method for CD Recordings: MAP Estimation Using EM Algorithm for Adaptive Tone Models. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2001)Google Scholar
  8. 8.
    Handel, S.: Listening – An Introduction to the Perception of Auditory Events. MIT Press, Cambridge (1991)Google Scholar
  9. 9.
    Hartmann, W.M.: Signals, Sound and Sensation. AIP Press (1997)Google Scholar
  10. 10.
    Klapuri, A.: Multipitch Estimation and Sound Separation by the Spectral Smoothness Principle. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2001)Google Scholar
  11. 11.
    Lyon, R.F.: A Computational Model of Filtering, Detection and Compression in the Cochlea. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1282–1285 (1982)Google Scholar
  12. 12.
    McAulay, R.J., Quatieri, T.F.: Speech Analysis/Synthesis based on a Sinusoidal Representation. IEEE Transactions on Acoustics, Speech and Signal Processing 34(4), 744–754 (1986)CrossRefGoogle Scholar
  13. 13.
    Martin, K.D.: Automatic Transcription of Simple Polyphonic Music: Robust Front End Processing. In: 3rd Joint Meeting of the Acoustical Societies of America and Japan (1996)Google Scholar
  14. 14.
    Paiva, R.P., Mendes, T., Cardoso, A.: A Methodology for Detection of Melody in Polyphonic Musical Signals. In: 116thAudio Engineering Convention (2004)Google Scholar
  15. 15.
    Scheirer, E.D.: Music-Listening Systems. PhD Thesis, Massachusetts Institute of Technology (2000)Google Scholar
  16. 16.
    Serra, X.: Musical Sound Modeling with Sinusoids Plus Noise. In: Roads, C., Pope, S., Picialli, A., De Poli, G. (eds.) Musical Signal Processing (1997)Google Scholar
  17. 17.
    Slaney, M., Lyon, R.F.: On the Importance of Time – A Temporal Representation of Sound. In: Cooke, M., Beet, S., Crawford, M. (eds.) Visual Representations of Speech Signals (1993)Google Scholar
  18. 18.
    Slaney, M.: Auditory Toolbox: A Matlab Toolbox for Auditory Modeling Work (version 2). Technical Report, Interval Research Corporation (1998)Google Scholar
  19. 19.
    Smith, S.: The Scientist and Engineer’s Guide to Digital Signal Processing. California Technical Publishing (1997)Google Scholar
  20. 20.
    Song, J., Bae, S.Y., Yoon, K.: Mid-Level Music Melody Representation of Polyphonic Audio for Query-by-Humming System. In: International Symposium on Music Information Retrieval (2002)Google Scholar
  21. 21.
    Uitdenbogerd, A.L., Zobel, J.: Music ranking techniques evaluated. In: Australasian Computer Science Conference, pp. 275–283 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Rui Pedro Paiva
    • 1
  • Teresa Mendes
    • 1
  • Amílcar Cardoso
    • 1
  1. 1.CISUC – Center for Informatics and Systems of the University of Coimbra, Department of Informatics EngineeringUniversity of Coimbra (Polo II)CoimbraPortugal

Personalised recommendations