Exploring Perceptual Based Timbre Feature for Singer Identification

  • Swe Zin Kalayar Khine
  • Tin Lay Nwe
  • Haizhou Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4969)


Timbre can be defined as feature of an auditory stimulus that allows us to distinguish the sounds which have the same pitch and loudness. In this paper, we explore timbre based perceptual feature for singer identification. We start with a vocal detection process to extract the vocal segments from the sound. The cepstral coefficients, which reflect timbre characteristics, are then computed from the vocal segments. The cepstral coefficients of timbre are formulated by combining information of harmonic and the dynamic characteristics of the sound such as vibrato and the attack-decay envelope of the songs. Bandpass filters that spread according to the octave frequency scale are used to extract vibrato and harmonic information of sounds. The experiments are conducted on a database of 84 popular songs. The results show that the proposed timbre based perceptual feature is robust and effective. We achieve an average error rate of 12.2% in segment level singer identification.


Timbre Singing Voice Detection Vibrato Harmonic 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bartsch, M.A., Wakefield, G.H.: Singing Voice Identification Using Spectral Envelope Estimation. IEEE Transactions, Speech and Audio Processing 12, 100–109 (2004)CrossRefGoogle Scholar
  2. 2.
    Bretos, J., Sundberg, J.: Measurements of Vibrato Parameters in Long Sustained Crescendo Notes As Sung by Ten Sopranos. Journal of Voice 17, 343–352 (2003)CrossRefGoogle Scholar
  3. 3.
    Cleveland, T.F.: Acoustic Properties of Voice Timbre Types and Their Influence on Voice Classification. Journal of Acoustical Society of America 61, 1622–1629 (1977)CrossRefGoogle Scholar
  4. 4.
    Dejonckere, P.H., Hirano, M., Sundberg, J.: Vibrato, ch. 2. Singular Pub., San Diego (1995)Google Scholar
  5. 5.
    Dromey, C., Carter, N., Hopkin, A.: Vibrato Rate Adjustment. Journal of Voice 17, 168–178 (2003)CrossRefGoogle Scholar
  6. 6.
    Erickson, M., Perry, S., Handel, S.: Discrimination Functions: Can They Be Used to Classify Singing Voices? Journal of Voice 15, 492–502 (2001)CrossRefGoogle Scholar
  7. 7.
    Everest, F.A.: Master Handbook of Acoustics. McGraw-Hill Professional, New York (2000)Google Scholar
  8. 8.
    Joliveau, E., Smith, J., Wolfe, J.: Vocal Tract Resonances in Singing: The Soprano Voice. Journal of Acoustical Society of America 116, 2434–2439 (2004)CrossRefGoogle Scholar
  9. 9.
    Poli, G.D., Prandoni, P.: Sonological Models for Timber Characterization. Journal of New Music Research 26, 170–197Google Scholar
  10. 10.
    Nwe, T.L., Foo, S.W., De Silva, L.C.: Stress classification using subband based features. IEICE Trans. Information and Systems, Special Issue on Speech Information Processing E86-D(3), 565–573 (2003)Google Scholar
  11. 11.
    Nwe, T.L., Li, H.: Exploring Vibrato-Motivated Acoustic Features for Singer Identification. IEEE Transactions, Audio, Speech and Language Processing 15(2) (2007)Google Scholar
  12. 12.
    Sukkar, R.A., Lee, C.H.: Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition. IEEE Trans. Speech and Audio Processing 4, 420–429 (1996)CrossRefGoogle Scholar
  13. 13.
    Sundberg, J.: The Science of Singing Voice. Northern Illinois University Press (1987)Google Scholar
  14. 14.
    Timmers, R., Desain, P.: Vibrato: Questions and Answers from Musicians and Science. In: Proc. Int. Conf. On Music Perception And Cognition, England (2000)Google Scholar
  15. 15.
    Winckell, F.: Music, Sound and Sensation. Dover, NY (1967)Google Scholar
  16. 16.
    Zhang, T.: System and method for automatic singer identification. In: Proceedings IEEE International Conference Multimedia and Expo., Baltimore, MD (2003)Google Scholar
  17. 17.
    Zhang, T., Kuo, C.C.J.: Content-Based Audio Classification and Retrieval for Data Parsing. Kluwer Academic Publishers, USA (2001)MATHGoogle Scholar
  18. 18.
    Helmholtz, H.: On the Sensation of Tone. Dover Publication, New York (1954)Google Scholar
  19. 19.
    Fredouille, C., Bonastre, J.-F., Merlin, T.: Bayesian approach based-decision in speaker verification, A Speaker Odyssey, Crete, Greece (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Swe Zin Kalayar Khine
    • 1
  • Tin Lay Nwe
    • 1
  • Haizhou Li
    • 1
  1. 1.Institute for Infocomm Research Singapore

Personalised recommendations