Advertisement

Feature Analysis and Normalization Approach for Robust Content-Based Music Retrieval to Encoded Audio with Different Bit Rates

  • Shuhei Hamawaki
  • Shintaro Funasawa
  • Jiro Katto
  • Hiromi Ishizaki
  • Keiichiro Hoashi
  • Yasuhiro Takishima
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5371)

Abstract

In order to achieve highly accurate content-based music information retrieval (MIR), it is necessary to compensate the various bit rates of encoded songs which are stored in the music collection, since the bit rate differences are expected to apply a negative effect to content-based MIR results. In this paper, we examine how the bit rate differences affect MIR results, propose methods to normalize MFCC features extracted from encoded files with various bit rates, and show their effects to stabilize MIR results.

Keywords

Mel-Frequency Cepstral Coefficient (MFCC) Content-based MIR Normalization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sigurdsson, S., Petersen, K.B., Lehn-Schiøler, T.: Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music. In: Proceedings of the International Conference on Music Information Retrieval (2006)Google Scholar
  2. 2.
    Hoashi, K., Matsumoto, K., Sugaya, F., Ishizaki, H., Katto, J.: Feature space modification for content-based music retrieval based on user preferences. In: Proceedings of ICASSP, pp. 517–520 (2006)Google Scholar
  3. 3.
    Foote, J.: Content-based retrieval of music and audio. In: Proceedings of SPIE, vol. 3229, pp. 138–147 (1997)Google Scholar
  4. 4.
    Spevak, C., Favreau, E.: SOUNDSPOTTER-A prototype system for content-based audio retrieval. In: Proceedings of the International Conference on Digital Audio Effects, pp. 27–32 (2002)Google Scholar
  5. 5.
    Deshpande, H., Singh, R., Nam, U.: Classification of musical signals in the visual domain. In: Proceedings of the International Conference on Digital Audio Effects (2001)Google Scholar
  6. 6.
    Tzanetakis, G., Cook, P.: MARSYAS: A framework for audio analysis. Organized Sound 4(3), 169–175 (2000)CrossRefGoogle Scholar
  7. 7.
    Mermelstein, P.: Distance measures for speech recognition. Psychological and instrumental. Pattern Recognition and Artificial Intelligence, 374–388 (1976)Google Scholar
  8. 8.
    Logan, B.: Mel frequency cepstral coefficients for music modeling. In: Proceedings of the International Symposium on Music Information Retrieval (2000)Google Scholar
  9. 9.
    Slaney, M.: Auditory toolbox, version 2. Technical Report #1998-010, Interval Research Corporation (1998)Google Scholar
  10. 10.
    Goto, M., et al.: RWC Music Database: Music GenreDatabase and Musical Instrument Sound Database. In: Proceedings of the International Conference on Music Information Retrieval, pp. 229–230 (2003)Google Scholar
  11. 11.
    Viikki, O., Laurila, k.: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 25, 133–147 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Shuhei Hamawaki
    • 1
  • Shintaro Funasawa
    • 1
  • Jiro Katto
    • 1
  • Hiromi Ishizaki
    • 2
  • Keiichiro Hoashi
    • 2
  • Yasuhiro Takishima
    • 2
  1. 1.Waseda UniversityTokyoJapan
  2. 2.KDDI R&D Laboratories IncSaitamaJapan

Personalised recommendations