Group Delay Function from All-Pole Models for Musical Instrument Recognition

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8905)


In this work, the feature based on the group delay function from all-pole models (APGD) is proposed for pitched musical instrument recognition. Conventionally, the spectrum-related features take into account merely the magnitude information, whereas the phase is often overlooked due to the complications related to its interpretation. However, there is often additional information concealed in the phase, which could be beneficial for recognition. The APGD is an elegant approach to inferring phase information, which lacks of the issues related to interpreting the phase and does not require extensive parameter adjustment. Having shown applicability for speech-related problems, it is now explored in terms of instrument recognition. The evaluation is performed with various instrument sets and shows noteworthy absolute accuracy gains of up to 7 % compared to the baseline mel-frequency cepstral coefficients (MFCCs) case. Combined with the MFCCs and with feature selection, APGD demonstrates superiority over the baseline with all the evaluated sets.


Musical instrument recognition Music information retrieval All-pole group delay feature Phase spectrum 


  1. 1.
    Agostini, G., Longari, M., Pollastri, E.: Musical instrument timbres classification with spectral features. In: IEEE Fourth Workshop on Multimedia Signal Processing, pp. 97–102 (2001)Google Scholar
  2. 2.
    Alsteris, L.D., Paliwal, K.K.: Short-time phase spectrum in speech processing: a review and some experimental results. Digital Signal Proc. 17(3), 578–616 (2007)CrossRefGoogle Scholar
  3. 3.
    Banno, H., Lu, J., Nakamura, S., Shikano, K., Kawahara, H.: Efficient representation of short-time phase based on group delay. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 861–864, May 1998Google Scholar
  4. 4.
    Bozkurt, B., Couvreur, L., Dutoit, T.: Chirp group delay analysis of speech signals. Speech Commun. 49, 159–176 (2007)CrossRefGoogle Scholar
  5. 5.
    Diment, A., Heittola, T., Virtanen, T.: Semi-supervised learning for musical instrument recognition. In: 21st European Signal Processing Conference 2013 (EUSIPCO 2013). Marrakech, Morocco, Sep 2013Google Scholar
  6. 6.
    Diment, A., Padmanabhan, R., Heittola, T., Virtanen, T.: Modified group delay feature for musical instrument recognition. In: 10th International Symposium on Computer Music Multidisciplinary Research (CMMR). Marseille, France, Oct 2013Google Scholar
  7. 7.
    Duxbury, C., Davies, M., Sandler, M.: Separation of transient information in musical audio using multiresolution analysis techniques. In: Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-01). Limerick, Ireland (2001)Google Scholar
  8. 8.
    Eronen, A.: Comparison of features for musical instrument recognition. In: 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, pp. 19–22 (2001)Google Scholar
  9. 9.
    Fletcher, N.H., Rossing, T.D.: The Physics of Musical Instruments. Springer, New York (1998)CrossRefzbMATHGoogle Scholar
  10. 10.
    Fuhrmann, F.: Automatic musical instrument recognition from polyphonic music audio signals. Ph.D. thesis, Universitat Pompeu Fabra (2012)Google Scholar
  11. 11.
    Giannoulis, D., Klapuri, A.: Musical instrument recognition in polyphonic audio using missing feature approach. IEEE Trans. Audio Speech Lang. Process. 21(9), 1805–1817 (2013)CrossRefGoogle Scholar
  12. 12.
    Goto, M., Hashiguchi, H., Nishimura, T., Oka, R.: RWC music database: music genre database and musical instrument sound database. In: Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR), pp. 229–230 (2003)Google Scholar
  13. 13.
    Hacihabiboglu, H., Canagarajah, N.: Musical instrument recognition with wavelet envelopes. In: Proceedings of Forum Acusticum Sevilla (CD-ROM) (2002)Google Scholar
  14. 14.
    He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: NIPS, vol. 186, p. 189 (2005)Google Scholar
  15. 15.
    Hegde, R., Murthy, H., Gadde, V.: Significance of the modified group delay feature in speech recognition. IEEE Trans. Audio Speech Lang. Process. 15(1), 190–202 (2007)CrossRefGoogle Scholar
  16. 16.
    Jensen, K.: Timbre models of musical sounds: from the model of one sound to the model of one instrument. Report, Københavns Universitet (1999)Google Scholar
  17. 17.
    Kaminsky, I., Materka, A.: Automatic source identification of monophonic musical instrument sounds. In: Proceedings of IEEE International Conference on Neural Networks, IEEE, vol. 1, pp. 189–194 (1995)Google Scholar
  18. 18.
    Karjalainen, M., Hrm, A., Laine, U.K., Huopaniemi, J.: Warped filters and their audio applications. In: 1997 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, IEEE, pp. 4 (1997)Google Scholar
  19. 19.
    Klapuri, A.: Analysis of musical instrument sounds by source-filter-decay model. In: IEEE International Conference on Acoustics, Speech and Signal Processing. vol. 1, pp. I-53–I-56 (2007)Google Scholar
  20. 20.
    Kostek, B., Czyzewski, A.: Representing musical instrument sounds for their automatic classification. J. Audio Eng. Soc. 49(9), 768–785 (2001)Google Scholar
  21. 21.
    Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)CrossRefGoogle Scholar
  22. 22.
    Marques, J., Moreno, P.J.: A study of musical instrument classification using gaussian mixture models and support vector machines. Cambridge Research Laboratory Technical Report Series CRL 4 (1999)Google Scholar
  23. 23.
    Meillier, J.L., Chaigne, A.: AR modeling of musical transients. In: 1991 International Conference on Acoustics, Speech, and Signal Processing. ICASSP-91, IEEE, pp. 3649–3652 (1991)Google Scholar
  24. 24.
    Murthy, H., Gadde, V.: The modified group delay function and its application to phoneme recognition. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (ICASSP ’03), vol. 1, pp. I-68-71 (2003)Google Scholar
  25. 25.
    Rajan, P., Kinnunen, T., Hanili, C., Pohjalainen, J., Alku, P.: Using group delay functions from all-pole models for speaker recognition. Proc. Interspeech 2013, 2489–2493 (2013)Google Scholar
  26. 26.
    Sturm, B., Morvidone, M., Daudet, L.: Musical instrument identification using multiscale mel-frequency cepstral coefficients. In: Proceedings of the European Signal Processing Conference (EUSIPCO), pp. 477–481 (2010)Google Scholar
  27. 27.
    Yegnanarayana, B.: Formant extraction from linear-prediction phase spectra. J. Acoust. Soc. Am. 63(5), 1638–1640 (1978)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Department of Signal ProcessingTampere University of TechnologyTampereFinland
  2. 2.School of Computing and Electrical EngineeringIndian Institute of TechnologyMandiIndia

Personalised recommendations