Instrument identification and pitch estimation in multi-timbre polyphonic musical signals based on probabilistic mixture model decomposition
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
In this paper, we propose a method based on probabilistic mixture model decomposition that can simultaneously identify musical instrument types, estimate pitches and assign each pitch to its source instrument in monaural polyphonic audio containing multiple sources. In the proposed system, the probability density function (PDF) of the observed mixture note is treated as a weighted sum approximation of all possible note models. These note models, covering 14 instruments and all their possible pitches, describe their dynamic frequency envelopes in terms of probability. The weight coefficients, indicating the probabilities of the existence of pitches of a certain type of instrument, are estimated using the Expectation-Maximization (EM) algorithm. The weight coefficients are used to detect the types of source instruments and the pitches. The results of experiments involving 14 instruments within a designated pitch range F3–F6 (37 pitches) demonstrate a good discrimination capability, especially in instrument identification and instrument-pitch identification. For the entire system including the note onset detection tool, using quartet polyphonic recordings, the average F-measure values of instrument-pitch identification, instrument identification and pitch estimation were 55.4, 62.5 and 86 % respectively.
- Barbedo, J. G. A., & Tzanetakis, G. (2011). Musical instrument classification using individual partials. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 111–122. CrossRef
- Bay, M., & Beauchamp, J. (2006). Harmonic source separation using prestored spectra. In Indep. Compon. Anal. and Blind Signal Separ. (pp. 561–568).
- Bertin, N., Badeau, R., Vincent, E. (2009). Fast Bayesian NMF algorithms enforcing harmonicity and temporal continuity in polyphonic music transcription. In IEEE Workshop Appl. Signal Process. Audio Acoust. (pp. 29–32). NY, USA: New Paltz.
- Bilmes, J. A. (1998). A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. International Computer Science Institute, 4, 126.
- Brown, J. C. (1991). Calculation of a constant Q spectral transform (Vol. 89, Vol. 1): Vision and modeling group, media laboratory, Massachusetts Institute of Technology.
- Burred, J.J., Robel, A., Sikora, T. (2010). Dynamic spectral envelope modeling for timbre analysis of musical instrument sounds. Audio, Speech, and Language Processing, IEEE Transactions on, 18(3), 663–674. CrossRef
- Dessein, A., Cont, A., Lemaitre, G. (2010). Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. In Int. soc. for music inf. retrieval conf., Utrecht, Netherlands.
- Dziubinski, M., Dalka, P., Kostek, B. (2005). Estimation of musical sound separation algorithm effectiveness employing neural networks. Journal of Intelligent Information Systems, 24(2), 133–157. CrossRef
- Essid, S., Richard, G., David, B. (2006). Musical instrument recognition by pairwise classification strategies. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1401–1412. CrossRef
- Goto, M. (2004). A predominant-F0 estimation method for polyphonic musical audio signals. In Proc. int. cong. on acoustics, ICA (pp. 1085–1088).
- Grindlay, G., & Ellis, D.P.W. (2010). A probabilistic subspace model for multi-instrument polyphonic transcription. In Int. soc. for music inf. retrieval conf., Utrecht, Netherlands (pp. 21–26).
- Heittola, T., Klapuri, A., Virtanen, T. (2009). Musical instrument recognition in polyphonic audio using source-filter model for sound separation. In Int. soc. for music inf. retrieval conf., Kobe, Japan (pp. 327–332).
- Hofmann, T. (1999). Probabilistic latent semantic indexing. In ACM proceedings of twenty-second annual int. SIGIR conf (pp. 50–57). New York: ACM.
- Hu, Y., & Liu, G. (2011). Dynamic characteristics of musical note for musical instrument classification. In IEEE int. conf. on signal processing, communications and computing (pp. 1–6). Xi’an, China: IEEE.
- Jiang, W., Wieczorkowska, A., & Raś, Z. (2009). Music instrument estimation in polyphonic sound based on short-term spectrum match. Foundations of Computational Intelligence, 2, 259–273.
- Joder, C., Essid, S., Richard, G. (2009). Temporal integration for audio classification with application to musical instrument classification. Audio, Speech, and Language Processing, IEEE Transactions on, 17(1), 174–186. CrossRef
- Kameoka, H., Nishimoto, T., Sagayama, S. (2007). A multipitch analyzer based on harmonic temporal structured clustering. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 982–994. CrossRef
- Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G. (2007). Instrogram: probabilistic representation of instrument existence for polyphonic music. Information and Media Technologies, 2(1), 279–291.
- Kostek, B. (2004). Musical instrument classification and duet analysis employing music information retrieval techniques. Proceedings of the IEEE, 92(4), 712–729. CrossRef
- Kursa, M., Rudnicki, W., Wieczorkowska, A., Kubera, E., Kubik-Komar, A. (2009). Musical instruments in random forest. Foundations of Intelligent Systems, 281–290.
- Li, Y., Woodruff, J., Wang, D.L. (2009). Monaural musical sound separation based on pitch and common amplitude modulation. IEEE Transactions on Audio, Speech, and Language Processing, 17(7), 1361–1371. CrossRef
- Loughran, R., Walker, J., O’Neill, M., O’Farrell, M. (2008). The use of mel-frequency cepstral coefficients in musical instrument identification. In Proc. of the international computer music conference (ICMC), SARC, Belfast, N. Ireland.
- Rao, P., & Shandilya, S. (2004). On the detection of melodic pitch in a percussive background. Journal of Audio Engineering Soc., 52(4), 378–391.
- Shashanka, M., Raj, B., Smaragdis, P. (2008). Probabilistic latent variable models as nonnegative factorizations. Computational Intelligence and Neuroscience, 2008, 947438. CrossRef
- Smaragdis, P., Raj, B., Shashanka, M. (2006). A probabilistic latent variable model for acoustic modeling. In Advances in Models for Acoustic Processing, NIPS (Vol. 146).
- Vincent, E., Bertin, N., Badeau, R. (2010). Adaptive harmonic spectral decomposition for multiple pitch estimation. Audio, Speech, and Language Processing, IEEE Transactions on, 18(3), 528–537. CrossRef
- Wieczorkowska, A.A., & Kubera, E. (2010). Identification of a dominating instrument in polytimbral same-pitch mixes using SVM classifiers with non-linear kernel. Journal of Intelligent Information Systems, 34(3), 275–303. CrossRef
- Wu, J., Vincent, E., Raczynski, S., Nishimoto, T., Ono, N., Sagayama, S. (2011). Polyphonic pitch estimation and instrument identification by joint modeling of sustained and attack sounds. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1124–1132. CrossRef
- Instrument identification and pitch estimation in multi-timbre polyphonic musical signals based on probabilistic mixture model decomposition
Journal of Intelligent Information Systems
Volume 40, Issue 1 , pp 141-158
- Cover Date
- Print ISSN
- Online ISSN
- Springer US
- Additional Links
- Instrument identification
- Instrument-pitch identification
- Pitch estimation
- EM algorithm
- Probabilistic model
- Industry Sectors