Multiple Predominant Instruments Recognition in Polyphonic Music Using Spectro/Modgd-gram Fusion

Lekshmi , C. R.; Rajeev, Rajan

doi:10.1007/s00034-022-02278-y

Multiple Predominant Instruments Recognition in Polyphonic Music Using Spectro/Modgd-gram Fusion

Published: 18 January 2023

Volume 42, pages 3464–3484, (2023)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

265 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Identification of multiple predominant instruments in polyphonic music is addressed using convolutional neural networks (CNN) through Mel-spectrogram, modgd-gram, and its fusion. Modgd-gram, a visual representation, is obtained by stacking modified group delay functions of consecutive frames successively. CNN learns the distinctive local characteristics from the visual representation and classifies the instrument to the group to which it belongs. The proposed system is systematically evaluated using the IRMAS dataset. We trained our networks using fixed-length audio excerpts to recognize multiple predominant instruments from the variable-length testing files. A wave-generative adversarial network (WaveGAN) architecture is also employed to generate audio files for data augmentation. We experimented with different fusion techniques, early fusion, mid-level fusion, and late or score-level fusion. The late fusion experiment reports a micro and macro F1 score of 0.69 and 0.62, respectively. These metrics are 7.81% and 12.73% higher than those obtained by the state-of-the-art Han’s model. The architectural choice of CNN with score-level fusion on Mel-spectro/modgd-gram has merit in recognizing the predominant instruments in polyphonic music.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music

Article Open access 16 May 2022

Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music Using Discrete Wavelet Transform

Article 19 March 2024

Exploiting cepstral coefficients and CNN for efficient musical instrument classification

Article 14 September 2023

Data availability

The datasets analyzed during the current study are available at https://www.upf.edu/web/mtg/irmas

References

M. Airaksinen, L. Juvela, P. Alku, O. Rsnen, Data augmentation strategies for neural network F0 estimation. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),10-15 Brighton, UK, (2019)
R. Ajayakumar, R. Rajan, Predominant Instrument Recognition in Polyphonic Music Using GMM-DNN Framework. in Proc. of International Conference on Signal Processing and Communications (SPCOM), (2020),1-5
G. Atkar, P. Jayaraju, Speech synthesis using generative adversarial network for improving readability of Hindi words to recuperate from dyslexia. Neural Computing and Applications, 1-10 (2021)
J.J. Bosch, J. Janer, F. Fuhrmann, P. Herrera, A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: Proceedings of 13th International Society for Music Information Retrieval Conference (ISMIR) 552-564 (2012)
C. Chen, Q. Li, A multimodal music emotion classification method based on multi-feature combined network classifier. Math. Probl. Eng. 2020 (2020)
S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Article Google Scholar
A. Diment, P. Rajan, T. Heittola, T. Virtanen, Modified group delay feature for musical instrument recognition. In: Proceedings of 10th International Symposium on Computer Music Multidisciplinary Research (CMMR), Marseille, France, 431-438 (2013)
T.-B. Do, H.-H. Nguyen, T.-T.-N. Nguyen, H. Vu, T.-T.-H. Tran, T.-L. Le, Plant identification using score-based fusion of multi-organ images. In: Proceedings of 9th International Conference on Knowledge and Systems Engineering (KSE), 191-196 (2017)
C. Donahue, J.J. McAuley, M. Puckette, Adversarial audio synthesis. In: Proceedings of International Conference on Learning Representations (ICLR), 1-16 (2019)
Z. Duan, J. Han, B. Pardo, Multi-pitch streaming of harmonic sound mixtures. IEEE/ACM Trans. Audio Speech Language Process. 22(1), 138–150 (2013)
Article Google Scholar
F. Fuhrmann, P. Herrera, Polyphonic instrument recognition for exploring semantic similarities in music. In: Proceedings of 13th International Conference on Digital Audio Effects DAFx10, pp. 1-8 (2010)
J. Gao, P. Li, Z. Chen, J. Zhang, A survey on deep learning for multimodal data fusion. Neural Comput. 32(5), 829–864 (2020). https://doi.org/10.1162/necoa01273
Article MathSciNet MATH Google Scholar
D. Ghosal, M.H. Kolekar, Music genre recognition using deep neural networks and transfer learning. In: Proceedings of Interspeech, 2087-2091 (2018)
X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth International conference on artificial intelligence and statistics, 249-256 (2010). JMLR Workshop and Conference Proceedings
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. Courville, Improved training of wasserstein GANs. In: Proceedings of Neural Information Processing System (NIPS) (2017)
S. Gururani, C. Summers, A. Lerch, Instrument activity detection in polyphonic music using deep neural networks. In: Proceedings of International Society for Music Information Retrieval Conference (ISMIR), 569-576 (2018)
Y. Han, J. Kim, K. Lee, Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans Audio Speech Language Process. 25(1), 208–221 (2017)
Article Google Scholar
B. Hariharan, P. Arbeláez, R. Girshick, J. Malik, Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 447-456 (2015)
T. Heittola, A. Klapuri, T. Virtanen, Musical instrument recognition in polyphonic audio using source-filter model for sound separation. In: Proceedings of International Society of Music Information Retrieval Conference, 327-332 (ISMIR) (2009)
G.C. Juan, A. Jakob, E. Cano, Jazz solo instrument classification with convolutional neural networks, source separation, and transfer learning. In: Proceedings of International Society for Music Information Retrieval Conference, 577-584,(ISMIR) (2018)
T. Kitahara, M. Goto, K. Komatani, T. Ogata, H.G. Okuno, Instrument identification in polyphonic music: feature weighting to minimize influence of sound overlaps. EURASIP J. Adv. Signal Process. 2007, 1–15 (2006)
Article MATH Google Scholar
A. Kratimenos, K. Avramidis, C. Garoufis, A. Zlatintsi, P. Maragos, Augmentation methods on monophonic audio for instrument classification in polyphonic music. In: Proceedings of 28th European Signal Processing Conference, 156-160 (2021). IEEE
J. Kong, J. Kim, J. Bae, Hifi-gan: generative adversarial networks for efficient and high fidelity speech synthesis. Adv. Neural Inf. Process. Syst. 33, 17022–17033 (2020)
Google Scholar
P. Li, J. Qian, T. Wang, Automatic instrument recognition in polyphonic music using convolutional neural networks. arXiv:1511.05520 (2015)
C.-J. Lin, C.-H. Lin, S.-Y. Jeng, Using feature fusion and parameter optimization of dual-input convolutional neural network for face gender recognition. Appl. Sci. (2020). https://doi.org/10.3390/app10093166
Article Google Scholar
A. Madhu, S. Kumaraswamy, Data augmentation using generative adversarial network for environmental sound classification. In: Proceedings of 27th European Signal Processing Conference, 1-5 (2019). IEEE
B. McFee, C. Raffel, D. Liang, D. Ellis, M. Mcvicar, E. Battenberg, O. Nieto, librosa: Audio and music signal analysis in python, pp. 18-24 (2015). https://doi.org/10.25080/Majora-7b98e3ed-003
S. Motamed, P. Rogalla, F. Khalvati, Data augmentation using generative adversarial networks (gans) for gan-based detection of pneumonia and covid-19 in chest x-ray images. Inf. Med. Unlock. 27, 100779 (2021)
Article Google Scholar
H.A. Murthy, B. Yegnanarayana, Group delay functions and its applications in speech technology. Sadhana 36(5), 745–782 (2011)
Article Google Scholar
A.V. Oppenheim, R.W. Schafer, Discrete Time Signal Processing (Prentice Hall Inc, New Jersey, 1990)
MATH Google Scholar
S. Oramas, F. Barbieri, O. Nieto Caballero, X. Serra, Multimodal deep learning for music genre classification. Trans. Int. Soc. Music Inf. 4-21 (2018)
D. O’Shaughnessy, Speech communication: human and machine. Universities press, 1-5 (1987)
L. Perez, J. Wang, The effectiveness of data augmentation in image classification using deep learning. arXiv:1712.04621 (2017)
J. Pons, O. Slizovskaia, R. Gong, E. Gomez, X. Serra, Timbre analysis of music audio signals with convolutional neural networks. In: Proceedings of 25th European Signal Processing Conference, 2744-2748 (2017). IEEE
K. Racharla, V. Kumar, C.B. Jayant, A. Khairkar, P. Harish, Predominant musical instrument classification based on spectral features. In: Proceedings of 7th International Conference on Signal Processing and Integrated Networks (SPIN), 617-622 (2020)
R. Rajan, H.A. Murthy, Two-pitch tracking in co-channel speech using modified group delay functions. Speech Commun. 89, 37–46 (2017)
Article Google Scholar
R. Rajan, H.A. Murthy, Group delay based melody monopitch extraction from music. In: Proceedings of the IEEE International Conference on Audio, Speech and Signal Processing, 186-190 (2013)
R. Rajan, Estimating pitch in speech and music using modified group delay functions. Ph.D. dissertation, Indian Institute of Technology, Madras (2017)
R. Rajan, H.A. Murthy, Music genre classification by fusion of modified group delay and melodic features. In: Proceedings of Twenty-third National Conference on Communications (NCC), 1-6 (2017). https://doi.org/10.1109/NCC.2017.8077056
R. Rajan, H.A. Murthy, Melodic pitch extraction from music signals using modified group delay functions. In: Proceedings of 2013 National Conference on Communications (NCC), pp. 1-5. IEEE, (2013)
L.C. Reghunath, R. Rajan, Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music. EURASIP Journal on Audio, Speech, and Music Processing, 11 (2022),1–14, Springer. https://doi.org/10.1186/s13636-022-00245-8
L.C. Reghunath, R. Rajan, Attention-based predominant instruments recognition in polyphonic music. In: Proceedings of 18th Sound and Music Computing Conference (SMC),(2021),199-206
J. Sebastian, H.A. Murthy, Group delay-based music source separation using deep recurrent neural networks. In: Proceedings of International Conference on Signal Processing and Communications (SPCOM), 1-5 (2016). IEEE
M. Seeland, P. Mäder, Multi-view classification with convolutional neural networks. PLOS ONE 16, 0245230 (2021). https://doi.org/10.1371/journal.pone.0245230
Article Google Scholar
O. Slizovskaia, E. Gomez Gutierrez, G. Haro Ortega, Automatic musical instrument recognition in audiovisual recordings by combining image and audio classification strategies. In: Proceedings of 13th Sound and Music Computing Conference (SMC) 2016, 442-7 (2016)
M. Sukhavasi, S. Adapa, Music theme recognition using cnn and self-attention. arXiv preprint arXiv:1911.07041 (2019)
M. Uzair, N. Jamil, Effects of hidden layers on the efficiency of neural networks. In: Proceedings of IEEE 23rd International Multitopic Conference (INMIC), 1-6 (2020). IEEE
W. Yao, A. Moumtzidou, C.O. Dumitru, A. Stelios, I. Gialampoukidis, S. Vrochidis, M. Datcu, I. Kompatsiaris, Early and late fusion of multiple modalities in sentinel imagery and social media retrieval. In: Proceedings of International Conference of Pattern Recognition (ICPR) (2021)
D. Yu, H. Duan, J. Fang, B. Zeng, Predominant instrument recognition based on deep neural network with auxiliary classification. IEEE/ACM Trans. Audio Speech Language Process. 28, 852–861 (2020)
Article Google Scholar
M.D. Zeiler, R. Fergus, T visualizing and understanding convolutional networks. In: Proceedings of European conference on computer vision (ECCV), 818-8331 (2014)

Download references

Author information

Authors and Affiliations

Audio, Speech and Language Lab, Department of Electronics and Communication, College of Engineering Trivandrum, Trivandrum, Kerala, 695016, India
C. R. Lekshmi & Rajan Rajeev

Authors

C. R. Lekshmi
View author publications
You can also search for this author in PubMed Google Scholar
Rajan Rajeev
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Lekshmi C. R and Rajeev Rajan jointly designed, implemented, and interpreted the computer simulations and prepared the manuscript. RR implemented the modgd-gram algorithm.

Corresponding authors

Correspondence to C. R. Lekshmi or Rajan Rajeev.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lekshmi , C.R., Rajeev, R. Multiple Predominant Instruments Recognition in Polyphonic Music Using Spectro/Modgd-gram Fusion. Circuits Syst Signal Process 42, 3464–3484 (2023). https://doi.org/10.1007/s00034-022-02278-y

Download citation

Received: 11 March 2022
Revised: 19 December 2022
Accepted: 20 December 2022
Published: 18 January 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00034-022-02278-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple Predominant Instruments Recognition in Polyphonic Music Using Spectro/Modgd-gram Fusion

Abstract

Access this article

Similar content being viewed by others

Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music

Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music Using Discrete Wavelet Transform

Exploiting cepstral coefficients and CNN for efficient musical instrument classification

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiple Predominant Instruments Recognition in Polyphonic Music Using Spectro/Modgd-gram Fusion

Abstract

Access this article

Similar content being viewed by others

Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music

Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music Using Discrete Wavelet Transform

Exploiting cepstral coefficients and CNN for efficient musical instrument classification

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation