Skip to main content

Effects of Soft-Masking Function on Spectrogram-Based Instrument - Vocal Separation

  • Conference paper
  • First Online:
Computational Linguistics (PACLING 2019)

Abstract

This paper presents an analysis of effects of soft-masking function on spectrogram-based instrument - vocal separation for audio signals. The function taken into consideration is of 1st-order with two masking magnitude parameters: one for background and one foreground separation. It is found that as the masking magnitude increases, the signal estimations are improved. The background signal’s spectrogram becomes closer to that of the original signal while the foreground signal’s spectrogram represents better the vocal wiggle lines compared to the original signal spectrogram. With the same increase in the masking magnitude (up to ten-fold), the effect on background signal spectrogram is more significant compared to that of foreground signal. This is evident through the significant (\(\approx \)three times) reduction of background signal’s root-mean-square (RMS) values and the less significant reduction (approximately one-third) of foreground signal’s RMS values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Andersen, K.T., Moonen, M.: Robust speech-distortion weighted interframe wiener filters for single-channel noise reduction. IEEE/ACM Trans. Audio Speech Lang. Process. 26(1), 97–107 (2018). https://doi.org/10.1109/TASLP.2017.2761699

    Article  Google Scholar 

  2. Arık, S.O., Jun, H., Diamos, G.: Multi-head convolutional neural networks. IEEE Signal Process. Lett. 26(1), 94–98 (2019). https://doi.org/10.1109/LSP.2018.2880284

    Article  Google Scholar 

  3. Badawy, D.E., Duong, N.Q.K., Ozerov, A.: On-the-fly audio source separation-a novel user-friendly framework. IEEE/ACM Trans. Audio Speech Lang. Process. 25(2), 261–272 (2017). https://doi.org/10.1109/TASLP.2016.2632528

    Article  Google Scholar 

  4. Braun, S., Habets, E.A.P.: Linear prediction-based online dereverberation and noise reduction using alternating kalman filters. IEEE/ACM Trans. Audio Speech Lang. Process. 26(6), 1119–1129 (2018). https://doi.org/10.1109/TASLP.2018.2811247

    Article  Google Scholar 

  5. Buades, A., Coll, B., Morel, J.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 60–65, June 2005. https://doi.org/10.1109/CVPR.2005.38

  6. Cheer, J., Daley, S.: An investigation of delayless subband adaptive filtering for multi-input multi-output active noise control applications. IEEE/ACM Trans. Audio Speech Lang. Process. 25(2), 359–373 (2017). https://doi.org/10.1109/TASLP.2016.2637298

    Article  Google Scholar 

  7. Chung, T.D., Ibrahim, R.B., Asirvadam, V.S., Saad, N.B., Hassan, S.M.: Adopting ewma filter on a fast sampling wired link contention in wirelesshart control system. IEEE Trans. Instrum. Meas. 65(4), 836–845 (2016). https://doi.org/10.1109/TIM.2016.2516321

    Article  Google Scholar 

  8. Chung, T.D., Ibrahim, R., Asirvadam, V.S., Saad, N., Hassan, S.M.: Wireless HART: Advanced EWMA Filter Design for Industrial Wireless Networked Control Systems, 1st edn. Taylor & Francis Group, LLC, Abingdon (2017)

    Google Scholar 

  9. Crocco, M., Martelli, S., Trucco, A., Zunino, A., Murino, V.: Audio tracking in noisy environments by acoustic map and spectral signature. IEEE Trans. Cybernet. 48(5), 1619–1632 (2018). https://doi.org/10.1109/TCYB.2017.2711497

    Article  Google Scholar 

  10. Duong, T.T.H., Duong, N.Q.K., Nguyen, P.C., Nguyen, C.Q.: Gaussian modeling-based multichannel audio source separation exploiting generic source spectral model. IEEE/ACM Trans. Audio Speech Lang. Process. 27(1), 32–43 (2019). https://doi.org/10.1109/TASLP.2018.2869692

    Article  Google Scholar 

  11. Ekpo, S.C., Adebisi, B., Wells, A.: Regulated-element frost beamformer for vehicular multimedia sound enhancement and noise reduction applications. IEEE Access 5, 27254–27262 (2017). https://doi.org/10.1109/ACCESS.2017.2775707

    Article  Google Scholar 

  12. Foundation, P.S.: Python software foundation (2019). https://www.python.org/

  13. Google: Welcome to colaboratory (2019). https://colab.research.google.com

  14. He, Q., Bao, F., Bao, C.: Multiplicative update of auto-regressive gains for codebook-based speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 25(3), 457–468 (2017). https://doi.org/10.1109/TASLP.2016.2636445

    Article  Google Scholar 

  15. Itakura, K., Bando, Y., Nakamura, E., Itoyama, K., Yoshii, K., Kawahara, T.: Bayesian multichannel audio source separation based on integrated source and spatial models. IEEE/ACM Trans. Audio Speech Lang. Process. 26(4), 831–846 (2018). https://doi.org/10.1109/TASLP.2017.2789320

    Article  Google Scholar 

  16. Koluguri, N.R., Meenakshi, G.N., Ghosh, P.K.: Spectrogram enhancement using multiple window savitzky-golay (MWSG) filter for robust bird sound detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1183–1192 (2017). https://doi.org/10.1109/TASLP.2017.2690562

    Article  Google Scholar 

  17. Laufer, Y., Gannot, S.: A bayesian hierarchical model for speech enhancement with time-varying audio channel. IEEE/ACM Trans. Audio Speech Lang. Process. 27(1), 225–239 (2019). https://doi.org/10.1109/TASLP.2018.2876177

    Article  Google Scholar 

  18. Xia, L., Chung, T.D., Kassim, K.A.A.: An automobile detection algorithm development for automated emergency braking system. In: 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6, June 2014. https://doi.org/10.1145/2593069.2593083

  19. Liu, Y., Jaw, D., Huang, S., Hwang, J.: Desnownet: context-aware deep network for snow removal. IEEE Trans. Image Process. 27(6), 3064–3073 (2018). https://doi.org/10.1109/TIP.2018.2806202

    Article  MathSciNet  Google Scholar 

  20. Luis-Valero, M., Habets, E.A.P.: Low-complexity multi-microphone acoustic echo control in the short-time fourier transform domain. IEEE/ACM Trans. Audio Speech Lang. Proces. 27(3), 595–609 (2019). https://doi.org/10.1109/TASLP.2018.2885786

    Article  Google Scholar 

  21. Mahé, G., Jaïdane, M.: Perceptually controlled reshaping of sound histograms. IEEE/ACM Trans. Audio Speech Lang. Proces. 26(9), 1671–1683 (2018). https://doi.org/10.1109/TASLP.2018.2836143

    Article  Google Scholar 

  22. Marquardt, D., Doclo, S.: Interaural coherence preservation for binaural noise reduction using partial noise estimation and spectral postfiltering. IEEE/ACM Trans. Audio Speech Lang. Proces. 26(7), 1261–1274 (2018). https://doi.org/10.1109/TASLP.2018.2823081

    Article  Google Scholar 

  23. Rafii, Z., Pardo, B.: Online repet-sim for real-time speech enhancement. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 848–852, May 2013. https://doi.org/10.1109/ICASSP.2013.6637768

  24. Raguraman, P.R.M., Vijayan, M.: Librosa based assessment tool for music information retrieval systems. In: 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 109–114, March 2019. https://doi.org/10.1109/MIPR.2019.00027

  25. Shimada, K., Bando, Y., Mimura, M., Itoyama, K., Yoshii, K., Kawahara, T.: Unsupervised speech enhancement based on multichannel nmf-informed beamforming for noise-robust automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Proces. 27(5), 960–971 (2019). https://doi.org/10.1109/TASLP.2019.2907015

    Article  Google Scholar 

  26. Sienko, M.: Loop-filter design and analysis for delta-sigma modulators and oversampled IIR filters. IEEE Trans. Circuits Syst. I Regul. Pap. 65(12), 4121–4132 (2018). https://doi.org/10.1109/TCSI.2018.2838021

    Article  Google Scholar 

  27. Stallmann, C.F., Engelbrecht, A.P.: Gramophone noise detection and reconstruction using time delay artificial neural networks. IEEE Trans. Syst. Man Cybernet. Syst. 47(6), 893–905 (2017). https://doi.org/10.1109/TSMC.2016.2523927

    Article  Google Scholar 

  28. Tan, W.R., Chan, C.S., Aguirre, H.E., Tanaka, K.: ArtGAN: artwork synthesis with conditional categorical GANs. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3760–3764, September 2017. https://doi.org/10.1109/ICIP.2017.8296985

  29. L.D. Team: Librosa (2019). https://librosa.github.io/librosa/

  30. Torcoli, M., Herre, J., Fuchs, H., Paulus, J., Uhle, C.: The adjustment/satisfaction test (a/st) for the evaluation of personalization in broadcast services and its application to dialogue enhancement. IEEE Trans. Broadcast. 64(2), 524–538 (2018). https://doi.org/10.1109/TBC.2018.2832458

    Article  Google Scholar 

  31. Xu, Y., Huang, Q., Wang, W., Foster, P., Sigtia, S., Jackson, P.J.B., Plumbley, M.D.: Unsupervised feature learning based on deep models for environmental audio tagging. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1230–1241 (2017). https://doi.org/10.1109/TASLP.2017.2690563

    Article  Google Scholar 

  32. Zheng, C., Deleforge, A., Li, X., Kellermann, W.: Statistical analysis of the multichannel wiener filter using a bivariate normal distribution for sample covariance matrices. IEEE/ACM Trans. Audio Speech Lang. Process. 26(5), 951–966 (2018). https://doi.org/10.1109/TASLP.2018.2800283

    Article  Google Scholar 

Download references

Acknowledgment

The authors would thank FPT University, Hanoi, Vietnam and UCSI University, Kuala Lumpur, Malaysia for supporting this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Duc Chung Tran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tran, D.C., Ahamed Khan, M.K.A. (2020). Effects of Soft-Masking Function on Spectrogram-Based Instrument - Vocal Separation. In: Nguyen, LM., Phan, XH., Hasida, K., Tojo, S. (eds) Computational Linguistics. PACLING 2019. Communications in Computer and Information Science, vol 1215. Springer, Singapore. https://doi.org/10.1007/978-981-15-6168-9_28

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-6168-9_28

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-6167-2

  • Online ISBN: 978-981-15-6168-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics