Ananthakrishnan KS, Dogancay K (2009) Recent trends and challenges in speech-separation systems research—-A tutorial review. In: TENCON 2009-2009 IEEE region 10 conference, pp 1–6. IEEE
Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.01271
Chen J, Wang D (2017) Long short-term memory for speaker generalization in supervised speech separation. J Acoust Soc Am 141(6):4705–4714
MathSciNet
Article
Google Scholar
Ephraim Y, Malah D (1984) Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 32(6):1109–1121. https://doi.org/10.1109/TASSP.1984.1164453
Article
Google Scholar
Erdogan H, Hershey JR, Watanabe S, Le Roux J (2015) Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. In: 2015 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 708–712. IEEE
Freesound. https://freesound.org/. Accessed: 2019-09-28
Garofalo J, Graff D, Paul D, Pallett D (2007) Csr-i (wsj0) complete. Linguistic Data Consortium, Philadelphia
Google Scholar
Garofolo JS (1993) Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium, 1993
Grais EM, Plumbley MD (2017) Single channel audio source separation using convolutional denoising autoencoders. arXiv:1703.08019
Grzywalski T, Drgas S (2018) Application of recurrent u-net architecture to speech enhancement. In: Signal processing: algorithms, architectures, arrangements, and applications (SPA), pp 82–87. IEEE
Grzywalski T, Drgas S (2019) Using recurrences in time and frequency within u-net architecture for speech enhancement. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6970–6974. IEEE
Guido RC, Pedroso F, Furlan A, Contreras RC, Caobianco LG, Neto JS (2020) Cwt× dwt× dtwt× sdtwt: clarifying terminologies and roles of different types of wavelet transforms. Int J Wavelets Multiresol Inform Proces 18(06):2030001
MathSciNet
Article
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. arXiv:1603.05027
Huang PS, Kim M, Hasegawa-Johnson M, Smaragdis P (2014) Deep learning for monaural speech separation. In: 2014 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 1562–1566. IEEE
Huang PS, Kim M, Hasegawa-Johnson M, Smaragdis P (2015) Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans Audio Speech Lang Process 23(12):2136–2147
Article
Google Scholar
Hui L, Cai M, Guo C, He L, Zhang WQ, Liu J (2015) Convolutional maxout neural networks for speech separation. In: 2015 IEEE international symposium on signal processing and information technology (ISSPIT), pp 24–27. https://doi.org/10.1109/ISSPIT.2015.7394335
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization, arXiv:1412.6980
Le Roux J, Wichern G, Watanabe S, Sarroff A, Hershey JR (2019) The phasebook: building complex masks via discrete representations for source separation. In: ICASSP 2019-2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 66–70. IEEE
Le Roux J, Wisdom S, Erdogan H, Hershey JR (2019) Sdr–half-baked or well done?. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 626–630. IEEE
Luo Y, Mesgarani N (2018) Tasnet: surpassing ideal time-frequency masking for speech separation, arXiv:1809.07454
Mesaros A, Heittola T, Virtanen T (2016) Tut database for acoustic scene classification and sound event detection. In: 2016 24th European signal processing conference (EUSIPCO), pp 1128–1132. IEEE
Mowlaee P, Saeidi R, Christensen MG, Martin R (2012) Subjective and objective quality assessment of single-channel speech separation algorithms. In: 2012 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 69–72. IEEE
Nicolson A, Paliwal KK (2019) Deep learning for minimum mean-square error approaches to speech enhancement. Speech Commun 111:44–55. https://doi.org/10.1016/j.specom.2019.06.002, https://www.sciencedirect.com/science/article/pii/S0167639318304308
Article
Google Scholar
Nicolson A, Paliwal KK (2020) Masked multi-head self-attention for causal speech enhancement. Speech Comm 125:80–96
Article
Google Scholar
Pandey A, Wang D (2019) A new framework for cnn-based speech enhancement in the time domain. IEEE/ACM Trans Audio Speech Lang Process 27 (7):1179–1188
Article
Google Scholar
Pandey A, Wang D (2019) Tcnn: temporal convolutional neural network for real-time speech enhancement in the time domain. In: ICASSP 2019-2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 6875–6879. IEEE
Park SR, Lee J (2016) A fully convolutional neural network for speech enhancement, arXiv:1609.07132
Pirhosseinloo S, Brumberg JS (2019) Monaural speech enhancement with dilated convolutions. In: Proc. Interspeech 2019, pp 3143–3147. https://doi.org/10.21437/Interspeech.2019-2782
Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs. In: 2001 IEEE International conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 01CH37221), vol 2, pp 749–752. IEEE
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241. Springer
Stowell D, Plumbley MD (2013) An open dataset for research on audio field recording archives: freefield1010. arXiv:1309.5275
Sun Y, Xian Y, Wang W, Naqvi SM (2019) Monaural source separation in complex domain with long short-term memory neural network. IEEE J Selected Topics Signal Process 13(2):359–369
Article
Google Scholar
Taal CH, Hendriks RC, Heusdens R, Jensen J (2011) An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Trans Audio Speech Lang Process 19(7):2125–2136
Article
Google Scholar
Tan K, Chen J, Wang D (2019) Gated residual networks with dilated convolutions for monaural speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 27(1):189–198
Article
Google Scholar
Tan K, Wang D (2018) A convolutional recurrent neural network for real-time speech enhancement. In: Interspeech, pp 3229–3233
Varga A, Steeneken HJ (1993) Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun 12(3):247–251
Article
Google Scholar
Wang D (2017) Deep learning reinvents the hearing aid. IEEE Spectr 54(3):32–37
Article
Google Scholar
Wang D, Chen J (2018) Supervised speech separation based on deep learning: an overview. IEEE/ACM Transactions on audio, speech, and language processing
Wang Y, Narayanan A, Wang D (2014) On training targets for supervised speech separation. IEEE/ACM Trans Audio Speech Lang Process (TASLP) 22 (12):1849–1858
Article
Google Scholar
Wang Y, Wang D (2013) Towards scaling up classification-based speech separation. IEEE Trans Audio Speech Lang Process 21(7):1381–1390
Article
Google Scholar
Wang Y, Wang D (2014) A structure-preserving training target for supervised speech separation. In: 2014 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 6107–6111. IEEE
Wang ZQ, Roux JL, Wang D, Hershey JR (2018) End-to-end speech separation with unfolded iterative phase reconstruction, arXiv:1804.10204
Wang ZQ, Tan K, Wang D (2019) Deep learning based phase reconstruction for speaker separation: a trigonometric perspective. In: ICASSP 2019-2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 71–75. IEEE
Williamson DS, Wang D (2017) Time-frequency masking in the complex domain for speech dereverberation and denoising. IEEE/ACM Trans Audio Speech Lang Process 25(7):1492–1501
Article
Google Scholar
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions, arXiv:1511.07122
Yuan W (2020) A time–frequency smoothing neural network for speech enhancement. Speech Commun 124:75–84. https://doi.org/10.1016/j.specom.2020.09.002, https://www.sciencedirect.com/science/article/pii/S0167639320302703
Article
Google Scholar
Zhang R (2019) Making convolutional networks shift-invariant again, arXiv:1904.11486