Abstract
Phase retrieval has been theoretically proved to be an efficient method for signal reconstruction given only the magnitude spectrum of short time Fourier transform (STFT). Recently, this topic has regained increasing interest for its usefulness in several applications such as compressive sensing, speech synthesis, speech enhancement, source separation, etc. Therefore this paper presents an efficient algorithm for audio signal reconstruction using phase retrieval from the STFT magnitude spectrum, based on an explicit relationship between STFT magnitude and phase. First, the performance of the proposed algorithm is studied for different types of audio signals, i.e. monophonic (speech) and polyphonic (music), in order to tune its parameters. Then, a detailed comparison with the state-of-the-art phase retrieval algorithms is presented. Thus, two types of evaluation are carried out: (a) An objective evaluation is performed using the standard metrics in signal reconstruction, i.e. time-domain segmental signal-to-noise ratio (segSNR), time-frequency domain signal-to-error ratio (SER), and cepstrum-related distance measures, namely log-likelihood ratio (LLR), Itakura-Saito distorsion (IS) and cepstrum distance. Such an evaluation was performed first for the proposed algorithm alone, and then in comparison to state-of-the art methods; (b) a subjective evaluation is conducted with a series of listening tests commonly used in audio quality rating, namely Mean Opinion Score (MOS), Degradation Mean Opinion Score (DMOS) and preference tests. The results of both evaluation protocols confirm the improvement brought by the proposed approach.
Similar content being viewed by others
References
Abdelmalek R, Mnasri Z, Benzarti F (2018) Determining the optimal conditions for signal reconstruction based on stft magnitude. Int J Speech Technol 21(3):619–632
Abdelmalek R, Mnasri Z, Benzarti F (2018) Signal reconstruction based on the relationship between stft magnitude and phase spectra. In: International conference on the sciences of electronics, technologies of information and telecommunications, Springer, pp 24–36
Alsteris LD, Paliwal KK (2007) Iterative reconstruction of speech from short-time fourier transform phase and magnitude spectra. Comput Speech Lang 21 (1):174–186
Alsteris LD, Paliwal KK (2007) Short-time phase spectrum in speech processing: a review and some experimental results. Digital Signal Process 17(3):578–616
Auger F, Chassande-Mottin É, Flandrin P (2012) On phase-magnitude relationships in the short-time fourier transform. IEEE Signal Process Lett 19(5):267–270
Barnwell III TP, Clements M., Quackenbush S. (1988) Objective measures for speech quality testing
Beauregard GT, Harish M, Wyse L (2015) Single pass spectrogram inversion. In: 2015 IEEE International conference on digital signal processing (DSP), IEEE, pp 427–431
Beauregard GT, Zhu X, Wyse L (2005) An efficient algorithm for real-time spectrogram inversion. In: Proceedings of the 8th international conference on digital audio effects, pp 116–118
Bendory T, Eldar YC, Boumal N (2017) Non-convex phase retrieval from stft measurements. IEEE Trans Inf Theory 64(1):467–484
Davies ME, Plumbley MD (2007) Context-dependent beat tracking of musical audio. IEEE Transactions on Audio, Speech, and Language Processing 15 (3):1009–1020
De Leon PL, Hernaez I, Saratxaga I, Pucher M, Yamagishi J (2011) Detection of synthetic speech for the problem of imposture. In: 2011 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4844–4847
Degottex G, Erro D (2014) A measure of phase randomness for the harmonic model in speech synthesis. In: Fifteenth annual conference of the international speech communication association
Dimolitsas S, Corcoran FL, Ravishankar C (1995) Dependence of opinion scores on listening sets used in degradation category rating assessments. IEEE Trans Speech Audio Process 3(5):421– 424
Emiya V, Vincent E, Harlander N, Hohmann V (2011) Subjective and objective quality assessment of audio source separation. IEEE Trans Aud Speech Lang Process 19(7):2046–2057
Garofolo JS (1993) Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium, 1993
Griffin D, Lim J (1984) Signal estimation from modified short-time fourier transform. IEEE Trans Acoust Speech Sign Process 32(2):236–243
Guido RC (2017) Effectively interpreting discrete wavelet transformed signals [lecture notes]. IEEE Signal Proc Mag 34(3):89–100
Guido RC, Pedroso F, Furlan A, Contreras RC, Caobianco LG, Neto JS (2020) Cwt× dwt× dtwt× sdtwt: Clarifying terminologies and roles of different types of wavelet transforms. Int J Wavelets Multiresolution and Info Process 18(06):2030001
Gunawan D, Sen D (2010) Iterative phase estimation for the synthesis of separated sources from single-channel mixtures. IEEE Signal Process Lett 17(5):421–424
Guo Y, Wang T, Li J, Wang A, Wang W (2019) Multiple input single output phase retrieval. Circ Syst Sign Process 38(8):3818–3840
Hansen JH, Pellom BL (1998) An effective quality evaluation protocol for speech enhancement algorithms. In: Fifth international conference on spoken language processing
Hayes M, Lim J, Oppenheim A (1980) Signal reconstruction from phase or magnitude. IEEE Trans Acoust Speech Sign Process 28(6):672–680
Holzapfel A, Stylianou Y (2008) Beat tracking using group delay based onset detection. In: ISMIR-International conference on music information retrieval, ISMIR, pp 653–658
Hu Y, Loizou PC (2007) Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process 16(1):229–238
Irino T, Kawahara H (1993) Signal reconstruction from modified auditory wavelet transform. IEEE Trans Sign Process 41(12):3549–3554
ITU-T RP (1996) 861:” objective quality measurement of telephone-band (300-3400 hz) speech code
Iwen M, Viswanathan A, Wang Y (2017) Robust sparse phase retrieval made easy. Appl Comput Harmon Anal 42(1):135–142
Laroche J, Dolson M (1997) Phase-vocoder: About this phasiness business. In: Proceedings of 1997 workshop on applications of signal processing to audio and acoustics, IEEE, pp 4–pp
Le Roux J, Kameoka H, Ono N, Sagayama S (2010) Fast signal reconstruction from magnitude stft spectrogram based on spectrogram consistency. In: Proc Int Conf Digital audio effects, vol 10
Loizou PC (2013) Speech enhancement: Theory and practice. CRC press
Lopes D, White P (2000) Signal reconstruction from the magnitude or phase of a generalised wavelet transform. In: 2000 10Th european signal processing conference, IEEE, pp 1–4
Magron P, Virtanen T (2020) Online spectrogram inversion for low-latency audio source separation. IEEE Sign Process Lett 27:306–310
Malek RA, Mnasri Z, Benzarti F (2018) Optimal conditions for signal reconstruction based on stft magnitude spectrum. In: 2018 15Th international multi-conference on systems, signals & devices (SSD), IEEE, pp 1084–1090
Mayer F, Mowlaee P (2015) Improved phase reconstruction in single-channel speech separation. In: Sixteenth annual conference of the international speech communication association
Mayer F, Williamson DS, Mowlaee P, Wang D (2017) Impact of phase estimation on single-channel speech separation based on time-frequency masking. J Acoust Soc Am 141(6):4668–4679
Moravec ML, Romberg JK, Baraniuk RG (2007) Compressive phase retrieval. In: Wavelets XII, vol 6701, International Society for Optics and Photonics, pp 670120
Mowlaee P, Kulmer J (2015) Harmonic phase estimation in single-channel speech enhancement using phase decomposition and snr information. IEEE/ACM Trans Aud Speech Lang Process 23(9):1521–1532
Mowlaee P, Saeidi R, Stylianou Y (2014) Phase importance in speech processing applications. In: Fifteenth annual conference of the international speech communication association
Mowlaee P, Stahl J, Kulmer J (2017) Iterative joint map single-channel speech enhancement given non-uniform phase prior. Speech Comm 86:85–96
Nakamura T, Kameoka H (2014) Fast signal reconstruction from magnitude spectrogram of continuous wavelet transform based on spectrogram consistency. In: DAFX, pp 129–135
Ohlsson H, Yang A, Dong R, Sastry S (2012) Cprl–an extension of compressive sensing to the phase retrieval problem. In: Advances in neural information processing systems, pp 1367–1375
Pirker G, Wohlmayr M, Petrik S, Pernkopf F (2011) A pitch tracking corpus with evaluation on multipitch tracking scenario. In: Twelfth annual conference of the international speech communication association
Pobloth H, Kleijn WB (1999) On phase perception in speech. In: 1999 IEEE International conference on acoustics, speech, and signal processing. Proceedings. ICASSP99 (cat. no. 99CH36258), vol 1, IEEE, pp 29–32
Portnoff M (1976) Implementation of the digital phase vocoder using the fast fourier transform. IEEE Trans Acoust Speech Sign Process 24(3):243–248
Portnoff M (1979) Magnitude-phase relationships for short-time fourier transforms based on gaussian analysis windows. In: ICASSP’79. IEEE International conference on acoustics, speech, and signal processing, vol 4, IEEE, pp 186–189
Pruša Z (2017) The phase retrieval toolbox. In: AES International conference on semantic audio, Erlangen, Germany
Pruša Z, Søndergaard PL (2016) Real-time spectrogram inversion using phase gradient heap integration. In: Proc Int Conf Digital audio effects (DAFx-16), pp 17–21
Sanchez J, Saratxaga I, Hernaez I, Navas E, Erro D, Raitio T (2015) Toward a universal synthetic speech spoofing detection using phase information. IEEE Trans Info Forensics Secur 10(4):810–820
Saratxaga I, Erro D, Hernáez I, Sainz I, Navas E (2009) Use of harmonic phase information for polarity detection in speech signals. In: Tenth annual conference of the international speech communication association
Shimauchi S, Kudo S, Koizumi Y, Furuya K (2017) On relationships between amplitude and phase of short-time fourier transform. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 676–680
Smaragdis P, Raj B, Shashanka M (2011) Missing data imputation for time-frequency representations of audio signals. J Signal Process Syst 65 (3):361–370
Takaki S, Kameoka H, Yamagishi J (2017) Direct modeling of frequency spectra and waveform generation based on phase recovery for dnn-based speech synthesis. In: INTERSPEECH, pp 1128–1132
Tech E (2008) 3253: Sound quality assessment material recordings for subjective tests. EBU Geneva
Thorpe L, Shelton B (1993) Subjective test methodology: Mos vs. dmos in evaluation of speech coding algorithms. In: Proceedings., IEEE workshop on speech coding for telecommunications, IEEE, pp 73–74
Van Hove P, Hayes M, Lim J, Oppenheim A (1983) Signal reconstruction from signed fourier transform magnitude. Trans Acous Speech Sign Process 31(5):1286–1293
Virtanen T (2007) Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. Trans Aud Speech Lang Process 15(3):1066–1074
Voiers WD (1976) Methods of predicting user acceptance of voice communication systems. Tech. rep. DYNASTAT INC AUSTIN TX
Waldspurger I (2017) Phase retrieval for wavelet transforms. IEEE Trans Inf Theory 63(5):2993–3009
Wang D, Lim J (1982) The unimportance of phase in speech enhancement. Trans Acous Speech Sign Process 30(4):679–681
Yang W (1999) Enhanced modified bark spectral distortion (EMBSD): An objective speech quality measure based on audible distortion and cognitive model temple university
Yegnanarayana B, Saikia D, Krishnan T (1984) Significance of group delay functions in signal reconstruction from spectral magnitude or phase. Trans Acous Speech Sign Process 32(3):610–623
Zhu X, Beauregard GT, Wyse L (2006) Real-time iterative spectrum inversion with look-ahead. In: 2006 IEEE International conference on multimedia and expo, IEEE, pp 229–232
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A full description of this work with Matlab code is available here (or at https://github.com/zied-mnasri/phase-retrieval)
Rights and permissions
About this article
Cite this article
Abdelmalek, R., Mnasri, Z. & Benzarti, F. Audio signal reconstruction using phase retrieval: Implementation and evaluation. Multimed Tools Appl 81, 15919–15946 (2022). https://doi.org/10.1007/s11042-022-12421-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12421-1