Audio signal reconstruction using phase retrieval: Implementation and evaluation

Abdelmalek, Raja; Mnasri, Zied; Benzarti, Faouzi

doi:10.1007/s11042-022-12421-1

Audio signal reconstruction using phase retrieval: Implementation and evaluation

Published: 02 March 2022

Volume 81, pages 15919–15946, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

416 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Phase retrieval has been theoretically proved to be an efficient method for signal reconstruction given only the magnitude spectrum of short time Fourier transform (STFT). Recently, this topic has regained increasing interest for its usefulness in several applications such as compressive sensing, speech synthesis, speech enhancement, source separation, etc. Therefore this paper presents an efficient algorithm for audio signal reconstruction using phase retrieval from the STFT magnitude spectrum, based on an explicit relationship between STFT magnitude and phase. First, the performance of the proposed algorithm is studied for different types of audio signals, i.e. monophonic (speech) and polyphonic (music), in order to tune its parameters. Then, a detailed comparison with the state-of-the-art phase retrieval algorithms is presented. Thus, two types of evaluation are carried out: (a) An objective evaluation is performed using the standard metrics in signal reconstruction, i.e. time-domain segmental signal-to-noise ratio (segSNR), time-frequency domain signal-to-error ratio (SER), and cepstrum-related distance measures, namely log-likelihood ratio (LLR), Itakura-Saito distorsion (IS) and cepstrum distance. Such an evaluation was performed first for the proposed algorithm alone, and then in comparison to state-of-the art methods; (b) a subjective evaluation is conducted with a series of listening tests commonly used in audio quality rating, namely Mean Opinion Score (MOS), Degradation Mean Opinion Score (DMOS) and preference tests. The results of both evaluation protocols confirm the improvement brought by the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Signal Reconstruction Based on the Relationship Between STFT Magnitude and Phase Spectra

Determining the optimal conditions for signal reconstruction based on STFT magnitude

Article 06 June 2018

Bayesian estimation for speech enhancement given a priori knowledge of clean speech phase

Article 10 September 2015

References

Abdelmalek R, Mnasri Z, Benzarti F (2018) Determining the optimal conditions for signal reconstruction based on stft magnitude. Int J Speech Technol 21(3):619–632
Article Google Scholar
Abdelmalek R, Mnasri Z, Benzarti F (2018) Signal reconstruction based on the relationship between stft magnitude and phase spectra. In: International conference on the sciences of electronics, technologies of information and telecommunications, Springer, pp 24–36
Alsteris LD, Paliwal KK (2007) Iterative reconstruction of speech from short-time fourier transform phase and magnitude spectra. Comput Speech Lang 21 (1):174–186
Article Google Scholar
Alsteris LD, Paliwal KK (2007) Short-time phase spectrum in speech processing: a review and some experimental results. Digital Signal Process 17(3):578–616
Article Google Scholar
Auger F, Chassande-Mottin É, Flandrin P (2012) On phase-magnitude relationships in the short-time fourier transform. IEEE Signal Process Lett 19(5):267–270
Article Google Scholar
Barnwell III TP, Clements M., Quackenbush S. (1988) Objective measures for speech quality testing
Beauregard GT, Harish M, Wyse L (2015) Single pass spectrogram inversion. In: 2015 IEEE International conference on digital signal processing (DSP), IEEE, pp 427–431
Beauregard GT, Zhu X, Wyse L (2005) An efficient algorithm for real-time spectrogram inversion. In: Proceedings of the 8th international conference on digital audio effects, pp 116–118
Bendory T, Eldar YC, Boumal N (2017) Non-convex phase retrieval from stft measurements. IEEE Trans Inf Theory 64(1):467–484
Article MathSciNet Google Scholar
Davies ME, Plumbley MD (2007) Context-dependent beat tracking of musical audio. IEEE Transactions on Audio, Speech, and Language Processing 15 (3):1009–1020
Article Google Scholar
De Leon PL, Hernaez I, Saratxaga I, Pucher M, Yamagishi J (2011) Detection of synthetic speech for the problem of imposture. In: 2011 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4844–4847
Degottex G, Erro D (2014) A measure of phase randomness for the harmonic model in speech synthesis. In: Fifteenth annual conference of the international speech communication association
Dimolitsas S, Corcoran FL, Ravishankar C (1995) Dependence of opinion scores on listening sets used in degradation category rating assessments. IEEE Trans Speech Audio Process 3(5):421– 424
Article Google Scholar
Emiya V, Vincent E, Harlander N, Hohmann V (2011) Subjective and objective quality assessment of audio source separation. IEEE Trans Aud Speech Lang Process 19(7):2046–2057
Article Google Scholar
Garofolo JS (1993) Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium, 1993
Griffin D, Lim J (1984) Signal estimation from modified short-time fourier transform. IEEE Trans Acoust Speech Sign Process 32(2):236–243
Article Google Scholar
Guido RC (2017) Effectively interpreting discrete wavelet transformed signals [lecture notes]. IEEE Signal Proc Mag 34(3):89–100
Article MathSciNet Google Scholar
Guido RC, Pedroso F, Furlan A, Contreras RC, Caobianco LG, Neto JS (2020) Cwt× dwt× dtwt× sdtwt: Clarifying terminologies and roles of different types of wavelet transforms. Int J Wavelets Multiresolution and Info Process 18(06):2030001
Article MathSciNet Google Scholar
Gunawan D, Sen D (2010) Iterative phase estimation for the synthesis of separated sources from single-channel mixtures. IEEE Signal Process Lett 17(5):421–424
Article Google Scholar
Guo Y, Wang T, Li J, Wang A, Wang W (2019) Multiple input single output phase retrieval. Circ Syst Sign Process 38(8):3818–3840
Article Google Scholar
Hansen JH, Pellom BL (1998) An effective quality evaluation protocol for speech enhancement algorithms. In: Fifth international conference on spoken language processing
Hayes M, Lim J, Oppenheim A (1980) Signal reconstruction from phase or magnitude. IEEE Trans Acoust Speech Sign Process 28(6):672–680
Article MathSciNet Google Scholar
Holzapfel A, Stylianou Y (2008) Beat tracking using group delay based onset detection. In: ISMIR-International conference on music information retrieval, ISMIR, pp 653–658
Hu Y, Loizou PC (2007) Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process 16(1):229–238
Article Google Scholar
Irino T, Kawahara H (1993) Signal reconstruction from modified auditory wavelet transform. IEEE Trans Sign Process 41(12):3549–3554
Article Google Scholar
ITU-T RP (1996) 861:” objective quality measurement of telephone-band (300-3400 hz) speech code
Iwen M, Viswanathan A, Wang Y (2017) Robust sparse phase retrieval made easy. Appl Comput Harmon Anal 42(1):135–142
Article MathSciNet Google Scholar
Laroche J, Dolson M (1997) Phase-vocoder: About this phasiness business. In: Proceedings of 1997 workshop on applications of signal processing to audio and acoustics, IEEE, pp 4–pp
Le Roux J, Kameoka H, Ono N, Sagayama S (2010) Fast signal reconstruction from magnitude stft spectrogram based on spectrogram consistency. In: Proc Int Conf Digital audio effects, vol 10
Loizou PC (2013) Speech enhancement: Theory and practice. CRC press
Lopes D, White P (2000) Signal reconstruction from the magnitude or phase of a generalised wavelet transform. In: 2000 10Th european signal processing conference, IEEE, pp 1–4
Magron P, Virtanen T (2020) Online spectrogram inversion for low-latency audio source separation. IEEE Sign Process Lett 27:306–310
Article Google Scholar
Malek RA, Mnasri Z, Benzarti F (2018) Optimal conditions for signal reconstruction based on stft magnitude spectrum. In: 2018 15Th international multi-conference on systems, signals & devices (SSD), IEEE, pp 1084–1090
Mayer F, Mowlaee P (2015) Improved phase reconstruction in single-channel speech separation. In: Sixteenth annual conference of the international speech communication association
Mayer F, Williamson DS, Mowlaee P, Wang D (2017) Impact of phase estimation on single-channel speech separation based on time-frequency masking. J Acoust Soc Am 141(6):4668–4679
Article Google Scholar
Moravec ML, Romberg JK, Baraniuk RG (2007) Compressive phase retrieval. In: Wavelets XII, vol 6701, International Society for Optics and Photonics, pp 670120
Mowlaee P, Kulmer J (2015) Harmonic phase estimation in single-channel speech enhancement using phase decomposition and snr information. IEEE/ACM Trans Aud Speech Lang Process 23(9):1521–1532
Article Google Scholar
Mowlaee P, Saeidi R, Stylianou Y (2014) Phase importance in speech processing applications. In: Fifteenth annual conference of the international speech communication association
Mowlaee P, Stahl J, Kulmer J (2017) Iterative joint map single-channel speech enhancement given non-uniform phase prior. Speech Comm 86:85–96
Article Google Scholar
Nakamura T, Kameoka H (2014) Fast signal reconstruction from magnitude spectrogram of continuous wavelet transform based on spectrogram consistency. In: DAFX, pp 129–135
Ohlsson H, Yang A, Dong R, Sastry S (2012) Cprl–an extension of compressive sensing to the phase retrieval problem. In: Advances in neural information processing systems, pp 1367–1375
Pirker G, Wohlmayr M, Petrik S, Pernkopf F (2011) A pitch tracking corpus with evaluation on multipitch tracking scenario. In: Twelfth annual conference of the international speech communication association
Pobloth H, Kleijn WB (1999) On phase perception in speech. In: 1999 IEEE International conference on acoustics, speech, and signal processing. Proceedings. ICASSP99 (cat. no. 99CH36258), vol 1, IEEE, pp 29–32
Portnoff M (1976) Implementation of the digital phase vocoder using the fast fourier transform. IEEE Trans Acoust Speech Sign Process 24(3):243–248
Article MathSciNet Google Scholar
Portnoff M (1979) Magnitude-phase relationships for short-time fourier transforms based on gaussian analysis windows. In: ICASSP’79. IEEE International conference on acoustics, speech, and signal processing, vol 4, IEEE, pp 186–189
Pruša Z (2017) The phase retrieval toolbox. In: AES International conference on semantic audio, Erlangen, Germany
Pruša Z, Søndergaard PL (2016) Real-time spectrogram inversion using phase gradient heap integration. In: Proc Int Conf Digital audio effects (DAFx-16), pp 17–21
Sanchez J, Saratxaga I, Hernaez I, Navas E, Erro D, Raitio T (2015) Toward a universal synthetic speech spoofing detection using phase information. IEEE Trans Info Forensics Secur 10(4):810–820
Article Google Scholar
Saratxaga I, Erro D, Hernáez I, Sainz I, Navas E (2009) Use of harmonic phase information for polarity detection in speech signals. In: Tenth annual conference of the international speech communication association
Shimauchi S, Kudo S, Koizumi Y, Furuya K (2017) On relationships between amplitude and phase of short-time fourier transform. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 676–680
Smaragdis P, Raj B, Shashanka M (2011) Missing data imputation for time-frequency representations of audio signals. J Signal Process Syst 65 (3):361–370
Article Google Scholar
Takaki S, Kameoka H, Yamagishi J (2017) Direct modeling of frequency spectra and waveform generation based on phase recovery for dnn-based speech synthesis. In: INTERSPEECH, pp 1128–1132
Tech E (2008) 3253: Sound quality assessment material recordings for subjective tests. EBU Geneva
Thorpe L, Shelton B (1993) Subjective test methodology: Mos vs. dmos in evaluation of speech coding algorithms. In: Proceedings., IEEE workshop on speech coding for telecommunications, IEEE, pp 73–74
Van Hove P, Hayes M, Lim J, Oppenheim A (1983) Signal reconstruction from signed fourier transform magnitude. Trans Acous Speech Sign Process 31(5):1286–1293
Article Google Scholar
Virtanen T (2007) Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. Trans Aud Speech Lang Process 15(3):1066–1074
Article Google Scholar
Voiers WD (1976) Methods of predicting user acceptance of voice communication systems. Tech. rep. DYNASTAT INC AUSTIN TX
Waldspurger I (2017) Phase retrieval for wavelet transforms. IEEE Trans Inf Theory 63(5):2993–3009
MathSciNet MATH Google Scholar
Wang D, Lim J (1982) The unimportance of phase in speech enhancement. Trans Acous Speech Sign Process 30(4):679–681
Article Google Scholar
Yang W (1999) Enhanced modified bark spectral distortion (EMBSD): An objective speech quality measure based on audible distortion and cognitive model temple university
Yegnanarayana B, Saikia D, Krishnan T (1984) Significance of group delay functions in signal reconstruction from spectral magnitude or phase. Trans Acous Speech Sign Process 32(3):610–623
Article Google Scholar
Zhu X, Beauregard GT, Wyse L (2006) Real-time iterative spectrum inversion with look-ahead. In: 2006 IEEE International conference on multimedia and expo, IEEE, pp 229–232

Download references

Author information

Authors and Affiliations

SITI Laboratory, ENIT, University of Tunis El Manar, Tunis, Tunisia
Raja Abdelmalek & Faouzi Benzarti
Electrical Engineering Department, ENIT, University of Tunis El Manar, Tunis, Tunisia
Zied Mnasri
DIBRIS, University of Genoa, Genova, Italy
Zied Mnasri

Authors

Raja Abdelmalek
View author publications
You can also search for this author in PubMed Google Scholar
Zied Mnasri
View author publications
You can also search for this author in PubMed Google Scholar
Faouzi Benzarti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zied Mnasri.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A full description of this work with Matlab code is available here (or at https://github.com/zied-mnasri/phase-retrieval)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abdelmalek, R., Mnasri, Z. & Benzarti, F. Audio signal reconstruction using phase retrieval: Implementation and evaluation. Multimed Tools Appl 81, 15919–15946 (2022). https://doi.org/10.1007/s11042-022-12421-1

Download citation

Received: 25 January 2021
Revised: 25 June 2021
Accepted: 25 January 2022
Published: 02 March 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11042-022-12421-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Audio signal reconstruction using phase retrieval: Implementation and evaluation

Abstract

Access this article

Similar content being viewed by others

Signal Reconstruction Based on the Relationship Between STFT Magnitude and Phase Spectra

Determining the optimal conditions for signal reconstruction based on STFT magnitude

Bayesian estimation for speech enhancement given a priori knowledge of clean speech phase

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Audio signal reconstruction using phase retrieval: Implementation and evaluation

Abstract

Access this article

Similar content being viewed by others

Signal Reconstruction Based on the Relationship Between STFT Magnitude and Phase Spectra

Determining the optimal conditions for signal reconstruction based on STFT magnitude

Bayesian estimation for speech enhancement given a priori knowledge of clean speech phase

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation