Skip to main content
Log in

Audio signal reconstruction using phase retrieval: Implementation and evaluation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Phase retrieval has been theoretically proved to be an efficient method for signal reconstruction given only the magnitude spectrum of short time Fourier transform (STFT). Recently, this topic has regained increasing interest for its usefulness in several applications such as compressive sensing, speech synthesis, speech enhancement, source separation, etc. Therefore this paper presents an efficient algorithm for audio signal reconstruction using phase retrieval from the STFT magnitude spectrum, based on an explicit relationship between STFT magnitude and phase. First, the performance of the proposed algorithm is studied for different types of audio signals, i.e. monophonic (speech) and polyphonic (music), in order to tune its parameters. Then, a detailed comparison with the state-of-the-art phase retrieval algorithms is presented. Thus, two types of evaluation are carried out: (a) An objective evaluation is performed using the standard metrics in signal reconstruction, i.e. time-domain segmental signal-to-noise ratio (segSNR), time-frequency domain signal-to-error ratio (SER), and cepstrum-related distance measures, namely log-likelihood ratio (LLR), Itakura-Saito distorsion (IS) and cepstrum distance. Such an evaluation was performed first for the proposed algorithm alone, and then in comparison to state-of-the art methods; (b) a subjective evaluation is conducted with a series of listening tests commonly used in audio quality rating, namely Mean Opinion Score (MOS), Degradation Mean Opinion Score (DMOS) and preference tests. The results of both evaluation protocols confirm the improvement brought by the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Abdelmalek R, Mnasri Z, Benzarti F (2018) Determining the optimal conditions for signal reconstruction based on stft magnitude. Int J Speech Technol 21(3):619–632

    Article  Google Scholar 

  2. Abdelmalek R, Mnasri Z, Benzarti F (2018) Signal reconstruction based on the relationship between stft magnitude and phase spectra. In: International conference on the sciences of electronics, technologies of information and telecommunications, Springer, pp 24–36

  3. Alsteris LD, Paliwal KK (2007) Iterative reconstruction of speech from short-time fourier transform phase and magnitude spectra. Comput Speech Lang 21 (1):174–186

    Article  Google Scholar 

  4. Alsteris LD, Paliwal KK (2007) Short-time phase spectrum in speech processing: a review and some experimental results. Digital Signal Process 17(3):578–616

    Article  Google Scholar 

  5. Auger F, Chassande-Mottin É, Flandrin P (2012) On phase-magnitude relationships in the short-time fourier transform. IEEE Signal Process Lett 19(5):267–270

    Article  Google Scholar 

  6. Barnwell III TP, Clements M., Quackenbush S. (1988) Objective measures for speech quality testing

  7. Beauregard GT, Harish M, Wyse L (2015) Single pass spectrogram inversion. In: 2015 IEEE International conference on digital signal processing (DSP), IEEE, pp 427–431

  8. Beauregard GT, Zhu X, Wyse L (2005) An efficient algorithm for real-time spectrogram inversion. In: Proceedings of the 8th international conference on digital audio effects, pp 116–118

  9. Bendory T, Eldar YC, Boumal N (2017) Non-convex phase retrieval from stft measurements. IEEE Trans Inf Theory 64(1):467–484

    Article  MathSciNet  Google Scholar 

  10. Davies ME, Plumbley MD (2007) Context-dependent beat tracking of musical audio. IEEE Transactions on Audio, Speech, and Language Processing 15 (3):1009–1020

    Article  Google Scholar 

  11. De Leon PL, Hernaez I, Saratxaga I, Pucher M, Yamagishi J (2011) Detection of synthetic speech for the problem of imposture. In: 2011 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4844–4847

  12. Degottex G, Erro D (2014) A measure of phase randomness for the harmonic model in speech synthesis. In: Fifteenth annual conference of the international speech communication association

  13. Dimolitsas S, Corcoran FL, Ravishankar C (1995) Dependence of opinion scores on listening sets used in degradation category rating assessments. IEEE Trans Speech Audio Process 3(5):421– 424

    Article  Google Scholar 

  14. Emiya V, Vincent E, Harlander N, Hohmann V (2011) Subjective and objective quality assessment of audio source separation. IEEE Trans Aud Speech Lang Process 19(7):2046–2057

    Article  Google Scholar 

  15. Garofolo JS (1993) Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium, 1993

  16. Griffin D, Lim J (1984) Signal estimation from modified short-time fourier transform. IEEE Trans Acoust Speech Sign Process 32(2):236–243

    Article  Google Scholar 

  17. Guido RC (2017) Effectively interpreting discrete wavelet transformed signals [lecture notes]. IEEE Signal Proc Mag 34(3):89–100

    Article  MathSciNet  Google Scholar 

  18. Guido RC, Pedroso F, Furlan A, Contreras RC, Caobianco LG, Neto JS (2020) Cwt× dwt× dtwt× sdtwt: Clarifying terminologies and roles of different types of wavelet transforms. Int J Wavelets Multiresolution and Info Process 18(06):2030001

    Article  MathSciNet  Google Scholar 

  19. Gunawan D, Sen D (2010) Iterative phase estimation for the synthesis of separated sources from single-channel mixtures. IEEE Signal Process Lett 17(5):421–424

    Article  Google Scholar 

  20. Guo Y, Wang T, Li J, Wang A, Wang W (2019) Multiple input single output phase retrieval. Circ Syst Sign Process 38(8):3818–3840

    Article  Google Scholar 

  21. Hansen JH, Pellom BL (1998) An effective quality evaluation protocol for speech enhancement algorithms. In: Fifth international conference on spoken language processing

  22. Hayes M, Lim J, Oppenheim A (1980) Signal reconstruction from phase or magnitude. IEEE Trans Acoust Speech Sign Process 28(6):672–680

    Article  MathSciNet  Google Scholar 

  23. Holzapfel A, Stylianou Y (2008) Beat tracking using group delay based onset detection. In: ISMIR-International conference on music information retrieval, ISMIR, pp 653–658

  24. Hu Y, Loizou PC (2007) Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process 16(1):229–238

    Article  Google Scholar 

  25. Irino T, Kawahara H (1993) Signal reconstruction from modified auditory wavelet transform. IEEE Trans Sign Process 41(12):3549–3554

    Article  Google Scholar 

  26. ITU-T RP (1996) 861:” objective quality measurement of telephone-band (300-3400 hz) speech code

  27. Iwen M, Viswanathan A, Wang Y (2017) Robust sparse phase retrieval made easy. Appl Comput Harmon Anal 42(1):135–142

    Article  MathSciNet  Google Scholar 

  28. Laroche J, Dolson M (1997) Phase-vocoder: About this phasiness business. In: Proceedings of 1997 workshop on applications of signal processing to audio and acoustics, IEEE, pp 4–pp

  29. Le Roux J, Kameoka H, Ono N, Sagayama S (2010) Fast signal reconstruction from magnitude stft spectrogram based on spectrogram consistency. In: Proc Int Conf Digital audio effects, vol 10

  30. Loizou PC (2013) Speech enhancement: Theory and practice. CRC press

  31. Lopes D, White P (2000) Signal reconstruction from the magnitude or phase of a generalised wavelet transform. In: 2000 10Th european signal processing conference, IEEE, pp 1–4

  32. Magron P, Virtanen T (2020) Online spectrogram inversion for low-latency audio source separation. IEEE Sign Process Lett 27:306–310

    Article  Google Scholar 

  33. Malek RA, Mnasri Z, Benzarti F (2018) Optimal conditions for signal reconstruction based on stft magnitude spectrum. In: 2018 15Th international multi-conference on systems, signals & devices (SSD), IEEE, pp 1084–1090

  34. Mayer F, Mowlaee P (2015) Improved phase reconstruction in single-channel speech separation. In: Sixteenth annual conference of the international speech communication association

  35. Mayer F, Williamson DS, Mowlaee P, Wang D (2017) Impact of phase estimation on single-channel speech separation based on time-frequency masking. J Acoust Soc Am 141(6):4668–4679

    Article  Google Scholar 

  36. Moravec ML, Romberg JK, Baraniuk RG (2007) Compressive phase retrieval. In: Wavelets XII, vol 6701, International Society for Optics and Photonics, pp 670120

  37. Mowlaee P, Kulmer J (2015) Harmonic phase estimation in single-channel speech enhancement using phase decomposition and snr information. IEEE/ACM Trans Aud Speech Lang Process 23(9):1521–1532

    Article  Google Scholar 

  38. Mowlaee P, Saeidi R, Stylianou Y (2014) Phase importance in speech processing applications. In: Fifteenth annual conference of the international speech communication association

  39. Mowlaee P, Stahl J, Kulmer J (2017) Iterative joint map single-channel speech enhancement given non-uniform phase prior. Speech Comm 86:85–96

    Article  Google Scholar 

  40. Nakamura T, Kameoka H (2014) Fast signal reconstruction from magnitude spectrogram of continuous wavelet transform based on spectrogram consistency. In: DAFX, pp 129–135

  41. Ohlsson H, Yang A, Dong R, Sastry S (2012) Cprl–an extension of compressive sensing to the phase retrieval problem. In: Advances in neural information processing systems, pp 1367–1375

  42. Pirker G, Wohlmayr M, Petrik S, Pernkopf F (2011) A pitch tracking corpus with evaluation on multipitch tracking scenario. In: Twelfth annual conference of the international speech communication association

  43. Pobloth H, Kleijn WB (1999) On phase perception in speech. In: 1999 IEEE International conference on acoustics, speech, and signal processing. Proceedings. ICASSP99 (cat. no. 99CH36258), vol 1, IEEE, pp 29–32

  44. Portnoff M (1976) Implementation of the digital phase vocoder using the fast fourier transform. IEEE Trans Acoust Speech Sign Process 24(3):243–248

    Article  MathSciNet  Google Scholar 

  45. Portnoff M (1979) Magnitude-phase relationships for short-time fourier transforms based on gaussian analysis windows. In: ICASSP’79. IEEE International conference on acoustics, speech, and signal processing, vol 4, IEEE, pp 186–189

  46. Pruša Z (2017) The phase retrieval toolbox. In: AES International conference on semantic audio, Erlangen, Germany

  47. Pruša Z, Søndergaard PL (2016) Real-time spectrogram inversion using phase gradient heap integration. In: Proc Int Conf Digital audio effects (DAFx-16), pp 17–21

  48. Sanchez J, Saratxaga I, Hernaez I, Navas E, Erro D, Raitio T (2015) Toward a universal synthetic speech spoofing detection using phase information. IEEE Trans Info Forensics Secur 10(4):810–820

    Article  Google Scholar 

  49. Saratxaga I, Erro D, Hernáez I, Sainz I, Navas E (2009) Use of harmonic phase information for polarity detection in speech signals. In: Tenth annual conference of the international speech communication association

  50. Shimauchi S, Kudo S, Koizumi Y, Furuya K (2017) On relationships between amplitude and phase of short-time fourier transform. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 676–680

  51. Smaragdis P, Raj B, Shashanka M (2011) Missing data imputation for time-frequency representations of audio signals. J Signal Process Syst 65 (3):361–370

    Article  Google Scholar 

  52. Takaki S, Kameoka H, Yamagishi J (2017) Direct modeling of frequency spectra and waveform generation based on phase recovery for dnn-based speech synthesis. In: INTERSPEECH, pp 1128–1132

  53. Tech E (2008) 3253: Sound quality assessment material recordings for subjective tests. EBU Geneva

  54. Thorpe L, Shelton B (1993) Subjective test methodology: Mos vs. dmos in evaluation of speech coding algorithms. In: Proceedings., IEEE workshop on speech coding for telecommunications, IEEE, pp 73–74

  55. Van Hove P, Hayes M, Lim J, Oppenheim A (1983) Signal reconstruction from signed fourier transform magnitude. Trans Acous Speech Sign Process 31(5):1286–1293

    Article  Google Scholar 

  56. Virtanen T (2007) Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. Trans Aud Speech Lang Process 15(3):1066–1074

    Article  Google Scholar 

  57. Voiers WD (1976) Methods of predicting user acceptance of voice communication systems. Tech. rep. DYNASTAT INC AUSTIN TX

  58. Waldspurger I (2017) Phase retrieval for wavelet transforms. IEEE Trans Inf Theory 63(5):2993–3009

    MathSciNet  MATH  Google Scholar 

  59. Wang D, Lim J (1982) The unimportance of phase in speech enhancement. Trans Acous Speech Sign Process 30(4):679–681

    Article  Google Scholar 

  60. Yang W (1999) Enhanced modified bark spectral distortion (EMBSD): An objective speech quality measure based on audible distortion and cognitive model temple university

  61. Yegnanarayana B, Saikia D, Krishnan T (1984) Significance of group delay functions in signal reconstruction from spectral magnitude or phase. Trans Acous Speech Sign Process 32(3):610–623

    Article  Google Scholar 

  62. Zhu X, Beauregard GT, Wyse L (2006) Real-time iterative spectrum inversion with look-ahead. In: 2006 IEEE International conference on multimedia and expo, IEEE, pp 229–232

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zied Mnasri.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A full description of this work with Matlab code is available here (or at https://github.com/zied-mnasri/phase-retrieval)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdelmalek, R., Mnasri, Z. & Benzarti, F. Audio signal reconstruction using phase retrieval: Implementation and evaluation. Multimed Tools Appl 81, 15919–15946 (2022). https://doi.org/10.1007/s11042-022-12421-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12421-1

Keywords

Navigation