Skip to main content
Log in

A Novel Pitch Detection Algorithm Based on Instantaneous Frequency for Clean and Noisy Speech

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

In this paper, a novel pitch detection algorithm (PDA) is proposed. Actually, pitch detection is a classical problem that has been investigated since the very beginning of speech processing. However, the novelty of the proposed method consists in establishing an empirical relationship between fundamental frequency (\(f_{0}\)) and instantaneous frequency (\(f_{i}\)), which serves as a basis to develop the proposed PDA. Even though \(f_{0}\) and \(f_{i}\) are defined as attributes of two different transforms, i.e., the Fourier transform and the Hilbert transform, respectively, the relationship proposed in this paper shows some interaction between both of them, at least empirically. The first step of this work consists in validating the proposed relationship on a large set of speech signals. Then, it is leveraged to develop an algorithm capable to (a) detect voiced/unvoiced parts of speech and (b) extract \(f_{0}\) contour from \(f_{i}\) values in the voiced parts. For evaluation purposes, the yielding \(f_{0}\) contour is compared to some well-rated state-of-the-art PDA’s. The main findings show that the quality of pitch detection obtained by the proposed technique is as satisfactory as some of top PDA’s, either in clean or in simulated noisy speech. In addition, one of the main advantages consists in bypassing the traditional short-time analysis required to assume local stationarity in speech signal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The datasets analyzed during the current study are available in the PTDB-TUG repository [29]. These datasets were derived from the following public domain resources: https://www2.spsc.tugraz.at/databases/PTDB-TUG/

Notes

  1. MATLAB code is available at [25].

References

  1. T. Abe, T. Kobayashi, S. Imai, Harmonics tracking and pitch extraction based on instantaneous frequency, in 1995 International Conference on Acoustics, Speech, and Signal Processing, IEEE, vol. 1, pp. 756–759 (1995)

  2. T. Abe, T. Kobayashi, S . Imai, Robust pitch estimation with harmonics enhancement in noisy environments based on instantaneous frequency, in Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96. IEEE, vol. 2, pp. 1277–1280 (1996)

  3. Y. Agiomyrgiannaki, Yang: Yet-another-generalized vocoder. https://github.com/google/yang_vocoder/, last accessed: 31-05-2022 (2017)

  4. E. Azarov, M. Vashkevich, A. Petrovsky, Instantaneous pitch estimation based on rapt framework, in 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO). IEEE, pp. 2787–2791 (2012)

  5. H. Ba, N. Yang, I. Demirkol, W. Heinzelman, Bana: a hybrid approach for noise resilient pitch detection, in 2012 IEEE Statistical Signal Processing Workshop (SSP). IEEE, pp. 369–372 (2012)

  6. B. Boashash, Estimating and interpreting the instantaneous frequency of a signal. II. Algorithms and applications. Proc. IEEE 80(4), 540–568 (1992)

    Article  Google Scholar 

  7. P. Boersma, D. Weenink, Praat: doing phonetics by computer. https://www.fon.hum.uva.nl/praat/, last accessed: 31-05-2022 (2006)

  8. A. Camacho, J.G. Harris, A sawtooth waveform inspired pitch estimator for speech and music. J. Acoust. Soc. Am. 124(3), 1638–1652 (2008)

    Article  Google Scholar 

  9. W. Chu, A. Alwan, Reducing f0 frame error of f0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend, in 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, pp. 3969–3972 (2009)

  10. A. De Cheveigné, H. Kawahara, Yin, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111(4), 1917–1930 (2002)

    Article  Google Scholar 

  11. A. De Cheveigné, H. Kawahara, Yin algorithm. https://labrosa.ee.columbia.edu/doc/yin.html, last accessed: 31-05-2022 (2002)

  12. T. Drugman, A . Alwan, Joint robust voicing detection and pitch estimation based on residual harmonics, in Proceedings of the Interspeech 2011, Florence, Italy. IEEE, pp. 1973–1976 (2011)

  13. D. Gabor, Theory of communication. Part 1. The analysis of information. J. Inst. Electr. Eng. Part III Radio Commun. Eng. 93(26), 429–441 (1946)

    Google Scholar 

  14. S. Gonzalez, M. Brookes, Pefac: a pitch estimation algorithm robust to high levels of noise. IEEE/ACM Trans. Audio Speech Lang. Process. 22(2), 518–530 (2014)

    Article  Google Scholar 

  15. S.W. Group et al., Speech signal processing toolkit (sptk) version 3.3, https://sourceforge.net/projects/sp-tk//, last accessed: 31-05-2022 (2009)

  16. D.J. Hermes, Measurement of pitch by subharmonic summation. J. Acoust. Soc. Am. 83(1), 257–264 (1988)

    Article  Google Scholar 

  17. W. Hess, Manual and instrumental pitch determination, voicing determination, in Pitch Determination of Speech Signals. Springer, pp. 92–151 (1983)

  18. H. Huang, J. Pan, Speech pitch determination based on Hilbert–Huang transform. Signal Process. 86(4), 792–803 (2006)

    Article  Google Scholar 

  19. N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 454(1971), 903–995 (1998)

    Article  MathSciNet  Google Scholar 

  20. D. Jouvet, Y. Laprie, Performance analysis of several pitch detection algorithms on simulated and real noisy speech data, in 2017 25th European Signal Processing Conference (EUSIPCO). IEEE, pp. 1614–1618 (2017)

  21. S. Kadambe, G.F. Boudreaux-Bartels, Application of the wavelet transform for pitch detection of speech signals. IEEE Trans. Inf. Theory 38(2), 917–924 (1992)

    Article  Google Scholar 

  22. H. Kawahara, Y. Agiomyrgiannakis, H. Zen, Using instantaneous frequency and aperiodicity detection to estimate f0 for high-quality speech synthesis, in 9th ISCA Speech Synthesis Workshop (SSW9), ISCA, pp. 221–228 (2016)

  23. A. Kissling, R. Kompe, N. Niemann, A. Batliner, Dp-based determination of f0 contours from speech signals, in Acoustics, Speech, and Signal Processing, 1992. Proceedings. (ICASSP’92), IEEE, vol. 1, pp. 1–4 (1992)

  24. E. Liflyand, Interaction between the Fourier transform and the Hilbert transform. Acta et Commentationes Universitatis Tartuensis de Mathematica 18(1), 19–32 (2014)

    Article  MathSciNet  Google Scholar 

  25. Z. Mnasri, Proposed algorithm, https://github.com/zied-mnasri/f0_IF_model, last accessed: 31-05-2022 (2021)

  26. Z. Mnasri, H. Amiri, On the relationship between instantaneous frequency and pitch in speech signals, in Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2018. ed. by A. Berton, U. Haiber, W. Minker (TUDpress, Dresden, 2018), pp. 23–29

    Google Scholar 

  27. Z. Mnasri, S. Rovetta, F. Masulli, A novel pitch detection algorithm based on instantaneous frequency, in 2021 29th European Signal Processing Conference (EUSIPCO), IEEE, pp. 16–20 (2021). http://doi.org/10.23919/EUSIPCO54536.2021.9616047

  28. A.M. Noll, Cepstrum pitch determination. J. Acoust. Soc. Am. 41(2), 293–309 (1967)

    Article  Google Scholar 

  29. G. Pirker, M. Wohlmayr, S. Petrik, F. Pernkopf, A pitch tracking corpus with evaluation on multipitch tracking scenario, in Twelfth Annual Conference of the International Speech Communication Association. http://doi.org/10.21437/Interspeech.2011 (2011)

  30. B. Van der Pol, The fundamental principles of frequency modulation. J. Inst. Electr. Eng. Part III Radio Commun. Eng. 93(23), 153–158 (1946)

    MathSciNet  Google Scholar 

  31. L. Qiu, H. Yang, S.N. Koh, Fundamental frequency determination based on instantaneous frequency estimation. Signal Process. 44(2), 233–241 (1995)

    Article  Google Scholar 

  32. L. Rabiner, On the use of autocorrelation analysis for pitch detection. IEEE Trans. Acoust. Speech Signal Process. 25(1), 24–33 (1977)

    Article  Google Scholar 

  33. P. Rengaswamy, K.S. Rao, P. Dasgupta, Songf0: a spectrum-based fundamental frequency estimation for monophonic songs. Circuits Syst. Signal Process. 40(2), 772–797 (2021)

    Article  Google Scholar 

  34. M. Ross, H. Shaffer, A. Cohen, R. Freudberg, H. Manley, Average magnitude difference function pitch extractor. IEEE Trans. Acoust. Speech Signal Process. 22(5), 353–362 (1974)

    Article  Google Scholar 

  35. S. Shimauchi, S. Kudo, Y. Koizumi, K. Furuya, On relationships between amplitude and phase of short-time Fourier transform, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 676–680 (2017)

  36. Y. Stylianou, Modeling speech based on harmonic plus noise models, in International School on Neural Networks, Initiated by IIASS and EMFCSC. (Springer, 2004), pp. 244–260

  37. L. Sukhostat, Y. Imamverdiyev, A comparative analysis of pitch detection methods under the influence of different noise conditions. J. Voice 29(4), 410–417 (2015)

    Article  Google Scholar 

  38. X. Sun, A pitch determination algorithm based on subharmonic-to-harmonic ratio, in Sixth International Conference on Spoken Language Processing (2000)

  39. X. Sun, Pitch determination algorithm. https://www.mathworks.com/matlabcentral/fileexchange/1230-pitch-determination-algorithm, last accessed: 31-05-2022 (2002)

  40. J. Tabrikian, S. Dubnov, Y. Dickalov, Maximum a-posteriori probability pitch tracking in noisy environments using harmonic model. IEEE Trans. Speech Audio Process. 12(1), 76–87 (2004)

    Article  Google Scholar 

  41. D. Talkin, W.B. Kleijn, A robust algorithm for pitch tracking (rapt). Speech Coding Synth. 495, 518 (1995)

    Google Scholar 

  42. L.N. Tan, A. Alwan, Multi-band summary correlogram-based pitch detection for noisy speech. Speech Commun. 55(7–8), 841–856 (2013)

    Article  Google Scholar 

  43. J. Ville, Theorie et application de la notion de signal analytique. Câbles et transmissions 2(1), 61–74 (1948)

    Google Scholar 

  44. K. Wu, D. Zhang, G. Lu, Ipeeh: improving pitch estimation by enhancing harmonics. Expert Syst. Appl. 64, 317–329 (2016)

    Article  Google Scholar 

  45. S.A. Zahorian, H. Hu, A spectral/temporal method for robust fundamental frequency tracking. J. Acoust. Soc. Am. 123(6), 4559–4571 (2008)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the University of Tunis El Manar, Tunisia, and by the University of Genoa, Italy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zied Mnasri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mnasri, Z., Rovetta, S. & Masulli, F. A Novel Pitch Detection Algorithm Based on Instantaneous Frequency for Clean and Noisy Speech. Circuits Syst Signal Process 41, 6266–6294 (2022). https://doi.org/10.1007/s00034-022-02082-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-022-02082-8

Keywords

Navigation