Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction

Ben Messaoud, Mohamed Anouar; Bouzid, Aïcha

doi:10.1007/s10772-015-9325-1

Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction

Published: 28 November 2015

Volume 19, pages 65–73, (2016)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Mohamed Anouar Ben Messaoud¹ &
Aïcha Bouzid¹

333 Accesses
4 Citations
Explore all metrics

Abstract

The pitch is a crucial parameter in speech and music signals. However, due to severe noisy conditions, missing harmonics, unsuitable physical vibration, the determination of pitch presents a great challenge when desiring to get a good accuracy. In this paper, we propose a method for pitch estimation of speech and music sounds. Our method is based on the fast Fourier transform (FFT) of the multi-scale product (MP) provided by a feature auditory model of the sound signals. The auditory model simulates the spectral behaviour of the cochlea by a gammachirp filter-bank, and the out/middle ear filtering by a low-pass filter. For the two output channels, the FFT function of the MP is computed over frames. The MP is based on constituting the product of the speech and music wavelet transform coefficients at three scales. The experimental results show that our method estimates the pitch with high accuracy. Besides, our proposed method outperforms several other pitch detection algorithms in clean and noisy environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pitch Estimation Based on the Cepstrum Analysis by the Multi Scale Product of Clean and Noisy Speech

A Novel Pitch Detection Algorithm Based on Instantaneous Frequency for Clean and Noisy Speech

Article 25 June 2022

Multiple Pitch Estimation Based on Modified Harmonic Product Spectrum

References

Bello, J. P., Daudet, L., Abdallah, S., & Duxbury, C. (2005). A tutorial on onset detection in music signals. IEEE Transactions on Speech, Audio Processing, 13, 1035–1048.
Article Google Scholar
Ben Messaoud, M. A., Bouzid, A., & Ellouze, N. (2015). Automatic segmentation of the clean speech signal. World Academy of Science, Engineering and Technology International Journal of Electrical, Computer, Electronics and Communication Engineering, 9, 114–117.
Google Scholar
Brown, J., & Zhang, B. (1991). Musical frequency tracking using the methods of conventional and ’narrowed’ autocorrelation. Journal of the Acoustic Society of America, 89, 2346–2354.
Article Google Scholar
Camacho, A., & Harris, J. (2008). A sawtooth waveform inspired pitch estimator for speech and music. Journal of the Acoustic Society of America, 124, 1638–1652.
Article Google Scholar
De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. Journal of the Acoustic Society of America, 111, 1917–1930.
Article Google Scholar
Gavat, I., Zira, M., & Sabac, B. (2002). Pitch estimation by block and instantaneous methods. International Journal of Speech Technology, 5, 269–279.
Article MATH Google Scholar
Hess, W. J. (1992). Pitch and voicing determination. In S. Furni, M. Sondhi, & M. Dekker (Eds.), Advances in speech signal processing. New York: Marcel Dekker, Inc.,
Google Scholar
Irino, T., & Patterson, R. D. (2006). A dynamic compressive gammachirp auditory filterbank. IEEE Transactions on Audio, Speech and Language Processing, 14, 2222–2253.
Article Google Scholar
Kawahara, H., Katayose, H., De Cheveigné, A., & Patterson, R. D. (1999). Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity. Proceedings 6th EUROSPEECH (pp. 2781–2784).
Klapuri, A. (2000). Qualitative and quantitative aspects in the design of periodicity estimation algorithms. European signal processing conference proceedings (pp. 2069–2072).
Klapuri, A. (2004). Automatic music transcription as we know it today. Journal of New Music Research, 33, 269–282.
Article Google Scholar
Kunieda, N., Shimamura, T., & Suzuki, J. (1996). Robust method of measurement of fundamental frequency by aclos: autocorrelation of log spectrum. International conference on acoustics, speech, and signal processing proceedings (pp. 232–235). Atlanta, GA.
Li, H., Dai, B., & Lu, W. (2006). A pitch detection algorithm based on AMDF and ACF. International conference on acoustics, speech and signal processing proceedings. Toulouse (pp. 377–380).
Lyon, R. F., Katsiamis, A. G., & Drakakis, E. M. (2010). History and future of auditory filter models. Proceedings of 2010 IEEE international symposium on circuits and systems (ISCAS) (pp. 3809–3820).
Mahmoodzadeh, A., Abutalebi, H. R., Soltanian-Zadeh, H., Sheikhzadeh, H. (2012). Single channel speech separation with a frame-based pitch range estimation method in modulation frequency. International symposium on telecommunications (pp. 609–613).
Mallat, S. (1999). A wavelet tour of signal processing. San Diego: Academic Press.
MATH Google Scholar
Meddis, R., Lopez-Poveda, E. A., Fay, R. R., & Popper, A. N. (2010). Computational models of the auditory system., Springer Handbook of Auditory Research New York: Springer.
Book Google Scholar
Meddis, R., & O’Mard, L. (1997). A unitary model for pitch perception. Journal of the Acoustic Society of America, 102, 1811–1820.
Article Google Scholar
Meyer, G., Plante, F., & Ainsworth, W. A. (1995). A pitch extraction reference database. 4th European Conference on Speech Communication and Technology. EUROSPEECH’95, Madrid, pp. 837–840.
Muller, M., Ellis, D., Klapuri, A., & Richard, G. (2011). Signal processing for music analysis. IEEE Journal of Selected Topics in Signal Processing, 5, 1088–1110.
Article Google Scholar
Patterson, R. D., Unoki, M., & Irino, T. (2003). Extending the domain of centre frequencies for the compressive gammachirp auditory filter. Journal of the Acoustic Society of America, 114, 1529–1570.
Article Google Scholar
Prasanna, S. R. M., & Yegnanarayana, B. (2004). Extraction of pitch in adverse conditions. International conference on acoustics, speech and signal processing proceedings (pp. 109–112).
Roy, S. J., Molla, M. K. I., Hirose, K., & Hasan, M. K. (2011). Harmonic modification and data adaptive filtering based approach to robust pitch estimation. International Journal of Speech Technology, 14, 339–349.
Article Google Scholar
Shahnaz, C., Zhu, W. P., & Ahmad, M. O. (2007). A robust pitch estimation algorithm in noise. International conference on acoustics, speech and signal processing proceedings (pp. 1037–1076).
Shahnaz, C. Zhu, W. P., & Ahmad, M. O. (2008). A pitch extraction algorithm in noise based on temporal and spectral representations. International conference on acoustics, speech and signal processing proceedings (pp. 4477–4480).
Shimamura, T., & Kobayashi, H. (2001). Weighted autocorrelation for pitch extraction of noisy speech. IEEE Transactions on Speech and Audio Processing, 9, 727–730.
Article Google Scholar
Sun, X. (2000). A pitch determination algorithm based on subharmonic-to-harmonic ratio. International conference on spoken language processing proceedings (pp. 676–679). Beijing.
Tolonen, M., & Karjalainen, M. (2000). A computationally efficient multipitch analysis model. IEEE Transactions on Speech and Audio Process, 8, 708–716.
Article Google Scholar
University of lowa. (2012). Electronic music studios. http://theremin.music.uiowa.edu.
Van Immerseel, L. M., & Martens, J. P. (1992). Pitch and voiced/unvoiced determination with an auditory model. Journal of the Acoustic Society of America, 91, 3511–3526.
Article Google Scholar
Varga, A. (1993). Assessment for automatic speech recognition: II. Noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Elsevier Speech Communication, 12, 247–251.
Article Google Scholar
Wang, D. L., & Brown, G. J. (2006). Principles, computational auditory scene analysis: Algorithms, and applications. Hoboken, NJ: Wiley/IEEE Press.
Book Google Scholar
Xu, Y., Weaver, J., Healy, D., & Lu, J. (1994). Wavelet transform domain filters: A spatially selective noise filtration technique. IEEE Transactions on Image Processing, 3, 747–758.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, National Engineering School of Tunis, Tunis, Tunisia
Mohamed Anouar Ben Messaoud & Aïcha Bouzid

Authors

Mohamed Anouar Ben Messaoud
View author publications
You can also search for this author in PubMed Google Scholar
Aïcha Bouzid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Anouar Ben Messaoud.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ben Messaoud, M.A., Bouzid, A. Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction. Int J Speech Technol 19, 65–73 (2016). https://doi.org/10.1007/s10772-015-9325-1

Download citation

Received: 06 July 2015
Accepted: 20 November 2015
Published: 28 November 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10772-015-9325-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction

Abstract

Access this article

Similar content being viewed by others

Pitch Estimation Based on the Cepstrum Analysis by the Multi Scale Product of Clean and Noisy Speech

A Novel Pitch Detection Algorithm Based on Instantaneous Frequency for Clean and Noisy Speech

Multiple Pitch Estimation Based on Modified Harmonic Product Spectrum

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction

Abstract

Access this article

Similar content being viewed by others

Pitch Estimation Based on the Cepstrum Analysis by the Multi Scale Product of Clean and Noisy Speech

A Novel Pitch Detection Algorithm Based on Instantaneous Frequency for Clean and Noisy Speech

Multiple Pitch Estimation Based on Modified Harmonic Product Spectrum

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation