Skip to main content
Log in

Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The pitch is a crucial parameter in speech and music signals. However, due to severe noisy conditions, missing harmonics, unsuitable physical vibration, the determination of pitch presents a great challenge when desiring to get a good accuracy. In this paper, we propose a method for pitch estimation of speech and music sounds. Our method is based on the fast Fourier transform (FFT) of the multi-scale product (MP) provided by a feature auditory model of the sound signals. The auditory model simulates the spectral behaviour of the cochlea by a gammachirp filter-bank, and the out/middle ear filtering by a low-pass filter. For the two output channels, the FFT function of the MP is computed over frames. The MP is based on constituting the product of the speech and music wavelet transform coefficients at three scales. The experimental results show that our method estimates the pitch with high accuracy. Besides, our proposed method outperforms several other pitch detection algorithms in clean and noisy environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Bello, J. P., Daudet, L., Abdallah, S., & Duxbury, C. (2005). A tutorial on onset detection in music signals. IEEE Transactions on Speech, Audio Processing, 13, 1035–1048.

    Article  Google Scholar 

  • Ben Messaoud, M. A., Bouzid, A., & Ellouze, N. (2015). Automatic segmentation of the clean speech signal. World Academy of Science, Engineering and Technology International Journal of Electrical, Computer, Electronics and Communication Engineering, 9, 114–117.

    Google Scholar 

  • Brown, J., & Zhang, B. (1991). Musical frequency tracking using the methods of conventional and ’narrowed’ autocorrelation. Journal of the Acoustic Society of America, 89, 2346–2354.

    Article  Google Scholar 

  • Camacho, A., & Harris, J. (2008). A sawtooth waveform inspired pitch estimator for speech and music. Journal of the Acoustic Society of America, 124, 1638–1652.

    Article  Google Scholar 

  • De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. Journal of the Acoustic Society of America, 111, 1917–1930.

    Article  Google Scholar 

  • Gavat, I., Zira, M., & Sabac, B. (2002). Pitch estimation by block and instantaneous methods. International Journal of Speech Technology, 5, 269–279.

    Article  MATH  Google Scholar 

  • Hess, W. J. (1992). Pitch and voicing determination. In S. Furni, M. Sondhi, & M. Dekker (Eds.), Advances in speech signal processing. New York: Marcel Dekker, Inc.,

    Google Scholar 

  • Irino, T., & Patterson, R. D. (2006). A dynamic compressive gammachirp auditory filterbank. IEEE Transactions on Audio, Speech and Language Processing, 14, 2222–2253.

    Article  Google Scholar 

  • Kawahara, H., Katayose, H., De Cheveigné, A., & Patterson, R. D. (1999). Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity. Proceedings 6th EUROSPEECH (pp. 2781–2784).

  • Klapuri, A. (2000). Qualitative and quantitative aspects in the design of periodicity estimation algorithms. European signal processing conference proceedings (pp. 2069–2072).

  • Klapuri, A. (2004). Automatic music transcription as we know it today. Journal of New Music Research, 33, 269–282.

    Article  Google Scholar 

  • Kunieda, N., Shimamura, T., & Suzuki, J. (1996). Robust method of measurement of fundamental frequency by aclos: autocorrelation of log spectrum. International conference on acoustics, speech, and signal processing proceedings (pp. 232–235). Atlanta, GA.

  • Li, H., Dai, B., & Lu, W. (2006). A pitch detection algorithm based on AMDF and ACF. International conference on acoustics, speech and signal processing proceedings. Toulouse (pp. 377–380).

  • Lyon, R. F., Katsiamis, A. G., & Drakakis, E. M. (2010). History and future of auditory filter models. Proceedings of 2010 IEEE international symposium on circuits and systems (ISCAS) (pp. 3809–3820).

  • Mahmoodzadeh, A., Abutalebi, H. R., Soltanian-Zadeh, H., Sheikhzadeh, H. (2012). Single channel speech separation with a frame-based pitch range estimation method in modulation frequency. International symposium on telecommunications (pp. 609–613).

  • Mallat, S. (1999). A wavelet tour of signal processing. San Diego: Academic Press.

    MATH  Google Scholar 

  • Meddis, R., Lopez-Poveda, E. A., Fay, R. R., & Popper, A. N. (2010). Computational models of the auditory system., Springer Handbook of Auditory Research New York: Springer.

    Book  Google Scholar 

  • Meddis, R., & O’Mard, L. (1997). A unitary model for pitch perception. Journal of the Acoustic Society of America, 102, 1811–1820.

    Article  Google Scholar 

  • Meyer, G., Plante, F., & Ainsworth, W. A. (1995). A pitch extraction reference database. 4th European Conference on Speech Communication and Technology. EUROSPEECH’95, Madrid, pp. 837–840.

  • Muller, M., Ellis, D., Klapuri, A., & Richard, G. (2011). Signal processing for music analysis. IEEE Journal of Selected Topics in Signal Processing, 5, 1088–1110.

    Article  Google Scholar 

  • Patterson, R. D., Unoki, M., & Irino, T. (2003). Extending the domain of centre frequencies for the compressive gammachirp auditory filter. Journal of the Acoustic Society of America, 114, 1529–1570.

    Article  Google Scholar 

  • Prasanna, S. R. M., & Yegnanarayana, B. (2004). Extraction of pitch in adverse conditions. International conference on acoustics, speech and signal processing proceedings (pp. 109–112).

  • Roy, S. J., Molla, M. K. I., Hirose, K., & Hasan, M. K. (2011). Harmonic modification and data adaptive filtering based approach to robust pitch estimation. International Journal of Speech Technology, 14, 339–349.

    Article  Google Scholar 

  • Shahnaz, C., Zhu, W. P., & Ahmad, M. O. (2007). A robust pitch estimation algorithm in noise. International conference on acoustics, speech and signal processing proceedings (pp. 1037–1076).

  • Shahnaz, C. Zhu, W. P., & Ahmad, M. O. (2008). A pitch extraction algorithm in noise based on temporal and spectral representations. International conference on acoustics, speech and signal processing proceedings (pp. 4477–4480).

  • Shimamura, T., & Kobayashi, H. (2001). Weighted autocorrelation for pitch extraction of noisy speech. IEEE Transactions on Speech and Audio Processing, 9, 727–730.

    Article  Google Scholar 

  • Sun, X. (2000). A pitch determination algorithm based on subharmonic-to-harmonic ratio. International conference on spoken language processing proceedings (pp. 676–679). Beijing.

  • Tolonen, M., & Karjalainen, M. (2000). A computationally efficient multipitch analysis model. IEEE Transactions on Speech and Audio Process, 8, 708–716.

    Article  Google Scholar 

  • University of lowa. (2012). Electronic music studios. http://theremin.music.uiowa.edu.

  • Van Immerseel, L. M., & Martens, J. P. (1992). Pitch and voiced/unvoiced determination with an auditory model. Journal of the Acoustic Society of America, 91, 3511–3526.

    Article  Google Scholar 

  • Varga, A. (1993). Assessment for automatic speech recognition: II. Noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Elsevier Speech Communication, 12, 247–251.

    Article  Google Scholar 

  • Wang, D. L., & Brown, G. J. (2006). Principles, computational auditory scene analysis: Algorithms, and applications. Hoboken, NJ: Wiley/IEEE Press.

    Book  Google Scholar 

  • Xu, Y., Weaver, J., Healy, D., & Lu, J. (1994). Wavelet transform domain filters: A spatially selective noise filtration technique. IEEE Transactions on Image Processing, 3, 747–758.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Anouar Ben Messaoud.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ben Messaoud, M.A., Bouzid, A. Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction. Int J Speech Technol 19, 65–73 (2016). https://doi.org/10.1007/s10772-015-9325-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9325-1

Keywords

Navigation