A New Probabilistic Spectral Pitch Estimator: Exact and MCMC-approximate Strategies

  • Harvey D. Thornburg
  • Randal J. Leistikow
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3310)

Abstract

We propose a robust probabilistic pitch (f0) estimator in the presence of interference and low SNR conditions, without the computational requirements of optimal time-domain methods. Our analysis is driven by sinusoidal peaks extracted by a windowed STFT. Given f0 and a reference amplitude (A0), peak frequency/amplitude observations are modeled probabilistically in order to be robust to undetected harmonics, spurious peaks, skewed peak estimates, and inherent deviations from ideal or other assumed harmonic structure. Parameters f0 and A0 are estimated by maximizing the observations’ likelihood (here A0 is treated as a nuisance parameter). Some previous spectral pitch estimation methods, most notably the work of Goldstein [3], introduce a probabilistic framework with a corresponding maximum likelihood approach. However, our method significantly extends the latter in order to guarantee robustness under adverse conditions, facilitating possible extensions to the polyphonic context. For instance, our addressing of spurious as well as undetected peaks averts a sudden breakdown under low-SNR conditions. Furthermore, our assimilation of peak amplitudes facilitates the incorporation of timbral knowledge. Our method utilizes a hidden, discrete-valued descriptor variable identifying spurious/undetected peaks. The likelihood evaluation, requiring a computationally unwieldy summation over all descriptor states, is successfully approximated by a MCMC traversal chiefly amongst high-probability states. The MCMC traversal obtains virtually identical evaluations for the entire likelihood surface at a fraction of the computational cost.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cinlar, E.: Introduction to Stochastic Processes. Prentice-Hall, Englewood Cliffs (1975)MATHGoogle Scholar
  2. 2.
    Fitzgerald, W.J.: Markov chain Monte Carlo methods with applications to signal processing. Elsevier Signal Processing 81(1), 3–18 (2001)MATHGoogle Scholar
  3. 3.
    Goldstein, J.: An optimum processor theory for the central formation of the pitch of complex tones. J. Acoust. Soc. Amer. 54, 1496–1516 (1973)CrossRefGoogle Scholar
  4. 4.
    Hory, C., Martin, N., Chehikian, A.: Spectrogram segmentation by means of statistical features for non-stationary signal interpretation. IEEE Trans. ASSP 50(12), 2915–2925 (2002)MathSciNetGoogle Scholar
  5. 5.
    Knuth, D., Vardi, I., Richberg, R.: 6581 (The asymptotic expansion of the middle binomial coefficient). American Mathematical Monthly 97(7), 626–630 (1990)CrossRefMathSciNetGoogle Scholar
  6. 6.
    McAulay, R.J., Quatieri, T.F.: Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. ASSP 34(4), 744–754 (1986)CrossRefGoogle Scholar
  7. 7.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  8. 8.
    Leistikow, R., Thornburg, H., et al.: Bayesian Identification of Closely-Spaced Chords from Single-Frame STFT Peaks. In: Proc. 7th International Conference on Digital Audio Effects (DAFx 2004), Naples (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Harvey D. Thornburg
    • 1
  • Randal J. Leistikow
    • 1
  1. 1.Center for Computer Research in Music and Acoustics (CCRMA), Department of MusicStanford UniversityStanfordUSA

Personalised recommendations