The Pitch Mode Modulation Model and Its Application in Speech Processing

  • Michael A. Ramalho
  • Richard J. Mammone
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 327)


The techniques currently used for speech coding or enhancement critically depend upon some form of statistical stationarity either in the speech signal the noise signal or both in order to accomplish the coding or enhancement. Virtually all speech processing techniques utilize a speech model to reduce the amount of information necessary to characterize the speech signal. Although the speech signal is known to be highly redundant it is also non-stationary. This non-stationarity requires that the parameters of these models be extracted from short duration signal segments, where the stationarity assumption in the models is not seriously violated. Unfortunately the use of short speech frames makes the estimation of the model parameters difficult and sometimes obscures the very redundancy the model was based on. The use of a longer frame size is desirable for many signal processing techniques that require increased frequency domain resolution.


Speech Signal Instantaneous Frequency Speech Enhancement Clean Speech Speaker Identification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    L. B. Almeida and J. M. Tribolet, “Nonstationary Spectral Modeling of Voiced Speech”IEEETrans. Acoust.,Speech Signal Processing, vol. ASSP-31, no. 3, pp. 664–677, June 1983.CrossRefGoogle Scholar
  2. [2]
    K. T. Assaleh, R. J. Mammone, and J. L. Flanagan, “Speech Recognition Using the Modulation Model”Proc. ICASSP-93vol. 2, pp. 664–667, 1993.Google Scholar
  3. [3]
    B. Boashash, editor, Time-Frequency SignalAnalysis Methods and Application, New York, NY: John Wiley& Sons, 1992.Google Scholar
  4. [4]
    D. Chazan, Y. Stettiner, and D. Malah, “Optimal Multi-Pitch Estimation Using the EM Algorithm for Co-Channel Speech Separation”Proc. ICASSP-93vol. 2, pp. 728–731, 1993.CrossRefGoogle Scholar
  5. [5]
    A. Gersho and R. M. GrayVector Quantizationand Signal Compression,Boston, MA.: Kluwer Academic Publishers, 1992.CrossRefGoogle Scholar
  6. [6]
    D. W. Griffin and J. S. Lim, “The Multiband Excitation Vocoder”IEEE Trans. Acoust., Speech, Signal Processingvol. 36, no. 8, pp. 1223–1236, August 1988.zbMATHCrossRefGoogle Scholar
  7. [7]
    G. Jones and B. Boashash “Instantaneous Frequency, Instantaneous Bandwidth and the Analysis of Multicomponent Signals”Proc. ICASSP-90vol. E, pp. 2467–2470, 1990.Google Scholar
  8. [8]
    J. S. Lim, Speech Enhancement, Englewood Cliffs, NJ: Prentice-Hall, 1983.Google Scholar
  9. [9]
    R. J. McAulay and T. F. Quatieri, “Speech Analysis/Synthesis Based on Sinusoidal Representation”IEEE Trans. Acoust. SpeechSignal Processingvol. ASSP-34, no. 4, pp. 744–754, August 1986.Google Scholar
  10. [10]
    A. Papoulis, “Random Modulation: A Review”IEEE Trans. Acoust. Speech,Signal Processingvol. ASSP-31, no. 1, pt. 1, pp. 96–105, February 1983.Google Scholar
  11. [11]
    L. A. Pipes, Applied Mathematics for Engineers and Physicists, New York, NY: McGraw-Hill, 1958.Google Scholar
  12. [12]
    L. R. Rabiner and B-H. Juang, Fundamentals of Speech Recognition, Englewood Cliffs, NJ: Prentice-Hall, 1993.Google Scholar
  13. [13]
    M. A. Ramalho and R. J. Mammone, “New Speech Enhancement Techniques Using the Pitch Mode Modulation Model”, Rutgers University, Piscataway, NJ, CAIP Technical Report, CAIP-TR-155, February 12, 1993.Google Scholar
  14. [14]
    M. A. Ramalho and R. J. Mammone, “New Speech Enhancement Techniques Using the Pitch Mode Modulation Model”Proc. of the36th Midwest Symposium on Circuitsand Systems, Detroit, MI, August 16–18, 1993.Google Scholar
  15. [15]
    M. A. Ramalho, The Pitch Mode Modulation Model with Applications in Speech Processing, doctoral dissertation, Rutgers University, New Brunswick, NJ, January 1994.Google Scholar
  16. [16]
    M. A. Ramalho and R. J. Mammone, “A New Speech Enhancement Technique with Application to Speaker Identification”Proc. ICASSP-94vol. 1, pp. 29–32, 1994.CrossRefGoogle Scholar
  17. [17]
    Y. Stettiner, D. Malah, and D. Chazan, “Estimation of a Long-Term Model for Accurate Representation of Voiced Speech”Proc. ICASSP-93vol. 2, pp. 534–537,1993.Google Scholar
  18. [18]
    F. G. Stremler, Introduction to Communications Systems, Reading, MA: Addison-Wesley Publishers, Series in Electrical Engineering, 1982.Google Scholar
  19. [19]
    M. Sun and R. J. Sclabassi, “Discrete-Time Instantaneous Frequency and Its Computation”IEEETrans. Signal Processing, vol. 41, no. 5, pp. 1867–1880, May 1993.zbMATHGoogle Scholar
  20. [20]
    H. Taub and D. L. Schilling, Principles of Communications Systems, New York, NY: McGraw-Hill, 1986.Google Scholar
  21. [21]
    R. J. Webster, “Spectral Line Profiles Generated by Deterministic Frequency Modulation”IEEETrans. Signal Processing,vol. 39, no. 4, pp. 1012–1017, April 1991.Google Scholar

Copyright information

© Springer Science+Business Media New York 1995

Authors and Affiliations

  • Michael A. Ramalho
    • 1
    • 2
  • Richard J. Mammone
    • 1
    • 2
  1. 1.BellcoreNew JerseyUSA
  2. 2.CAIP Center, Rutgers University PiscatawayNew JerseyUSA

Personalised recommendations