Multirate STC and Its Application to Multi-Speaker Conferencing
The problem of conferencing over systems which employ parametric vocoders has long been of interest to the military. In analog or wideband digital conferencing, overlapping speakers are handled by signal summation at a conferencing bridge. Such a scheme is not feasible for parametric vocoders which would require synthesis and reanalysis of the aggregate speech signal, a process called tandeming, which results in severe loss in quality in the synthetic speech. Moreover, further degradations occur when multiple speakers are active since parametric vocoders are not designed to model more than one voice. One narrowband technique currently in use is based on the idea of signal selection—a speaker has the channel until finished or until replaced by someone with a higher priority, and speakers contend for the open channel when it becomes available . The advantage of such a technique is that it avoids the degradations due to tandeming, but it is cumbersome. A more natural conference control is handled by interruptions corresponding to multiple speakers producing overlapping speech. One scheme that permits two-speaker overlaps assigns one-half of the available bandwidth to each speech coder and defers signal summation to the terminal . This approach limits the overall quality of the conference by forcing the coder to work at half the bandwidth. Since for the majority of a conference there will be only a single active speaker, this technique causes an overall degradation in the perceived quality in order to model an event that occurs relatively infrequently.
KeywordsVocal Tract Synthetic Speech Signal Summation Sinusoidal Model Lincoln Laboratory
Unable to display preview. Download preview PDF.
- J.W. Forgie, C.E. Feehrer, and P.L. Weene, “Voice Conferencing Technology Problem,” MIT Lincoln Laboratory Final Report, 31 March 1979.Google Scholar
- D. Busson, N. Irisarry, and C. Stengel, “Secure Conferencing HF Communications,” RADC-TR-86-55, April 1986.Google Scholar
- R.J. McAulay, and T.F. Quatieri, “Pitch Estimation and Voicing Detection Based on a Sinusoidal Model,” IEEE Proc. Int. Conf. Acoustics, Speech and Signal Processing 1990, Albuquerque, NM, pp. 249–252, April 1990.Google Scholar
- R.J. McAulay, and T.F. Quatieri, “Sine-Wave Phase Modelling at Low Data Rates,” IEEE Proc. Int. Conf. Acoustics. Speech and Signal Processing 1991, Toronto, Canada, May 1991.Google Scholar
- R.J. McAulay and T.F. Quatieri, “Low-Rate Speech Coding Based on the Sinusoidal Model,” Chapter 1.6, pp. 165–207, in Advances in Acoustics and Speech Processing, M. Sondhi and S. Furui, Eds., Marcel Deckker, 1992.Google Scholar
- R.J. McAulay and T.F. Quatieri, “The Sinusoidal Transform Coder at 2400 b/s,” to be published in Proc. MILCOM’92, San Diego, CA, October 1992.Google Scholar
- D. Lin, “Statistical Analysis of the BNR Half-Rate MOS Data Set,” TIA Speech Codec Working Group, Toronto, June 1992.Google Scholar
- R.J. McAulay and T.F. Quatieri, “Computationally Efficient Sine-Wave Synthesis and Its Application to Sinusoidal Transform Coding,” IEEE Proc. Int. Conf. Acoustics, Speech and Signal Processing 1988, New York City, NY, April 1988.Google Scholar