Skip to main content

Using Normalized RBF Networks to Map Hand Gestures to Speech

  • Chapter
Radial Basis Function Networks 2

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 67))

Abstract

Glove-TalkII is a system that translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to 10 control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-TalkII uses several input devices (including a Cyberglove, a 3-space tracker, a keyboard and a foot-pedal), a parallel formant speech synthesizer and 3 neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a mixture of experts architecture where the gating network weights the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed, user-defined relationship between hand-position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency and stop consonants are produced with a fixed mapping from the input devices. One subject has trained to speak intelligibly with Glove-TalkII. He speaks slowly with speech quality similar to a text-to-speech synthesizer but with far more natural sounding pitch variations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bell, A.G. (1909), “Making a talking-machine,” Beinn Bhreagh Recorder, pp. 61–72, November.

    Google Scholar 

  2. Bridle, J.S. (1990), “Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition,” in Fougelman-Soulie, F. and Herault, J. (Eds.), NATO ASI Series on Systems and Computer Science, Springer-Verlag.

    Google Scholar 

  3. Bridle, J.S. (1990), “Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters,” in Touretzky, D.S. (Ed.), Neural Information Processing Systems, vol. 2, pp. 111–217, San Mateo, CA, Morgan Kaufmann.

    Google Scholar 

  4. Broomhead, D. and Lowe, D. (1988), “Multivariable functional interpolation and adaptive networks,” Complex Systems, vol. 2, pp. 321–355.

    MathSciNet  MATH  Google Scholar 

  5. Dudley, H., Riesz, R.R., and Watkins, S.S.A. (1939), “A synthetic speaker,” Journal of the Franklin Institute, vol. 227, no. 6, pp. 739764, June.

    Google Scholar 

  6. Fels, S.S. (1994), Glove-TalkIl: Mapping Hand Gestures to Speech Using Neural Networks, Ph.D. thesis, University of Toronto, Toronto, ON, August.

    Google Scholar 

  7. Fels, S.S. and Hinton, G. (1993), “Glove-Talk: a neural network interface between a data-glove and a speech synthesizer,” IEEE Transaction on Neural Networks, vol. 4, pp. 2–8.

    Article  Google Scholar 

  8. Fels, S.S. and Hinton, G.E. (1998), “Glove-TalkII: a neural network interface which maps gestures to parallel formant speech synthesizer controls,” IEEE Transactions on Neural Networks, vol. 9, pp. 205–212.

    Article  Google Scholar 

  9. Connectionist Research Group (1990), Xerion Neural Network Simulator Libraries and Man Pages; version 3.183, University of Toronto, Toronto, ON, CANAD

    Google Scholar 

  10. Jones, R.D., Lee, Y.C., Qian, S., Barnes, C.W., Bisset, K.R., Bruce, G.M., Flake, G.W., Lee, K., Lee, L.A., Mead, W.C., O’Rourke, M.K., Poli, I.J., and Thodes, L.E. (1990), “Nonlinear adaptive networks: a little theory, a few applications,” Technical Report LAUR-91–273, Los Alamos National Laboratory.

    Google Scholar 

  11. Ladefoged, P. (1982), A Course in Phonetics (2 ed.), Harcourt Brace Javanovich, New York.

    Google Scholar 

  12. Lewis, E. (1989), “A ‘C’ implementation of the JSRU text-tospeech system,” Technical report, Computer Science Dept., University of Bristol.

    Google Scholar 

  13. Lowry, A., Hall, M.C., and Hughes, P.M. (1989), “Iterative parameter optimization techniques for parallel-formant encoding of speech,” European Conference on Circuit Theory and Design, pp. 537–541.

    Google Scholar 

  14. Rumelhart, D.E., Hinton, G.E., and Williams, R.J. (1986), “Learning internal representations by back-propagating errors,” Nature, vol. 323, pp. 533–536.

    Article  Google Scholar 

  15. Rye, J.M. and Holmes, J.N. (1982), “A versatile software parallel-formant speech synthesizer,” Technical Report JSRU-RR-1016, Joint Speech Research Unit, Malvern, U.K.

    Google Scholar 

  16. Von Kempelen, W. Ritter (1970), Mechanismus der menschlichen Sprache nebst Beschreibung einer sprechenden Maschine. Mit einer Einleitung von Herbert E. Brekle und Wolfgang Wild,Stuttgart-Bad Cannstatt F Frommann, Stuttgart. (In German.)

    Google Scholar 

  17. Yair, E. and Gersho, A. (1989), “The Boltzmann perceptron network: a multilayered feed-forward network equivalent to the Boltzmann machine,” in Touretzky, D. (Ed.), Advances in Neural Information Processing Systems 1 (NIPS*88), pp. 116–123, San Mateo, Morgan Kaufman Publishers.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Fels, S.S. (2001). Using Normalized RBF Networks to Map Hand Gestures to Speech. In: Howlett, R.J., Jain, L.C. (eds) Radial Basis Function Networks 2. Studies in Fuzziness and Soft Computing, vol 67. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1826-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-7908-1826-0_3

  • Publisher Name: Physica, Heidelberg

  • Print ISBN: 978-3-7908-2483-4

  • Online ISBN: 978-3-7908-1826-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics