Skip to main content

Current Problems in Automatic Speech Recognition

  • Chapter
Pattern Recognition
  • 252 Accesses

Abstract

Automatic speech recognition may be defined as any process which decodes the acoustic signal produced by the human voice into a sequence of linguistic units which contain the message that the speaker wishes to convey. At one extreme this includes the “phonetic typewriter,” a hypothetical device which types any words spoken into it, and at the other, “speech understanding systems” which extract the intended meaning from the sounds and carry out some appropriate action such as replying to a question or controlling a robot. During the last two decades the emphasis in research in automatic speech recognition has gradually shifted from the former type of device to the latter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Ackroyd, M. H., 1974, Commercial applications of speech recognition, IEE Colloquium on Speech Synthesis and Recognition, Digest No. 1974/9, p. 7.

    Google Scholar 

  • Ainsworth, W. A., 1972, Duration as a cue in the recognition of vowels, J. Acoust. Soc. Am. 51: 648–651.

    Article  Google Scholar 

  • Ainsworth, W. A., 1973, Intrinsic and extrinsic factors in vowel judgements, Auditory Analysis and Speech Perception, Academic Press, London.

    Google Scholar 

  • Barnett, J., 1973, A vocal data management system IEEE Trans. Audio Electroacoust. AU-21: 185–188.

    Article  Google Scholar 

  • Bates, M., 1974, The use of syntax in a speech understanding system, IEEE Symp. Speech Recognition: 226–233.

    Google Scholar 

  • Bezdel, W., and Chandler, H. J., 1965, Results of analysis and recognition of vowels by computer using zero-crossing data, Proc. IEEE 112: 2060.

    Google Scholar 

  • Broadbent, D. E., and Ladefoged, P., 1960, Vowel judgements and adaptation level, Proc. Royal Soc. B 151: 384–399.

    Article  CAS  Google Scholar 

  • Davis, K. H., Biddulph, R., and Balashek, H., 1952, Automatic recognition of spoken digits, J. Acoust. Soc. Am. 24: 637–642.

    Article  Google Scholar 

  • Dudley, H., and Balashek, S., 1958, Automatic recognition of phonetic patterns in speech, J. Acoust. Soc. Am. 30: 721–732.

    Article  Google Scholar 

  • Fant, C. G. M., 1960, Acoustic Theory of Speech Production, Mouton, s’Gravenhage.

    Google Scholar 

  • Forgie, J. W., and Forgie, C. D., 1959, Results obtained from a vowel recognition computer program, J. Acoust. Soc. Am. 31: 1480–1489.

    Article  Google Scholar 

  • Fry, D. B., and Denes, P., 1958, The solution of some fundamental problems in mechanical speech recognition, Language and Speech 1: 35–58.

    Google Scholar 

  • Fujisaki, H., and Kawashima, T., 1968, The roles of pitch and higher formants in the perception of vowels, IEEE Trans. Audio Electroacoust. AU-16: 73–77.

    Article  Google Scholar 

  • Gerstman, L. J., 1968, Classification of self-normalized vowels, IEEE Trans. Audio Electroacoust. AU-16: 78–80.

    Article  Google Scholar 

  • Green, P. D., 1971, Temporal characteristics of spoken consonants as discriminants in automatic speech recognition, Ph.D. Thesis, University of Keele.

    Google Scholar 

  • Green, P. D., and Ainsworth, W. A., 1973, Towards the automatic recognition of spoken Basic English, Machine Perception of Patterns and Pictures, Inst. of Physics Conf. Series No. 13, p. 161–168.

    Google Scholar 

  • Gregory, R. L., 1970, The Intelligent Eye, Weidenfeld and Nicolson, London.

    Google Scholar 

  • Halle, M., and Stevens, K. N., 1962, Speech recognition: a model and a program for research, IRE Trans. Information Theory IT-8: 155–159.

    Article  Google Scholar 

  • Jakobson, R., Fant, C. G. M., and Halle, M., 1952, Preliminaries to Speech Analysis, MIT Tech. Report No. 13.

    Google Scholar 

  • Klatt, D. H., and Stevens, K. N., 1973, On the automatic recognition of continuous speech, IEEE Trans. Audio Electroacoust. AU-21: 210–217.

    Article  Google Scholar 

  • Lavington, S. H., 1968, Measurement systems for automatic speech recognition, Ph.D. Thesis, University of Manchester.

    Google Scholar 

  • Lawrence, W., (1953), The synthesis of speech from signals which have a low information rate, Communication Theory (W. Jackson, ed.), Butterworths, London, 460–469.

    Google Scholar 

  • Lea, W. A., Medress, M. F., and Skinner, T. E., 1974, A prosodically-guided speech understanding strategy, IEEE Symp. Speech Recognition, 38–44.

    Google Scholar 

  • Lesser, V. R., Fennel, R. D., Erman, L. D. and Reddy, D. R., 1974, Organization of the HEARSAY II speech understanding system, IEEE Symp. Speech Recognition, 11–21.

    Google Scholar 

  • Licklider, J. C. R., and Pollack, I., 1948, Effects of differentiation, integration, and infinite peak dipping on the intelligibility of speech, J. Acoust. Soc. Am. 20: 42–51.

    Article  Google Scholar 

  • Lindblom, B. E. F., and Studdert-Kennedy, M., 1967, On the role of formant transitions in vowel recognition, J. Acoust. Soc. Am. 42: 830–843.

    Article  CAS  Google Scholar 

  • MacKay, D. M., 1952, Mentality in machines, Proc. Aristot. Soc. Suppt., 26: 61–86.

    Google Scholar 

  • MacKay, D. M., 1967, Ways of looking at perception, Models for the Perception of Speech and Visual Form (W. Wathen-Dunn, ed.), MIT Press, Boston, 25–43.

    Google Scholar 

  • Nash-Webber, B., 1974, Semantic support for a speech understanding system, IEEE Symp. Speech Recognition, 244–249.

    Google Scholar 

  • Nelson, A. L., Werscher, M. B., Martin, T. B., Zadell, H. J., and Falter, J. W., 1967, Acoustic recognition by analog feature-abstraction techniques’ Models for Perception of Speech and Visual Form, (W. Wathen-Dunn, ed.), MIT Press, Boston, 428–439.

    Google Scholar 

  • Newell, A., Barnett, J., Forgie, J. W., Green, C, Klatt, D., Licklider, J. C. R., Munson, J., Reddy, D. R., and Woods, W. A., 1973, Speech Understanding System North-Holland Publishing Co.

    Google Scholar 

  • Öhman, S. E. G., 1966, Perception of segments of VCCV utterances, J. Acoust. Soc. Am., 40: 978–988.

    Article  Google Scholar 

  • Oshika, B. T., Zue, V. W., Weeks, R. V., Neu, H., and Aurbach, J., 1974, The role of phonological rules in speech understanding research, IEEE Symp. Speech Recognition, 204–207.

    Google Scholar 

  • Paul, J. E., and Rabinowitz, A. S., 1974, An acoustically based continuous speech recognition system, IEEE Symp. Speech Recognition, 63–67.

    Google Scholar 

  • Paxton, W. H., 1974, A best-first parser, IEEE Symp. Speech Recognition, 218–225.

    Google Scholar 

  • Peterson, G. E., and Barney, H. L., 1952, Control methods used in a study of the vowels, J. Acoust. Soc. Am. 24: 175–184.

    Article  Google Scholar 

  • Pollack, I., and Pickett, J., 1964, The intelligibility of excerpts from conversation, Language and Speech 6, 165–171.

    Google Scholar 

  • Potter, R. K., Kopp, G. A., and Green, H. C, 1947, Visible Speech, van Nostrand, New York.

    Google Scholar 

  • Purton, R. F., 1968, Speech recognition using autocorrelation analysis, IEEE Trans. Audio Electroacoust. AU-16: 235–239.

    Article  Google Scholar 

  • Reddy, D. R., 1967, Computer recognition of connected speech, J. Acoust. Soc. Am., 44: 329–347.

    Article  Google Scholar 

  • Reddy, D. R., Erman, L. D., and Neely, R. B., 1973, A model and a system for machine recognition of speech, IEEE Trans. Audio Electroacoust. AU-21: 229–238.

    Article  Google Scholar 

  • Ritea, H. B., 1974, A voice-controlled data management system, IEEE Symp. Speech Recognition, 28–31.

    Google Scholar 

  • Rovner, P., Nash-Webber, R., and Words, W. A., 1974, Control concepts in a speech understanding system, IEEE Symp. Speech Recognition, 267–272.

    Google Scholar 

  • Sakai, T., and Doshita, S., 1962, The phonetic typewriter, Proc. IFIP Congress, Munich.

    Google Scholar 

  • Tappert, C. C., 1974, Experiments with a tree search method for converting noisy phonetic representation into standard orthography, IEEE Symp. Speech Recognition, pp. 261–266.

    Google Scholar 

  • Tappert, C. C., Dixon, N. R., and Rabinowitz, A. S., 1973, Application of sequential decoding for converting phonetic to graphic representation in automatic recognition of continuous speech (ARCS), IEEE Trans. Audio Electroacoust. AU-21: 225–229.

    Article  Google Scholar 

  • Teacher, C. F., Kellett, H., and Focht, L., 1967, Experimental, limited vocabulary, speech recognizer, IEEE Intern. Conv. Record (Part III), 169–173.

    Google Scholar 

  • Walker, D. E., 1974, The SRI speech understanding system, IEEE Symp. Speech Recognition, pp.32–37.

    Google Scholar 

  • Winograd, T., 1972, Understanding Natural Language, Edinburgh University Press, Edinburgh.

    Google Scholar 

  • Winston, P. H., 1972, The MIT robot, Machine Intelligence 7: 431–463.

    Google Scholar 

  • Wiren, J., and Stubbs, H. L., 1956, Electronic binary selection system for phoneme classification, J. Acoust. Soc. Am. 28: 1082–1091.

    Article  Google Scholar 

  • Woods, W. A., 1974, Motivation and overview of BBN SPEECHLIS, an experimental prototype for speech understanding research, IEEE Symp. Speech Recognition, pp. 1–10.

    Google Scholar 

  • Woods, W. A., and Makhoul, J., 1974, Mechanical inference problems in continuous speech understanding, Artific. Intell. 5: 73.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1978 Plenum Press, New York

About this chapter

Cite this chapter

Ainsworth, W.A., Green, P.D. (1978). Current Problems in Automatic Speech Recognition. In: Batchelor, B.G. (eds) Pattern Recognition. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-4154-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-4154-3_14

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-4156-7

  • Online ISBN: 978-1-4613-4154-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics