Eyes and Ears for Computers

Reddy, R.

doi:10.1007/978-3-642-80749-7_1

R. Reddy

Part of the book series: Lecture Notes in Economics and Mathematical Systems ((LNE,volume 83))

31 Accesses
3 Citations

Abstract

Visual and speech perception tasks, which can be performed with no apparent effort by people, have proved to be difficult for machines. This may be in part due to the absence of cognitive models of perception of the type proposed above by Jakobson. In this paper we attempt to give a unified view of the research in machine perception of speech and vision in the hope that a clear appreciation of similarities and differences may lead to better information processing models of perception. Being active in research in both computer vision and speech, we have found it useful to look at the problems that have arisen in one domain and anticipate corresponding problems in the other (Reddy, 1969). Thus, this paper represents a comparitive study of the issues, systems and unsolved problems that are, at present, of interest to visual and speech recognition research.

It is clear that all the (visual and speech) phenomena occur in both space and time. In visual signs it is the spacial dimension which takes priority, whereas the temporal dimension takes priority in auditory signs...what is the substantial difference between spacial and auditory signs? We observe a strong tendency to reify visual signs, to connect them with objects, to ascribe mimesis to such signs, and to view them as elements of an “imitative art”. ... On the other hand verbal and musical signs show us two essential features. First, both music and language present a consistantly heirarchized structure, and, second, both are resolvable into ultimate, discrete, rigorously patterned components which, as such, have no existence in nature but are built ad hoc.

One should not draw the frequently suggested but over-simplified conclusion that speech displays a purely linear character or that visual perception is performed by purely simultaneous synthesis. Luria shows that in our perception of a painting, we first deploy step-by-step efforts to go over from certain selected details from parts to the whole, and for the contemplator of a painting the integration follows as a further phase, as a goal. In the fifth century, Bhartrhari, the great master of Indic linguistic theory, distinguished three stages in a speech event, conceptualization, production and audition, and comprehension. While production and audition are naturally sequential, both conceptualization and comprehension of the whole message is done at one and the same time. This conception is akin to the modern psychological problem of “short-term memory”.

Jakobson (1964)

This research was supported in part by the Advanced Research Projects Agency of the Department of Defense under contract no. F44620-70-C-0107 and monitored by Air Force Office of Scientific Research.

This research was supported in part by the Advanced Research Projects Agency of the Department of Defense under contract no. F44620-70-C-OI07 and monitored by Air Force Office of Scientific Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barnett, J. (1972), A Vocal Data Management System, International Conference on Speech Communication and Processing, Boston, 340–343.
Google Scholar
Chase, W.G. and H.A. Simon (1971), Perception in Chess, CIP-182, Dept. of Psychology, Carnegie-Mellon Univ., Pittsburgh, Pa.
Google Scholar
Chomsky, N. and M. Halle (1968), The Sound Pattern of English, Harper and Row, New York.
Google Scholar
Erman, L.D. (1973), An Environment and System for Machine Recognition of Continuous Speech, Ph.D. Thesis, Computer Science Dept., Stanford Univ., to appear as a Technical Report, Computer Science Dept., Carnegie-Mellon Univ., Pittsburgh, Pa.
Google Scholar
Fant, G. (1960), Acoustic Theory of Speech Production, Mouton and Company: The Hague.
Google Scholar
Fant, G. (1970), Automatic Recognition and Speech Research, Quarterly Progress Report, 16–31, Dept. of Speech Communication, KTH, Stockholm.
Google Scholar
Feldman, J.A., et al. (1969), The Stanford Hand Eye Project, Proc. IJCAI, May 7–9, Washington, D.C.
Google Scholar
Feldman, J.A. et al. (1971), The Use of Vision and Manipulation to Solve the “Instant Insanity” Puzzle, Proc. Second IJCAI, London, 359–365.
Google Scholar
Fikes, R.E. and N.J. Nilsson (1971), STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving, Proc. Second IJCAI, London, 608–621.
Google Scholar
Flanagan, J.L. (1965), Speech Analysis, Synthesis, and Perception, Academic Press: New York. Second edition, 1971.
Google Scholar
Forgie, J. (1972), Personal Communication, MIT Lincoln Laboratories, Lexington, Mass.
Google Scholar
Fry, D.B. and P.B. Denes (1959), The Design and Operation of a Mechanical Speech’ Recognizer, J. British IRE, 19, 211–229.
Google Scholar
Gillogly, J.J. (1972), The TECHNOLOGY Chess Program, Artificial Intelligence, 3, 145–163.
Article MATH Google Scholar
Hughes, G.W. and J.F. Hemdal (1965), Speech Analysis, Tech. Rept, AFCRL-65–681, Purdue Univ., Lafayette, Ind.
Google Scholar
Jakobson, R. (1964), About the Relation between Visual and Auditory Signs, Models for the Perception of Speech and Visual Form (Ed. Wathen-Dunn), MIT Press, Cambridge, Mass., 1–7.
Google Scholar
Kelly, M.D. (1970), Visual Identification of People by Computers, AIM-130, Ph.D. thesis, Computer Science Dept., Stanford Univ., Stanford, Ca.
Google Scholar
Krakauer, L.J. (1971), Computer Analysis of Visual Properties of Curved Objects, Ph.D. Thesis, Electrical Engineering Dept., MIT, Cambridge, Mass.
Google Scholar
Lehiste, I. (1967), Readings in Acoustic-Phonetics, MIT Press, Cambridge, Mass.
Google Scholar
Minsky, M. and S. Papert (1972), Artificial Intelligence, Technical Report, Al Group, MIT, Cambridge, Mass.
Google Scholar
Narasimhan, R. (1966), Syntax-Directed Interpretation of Classes of Pictures, CACM, 9, 3, 166–173.
MathSciNet Google Scholar
Neely, R.B. (1973), On the Use of Syntax and Semantics in a Speech Understanding System, Ph.D. Thesis, Stanford Univ., to appear as a Technical Report, Computer Science Dept., Carnegie-Mellon Univ., Pittsburgh, Pa.
Google Scholar
Neely, R.B. (1973), On the Use of Syntax and Semantics in a Speech Understanding System, Ph.D. Thesis, Stanford Univ., to appear as a Technical Report, Computer Science Dept., Carnegie-Mellon Univ., Pittsburgh, Pa.
Google Scholar
Newell, A. (1970), Remarks on the Relationship between Artificial Intelligence and Cognitive Psychology, in Banerji and Mesarovic (eds.), Non-Numerical Problem Solving, 363400, Springer-Verlag.
Google Scholar
Newell, A. and H.A. Simon (1972), Human Problem Solving, Prentice-Hall.
Google Scholar
Newell, A. et al. (1973), Visualization, unpublished research, Carnegie-Mellon Univ., Pittsburgh, Pa.
Google Scholar
Nilsson, N.J. (1969), A Mobile Automaton: An Application of Artificial Intelligence Techniques, Proc. IJCAI, May 7–9, Washington, D.C.
Google Scholar
Pierce, J.R. (1969), Whither Speech Recognition, J. Acoust. Soc. Am. 46 1049–1051.
Google Scholar
Reddy, D.R. (1967), Computer Recognition of Connected Speech, J. Acoust. Soc. Am., 42. 2, 329–347.
Article Google Scholar
Reddy, D.R., (1969), On the Use of Environmental, Syntactic, and Probabilistic Constraints in Vision and Speech, AIM 78, Computer Science Dept., Stanford Univ., Stanford, Ca.
Google Scholar
Reddy, D.R., L.D. Erman, and R.B. Neely (1972), A Model and A System for Machine Recognition of Speech, (to be published in IEEE Trans. on Audio and Electroacoustics, 1973 ).
Google Scholar
Reddy, D.R., W.J. Davis, R.B. Ohlander, and D.J. Bihary (1972a), Computer Analysis of Neuronal Structure, Technical Report, Computer Science Dept., Carnegie-Mellon Univ., Pittsburgh, Pa.
Google Scholar
Reddy, D.R., B. Broadley, L. Erman, R. Johnsson, J. Newcomer, G. Robertson, and J. Wright (1972b), XCRIBL, A Hardcopy Scan Line Graphics System for Document Generation, Technical Report, Computer Science Dept., Carnegie-Mellon Univ., Pittsburgh, Pa.
Google Scholar
Reddy, D.R., L.D. Erman, R. Fennell, R.B. Neely (1973), The HEARSAY Speech Understanding System, to be published.
Google Scholar
Rosenfeld, A. (1969), Picture Processing by Computer, Academic Press, N.Y.
Google Scholar
Rosenfeld, A. (1973), Progress in Picture Processing: 1969–71. Computing Surveys 5, in press.
Google Scholar
Simon, H.A., and M. Barenfeld (1969), Information Processing Analysis of Perceptual Processes in Problem Solving, Psychological Review, 76, 473–483.
Article Google Scholar
Simon, H.A., and M. Barenfeld (1969), Information Processing Analysis of Perceptual Processes in Problem Solving, Psychological Review, 76, 473–483.
Article Google Scholar
Tenenbaum, J.M. (1970), Accomodation in Computer Vision, Ph.D. Thesis, Computer Science Dept., Stanford Univ., Stanford, Ca.
Google Scholar
Vicens, P.J. (1969), Aspects of Speech Recognition by a Computer, Ph.D. Thesis, AIM 85, Computer Science Dept.ment, Stanford Univ., Stanford, Ca.
Google Scholar
Walker, D. (1972), Personal Communication, Stanford Research Institute, Menlo Park, Ca. Winston ( 1971 ), Learning Structural Descriptions from Visual Scenes, Ph.D. Thesis, Electrical Engineering Dept., MIT, Cambridge, Mass.
Google Scholar
Woods, W. (1972), Personal Communication, Bolt, Beranek and Newman, Cambridge, Mass.
Google Scholar

Download references

Authors

R. Reddy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Datenverarbeitung, Arcisstraße 21, 8 München 2, Deutschland
Theodor Einsele
Institut für Angewandte Mathematik, Im Stadtwald, 66 Saarbrücken, Deutschland
Wolfgang Giloi
Institut für Informatik, Schlüterstraße 70, 2 Hamburg 13, Deutschland
Hans-Hellmut Nagel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reddy, R. (1973). Eyes and Ears for Computers. In: Einsele, T., Giloi, W., Nagel, HH. (eds) NTG/GI Gesellschaft für Informatik Nachrichtentechnische Gesellschaft Fachtagung „Cognitive Verfahren und Systeme“. Lecture Notes in Economics and Mathematical Systems, vol 83. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-80749-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-80749-7_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-06268-4
Online ISBN: 978-3-642-80749-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics