Advertisement

A maximum-likelihood approach to visual event classification

  • Jeffrey Mark Siskind
  • Quaid Morris
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1065)

Abstract

This paper presents a novel framework, based on maximum likelihood, for training models to recognise simple spatial-motion events, such as those described by the verbs pick up, put down, push, pull, drop, and throw, and classifying novel observations into previously trained classes. The model that we employ does not presuppose prior recognition or tracking of 3D object pose, shape, or identity. We describe our general framework for using maximum-likelihood techniques for visual event classification, the details of the generative model that we use to characterise observations as instances of event types, and the implemented computational techniques used to support training and classification for this generative model. We conclude by illustrating the operation of our implementation on a small example.

Keywords

Feature Vector Object Recognition Event Type Event Classification Event Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Abe, N., Soga, I., & Tsuji, S. (1981). A Plot Understanding System on Reference to Both Image and Language. In Proceedings of the Seventh International Joint Conference on Artificial Intelligence, pp. 77–84, Vancouver, BC.Google Scholar
  2. Badler, N. I. (1975). Temporal Scene Analysis: Conceptual Descriptions of Object Movements. Tech. rep. 80, University of Toronto Department of Computer Science.Google Scholar
  3. Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A Maximization Technique Occuring in the Statistical Analysis of Probabilistic Functions of Markov Chains. The Annals of Mathematical Statistics, 41(1), 164–171.Google Scholar
  4. Jackendoff, R. (1990). Semantic Structures. Cambridge, MA: The MIT Press.Google Scholar
  5. Johansson, G. (1973). Visual Perception of Biological Motion and a Model for its Analysis. Perception and Psychophysics, 14(2), 201–211.Google Scholar
  6. Mann, R., Jepson, A., & Siskind, J. M. (1996). The Computational Perception of Scene Dynamics. In Proceedings of the Fourth European Conference on Computer Vision, Cambridge, UK: Springer-Verlag.Google Scholar
  7. Marburger, H., Neumann, B., & Novak, H. (1981). Natural Language Dialogue About Moving Objects in an Automatically Analyzed Traffic Scene. In Proceedings of the Seventh International Joint Conference on Artificial Intelligence, pp. 49–51, Vancouver, BC.Google Scholar
  8. Marr, D., & Vaina, L. (1982). Representation and Recognition of the Movements of Shapes. Proc. R. Soc. Lond. B, 214, 501–524.Google Scholar
  9. Nagel, H. (1977). Analysing Sequences of TV-Frames. In Proceedings of the Fifth International Joint Conference on Artificial Intelligence, p. 626, Cambridge, MA.Google Scholar
  10. Neumann, B., & Novak, H. (1983). Event Models for Recognition and Natural Language Description of Events in Real-World Image Sequences. In Proceedings of the Eighth International Joint Conference on Artificial Intelligence, pp. 724–726, Karlsruhe.Google Scholar
  11. Okada, N. (1979). SUPP: Understanding Moving Picture Patterns Based on Linguistic Knowledge. In Proceedings of the Sixth International Joint Conference on Artificial Intelligence, pp. 690–692, Tokyo.Google Scholar
  12. O'Rourke, J., & Badler, N. I. (1980). Model-Based Image Analysis of Human Motion Using Constraint Propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(6), 522–536.Google Scholar
  13. Pinhanez, C., & Bobick, A. (1995). Scripts in Machine Understanding of Image Sequences. In AAAI Fall Symposium Series on Computational Models for Integrating Language and Vision.Google Scholar
  14. Rashid, R. F. (1980). Towards a System for the Interpretation of Moving Light Displays. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(6), 574–581.Google Scholar
  15. Siskind, J. M. (1992). Naive Physics, Event Perception, Lexical Semantics, and Language Acquisition. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
  16. Siskind, J. M. (1995). Grounding Language in Perception. Artificial Intelligence Review, 8, 371–391.Google Scholar
  17. Thibadeau, R. (1986). Artificial Perception of Actions. Cognitive Science, 10(2), 117–149.Google Scholar
  18. Tsotsos, J. K., Mylopoulos, J., Covvey, H. D., & Zucker, S. W. (1980). A Framework for Visual Motion Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(6), 563–573.Google Scholar
  19. Tsuji, S., Morizono, A., & Kuroda, S. (1977). Understanding a Simple Cartoon Film by a Computer Vision System. In Proceedings of the Fifth International Joint Conference on Artificial Intelligence, pp. 609–610, Cambridge, MA.Google Scholar
  20. Viterbi, A. J. (1967). Error Bounds for Convolutional Codes and an Asymtotically Optimum Decoding Algorithm. IEEE Transactions on Information Theory, 13, 260–267.Google Scholar
  21. Waltz, D. L. (1981). Toward A Detailed Model of Processing for Language Describing the Physical World. In Proceedings of the Seventh International Joint Conference on Artificial Intelligence, pp. 1–6, Vancouver, BC.Google Scholar
  22. Yamamoto, J., Ohya, J., & Ishii, K. (1992). Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model. In Proceedings of the 1992 IEEE Conference on Computer Vision and Pattern Recognition, pp. 379–385. IEEE Press.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Jeffrey Mark Siskind
    • 1
  • Quaid Morris
    • 1
  1. 1.Department of Computer ScienceUniversity of TorontoTorontoCanada

Personalised recommendations