Integrating Vision and Language: Semantic Description of Traffic Events from Image Sequences

  • Takashi Hirano
  • Shogo Yoneyama
  • Yasuhiro Okada
  • Yukio Kosugi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4842)


We propose an event extraction method from traffic image seque-nces. This method extracts moving objects and their trajectories from image sequences recorded by a stationary camera. These trajectories are mapped to 3D virtual space and physical parameters such as velocity and direction are estimated. After that, traffic events are extracted from these trajectories and physical parameters based on case-frame analysis in the field of natural language processing. Our method facilitates to describe events easily and detect general traffic events and abnormal situations. The experimental results of actual intersection traffic image sequence have shown the effectiveness of the method.


Natural Language Processing Semantic Category Semantic Description Knowledge Database Stationary Camera 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kollnig, H., Nagel, H.-H., Otte, M.: Association of motion verbs with vehicle movements extracted from dense optical flow fields. In: Eklundh, J.-O. (ed.) ECCV 1994. LNCS, vol. 801, pp. 338–347. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  2. 2.
    Nagel, H.-H.: A vision of ‘vision and language’ comprises action: An example from road traffic. Artifitial Intelligence Review 8, 189–214 (1994)CrossRefGoogle Scholar
  3. 3.
    Herzog, G., Wazinski, P.: Visual translator: Linking perceptions and natural language descriptions. Artifitial Intelligence Review 8, 175–187 (1994)CrossRefGoogle Scholar
  4. 4.
    Herzog, G., Rohr, K.: Integrating vision and language: Towards automatic description of human movements. In: Proc. 19th Annual German Conf. on Artificial Intelligence, pp. 257–268 (1995)Google Scholar
  5. 5.
    Okada, N.: Integrating vision, motion, and language through mind. Artificial Intelligence Review 9, 209–234 (1996)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Kojima, A., Tahara, N., Tamura, T., Fukunaga, K.: Natural Language Description of Human Behavior from Image Sequences. IEICE J81-D-II(8), 1867–1875 (1998) (in Japanese)Google Scholar
  7. 7.
    Porikli, F., Tuzel, O.: Bayesian Background Modeling for Foreground Detection. In: ACM International Workshop on Video Surveillance and Sensor Networks (VSSN), pp. 55–28 (November 2005)Google Scholar
  8. 8.
    Tuzel, O., Porikli, F., Meer, P.: A Bayesian Approach to Background Modeling. In: IEEE Workshop on Machine Vision for Intelligent Vehicles (MVIV), vol. 3, p. 58 (June 2005)Google Scholar
  9. 9.
    Porikli, F., Tuzel, O.: Multi-Kernel Object Tracking. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1234–1237 (2005)Google Scholar
  10. 10.
    Fillmore, C.J.: The case for case. In: Bach, E., Harms, R. (eds.) Universals in Linguistic Theory. Rinehart and Wiston (1968)Google Scholar
  11. 11.
    Ivanov, Y.A., Bobick, A.F.: Recognition of Visula Activities and Interactions by Stochastic Parsing. IEEE Trans. on Pattern Analysis and Machine Intelligence 22(8), 852–872 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Takashi Hirano
    • 1
  • Shogo Yoneyama
    • 1
  • Yasuhiro Okada
    • 1
  • Yukio Kosugi
    • 2
  1. 1.Mitsubishi Electric Corporation, Information Technology R & D Center 
  2. 2.Tokyo Institute of Technology, Interdisciplinary Graduate School, of Science and Engineering 

Personalised recommendations