Skip to main content

Towards High-Level Human Activity Recognition through Computer Vision and Temporal Logic

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6359))

Abstract

Most approaches to the visual perception of humans do not include high-level activity recognitition. This paper presents a system that fuses and interprets the outputs of several computer vision components as well as speech recognition to obtain a high-level understanding of the perceived scene. Our laboratory for investigating new ways of human-machine interaction and teamwork support, is equipped with an assemblage of cameras, some close-talking microphones, and a videowall as main interaction device. Here, we develop state of the art real-time computer vision systems to track and identify users, and estimate their visual focus of attention and gesture activity. We also monitor the users’ speech activity in real time. This paper explains our approach to high-level activity recognition based on these perceptual components and a temporal logic engine.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Waibel, A., Stiefelhagen, R. (eds.): Computers in the Human Interaction Loop. Springer, London (2010)

    Google Scholar 

  2. Nakashima, H., Aghajan, H., Augusto, J.C. (eds.): Handbook of Ambient Intelligence and Smart Environments. Springer, New York (2010)

    Google Scholar 

  3. Ivergard, T., Hunt, B.: Handbook of Control Room Design and Ergonomics: A Perspective for the Future, 2nd edn. CRC Press, London (2008)

    Book  Google Scholar 

  4. Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine Recognition of Human Activities: A Survey. Circ. Syst. Vid. Techn. 18(11), 1473–1488 (2008)

    Article  Google Scholar 

  5. Thiran, J.-P., Marques, F., Bourlard, H. (eds.): Multimodal Signal Processing, Theory and Applications for Human-Computer Interaction. Academic P., Oxford (2010)

    Google Scholar 

  6. Ryoo, M.S., Aggarwal, J.K.: Semantic Representation and Recognition of Continued and Recursive Human Activities. Int. Jour. of Computer Vision 82, 1–24 (2009)

    Article  Google Scholar 

  7. Gu, T., Wu, Z., Tao, X., Pung, H.K., Lu, J.: epSICAR: An Emerging Patterns based Approach to Sequential, Interleaved and Concurrent Activity Recognition. In: 7th Conf. on Pervasive Computing and Communications. IEEE P., New York (2009)

    Google Scholar 

  8. Brdiczka, O., Langet, M., Maisonnasse, J., Crowley, J.L.: Detecting Human Behavior Models from Multimodal Observation in a Smart Home. IEEE T. Automation Science and Engineering 6(4), 588–597 (2009)

    Article  Google Scholar 

  9. Gerber, R., Nagel, H.-H.: Representation of Occurrences for Road Vehicle Traffic. Artificial Intelligence 172, 351–391 (2008)

    Article  Google Scholar 

  10. Gonzalez, J., Rowe, D., Varona, J., Xavier Roca, F.: Understanding dynamic scenes based on human sequence evaluation. Im. Vis. Comput. 27(10), 1433–1444 (2009)

    Article  Google Scholar 

  11. Yao, B.Z., Yang, X., Lin, L., Lee, M.W., Zhu, S.-C.: I2T: Image Parsing to Text Description. Proceedings of the IEEE 99, 1–24 (2010)

    Google Scholar 

  12. Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding Videos, Constructing Plots; Learning a Visually Grounded Storyline Model from Annotated Videos. In: Conf. on Computer Vision and Pattern Recog., pp. 2004–2011. IEEE P., New York (2009)

    Google Scholar 

  13. Bernardin, K., Gehrig, T., Stiefelhagen, R.: Multi-Level Particle Filter Fusion of Features and Cues for Audio-Visual Person Tracking. In: Stiefelhagen, R., Bowers, R., Fiscus, J.G. (eds.) RT 2007 and CLEAR 2007. LNCS, vol. 4625, pp. 70–81. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  14. Ekenel, H.K., Jin, Q., Fischer, M., Stiefelhagen, R.: ISL Person Identification Systems in the CLEAR 2007 Evaluations. In: Stiefelhagen, R., Bowers, R., Fiscus, J.G. (eds.) RT 2007 and CLEAR 2007. LNCS, vol. 4625, pp. 256–265. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  15. Voit, M., Stiefelhagen, R.: Deducing the Visual Focus of Attention from Head Pose Estimation in Dynamic Multi-view Meeting Scenarios. In: 10th International Conference on Multimodal Interfaces, pp. 173–180. ACM Press, New York (2008)

    Google Scholar 

  16. Schick, A., van de Camp, F., Ijsselmuiden, J., Stiefelhagen, R.: Extending Touch: Towards Interaction with Large-Scale Surfaces. In: Interactive Tabletops and Surfaces 2009, pp. 127–134. ACM Press, New York (2009)

    Google Scholar 

  17. Soltau, H., Metze, F., Fugen, C., Waibel, A.: A One-pass Decoder Based on Polymorphic Linguistic Context Assignment. In: 2001 Automatic Speech Recognition and Understanding Workshop, pp. 214–217. IEEE Press, New York (2001)

    Chapter  Google Scholar 

  18. Naik, R.: Blending the Logic Paradigm into C++ (2008), http://mpprogramming.com

  19. Allen, J.F., Ferguson, G.: Actions and Events in Interval Temporal Logic. Journal of Logic and Computation 4(5), 531–579 (1994)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ijsselmuiden, J., Stiefelhagen, R. (2010). Towards High-Level Human Activity Recognition through Computer Vision and Temporal Logic. In: Dillmann, R., Beyerer, J., Hanebeck, U.D., Schultz, T. (eds) KI 2010: Advances in Artificial Intelligence. KI 2010. Lecture Notes in Computer Science(), vol 6359. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16111-7_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16111-7_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16110-0

  • Online ISBN: 978-3-642-16111-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics