Skip to main content

Multi-level Particle Filter Fusion of Features and Cues for Audio-Visual Person Tracking

  • Conference paper
Multimodal Technologies for Perception of Humans (RT 2007, CLEAR 2007)

Abstract

In this paper, two multimodal systems for the tracking of multiple users in smart environments are presented. The first is a multi-view particle filter tracker using foreground, color and special upper body detection and person region features. The other is a wide angle overhead view person tracker relying on foreground segmentation and model-based blob tracking. Both systems are completed by a joint probabilistic data association filter-based source localizer using the input from several microphone arrays. While the first system fuses audio and visual cues at the feature level, the second one incorporates them at the decision level using state-based heuristics.

The systems are designed to estimate the 3D scene locations of room occupants and are evaluated based on their precision in estimating person locations, their accuracy in recognizing person configurations and their ability to consistently keep track identities over time.

The trackers are extensively tested and compared, for each separate modality and for the combined modalities, on the CLEAR 2007 Evaluation Database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Khalaf, R.Y., Intille, S.S.: Improving Multiple People Tracking using Temporal Consistency. MIT Dept. of Architecture House_n Project Technical Report (2001)

    Google Scholar 

  2. Niu, W., Jiao, L., Han, D., Wang, Y.-F.: Real-Time Multi-Person Tracking in Video Surveillance. In: Pacific Rim Multimedia Conference, Singapore (2003)

    Google Scholar 

  3. Mittal, A., Davis, L.S.: M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene Using Region-Based Stereo. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 18–33. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Checka, N., Wilson, K., Rangarajan, V., Darrell, T.: A Probabilistic Framework for Multi-modal Multi-Person Tracking. In: Workshop on Multi-Object Tracking (CVPR) (2003)

    Google Scholar 

  5. Comaniciu, D., Meer, P.: Mean Shift: A Robust Approach Toward Feature Space Analysis. IEEE PAMI 24(5) (May 2002)

    Google Scholar 

  6. Haritaoglu, I., Harwood, D., Davis, L.S.: W4: Who? When? Where? What? A Real Time System for Detecting and Tracking People. In: Third Face and Gesture Recognition Conference, pp. 222–227 (1998)

    Google Scholar 

  7. Raja, Y., McKenna, S.J., Gong, S.: Tracking and Segmenting People in Varying Lighting Conditions using Colour. In: 3rd. Int. Conference on Face & Gesture Recognition, p. 228 (1998)

    Google Scholar 

  8. Viola, P., Jones, M.: Rapid Object Detection using a Boosted Cascade of Simple Features. In: IEEE CVPR (2001)

    Google Scholar 

  9. Lienhart, R., Maydt, J.: An Extended Set of Haar-like Features for Rapid Object Detection. In: IEEE ICIP 2002, September 2002, vol. 1, pp. 900–903 (2002)

    Google Scholar 

  10. Gehrig, T., McDonough, J.: Tracking of Multiple Speakers with Probabilistic Data Association Filters. In: CLEAR Workshop, Southampton, UK (April 2006)

    Google Scholar 

  11. Bernardin, K., Elbs, A., Stiefelhagen, R.: Multiple Object Tracking Performance Metrics and Evaluation in a Smart Room Environment. In: Sixth IEEE International Workshop on Visual Surveillance, in conjunction with ECCV 2006, Graz, Austria, May 13th (2006)

    Google Scholar 

  12. Bernardin, K., Gehrig, T., Stiefelhagen, R.: Multi- and Single View Multiperson Tracking for Smart Room Environments. In: CLEAR Evaluation Workshop 2006, Southampton, UK, April 2006. LNCS, vol. 4122, pp. 81–92 (2006)

    Google Scholar 

  13. Tao, H., Sawhney, H., Kumar, R.: A Sampling Algorithm for Tracking Multiple Objects. In: International Workshop on Vision Algorithms: Theory and Practice, pp. 53–68 (1999)

    Google Scholar 

  14. Wren, C., Azarbayejani, A., Darrell, T., Pentland, A.: Pfinder: Real-Time Tracking of the Human Body. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7), 780–785 (1997)

    Article  Google Scholar 

  15. Bar-Shalom, Y.: Tracking and data association. Academic Press Professional, Inc., San Diego (1987)

    Google Scholar 

  16. Knapp, C.H., Carter, G.C.: The Generalized Correlation Method for Estimation of Time Delay. IEEE Trans. Acoust. Speech Signal Proc. 24(4), 320–327 (1976)

    Article  Google Scholar 

  17. Omologo, M., Svaizer, P.: Acoustic Event Localization Using a Crosspower-spectrum Phase Based Technique. In: Proc. ICASSP, vol. 2, pp. 273–276 (1994)

    Google Scholar 

  18. Klee, U., Gehrig, T., McDonough, J.: Kalman Filters for Time Delay of Arrival-Based Source Localization. EURASIP Journal on Applied Signal Processing (2006)

    Google Scholar 

  19. Gehrig, T., McDonough, J.: Tracking Multiple Simultaneous Speakers with Probabilistic Data Association Filters. LNCS, vol. 4122, pp. 137–150 (2006)

    Google Scholar 

  20. CHIL - Computers In the Human Interaction Loop, http://chil.server.de

  21. AMI - Augmented Multiparty Interaction, http://www.amiproject.org

  22. VACE - Video Analysis and Content Extraction, http://www.ic-arda.org

  23. OpenCV - Open Computer Vision Library, http://sourceforge.net/projects/opencvlibrary

Download references

Author information

Authors and Affiliations

Authors

Editor information

Rainer Stiefelhagen Rachel Bowers Jonathan Fiscus

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bernardin, K., Gehrig, T., Stiefelhagen, R. (2008). Multi-level Particle Filter Fusion of Features and Cues for Audio-Visual Person Tracking. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68585-2_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68584-5

  • Online ISBN: 978-3-540-68585-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics