Skip to main content
Log in

Detecting small group activities from multimodal observations

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This article addresses the problem of detecting configurations and activities of small groups of people in an augmented environment. The proposed approach takes a continuous stream of observations coming from different sensors in the environment as input. The goal is to separate distinct distributions of these observations corresponding to distinct group configurations and activities. This article describes an unsupervised method based on the calculation of the Jeffrey divergence between histograms over observations. These histograms are generated from adjacent windows of variable size slid from the beginning to the end of a meeting recording. The peaks of the resulting Jeffrey divergence curves are detected using successive robust mean estimation. After a merging and filtering process, the retained peaks are used to select the best model, i.e. the best allocation of observation distributions for a meeting recording. These distinct distributions can be interpreted as distinct segments of group configuration and activity. To evaluate this approach, 5 small group meetings, one seminar and one cocktail party meeting have been recorded. The observations of the small groups meetings and the seminar were generated by a speech activity detector, while the observations of the cocktail party meeting were generated by both the speech activity detector and a visual tracking system. The authors measured the correspondence between detected segments and labeled group configurations and activities. The obtained results are promising, in particular as the method is completely unsupervised.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aoki PM, Romaine M, Szymanski MH, Thornton JD, Wilson D, Woodruff A (2003) The mad hatter’s cocktail party: a social mobile audio space supporting multiple simultaneous conversations. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 425–432

  2. Basu S (2002) Conversational scene analysis. PhD thesis, MIT Department of EECS, Cambridge, MA

  3. Bilmes JA (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report ICSI-TR-97-021, University of Berkeley

  4. Bobick A, Intille S, Davis J, Baird F, Pinhanez C, Campbell L, Ivanov Y, Schutte A, Wilson A (1999) The KidsRoom: a perceptually-based interactive and immersive story environment. In: Presence (USA), vol 8, pp 369–393

  5. Brdiczka O, Maisonnasse J, Reignier P (2005) Automatic detection of interaction groups. In: Proceedings of the international conference multimodal interfaces, pp 32–36, October 2005

  6. Brdiczka O, Vaufreydaz D, Maisonnasse J, Reignier P (2006) Unsupervised segmentation of meeting configurations and activities using speech activity detection. In: Maglogiannis I, Karpouzis K, Bramer M (eds) IFIP international federation of information processing. Artificial intelligence applications and innovations, vol 204. Springer, Boston, pp 195–203

    Google Scholar 

  7. Brumitt B, Meyers B, Krumm J, Kern A, Shafer SA (2000) EasyLiving: technologies for intelligent environments. In: Proceedings of the international conference on handheld and ubiquitous computing, pp 12–29

  8. Burger S, MacLaren V, Yu H (2002) The ISL meeting corpus: the impact of meeting type on speech style. In: Proceedings of the international conference on spoken language processing, pp 301–304

  9. Caporossi A, Hall D, Reignier P, Crowley JL (2004) Robust visual tracking from dynamic control of processing. In: Proceedings of the international workshop on performance evaluation for tracking and surveillance, pp 23–32

  10. Choudhury T, Pentland A (2004) Characterizing social interactions using the sociometer. In: Proceedings NAACOS 2004, June 2004

  11. Le Gal Ch, Martin J, Lux A, Crowley JL (2001) Smartoffice: design of an intelligent environment. IEEE Intell Syst 16(4): 60–66

    Article  Google Scholar 

  12. McCowan I, Gatica-Perez D, Bengio S, Lathoud G, Barnard M, Zhang D (2005) Automatic analysis of multimodal group actions in meetings. IEEE Trans Pattern Anal Mach Intell 27(3): 305–317

    Article  Google Scholar 

  13. Muehlenbrock M, Brdiczka O, Snowdon D, Meunier J-L (2004) Learning to detect user activity and availability from a variety of sensor data. In: Proceedings of the IEEE international conference on pervasive computing and communications, March 2004, pp 13–22

  14. Oliver N, Rosario B, Pentland A (2000) A Bayesian computer vision system for modeling human interactions. IEEE Trans Pattern Anal Mach Intell 22(8): 831–843

    Article  Google Scholar 

  15. Puzicha J, Hofmann Th, Buhmann J (1997) Non-parametric similarity measures for unsupervised texture segmentation and image retrieval. In: Proceedings of the international conference on computer vision and pattern recognition, pp 267–272

  16. Qian RJ, Sezan MI, Mathews KE (1998) Face tracking using robust statistical estimation. In: Proceedings workshop on perceptual user interfaces, San Francisco

  17. Suchman L (1987) Plans and situated actions: the problem of human–machine communication. Cambridge University Press, Cambridge

    Google Scholar 

  18. Stiefelhagen R, Steusloff H, Waibel A (2004) CHIL—computers in the human interaction loop. In: Proceedings of the international workshop on image analysis for multimedia interactive services

  19. Vaufreydaz D (2001) IST-2000-28323 FAME: facilitating agent for multi-cultural exchange (WP4). European Commission project IST-2000-28323, October 2001

  20. Zaidenberg S, Brdiczka O, Reignier P, Crowley JL (2006) Learning context models for the recognition of scenarios. In: Maglogiannis I, Karpouzis K, Bramer M (eds) IFIP international federation of information processing. Artificial intelligence applications and innovations, vol 204. Springer, Boston, pp 86–97

    Google Scholar 

  21. Zhang D, Gatica-Perez D, Bengio S, McCowan I, Lathoud G (2004) Multimodal group action clustering in meetings. In: Proceedings of the international workshop on video surveillance & sensor networks

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oliver Brdiczka.

Additional information

A short version of this article [6] obtained the Best Paper Award of the 3rd IFIP Conference on Artificial Intelligence Applications and Innovations (AIAI) 2006.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brdiczka, O., Maisonnasse, J., Reignier, P. et al. Detecting small group activities from multimodal observations. Appl Intell 30, 47–57 (2009). https://doi.org/10.1007/s10489-007-0074-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-007-0074-y

Keywords

Navigation