Skip to main content
Log in

Working together: a DBN approach for individual and group activity recognition

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Human activity recognition is gaining more and more the attention of researchers due to its applicability in many different fields such as health monitoring, smart environments, etc. Activity recognition solutions typically focus on the classification of single-user behavior. However, in a living or working environment, there are usually multiple inhabitants acting together, hence it makes sense to interpret the activities by considering the aggregated information from different subjects. In this paper, we address the problem of group activity recognition (GAR) in a hierarchical way by first examining individual person’s actions, reconstructed by correlating data coming from body-worn and external positioning sensors. We then aggregate this information by considering each individual as an input of a hierarchical deep belief network (DBN). This aims to extract common temporal/spatial dynamics at the level of group activity. We evaluated the proposed approach in a laboratory environment, where the participants labeled their daily activities using an app on a mobile phone. Collected data contributed to the creation of two datasets respectively containing labeled single and group activities. The experimental results evaluated on these datasets and on a public one demonstrated the effectiveness of the proposed model with respect to a support vector machine (SVM) baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Aipperspach R, Cohen E, Canny JF (2006) Modeling human behavior from simple sensors in the home. In: Fishkin KP, Schiele B, Nixon P, Quigley AJ (eds) Pervasive, Lecture Notes in Computer Science, Springer, Berlin, vol 3968, pp 337–348

  • Alam MAU, Pathak N, Roy N (2015) Mobeacon: an ibeacon-assisted smartphone-based real time activity recognition framework. EAI Endorsed Trans Ubiquitous Environ 2(7):e2

    Google Scholar 

  • Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2012) Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. In: International workshop on ambient assisted living, Springer, Berlin, pp 216–223

  • Arnold L, Rebecchi S, Chevallier S, Paugam-Moisy H (2011) An introduction to deep learning. In: European symposium on artificial neural networks (ESANN), Bruges, Belgium

  • Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153

    Google Scholar 

  • Bergmann JHM, McGregor AH (2011) Body-worn sensor design: what do patients and clinicians want? Ann Biomed Eng 39(9):2299–2312

    Article  Google Scholar 

  • Bruno B, Mastrogiovanni F, Sgorbissa A, Vernazza T, Zaccaria R (2013) Analysis of human behavior recognition algorithms based on acceleration data. In: 2013 IEEE international conference on robotics and automation (ICRA), IEEE, pp 1602–1607

  • Chang MC, Krahnstoever N, Lim SN, Yu T (2010) Group level activity recognition in crowded environments across multiple cameras. In: AVSS, IEEE computer society, pp 56–63

  • Chen H, Murray AF (2003) Continuous restricted boltzmann machine with an implementable training algorithm. IEEE Proc Vis Image Signal Process 150(3):153–158

    Article  Google Scholar 

  • Chen L, Hoey J, Nugent CD, Cook DJ, Yu Z (2012) Sensor-based activity recognition. IEEE Trans Syst Man Cybernet Part C Appl Rev 42(6):790–808

    Article  Google Scholar 

  • Choi S, Kim E, Oh S (2013) Human behavior prediction for smart homes using deep learning. In: IEEE international symposium on robot and human interactive communication (RO-MAN), IEEE, pp 173–179

  • Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314

    Article  MathSciNet  Google Scholar 

  • Deng Z, Zhai M, Chen L, Liu Y, Muralidharan S, Roshtkhari MJ, Mori G (2015) Deep structured models for group activity recognition. arXiv preprint arXiv:150604191

  • Ercolano G, Riccio D, Rossi S (2017) Two deep approaches for adl recognition: A multi-scale lstm and a cnn-lstm with a 3d matrix skeleton representation. In: 26th IEEE international symposium on robot and human interactive communication (RO-MAN), pp 877–882

  • Ge W, Collins RT, Ruback B (2009) Automatically detecting the small group structure of a crowd. In: WACV, IEEE Computer Society, pp 1–8

  • Gordon D (2014) Group activity recognition using wearable sensing devices. PhD thesis, Karlsruhe Institute of Technology

  • Gordon D, Hanne JH, Berchtold M, Shirehjini AAN, Beigl M (2013) Towards collaborative group activity recognition using mobile devices. Mobile Netw Appl 18(3):326–340

    Article  Google Scholar 

  • Gu T, Wu Z, Wang L, Tao X, Lu J (2009) Mining emerging patterns for recognizing activities of multiple users in pervasive computing. In: 6th annual international mobile and ubiquitous systems: networking and services, MobiQuitous, 2009. MobiQuitous’ 09, IEEE, pp 1–10

  • Hassan MM, Uddin MZ, Mohamed A, Almogren A (2018) A robust human activity recognition system using smartphone sensors and deep learning. Future Gener Comput Syst 81:307–313

    Article  Google Scholar 

  • Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14:1771–1800

    Article  Google Scholar 

  • Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  Google Scholar 

  • Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  Google Scholar 

  • Hu G, Cui B, He Y, Yu S (2019) Progressive relation learning for group activity recognition. arXiv:1908.02948

  • Ibrahim MS, Mori G (2018) Hierarchical relational networks for group activity recognition and retrieval. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) ECCV (3), Springer, Lecture Notes in Computer Science, vol 11207, pp 742–758

  • Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) A hierarchical deep temporal model for group activity recognition. In: CVPR, IEEE Computer Society, pp 1971–1980

  • Iengo S, Rossi S, Staffa M, Finzi A (2014) Continuous gesture recognition for flexible human-robot interaction. In: IEEE international conference on robotics and automation, ICRA Hong Kong, China, May 31–June 7, IEEE, pp 4863–4868

  • Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: IEEE conference on computer vision and pattern recognition, IEEE Computer Society, pp 1725–1732

  • Khan SM, Shah M (2005) Detecting group activities using rigidity of formation. In: Chua TS, Steinmetz R, Kankanhalli MS, Wilcox L, Zhang H (eds) ACM multimedia, ACM, pp 403–406

  • Kong L, Qin J, Huang D, Wang Y, Van Gool L (2018) Hierarchical attention and context modeling for group activity recognition. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1328–1332

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  • Lan T, Sigal L, Mori G (2012a) Social roles in hierarchical models for human activity recognition. In: IEEE conference on computer vision and pattern recognition, IEEE Computer Society, pp 1354–1361

  • Lan T, Wang Y, Yang W, Robinovitch SN, Mori G (2012b) Discriminative latent models for recognizing contextual group activities. IEEE Trans Pattern Anal Mach Intell 34(8):1549–1562

    Article  Google Scholar 

  • Larochelle H, Bengio Y (2008) Classification using discriminative restricted boltzmann machines. In: Cohen WW, McCallum A, Roweis ST (eds) ICML, ACM, ACM international conference proceeding series, vol 307, pp 536–543

  • Li G, Zhu C, Du J, Cheng Q, Sheng W, Chen H (2012) Robot semantic mapping through wearable sensor-based human activity recognition. In: 2012 IEEE International Conference on robotics and automation (ICRA), pp 5228–5233

  • Magnanimo V, Saveriano M, Rossi S, Lee D (2014) A bayesian approach for task recognition and future human activity prediction. In: The 23rd IEEE international symposium on robot and human interactive communication (RO-MAN), pp 726–731

  • Mohamed AR, Dahl GE, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22

    Article  Google Scholar 

  • Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 689–696

  • Ramanathan V, Yao B, Li FF (2013) Social role discovery in human events. In: CVPR, IEEE Computer Society, pp 2475–2482

  • Ramanathan V, Huang J, Abu-El-Haija S, Gorban AN, Murphy K, Fei-Fei L (2016) Detecting events and key actors in multi-person videos. In: CVPR, IEEE Computer Society, pp 3043–3053

  • Rashidi P, Cook DJ (2009) Keeping the resident in the loop: adapting the smart home to the user. IEEE Trans Syst Man Cybernet Part A 39(5):949–959

    Article  Google Scholar 

  • Reyes-Ortiz JL, Anguita D, Ghio A, Parra X (2013) Human activity recognition using smartphones data set. UCI Machine Learning Repository

  • Rossi S, Ferland F, Tapus A (2017) User profiling and behavioral adaptation for hri: a survey. Pattern Recognit Lett 99(Supplemt c):3–12

    Article  Google Scholar 

  • Roux NL, Bengio Y (2010) Deep belief networks are compact universal approximators. Neural Comput 22(8):2192–2207

    Article  MathSciNet  Google Scholar 

  • Roy N, Misra A, Cook D (2016) Ambient and smartphone sensor assisted adl recognition in multi-inhabitant smart environments. J Ambient Intell Humaniz Comput 7(1):1–19

    Article  Google Scholar 

  • Ryoo MS, Aggarwal JK (2010) Stochastic representation and recognition of high-level group activities. Int J Comput Vis 93:183–200

    Article  MathSciNet  Google Scholar 

  • Salakhutdinov R (2015) Learning deep generative models. Ann Rev Stat Appl 2:361–385

    Article  Google Scholar 

  • Salakhutdinov R, Tenenbaum JB, Torralba A (2013) Learning with hierarchical-deep models. IEEE Trans Pattern Anal Mach Intell 35(8):1958–1971

    Article  Google Scholar 

  • Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th international conference on neural information processing systems–volume 1, MIT Press, Cambridge, MA, USA, NIPS’14, pp 568–576

  • Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep boltzmann machines. In: Advances in neural information processing systems, 25, Curran Associates, Inc., pp 2222–2230

  • Staffa M, Gregorio MD, Giordano M, Rossi S (2014) Can you follow that guy? In: 22th European symposium on artificial neural networks, ESANN 2014, Bruges, Belgium, April 23–25, 2014, pp 511–516

  • Wang H, Kläser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: CVPR, IEEE Computer Society, pp 3169–3176

  • Want R, Hopper A, Falcao V, Gibbons J (1992) The active badge location system. ACM Trans Inf Syst (TOIS) 10(1):91–102

    Article  Google Scholar 

  • Wermter S, Weber C, Elshaw M, Panchev C, Erwin H, Pulvermüller F (2004) Towards multimodal neural robot learning. Robot Auton Syst 47(2):171–175

    Article  Google Scholar 

  • Wongun, Shahid K, Savarese S (2009) What are they doing? : Collective activity classification using spatio-temporal relationship among people. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV Workshops, pp 1282–1289

  • Yeung S, Russakovsky O, Jin N, Andriluka M, Mori G, Li FF (2018) Every moment counts: Dense detailed labeling of actions in complex videos. Int J Comput Vis 126(2):375–389

    Article  MathSciNet  Google Scholar 

  • Zeng M, Nguyen LT, Yu B, Mengshoel OJ, Zhu J, Wu P, Zhang J (2014) Convolutional neural networks for human activity recognition using mobile sensors. In: Lane ND, Mishra S, Julien C (eds) MobiCASE, IEEE, pp 197–205

  • Zhang D, Gatica-Perez D, Bengio S, McCowan I (2006) Modeling individual and group actions in meetings with layered hmms. IEEE Trans Multimed 8(3):509–520

    Article  Google Scholar 

Download references

Acknowledgements

The Authors would like to thank Roberto Capasso for his help in creating the code used for this work during his Master Thesis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mariacarla Staffa.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rossi, S., Acampora, G. & Staffa, M. Working together: a DBN approach for individual and group activity recognition. J Ambient Intell Human Comput 11, 6007–6019 (2020). https://doi.org/10.1007/s12652-020-01851-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-01851-0

Keywords

Navigation