Skip to main content
Log in

Predicting and recognizing human interactions in public spaces

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

We present an extensive survey of methods for recognizing human interactions and propose a method for predicting rendezvous areas in observable and unobservable regions using sparse motion information. Rendezvous areas indicate where people are likely to interact with each other or with static objects (e.g., a door, an information desk or a meeting point). The proposed method infers the direction of movement by calculating prediction lines from displacement vectors and temporally accumulates intersecting locations generated by prediction lines. The intersections are then used as candidate rendezvous areas and modeled as spatial probability density functions using Gaussian Mixture Models. We validate the proposed method to predict dynamic and static rendezvous areas on real-world datasets and compare it with related approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Notes

  1. Definition taken from Cambridge Dictionary, Cambridge University Press 2014.

  2. iLIDS, Home Office multiple camera tracking scenario definition (UK), 2008.

  3. http://www.eecs.qmul.ac.uk/~andrea/avss2007_d.html. Last accessed: December 2013.

  4. http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html. Last accessed: December 2013.

  5. http://www.cvg.rdg.ac.uk/PETS2009/a.html. Last accessed: December 2013.

  6. Video results on the full sequence can be found here: ftp://motinas.elec.qmul.ac.uk/pub/ra_results/students003_border.zip.

  7. Video results on the full sequence can be found here: ftp://motinas.elec.qmul.ac.uk/pub/ra_results/pets2009_noborder.zip.

  8. Video results on the full sequence can be found here: ftp://motinas.elec.qmul.ac.uk/pub/ra_results/trainstation_border_a.zip ftp://motinas.elec.qmul.ac.uk/pub/ra_results/trainstation_border_b.zip.

References

  1. Andriyenko, A., Schindler, K., Roth, S.: Discrete-continuous optimization for multi-target tracking. In: Proceedings of Computer Vision and Pattern Recognition, Providence. pp. 1926–1933 (2012)

  2. Bazzani, L., Cristani, M., Murino, V.: Decentralized particle filter for joint individual-group tracking. In: Proceedings of Computer Vision and Pattern Recognition, Providence. pp. 1886–1893 (2012)

  3. Benabbas, Y., Ihaddadene, N., Djeraba, C.: Motion pattern extraction and event detection for automatic visual surveillance. EURASIP 7, 1 (2011)

    Google Scholar 

  4. Bera, A., Galoppo, N., Sharlet, D., Lake, A., Manocha, D.: Adapt: real-time adaptive pedestrian tracking for crowded scenes. In: Proceedings of Conference on Robotics and Automation, Hong Kong. (2014)

  5. Borges, P., Conci, N., Cavallaro, A.: Video-based human behavior understanding: a survey. Trans. Circuits Syst. Video Technol. 23(11), 1993–2008 (2013)

    Article  Google Scholar 

  6. Bouman, C: Cluster: an unsupervised algorithm for modeling gaussian mixtures. http://engineering.purdue.edu/-bouman. (1998)

  7. Bulthoff, H., Little, J., Poggio, T.: A parallel algorithm for real-time computation of optical flow. Nature 337(6207), 549–553 (1989)

    Article  Google Scholar 

  8. Calderara, S., Cucchiara, R.: Understanding dyadic interactions applying proxemic theory on videosurveillance trajectories. In: Proceedings of Computer Vision and Pattern Recognition Workshop, Providence. pp. 20–27 (2012)

  9. Chang, M.C., Krahnstoever, N., Ge, W.: Probabilistic group-level motion analysis and scenario recognition. In: Proceedings of International Conference on Computer Vision, Barcelona. pp. 747–754 (2011)

  10. Chaquet, J., Carmona, E., Fernandez-Caballero, A.: A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. 117(6), 633–659 (2013)

    Article  Google Scholar 

  11. Chen, D.Y., Huang, P.C.: Motion-based unusual event detection in human crowds. J. Vis. Commun. Image R. 22(2), 178–186 (2011)

    Article  Google Scholar 

  12. Chen, F., Cavallaro, A.: Detecting group interactions by online association of trajectory data. In: Proceedings of Acoustics, Speech, and Signal Processing, Vancouver. pp. 1754–1758 (2013)

  13. Cong, Y., Liu, J.Y.J.: Sparse reconstruction cost for abnormal event detection. In: Proceedings of Computer Vision and Pattern Recognition, Colorado Springs. pp. 3449–3456 (2011)

  14. Cristani, M., Bazzani, L., Paggetti, G., Fossati, A., Bue, A.D., Menegaz, G., Murino, V. : Social interaction discovery by statistical analysis of \(F\)-formations. In: Proceedings of British Machine Vision Conference, Dundee. pp. 1–12 (2011a)

  15. Cristani, M., Paggetti, G., Vinciarelli, A., Bazzani, L., Menegaz, G., Murino, V.: Towards computational proxemics: inferring social relations from interpersonal distances. In: Proceedings of Internation Conference on Social Computing, Sydney. pp. 290–297 (2011b)

  16. Dalal, N., Triggs, B.: Histograms of Oriented Gradients for human detection. In: Proceedings of Computer Vision and Pattern Recognition, San Diego. pp. 886–893 (2005)

  17. Farenzena, M., Tavano, A., Bazzani, L., Tosato, D., Paggetti, G., Menegaz, G., Murino, V., Cristani, M.: Social interactions by visual focus of attention in a three-dimensional environment. In: Workshop on Pattern Recognition and Artificial Intelligence for Human Behaviour Analysis, Reggio Emilia. (2009)

  18. Fassold, H., Rosner, J., Schallauer, P., Bailer, W.: Realtime KLT feature point tracking for high definition video. In: Proceedings of Computer Graphics, Computer Vision and Mathematics, Plzen. pp. 40–47 (2009)

  19. Fathi, A., Hodgins, J., Rehg, J.: Social interactions: a first-person perspective. In: Proceedings of Computer Vision and Pattern Recognition, Providence. pp. 1226–1233 (2012)

  20. Garcia-Rodriguez, J., Orts-Escolano, S., Angelopoulou, A., Psarrou, A., Azorin-Lopez, J., Garcia-Chamizo, J.: Real time motion estimation using a neural architecture implemented on GPUs. J. Real-Time Image Process. (2014)

  21. Granger, C.: Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37(3), 424–438 (1969)

    Article  MathSciNet  Google Scholar 

  22. Greggio, N., Bernardino, A., Laschi, C., Dario, P., Santos-Victor, J.: Self-adaptive Gaussian mixture models for real-time video segmentation and background subtraction. In: Proceedings of Intelligent Systems Design and Applications, Cairo. pp. 983–989 (2010)

  23. Greggio, N., Bernardino, A., Laschi, C., Dario, P., Santos-Victor, J.: Fast estimation of Gaussian mixture models for image segmentation. Mach. Vis. Appl. 23(4), 773–789 (2012)

    Article  Google Scholar 

  24. Hall, E.: The Hidden Dimension: Handbook for Proxemic Research. Anchor Books Doubleday, New York (1966)

    Google Scholar 

  25. Helbing, D., Molnar, P.: Social force model for pedestrian dynamics. Phys. Rev. E 51(5), 4282–4286 (1995)

    Article  Google Scholar 

  26. Jin, B., Hu, W., Wang, H.: Human interaction recognition based on transformation of spatial semantics. IEEE Sign. Process. Lett. 19(3), 139–142 (2012)

    Article  MathSciNet  Google Scholar 

  27. Kendon, A.: Studies in the Behavior of Social Interaction. Indiana Univeristy Press, Bloomington (1977)

    Google Scholar 

  28. Kendon, A.: Development of Multimodal Interfaces: Active Listening and Synchrony. Spacing and Orientation in Co-present, Interaction, pp. 1–15. Springer, Berlin (2009)

    Google Scholar 

  29. Kim, K., Grundmann, M., Shamir, A., Matthews, I., Hodgins, J., Essa, I.: Motion field to predict play evolution in dynamic sport scenes. In: Proceedings of Computer Vision and Pattern Recognition, San Francisco. pp. 840–847 (2010)

  30. Kirby, R.: Social Robot Navigation. Ph.D. Thesis (CMU-RI-TR-10-13), Robotics Institute, Carnegie Mellon University, Pittsburgh (2010)

  31. Krausz, B., Bauckhage, C.: Loveparade 2010: automatic video analysis of a crowd disaster. Comput. Vis. Image Underst. 116(3), 307–319 (2012)

    Article  Google Scholar 

  32. Kumar, N., Satoor, S., Buck, I.: Fast parallel expectation maximization for Gaussian mixture models on GPUs using CUDA. In: Proceedings of High Performance Computing and Communications, Seoul. pp. 103–109 (2009)

  33. Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: Proceedings of Computer Vision and Pattern Recognition, Providence. pp. 1354–1361 (2012)

  34. Laptev, I.: On space–time interest points. Intern. J. Comput. Vis. 64(2/3), 107–123 (2005)

    Article  Google Scholar 

  35. Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. Intern. J. Comput. Vis. 77(1), 259–289 (2008)

    Article  Google Scholar 

  36. Lester, P.M.: Visual Communication: Images with Messages. Wadsworth Publishing Co Inc., Belmont (2002)

    Google Scholar 

  37. Li, R., Porfilio, P., Zickler, T.: Finding group interactions in social clutter. In: Proceedings of Computer Vision and Pattern Recognition, Columbus. pp. 2722–2729 (2013)

  38. Liu, H., Hong, T.H., Herman, M., Chellappa, R.: Accuracy vs. efficiency trade-offs in optical flow algorithms. Comput. Vis. Image Underst. 72(3), 271–286 (1996)

    Article  Google Scholar 

  39. Liu, H., Hong, T.H., Herman, M., Chellappa, R.: A general motion model and spatio-temporal filters for computing optical flow. Intern. J. Comput Vis. 22(2), 141–172 (1997)

    Article  Google Scholar 

  40. Liu, J., Carr, P., Collins, R., Liu, Y.: Tracking sports players with context-conditioned motion models. In: Proceedings of Computer Vision and Pattern Recognition, Portland. pp. 1830–1837 (2013)

  41. Lowe, D.: Object recognition from local scale-inveriant feature. In: Proceedings of International Conference on Computer Vision, Corfu. pp. 1150–1157 (1999)

  42. Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of International Joint Conference on Artificial Intelligence, San Francisco. pp. 674–679 (1981)

  43. Mazzon, R., Poiesi, F., Cavallaro, A.: Detection and tracking of groups in crowd. In: Proceedings of Advanced Video and Signal Based Surveillance, Krakow. pp. 202–207 (2013)

  44. McKenna, S., Nait-Charif, H.: Learning spatial context from tracking using penalised likelihoods. In: Proceedings of International Conference on Pattern Recognition, Cambridge. pp. 138–141 (2004)

  45. Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: Proceedings of Computer Vision and Pattern Recognition, Miami. pp. 935–942 (2009)

  46. Mehran, R., Moore, B., Shah, M.: A streakline representation of flow in crowded scenes. In: Proceedings of European Conference in Computer Vision, Crete. pp. 439–452 (2010)

  47. Nayak, N., Zhu, Y., Roy-Chowdhury, A.: Vector field analysis for multi-object behavior modeling. Comput. Vis. Image Underst. 31(6–7), 460–472 (2013)

    Article  Google Scholar 

  48. Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  49. Oliver, N.: Towards perceptual intelligence: statistical modeling of human individual and interactive behaviors. Ph.D. thesis, Massachusetts Institute Technology (MIT), Media Lab, Cambridge, Mass (2000)

  50. Papadakis, P., Spalanzani, A., Laugier, C.: Social mapping of human-populated environments by implicit function learning. In: Proceedings of Intelligent Robots and Systems, Tokyo. pp. 1701–1706 (2013)

  51. Pellegrini, S., Ess, A., Schindler, K., Gool, L.V.: You will never walk alone: modeling social behavior for multi-target tracking. In: Proceedings of Internation Conference on Computer Vision, Kyoto. pp. 261–268 (2009)

  52. Pellegrini, S., Ess, A., Gool, L.V.: Improving data association by joint modeling of pedestrian trajectories and groupings. In: Proceedings of European Conference on Computer Vision, Heraklion, Crete. pp. 452–465 (2010)

  53. Poiesi, F., Danyial, F., Cavallaro, A.: Detector-less ball localization using context and motion flow analysis. In: Proceedings of International Conference on Image Processing, Hong Kong. pp. 3913–3916 (2010)

  54. Raghavendra, R., Bue, A.D., Cristani, M., Murino, V.: Optimizing interaction force for global anomaly detection in crowded scenes. In: Proceedings of Internation Conference on Computer Vision Workshop, Barcelona. pp. 136–143 (2011)

  55. Ryoo, M., Aggarwal, J.L.: Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: Proceedings of International Conference on Computer Vision, Kyoto. pp. 1593–1600 (2009)

  56. Salvadori, C., Petracca, M., del Rincon, J.M., Velastin, S.A., Makris, D.: An optimisation of Gaussian Mixture Models for integer processing units. J. Real-Time Image Process (2014).

  57. Sfikas, G., Constantinopoulos, C., Likas, A., Galatsanos, N.P.: An analytic distance metric for Gaussian mixture models with application in image retrieval. Artif. Neural Netw. 3697, 835–840 (2005)

    Google Scholar 

  58. Sinha, S., Frahm, J.M., Pollefeys, M., Genc, Y.: GPU-based video feature tracking and matching. Technical Report TR 06–012, Department of Computer Science, UNC Chapel Hill, Chapel Hill (2006)

  59. Sochman, J., Hogg, D.: Who knows who inverting the social force model for finding groups. In: Proceedings of International Conference on Computer Vision Workshop, Barcelona. pp. 830–837 (2011)

  60. Soldera, F., Calderara, S., Cucchiara, R.: Structured learning for detection of social groups in crowd. In: Proceedings of Advanced Video and Signal Based Surveillance, Krakow. pp. 7–12 (2013)

  61. Solmaz, B., Moore, B., Shah, M.: Identifying behaviors in crowd scenes using analysis for dynamical systems. IEEE Trans. PAMI 34(10), 2064–2070 (2012)

    Article  Google Scholar 

  62. Su, H., Yang, H., Zheng, S., Fan, Y., Wei, S.: The large-scale crowd behavior perception based on spatio-temporal viscous fluid fields. IEEE Trans. Info. Forens. Sec. 8(10), 1556–1589 (2013)

    Google Scholar 

  63. Suk, H.I., Jain, A., Lee, S.W.: A network of dynamic probabilistic models for human interaction analysis. IEEE Trans. Circuits Syst. Video Technol. 21, 932–945 (2011)

    Article  Google Scholar 

  64. Sun, D., Roth, S., Black, M.J.: Secrets of optical flow estimation and their principles. In: Proceedings of Computer Vision and Pattern Recognition, San Francisco. pp. 2432–2439 (2010)

  65. Taj, M., Cavallaro, A.: Recognizing Interactions in Video. Intelligent Multimedia Analysis for Security Applications, vol. 282/2010. Springer, Berlin (2010)

    Google Scholar 

  66. Taj, M., Cavallaro, A.: Interaction recognition in wide areas using audiovisual sensors. In: Proceedings of Internation Conference on Image Processing, Orlando. pp. 1113–1116 (2012)

  67. Tao, J., Klette, R.: Integrated pedestrian and direction classification using a random decision forest. In: Proceedings of International Conference on Computer Vision Workshop, Sydney. pp. 230–237 (2013)

  68. Wang, X., Ma, X., Grimson, W.: Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian model. IEEE Trans. Patt. Anal. Mach. Intell. 31(3), 539–555 (2009)

    Article  Google Scholar 

  69. Zanotto, M., Cristani, L.B.B., Murino, V.: Online bayesian nonparametrics for group detection. In: Proceedings of British Machine Vision Conference, Surrey. pp. 111.1–111.12 (2012)

  70. Zhao, M., Turner, S., Cai, W.: A data-driven crowd simulation model based on clustering and classification. In: Proceedings of Distributed Simulation and Real Time Applications, Delft. pp. 125–134 (2013)

  71. Zhou, B., Wang, X., Tang, X.: Understanding collective crowd behaviors: learning a mixture model of dynamic pedestrian-agents. In: Proceedings of Computer Vision and Pattern Recognition, Providence. pp. 2871–2878 (2012)

  72. Zhou, B., Tang, X., Wang, X.: Measuring the collectiveness. In: Proceedings of Computer Vision and Pattern Recognition, Columbus. pp. 3049–3056 (2013)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabio Poiesi.

Additional information

This work was supported in part by the Artemis JU and in part by the UK Technology Strategy Board through COPCAMS Project under Grant 332913.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Poiesi, F., Cavallaro, A. Predicting and recognizing human interactions in public spaces. J Real-Time Image Proc 10, 785–803 (2015). https://doi.org/10.1007/s11554-014-0428-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-014-0428-8

Keywords

Navigation