First-person activity recognition from micro-action representations using convolutional neural networks and object flow histograms

Giannakeris, Panagiotis; Petrantonakis, Panagiotis C.; Avgerinakis, Konstantinos; Vrochidis, Stefanos; Kompatsiaris, Ioannis

doi:10.1007/s11042-020-09902-6

First-person activity recognition from micro-action representations using convolutional neural networks and object flow histograms

Published: 29 September 2020

Volume 80, pages 22487–22507, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Panagiotis Giannakeris ORCID: orcid.org/0000-0002-3774-3161¹,
Panagiotis C. Petrantonakis¹,
Konstantinos Avgerinakis¹,
Stefanos Vrochidis¹ &
…
Ioannis Kompatsiaris¹

355 Accesses
1 Altmetric
Explore all metrics

Abstract

A novel first-person human activity recognition framework is proposed in this work. Our proposed methodology is inspired by the central role moving objects have in egocentric activity videos. Using a Deep Convolutional Neural Network we detect objects and develop discriminant object flow histograms in order to represent fine-grained micro-actions during short temporal windows. Our framework is based on the assumption that large scale activities are synthesized by fine-grained micro-actions. We gather all the micro-actions and perform Gaussian Mixture Model clusterization, so as to build a micro-action vocabulary that is later used in a Fisher encoding schema. Results show that our method can reach 60% recognition rate on the benchmark ADL dataset. The capabilities of the proposed framework are also showcased by profoundly evaluating for a great deal of hyper-parameters and comparing to other State-of-the-Art works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Micro-network-based deep convolutional neural network for human activity recognition from realistic and multi-view visual data

Article 13 March 2023

Action representation and recognition through temporal co-occurrence of flow fields and convolutional neural networks

Article 01 July 2020

Vision-Based Human Activity Recognition

References

Avgerinakis K, Briassouli A, Kompatsiaris Y (2016) Activity detection using sequential statistical boundary detection (ssbd). Comput Vis Image Underst 144:46–61
Article Google Scholar
Bambach S, Crandall DJ, Yu C (2015) Viewpoint integration for hand-based recognition of social interactions from a first-person view. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. ACM, pp 351–354
Bambach S, Lee S, Crandall DJ, Yu C (2015) Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions. In: 2015 IEEE international conference on Computer vision (ICCV). IEEE, pp 1949–1957
Chéron G, Laptev I, Schmid C (2015) P-cnn: Pose-based cnn features for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 3218–3226
Crispim-Junior CF, Buso V, Avgerinakis K, Meditskos G, Briassouli A, Benois-Pineau J, Kompatsiaris I, Bremond F (2016) Semantic event fusion of different visual modality concepts for activity recognition. IEEE Trans Pattern Anal Mach Intell 38(8):1598–1611
Article Google Scholar
Crispim-Junior CF, Gómez uría A, Strumia C, Koperski M, König A, Negin F, Cosar S, Nghiem AT, Chau DP, Charpiat G et al (2017) Online recognition of daily activities by color-depth sensing and knowledge models. Sensors 17(7):1528
Article Google Scholar
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European conference on computer vision. Springer, pp 428–441
Damen D, Doughty H, Maria Farinella G, Fidler S, Furnari A, Kazakos E, Moltisanti D, Munro J, Perrett T, Price W et al (2018) Scaling egocentric vision: The epic-kitchens dataset. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 720–736
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-fei L (2009) Imagenet: A large-scale hierarchical image database
Du W, Wang Y, Qiao Y (2017) Rpan: an end-to-end recurrent pose-attention network for action recognition in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3725–3734
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88 (2):303–338
Article Google Scholar
Fathi A, Farhadi A, Rehg JM (2011) Understanding egocentric activities. In: 2011 IEEE international conference on Computer vision (ICCV). IEEE, pp 407–414
Feichtenhofer C, Pinz A, Wildes R (2016) Spatiotemporal residual networks for video action recognition. In: Advances in neural information processing systems, pp 3468–3476
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941
Frankl P, Maehara H (1988) The johnson-lindenstrauss lemma and the sphericity of some graphs. J Comb Theory Ser B 44(3):355–362
Article MathSciNet Google Scholar
Gaidon A, Harchaoui Z, Schmid C (2011) Actom sequence models for efficient action detection. In: CVPR 2011. IEEE, pp 3201–3208
Giannakeris P, Avgerinakis K, Vrochidis S, Kompatsiaris I (2018) Activity recognition from wearable cameras. In: 2018 International conference on content-based multimedia indexing (CBMI). IEEE, pp 1–6
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
González-Díaz I, Benois-Pineau J, Domenger JP, Cattaert D, de Rugy A (2019) Perceptually-guided deep neural networks for ego-action prediction: Object grasping. Pattern Recogn 88:223–235
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Article Google Scholar
Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR
Jain M, Jegou H, Bouthemy P (2013) Better exploiting motion for better action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2555–2562
Jhuang H, Gall J, Zuffi S, Schmid C, Black MJ (2013) Towards understanding action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 3192–3199
Jin P, Rathod V, Zhu X (2018) Pooling pyramid network for object detection. arXiv:1807.03284
Johnson WB, Lindenstrauss J (1984) Extensions of lipschitz mappings into a hilbert space. Contemp Math 26(189-206):1
MathSciNet MATH Google Scholar
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: Reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5936–5944
Kroeger T, Timofte R, Dai D, Van Gool L (2016) Fast optical flow using dense inverse search. In: European conference on computer vision. Springer, pp 471–488
Kuznetsova A, Rom H, Alldrin N, Uijlings J, Krasin I, Pont-Tuset J, Kamali S, Popov S, Malloci M, Duerig T et al (2018) The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982
Li Y, Liu M, Rehg JM (2018) In the eye of beholder: Joint learning of gaze and actions in first person video. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 619–635
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
McCandless T, Grauman K (2013) Object-centric spatio-temporal pyramids for egocentric activity recognition. In: BMVC, vol 2, pp 3
Meditskos G, Plans PM, Stavropoulos TG, Benois-Pineau J, Buso V, Kompatsiaris I (2018) Multi-modal activity recognition from egocentric vision, semantic enrichment and lifelogging applications for the care of dementia. J Vis Commun Image Represent 51:169–190
Article Google Scholar
Ohnishi K, Kanehira A, Kanezaki A, Harada T (2016) Recognizing activities of daily living with a wrist-mounted camera. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3103–3111
Papadimitriou CH, Raghavan P, Tamaki H, Vempala S (2000) Latent semantic indexing: a probabilistic analysis. J Comput Syst Sci 61(2):217–235
Article MathSciNet Google Scholar
Pirsiavash H, Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: 2012 IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 2847–2854
Pirsiavash H, Ramanan D (2014) Parsing videos of actions with segmental grammars. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 612–619
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Sharma S, Kiros R, Salakhutdinov R (2015) Action recognition using visual attention. arXiv:1511.04119
Singh B, Najibi M, Davis LS (2018) Sniper: Efficient multi-scale training. In: Advances in neural information processing systems, pp 9333–9343
Sun S, Kuang Z, Sheng L, Ouyang W, Zhang W (2018) Optical flow guided feature: a fast and robust motion representation for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1390–1399
Tran A, Cheong LF (2017) Two-stream flow-guided convolutional attention networks for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3110–3119
Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Vaca-Castano G, Das S, Sousa JP, Lobo ND, Shah M (2017) Improved scene identification and object detection on egocentric vision of daily activities. Comput Vis Image Underst 156:92–103
Article Google Scholar
Wang H, Kläser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: 2011 IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 3169–3176
Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103 (1):60–79
Article MathSciNet Google Scholar
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558
Wang L, Xiong Y, Wang Z, Qiao Y (2015) Towards good practices for very deep two-stream convnets. arXiv:1507.02159
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2018) Temporal segment networks for action recognition in videos IEEE transactions on pattern analysis and machine intelligence
Wang X, Gao L, Song J, Zhen X, Sebe N, Shen HT (2018) Deep appearance and motion learning for egocentric activity recognition. Neurocomputing 275:438–447
Article Google Scholar
Yan Y, Ricci E, Liu G, Sebe N (2015) Egocentric daily activity recognition via multitask clustering. IEEE Trans Image Process 24(10):2984–2995
Article MathSciNet Google Scholar
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4203–4212
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2018) M2det: A single-shot object detector based on multi-level feature pyramid network. arXiv:1811.04533
Zhou Y, Ni B, Hong R, Yang X, Tian Q (2016) Cascaded interactional targeting network for egocentric video analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1904–1913
Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) Couplenet: Coupling global structure with local parts for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4126–4134

Download references

Acknowledgements

This work is supported by the project SUITCEYES that has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 780814.

Author information

Authors and Affiliations

ITI-CERTH, Thermi, Greece
Panagiotis Giannakeris, Panagiotis C. Petrantonakis, Konstantinos Avgerinakis, Stefanos Vrochidis & Ioannis Kompatsiaris

Authors

Panagiotis Giannakeris
View author publications
You can also search for this author in PubMed Google Scholar
Panagiotis C. Petrantonakis
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos Avgerinakis
View author publications
You can also search for this author in PubMed Google Scholar
Stefanos Vrochidis
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Kompatsiaris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Panagiotis Giannakeris.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Giannakeris, P., Petrantonakis, P.C., Avgerinakis, K. et al. First-person activity recognition from micro-action representations using convolutional neural networks and object flow histograms. Multimed Tools Appl 80, 22487–22507 (2021). https://doi.org/10.1007/s11042-020-09902-6

Download citation

Received: 20 March 2019
Revised: 16 August 2020
Accepted: 16 September 2020
Published: 29 September 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11042-020-09902-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

First-person activity recognition from micro-action representations using convolutional neural networks and object flow histograms

Abstract

Access this article

Similar content being viewed by others

Micro-network-based deep convolutional neural network for human activity recognition from realistic and multi-view visual data

Action representation and recognition through temporal co-occurrence of flow fields and convolutional neural networks

Vision-Based Human Activity Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

First-person activity recognition from micro-action representations using convolutional neural networks and object flow histograms

Abstract

Access this article

Similar content being viewed by others

Micro-network-based deep convolutional neural network for human activity recognition from realistic and multi-view visual data

Action representation and recognition through temporal co-occurrence of flow fields and convolutional neural networks

Vision-Based Human Activity Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation