Egocentric upper limb segmentation in unconstrained real-life scenarios

Gruosso, Monica; Capece, Nicola; Erra, Ugo

doi:10.1007/s10055-022-00725-4

Egocentric upper limb segmentation in unconstrained real-life scenarios

S.I. : New Trends on Immersive Healthcare
Published: 03 December 2022

Volume 27, pages 3421–3433, (2023)
Cite this article

Virtual Reality Aims and scope Submit manuscript

186 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

The segmentation of bare and clothed upper limbs in unconstrained real-life environments has been less explored. It is a challenging task that we tackled by training a deep neural network based on the DeepLabv3+ architecture. We collected about 46 thousand real-life and carefully labeled RGB egocentric images with a great variety of skin tones, clothes, occlusions, and lighting conditions. We then widely evaluated the proposed approach and compared it with state-of-the-art methods for hand and arm segmentation, e.g., Ego2Hands, EgoArm, and HGRNet. We used our test set and a subset of the EgoGesture dataset (EgoGestureSeg) to assess the model generalization level on challenging scenarios. Moreover, we tested our network on hand-only segmentation since it is a closely related task. We made a quantitative analysis through standard metrics for image segmentation and a qualitative evaluation by visually comparing the obtained predictions. Our approach outperforms all comparing models in both tasks and proving the robustness of the proposed approach to hand-to-hand and hand-to-object occlusions, dynamic user/camera movements, different lighting conditions, skin colors, clothes, and limb/hand poses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of object detection based on deep learning

Article 12 June 2020

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

Article 17 August 2020

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request. Source code, video, and test dataset (example samples) generated or analysed during this study are included in this published article [and its supplementary information files].

Notes

The pre-trained weights used are publicly available on the DeepLab project page: https://github.com/tensorflow/models/tree/master/research/deeplab.
The original subset (Gonzalez-Sosa et al. 2020) contains 277 images. After a careful inspection, we deleted some data whose labels showed small imperfections or errors.
We tested the best Ego2Hands model without custom scene adaptation, which is public available at https://github.com/AlextheEngineer/Ego2Hands.
Since the code and the EgoArm network are not publicly available, network inference was performed by one of the authors, Ester Gonzalez-Sosa, who sent us the obtained predictions.

References

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org https://www.tensorflow.org/
Alletto S, Serra G, Calderara S, Cucchiara R (2015) Understanding social relationships in egocentric vision. Pattern Recognit 48(12):4082–4096
Article Google Scholar
Bambach S, Lee S, Crandall DJ, Yu C (2015) Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: The IEEE international conference on computer vision (ICCV)
Bandini A, Zariffa J (2020) Analysis of the hands in egocentric vision: A survey. IEEE Trans Pattern Anal Mach Intell
Betancourt A, Morerio P, Regazzoni CS, Rauterberg M (2015) The evolution of first person vision methods: a survey. IEEE Trans Circuits Syst Video Technol 25(5):744–760
Article Google Scholar
Betancourt A, Morerio P, Barakova E, Marcenaro L, Rauterberg M, Regazzoni C (2017) Left/right hand segmentation in egocentric videos. Comput Vis Image Underst 154:73–81
Article Google Scholar
Bojja AK, Mueller F, Malireddi SR, Oberweger M, Lepetit V, Theobalt C, Yi KM, Tagliasacchi A (2019) Handseg: an automatically labeled dataset for hand segmentation from depth images. In: 2019 16th conference on computer and robot vision (CRV), pp 151–158. IEEE
Brancati N, Caggianese G, Frucci M, Gallo L, Neroni P (2015) Robust fingertip detection in egocentric vision under varying illumination conditions. In: 2015 IEEE international conference on multimedia and expo workshops (ICMEW), pp 1–6 IEEE
Caggianese G, Gallo L, Neroni P (2015) Design and preliminary evaluation of free-hand travel techniques for wearable immersive virtual reality systems with egocentric sensing. In: International conference on augmented and virtual reality, pp 399–408. Springer
Caggianese G, Capece N, Erra U, Gallo L, Rinaldi M (2020) Freehand-steering locomotion techniques for immersive virtual environments: a comparative evaluation. Int J Hum Comput Interact 36(18):1734–1755
Article Google Scholar
Cai M, Kitani KM, Sato Y (2017) An ego-vision system for hand grasp analysis. IEEE Trans Hum Mach Syst 47(4):524–535
Article Google Scholar
Cai M, Lu F, Sato Y (2020) Generalizing hand segmentation in egocentric videos with uncertainty-guided model adaptation. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 14380–14389. https://doi.org/10.1109/CVPR42600.2020.01440
Capece N, Erra U, Gruosso M, Anastasio M (2020) Archaeo puzzle: an educational game using natural user interface for historical artifacts
Chalasani T, Ondrej J, Smolic A (2018) Egocentric gesture recognition for head-mounted ar devices. In: 2018 IEEE international symposium on mixed and augmented reality adjunct (ISMAR-Adjunct), pp 109–114. https://doi.org/10.1109/ISMAR-Adjunct.2018.00045
Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Dadashzadeh A, Targhi AT, Tahmasbi M, Mirmehdi M (2019) Hgr-net: a fusion network for hand gesture segmentation and recognition. IET Comput Vis 13(8):700–707
Article Google Scholar
Dave IR, Chaudhary V, Upla KP (2019) Simulation of analytical chemistry experiments on augmented reality platform. In: Panigrahi CR, Pujari AK, Misra S, Pati B, Li K-C (eds) Progress in advanced computing and intelligent engineering. Springer, Singapore, pp 393–403
Chapter Google Scholar
Fathi A, Ren X, Rehg JM (2011) Learning to recognize objects in egocentric activities. In: CVPR 2011, pp 3281–3288. IEEE
Ferracani A, Pezzatini D, Bianchini J, Biscini G, Del Bimbo A (2016) Locomotion by natural gestures for immersive virtual environments. AltMM ’16, pp 21–24. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2983298.2983307
Garcia-Hernando G, Yuan S, Baek S, Kim TK (2018) First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Gonzalez-Sosa E, Perez P, Tolosana R, Kachach R, Villegas A (2020) Enhanced self-perception in mixed reality: egocentric arm segmentation and database with automatic labeling. IEEE Access 8:146887–146900
Article Google Scholar
Gruosso M, Capece N, Erra U, Angiolillo FA (2020) preliminary investigation into a deep learning implementation for hand tracking on mobile devices. In: 2020 IEEE international conference on artificial intelligence and virtual reality (AIVR), pp 380–385. IEEE
Gruosso M, Capece N, Erra U (2021a) Human segmentation in surveillance video with deep learning. Multimed Tools Appl 80(1):1175–1199
Article Google Scholar
Gruosso M, Capece N, Erra U (2021b) Exploring upper limb segmentation with deep learning for augmented virtuality. In: Frosini P, Giorgi D, Melzi S, Rodolá E (eds) Smart tools and apps for graphics: Eurographics Italian chapter conference. The Eurographics Association. https://doi.org/10.2312/stag.20211483
Gruosso M, Capece N, Erra U (2021c) Solid and effective upper limb segmentation in egocentric vision. In: The 26th international conference on 3D Web technology. https://doi.org/10.1145/3485444.3495179
Haria A, Subramanian A, Asokkumar N, Poddar S, Nayak JS (2017) Hand gesture recognition for human computer interaction. Procedia Comput Sci 115:367–374
Article Google Scholar
Harkat H, Nascimento J, Bernardino A (2020) Fire segmentation using a deeplabv3+ architecture. In: Image and signal processing for remote sensing XXVI, vol 11533, p 115330. International Society for Optics and Photonics
Herumurti D, Yuniarti A, Kuswardayan I, Nurul W, Hariadi RR, Suciati N, Manggala MG (2017) Mixed reality in the 3d virtual room arrangement. In: 2017 11th international conference on information communication technology and system (ICTS), pp 303–306. https://doi.org/10.1109/ICTS.2017.8265688
Ju Z, Ji X, Li J, Liu H (2017) An integrative framework of human hand gesture segmentation for human–robot interaction. IEEE Syst J 11(3):1326–1336. https://doi.org/10.1109/JSYST.2015.2468231
Article Google Scholar
Kapidis G, Poppe R, Van Dam E, Noldus L, Veltkamp R (2019) Egocentric hand track and object-based human action recognition. In: 2019 IEEE SmartWorld, ubiquitous intelligence & computing, advanced & trusted computing, scalable computing & communications, cloud & big data computing, internet of people and smart city innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) pp. 922–929. IEEE
Kok VJ, Chan CS (2016) Grcs: granular computing-based crowd segmentation. IEEE Trans Cybern 47(5):1157–1168
Article Google Scholar
Kong Y, Liu Y, Yan B, Leung H, Peng X (2021) A novel deeplabv3+ network for sar imagery semantic segmentation based on the potential energy loss function of gibbs distribution. Remote Sensing 13(3):454
Article Google Scholar
Lateef F, Ruichek Y (2019) Survey on semantic segmentation using deep learning techniques. Neurocomputing 338:321–348
Article Google Scholar
Lee K, Kacorri H (2019) Hands holding clues for object recognition in teachable machines. In: Proceedings of the 2019 CHI conference on human factors in computing systems ACM
Lee S, Bambach S, Crandall DJ, Franchak JM, Yu C (2014) This hand is my hand: a probabilistic approach to hand disambiguation in egocentric video. In: 2014 IEEE conference on computer vision and pattern recognition workshops, pp 557–564 . https://doi.org/10.1109/CVPRW.2014.86
Li C, Kitani KM (2013) Pixel-level hand detection in ego-centric videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3570–3577
Li Y, Ye Z, Rehg JM (2015) Delving into egocentric actions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 287–295
Li Y, Jia L, Wang Z, Qian Y, Qiao H (2019) Un-supervised and semi-supervised hand segmentation in egocentric images with noisy label learning. Neurocomputing 334:11–24. https://doi.org/10.1016/j.neucom.2018.12.010
Article Google Scholar
Lin F, Martinez T (2020) Ego2hands: A dataset for egocentric two-hand segmentation and detection. arXiv preprint arXiv:2011.07252
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755. Springer
Lin G, Milan A, Shen C, Reid I (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934
Lin F, Wilhelm C, Martinez T (2021) Two-hand global 3d pose estimation using monocular rgb. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2373–2381
Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better. arXiv preprint arXiv:1506.04579
Maricchiolo F, Bonaiuto M, Gnisci A (2005) Hand gestures in speech: studies of their roles in social interaction. In: Proceedings of the conference of the international society for gesture studies
Matilainen M, Sangi P, Holappa J, Silvén O (2016) Ouhands database for hand detection and pose recognition. In: 2016 Sixth international conference on image processing theory, tools and applications (IPTA), pp 1–5 IEEE
Maurya J, Hebbalaguppe R, Gupta P (2018) Real time hand segmentation on frugal headmounted device for gestural interface. In: 2018 25th IEEE international conference on image processing (ICIP), pp 4023–4027. https://doi.org/10.1109/ICIP.2018.8451213
Minaee S, Boykov YY, Porikli F, Plaza AJ, Kehtarnavaz N, Terzopoulos D (2021) Image segmentation using deep learning: a survey. IEEE Trans Pattern Anal Mach Intell
Mueller F, Bernard F, Sotnychenko O, Mehta D, Sridhar S, Casas D, Theobalt C (2018) Ganerated hands for real-time 3d hand tracking from monocular rgb. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49–59
Narasimhaswamy S, Wei Z, Wang Y, Zhang J, Hoai M (2019) Contextual attention for hand detection in the wild. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9567–9576
Nguyen T-H-C, Nebel J-C, Florez-Revuelta F (2018) Recognition of activities of daily living from egocentric videos using hands detected by a deep convolutional network. In: Campilho A, Karray F, ter Haar Romeny B (eds) Image analysis and recognition. Springer, Cham, pp 390–398
Chapter Google Scholar
Papandreou G, Kokkinos I, Savalle PA (2015) Modeling local and global deformations in deep learning: epitomic convolution, multiple instance learning, and sliding window detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 390–399
Paul S, Bhattacharyya A, Mollah AF, Basu S, Nasipuri M (2020) Hand segmentation from complex background for gesture recognition. In: Mandal JK, Bhattacharya D (eds) Emerging technology in modelling and graphics. Springer, Singapore, pp 775–782
Chapter Google Scholar
Pirsiavash H, Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2847–2854. IEEE
Poularakis S, Katsavounidis I (2015) Low-complexity hand gesture recognition system for continuous streams of digits and letters. IEEE Trans Cybern 46(9):2094–2108
Article Google Scholar
Rautaray SS, Agrawal A (2015) Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev 43(1):1–54
Article Google Scholar
Ren Y, Kong AWK, Jiao L (2020) A survey on image and video cosegmentation: methods, challenges and analyses. Pattern Recognit 103:107297
Article Google Scholar
Rogez G, Khademi M, Supančič JS III, Montiel JMM, Ramanan D (2015) 3d hand pose detection in egocentric rgb-d images. In: Agapito L, Bronstein MM, Rother C (eds) Computer vision: ECCV 2014 workshops. Springer, Cham, pp 356–371
Chapter Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Sharma S, Huang S (2021) An end-to-end framework for unconstrained monocular 3d hand pose estimation. Pattern Recognit 115:107892
Article Google Scholar
Shilkrot R, Narasimhaswamy S, Vazir S, Hoai M (2019) Workinghands: a hand-tool assembly dataset for image segmentation and activity mining. In: BMVC, p 258
Tang Y, Wang Z, Lu J, Feng J, Zhou J (2018) Multi-stream deep neural networks for rgb-d egocentric action recognition. IEEE Trans Circuits Syst Video Technol 29(10):3001–3015
Article Google Scholar
Thalmann D, Liang H, Yuan J (2015) First-person palm pose tracking and gesture recognition in augmented reality. In: International joint conference on computer vision, imaging and computer graphics, pp. 3–15. Springer
Urooj A, Borji A (2018) Analysis of hand segmentation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4710–4719
Valli A (2008) The design of natural interaction. Multimed Tools Appl 38(3):295–305
Article Google Scholar
Wang J, Liu X (2021) Medical image recognition and segmentation of pathological slices of gastric cancer based on deeplab v3+ neural network. Comput Methods Programs Biomed 207:106210
Article Google Scholar
Wang W, Yu K, Hugonot J, Fua P, Salzmann M (2019) Recurrent u-net for resource-constrained segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2142–2151
Wu W, Gan J, Zhou J, Wang J (2021) A lightweight and effective semantic segmentation network for ethnic clothing images based on deeplab. In: 2021 9th international conference on communications and broadband networking, pp 34–40
Yuan S, Ye Q, Garcia-Hernando G, Kim TK (2017) The 2017 hands in the million challenge on 3d hand pose estimation. arXiv preprint arXiv:1707.02237
Yueming W, Hanwu H, Tong R, Detao Z (2007) Hand segmentation for augmented reality system. In: Second workshop on digital media and its application in museum heritages (DMAMH 2007), pp 395–401. https://doi.org/10.1109/DMAMH.2007.39
Zhang Y, Cao C, Cheng J, Lu H (2018) Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans Multimed 20(5):1038–1050
Article Google Scholar
Zimmermann C, Brox T (2017) Learning to estimate 3d hand pose from single rgb images. In: Proceedings of the IEEE international conference on computer vision, pp 4903–4911

Download references

Acknowledgements

The authors would like to thank NVIDIA’s Academic Research Team for providing the Titan Xp cards under the Hardware Donation Program. The authors also thank Ester Gonzalez-Sosa for the availability and support.

Author information

Monica Gruosso, Nicola Capece and Ugo Erra have contributed equally to this work.

Authors and Affiliations

Department of Mathematics, Computer Science and Economics, University of Basilicata, Via dell’Ateneo Lucano 10, Potenza, 85100, Italy
Monica Gruosso, Nicola Capece & Ugo Erra

Authors

Monica Gruosso
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Capece
View author publications
You can also search for this author in PubMed Google Scholar
Ugo Erra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicola Capece.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interests regarding the publication of this papers.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gruosso, M., Capece, N. & Erra, U. Egocentric upper limb segmentation in unconstrained real-life scenarios. Virtual Reality 27, 3421–3433 (2023). https://doi.org/10.1007/s10055-022-00725-4

Download citation

Received: 26 April 2022
Accepted: 18 November 2022
Published: 03 December 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10055-022-00725-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Egocentric upper limb segmentation in unconstrained real-life scenarios

Abstract

Access this article

Similar content being viewed by others

A review of object detection based on deep learning

Convolutional neural network: a review of models, methodologies and applications to object detection

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Egocentric upper limb segmentation in unconstrained real-life scenarios

Abstract

Access this article

Similar content being viewed by others

A review of object detection based on deep learning

Convolutional neural network: a review of models, methodologies and applications to object detection

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation