Abstract
The segmentation of bare and clothed upper limbs in unconstrained real-life environments has been less explored. It is a challenging task that we tackled by training a deep neural network based on the DeepLabv3+ architecture. We collected about 46 thousand real-life and carefully labeled RGB egocentric images with a great variety of skin tones, clothes, occlusions, and lighting conditions. We then widely evaluated the proposed approach and compared it with state-of-the-art methods for hand and arm segmentation, e.g., Ego2Hands, EgoArm, and HGRNet. We used our test set and a subset of the EgoGesture dataset (EgoGestureSeg) to assess the model generalization level on challenging scenarios. Moreover, we tested our network on hand-only segmentation since it is a closely related task. We made a quantitative analysis through standard metrics for image segmentation and a qualitative evaluation by visually comparing the obtained predictions. Our approach outperforms all comparing models in both tasks and proving the robustness of the proposed approach to hand-to-hand and hand-to-object occlusions, dynamic user/camera movements, different lighting conditions, skin colors, clothes, and limb/hand poses.
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request. Source code, video, and test dataset (example samples) generated or analysed during this study are included in this published article [and its supplementary information files].
Notes
The pre-trained weights used are publicly available on the DeepLab project page: https://github.com/tensorflow/models/tree/master/research/deeplab.
The original subset (Gonzalez-Sosa et al. 2020) contains 277 images. After a careful inspection, we deleted some data whose labels showed small imperfections or errors.
We tested the best Ego2Hands model without custom scene adaptation, which is public available at https://github.com/AlextheEngineer/Ego2Hands.
Since the code and the EgoArm network are not publicly available, network inference was performed by one of the authors, Ester Gonzalez-Sosa, who sent us the obtained predictions.
References
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org https://www.tensorflow.org/
Alletto S, Serra G, Calderara S, Cucchiara R (2015) Understanding social relationships in egocentric vision. Pattern Recognit 48(12):4082–4096
Bambach S, Lee S, Crandall DJ, Yu C (2015) Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: The IEEE international conference on computer vision (ICCV)
Bandini A, Zariffa J (2020) Analysis of the hands in egocentric vision: A survey. IEEE Trans Pattern Anal Mach Intell
Betancourt A, Morerio P, Regazzoni CS, Rauterberg M (2015) The evolution of first person vision methods: a survey. IEEE Trans Circuits Syst Video Technol 25(5):744–760
Betancourt A, Morerio P, Barakova E, Marcenaro L, Rauterberg M, Regazzoni C (2017) Left/right hand segmentation in egocentric videos. Comput Vis Image Underst 154:73–81
Bojja AK, Mueller F, Malireddi SR, Oberweger M, Lepetit V, Theobalt C, Yi KM, Tagliasacchi A (2019) Handseg: an automatically labeled dataset for hand segmentation from depth images. In: 2019 16th conference on computer and robot vision (CRV), pp 151–158. IEEE
Brancati N, Caggianese G, Frucci M, Gallo L, Neroni P (2015) Robust fingertip detection in egocentric vision under varying illumination conditions. In: 2015 IEEE international conference on multimedia and expo workshops (ICMEW), pp 1–6 IEEE
Caggianese G, Gallo L, Neroni P (2015) Design and preliminary evaluation of free-hand travel techniques for wearable immersive virtual reality systems with egocentric sensing. In: International conference on augmented and virtual reality, pp 399–408. Springer
Caggianese G, Capece N, Erra U, Gallo L, Rinaldi M (2020) Freehand-steering locomotion techniques for immersive virtual environments: a comparative evaluation. Int J Hum Comput Interact 36(18):1734–1755
Cai M, Kitani KM, Sato Y (2017) An ego-vision system for hand grasp analysis. IEEE Trans Hum Mach Syst 47(4):524–535
Cai M, Lu F, Sato Y (2020) Generalizing hand segmentation in egocentric videos with uncertainty-guided model adaptation. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 14380–14389. https://doi.org/10.1109/CVPR42600.2020.01440
Capece N, Erra U, Gruosso M, Anastasio M (2020) Archaeo puzzle: an educational game using natural user interface for historical artifacts
Chalasani T, Ondrej J, Smolic A (2018) Egocentric gesture recognition for head-mounted ar devices. In: 2018 IEEE international symposium on mixed and augmented reality adjunct (ISMAR-Adjunct), pp 109–114. https://doi.org/10.1109/ISMAR-Adjunct.2018.00045
Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Dadashzadeh A, Targhi AT, Tahmasbi M, Mirmehdi M (2019) Hgr-net: a fusion network for hand gesture segmentation and recognition. IET Comput Vis 13(8):700–707
Dave IR, Chaudhary V, Upla KP (2019) Simulation of analytical chemistry experiments on augmented reality platform. In: Panigrahi CR, Pujari AK, Misra S, Pati B, Li K-C (eds) Progress in advanced computing and intelligent engineering. Springer, Singapore, pp 393–403
Fathi A, Ren X, Rehg JM (2011) Learning to recognize objects in egocentric activities. In: CVPR 2011, pp 3281–3288. IEEE
Ferracani A, Pezzatini D, Bianchini J, Biscini G, Del Bimbo A (2016) Locomotion by natural gestures for immersive virtual environments. AltMM ’16, pp 21–24. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2983298.2983307
Garcia-Hernando G, Yuan S, Baek S, Kim TK (2018) First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Gonzalez-Sosa E, Perez P, Tolosana R, Kachach R, Villegas A (2020) Enhanced self-perception in mixed reality: egocentric arm segmentation and database with automatic labeling. IEEE Access 8:146887–146900
Gruosso M, Capece N, Erra U, Angiolillo FA (2020) preliminary investigation into a deep learning implementation for hand tracking on mobile devices. In: 2020 IEEE international conference on artificial intelligence and virtual reality (AIVR), pp 380–385. IEEE
Gruosso M, Capece N, Erra U (2021a) Human segmentation in surveillance video with deep learning. Multimed Tools Appl 80(1):1175–1199
Gruosso M, Capece N, Erra U (2021b) Exploring upper limb segmentation with deep learning for augmented virtuality. In: Frosini P, Giorgi D, Melzi S, Rodolá E (eds) Smart tools and apps for graphics: Eurographics Italian chapter conference. The Eurographics Association. https://doi.org/10.2312/stag.20211483
Gruosso M, Capece N, Erra U (2021c) Solid and effective upper limb segmentation in egocentric vision. In: The 26th international conference on 3D Web technology. https://doi.org/10.1145/3485444.3495179
Haria A, Subramanian A, Asokkumar N, Poddar S, Nayak JS (2017) Hand gesture recognition for human computer interaction. Procedia Comput Sci 115:367–374
Harkat H, Nascimento J, Bernardino A (2020) Fire segmentation using a deeplabv3+ architecture. In: Image and signal processing for remote sensing XXVI, vol 11533, p 115330. International Society for Optics and Photonics
Herumurti D, Yuniarti A, Kuswardayan I, Nurul W, Hariadi RR, Suciati N, Manggala MG (2017) Mixed reality in the 3d virtual room arrangement. In: 2017 11th international conference on information communication technology and system (ICTS), pp 303–306. https://doi.org/10.1109/ICTS.2017.8265688
Ju Z, Ji X, Li J, Liu H (2017) An integrative framework of human hand gesture segmentation for human–robot interaction. IEEE Syst J 11(3):1326–1336. https://doi.org/10.1109/JSYST.2015.2468231
Kapidis G, Poppe R, Van Dam E, Noldus L, Veltkamp R (2019) Egocentric hand track and object-based human action recognition. In: 2019 IEEE SmartWorld, ubiquitous intelligence & computing, advanced & trusted computing, scalable computing & communications, cloud & big data computing, internet of people and smart city innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) pp. 922–929. IEEE
Kok VJ, Chan CS (2016) Grcs: granular computing-based crowd segmentation. IEEE Trans Cybern 47(5):1157–1168
Kong Y, Liu Y, Yan B, Leung H, Peng X (2021) A novel deeplabv3+ network for sar imagery semantic segmentation based on the potential energy loss function of gibbs distribution. Remote Sensing 13(3):454
Lateef F, Ruichek Y (2019) Survey on semantic segmentation using deep learning techniques. Neurocomputing 338:321–348
Lee K, Kacorri H (2019) Hands holding clues for object recognition in teachable machines. In: Proceedings of the 2019 CHI conference on human factors in computing systems ACM
Lee S, Bambach S, Crandall DJ, Franchak JM, Yu C (2014) This hand is my hand: a probabilistic approach to hand disambiguation in egocentric video. In: 2014 IEEE conference on computer vision and pattern recognition workshops, pp 557–564 . https://doi.org/10.1109/CVPRW.2014.86
Li C, Kitani KM (2013) Pixel-level hand detection in ego-centric videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3570–3577
Li Y, Ye Z, Rehg JM (2015) Delving into egocentric actions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 287–295
Li Y, Jia L, Wang Z, Qian Y, Qiao H (2019) Un-supervised and semi-supervised hand segmentation in egocentric images with noisy label learning. Neurocomputing 334:11–24. https://doi.org/10.1016/j.neucom.2018.12.010
Lin F, Martinez T (2020) Ego2hands: A dataset for egocentric two-hand segmentation and detection. arXiv preprint arXiv:2011.07252
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755. Springer
Lin G, Milan A, Shen C, Reid I (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934
Lin F, Wilhelm C, Martinez T (2021) Two-hand global 3d pose estimation using monocular rgb. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2373–2381
Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better. arXiv preprint arXiv:1506.04579
Maricchiolo F, Bonaiuto M, Gnisci A (2005) Hand gestures in speech: studies of their roles in social interaction. In: Proceedings of the conference of the international society for gesture studies
Matilainen M, Sangi P, Holappa J, Silvén O (2016) Ouhands database for hand detection and pose recognition. In: 2016 Sixth international conference on image processing theory, tools and applications (IPTA), pp 1–5 IEEE
Maurya J, Hebbalaguppe R, Gupta P (2018) Real time hand segmentation on frugal headmounted device for gestural interface. In: 2018 25th IEEE international conference on image processing (ICIP), pp 4023–4027. https://doi.org/10.1109/ICIP.2018.8451213
Minaee S, Boykov YY, Porikli F, Plaza AJ, Kehtarnavaz N, Terzopoulos D (2021) Image segmentation using deep learning: a survey. IEEE Trans Pattern Anal Mach Intell
Mueller F, Bernard F, Sotnychenko O, Mehta D, Sridhar S, Casas D, Theobalt C (2018) Ganerated hands for real-time 3d hand tracking from monocular rgb. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49–59
Narasimhaswamy S, Wei Z, Wang Y, Zhang J, Hoai M (2019) Contextual attention for hand detection in the wild. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9567–9576
Nguyen T-H-C, Nebel J-C, Florez-Revuelta F (2018) Recognition of activities of daily living from egocentric videos using hands detected by a deep convolutional network. In: Campilho A, Karray F, ter Haar Romeny B (eds) Image analysis and recognition. Springer, Cham, pp 390–398
Papandreou G, Kokkinos I, Savalle PA (2015) Modeling local and global deformations in deep learning: epitomic convolution, multiple instance learning, and sliding window detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 390–399
Paul S, Bhattacharyya A, Mollah AF, Basu S, Nasipuri M (2020) Hand segmentation from complex background for gesture recognition. In: Mandal JK, Bhattacharya D (eds) Emerging technology in modelling and graphics. Springer, Singapore, pp 775–782
Pirsiavash H, Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2847–2854. IEEE
Poularakis S, Katsavounidis I (2015) Low-complexity hand gesture recognition system for continuous streams of digits and letters. IEEE Trans Cybern 46(9):2094–2108
Rautaray SS, Agrawal A (2015) Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev 43(1):1–54
Ren Y, Kong AWK, Jiao L (2020) A survey on image and video cosegmentation: methods, challenges and analyses. Pattern Recognit 103:107297
Rogez G, Khademi M, Supančič JS III, Montiel JMM, Ramanan D (2015) 3d hand pose detection in egocentric rgb-d images. In: Agapito L, Bronstein MM, Rother C (eds) Computer vision: ECCV 2014 workshops. Springer, Cham, pp 356–371
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Sharma S, Huang S (2021) An end-to-end framework for unconstrained monocular 3d hand pose estimation. Pattern Recognit 115:107892
Shilkrot R, Narasimhaswamy S, Vazir S, Hoai M (2019) Workinghands: a hand-tool assembly dataset for image segmentation and activity mining. In: BMVC, p 258
Tang Y, Wang Z, Lu J, Feng J, Zhou J (2018) Multi-stream deep neural networks for rgb-d egocentric action recognition. IEEE Trans Circuits Syst Video Technol 29(10):3001–3015
Thalmann D, Liang H, Yuan J (2015) First-person palm pose tracking and gesture recognition in augmented reality. In: International joint conference on computer vision, imaging and computer graphics, pp. 3–15. Springer
Urooj A, Borji A (2018) Analysis of hand segmentation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4710–4719
Valli A (2008) The design of natural interaction. Multimed Tools Appl 38(3):295–305
Wang J, Liu X (2021) Medical image recognition and segmentation of pathological slices of gastric cancer based on deeplab v3+ neural network. Comput Methods Programs Biomed 207:106210
Wang W, Yu K, Hugonot J, Fua P, Salzmann M (2019) Recurrent u-net for resource-constrained segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2142–2151
Wu W, Gan J, Zhou J, Wang J (2021) A lightweight and effective semantic segmentation network for ethnic clothing images based on deeplab. In: 2021 9th international conference on communications and broadband networking, pp 34–40
Yuan S, Ye Q, Garcia-Hernando G, Kim TK (2017) The 2017 hands in the million challenge on 3d hand pose estimation. arXiv preprint arXiv:1707.02237
Yueming W, Hanwu H, Tong R, Detao Z (2007) Hand segmentation for augmented reality system. In: Second workshop on digital media and its application in museum heritages (DMAMH 2007), pp 395–401. https://doi.org/10.1109/DMAMH.2007.39
Zhang Y, Cao C, Cheng J, Lu H (2018) Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans Multimed 20(5):1038–1050
Zimmermann C, Brox T (2017) Learning to estimate 3d hand pose from single rgb images. In: Proceedings of the IEEE international conference on computer vision, pp 4903–4911
Acknowledgements
The authors would like to thank NVIDIA’s Academic Research Team for providing the Titan Xp cards under the Hardware Donation Program. The authors also thank Ester Gonzalez-Sosa for the availability and support.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no conflicts of interests regarding the publication of this papers.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file 1
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gruosso, M., Capece, N. & Erra, U. Egocentric upper limb segmentation in unconstrained real-life scenarios. Virtual Reality 27, 3421–3433 (2023). https://doi.org/10.1007/s10055-022-00725-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10055-022-00725-4