Occlusion-robust workflow recognition with context-aware compositional ConvNet

Zhang, Min; Hu, Haiyang; Li, Zhongjin; Chen, Jie

doi:10.1007/s00500-023-09225-2

Occlusion-robust workflow recognition with context-aware compositional ConvNet

Application of soft computing
Published: 03 October 2023

Volume 28, pages 5125–5135, (2024)
Cite this article

Soft Computing Aims and scope Submit manuscript

Min Zhang^1,2,
Haiyang Hu²,
Zhongjin Li² &
…
Jie Chen²

87 Accesses
Explore all metrics

Abstract

Workflow recognition relying on deep convolutional neural network has obtained promising performance. Though impressive results have been achieved on standard industrial workflow, the performance on heavily occluded workflow remains far from satisfactory. In this paper, we present an effective context-aware compositional ConvNet (CA-CompNet) for occluded workflow detection with the following contributions. First, we combine compositional model and original ConvNet together to build a unified deep architecture for occluded workflow detection, which has shown innate robustness to address the problem of object classification under occlusion. Second, in order to overcome the variable occlusion limitations, the bounding box annotations are utilized to segment the context from target workflow instance during training. Then, these segmentations are used to learn the proposed CA-CompNet, which enables the network to untangle the feature representation of workflow instance from the context. Third, a robust voting mechanism for candidate bounding box is introduced to improve the detection accuracy, which facilitates the model to precisely detect the bounding box of a specific workflow instance. Comprehensive experiments demonstrate that the proposed context-aware network can robustly detect workflow instance under occlusion in industrial environment, increasing the detection performance on MS COCO dataset by 4.6% (from 45.1 to 49.7%) in absolute performance compared to the advanced CenterNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection Beyond What and Where: A Benchmark for Detecting Occlusion State

Occlusion relationship reasoning with a feature separation and interaction network

Article Open access 13 October 2023

ID-Net: an improved mask R-CNN model for intrusion detection under power grid surveillance

Article 29 January 2021

Data availability

Enquiries about data availability should be directed to the authors.

References

Arthur D, Vassilvitskii S (2007) k-means plus plus: the advantages of careful seeding. In: Proceedings of the eighteenth annual Acm-Siam symposium on discrete algorithms, pp 1027–1035
Banerjee A, Dhillon IS, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises-Fisher distributions. J Mach Learn Res 6:1345–1382
MathSciNet Google Scholar
Bienenstock E, Geman S (1998) Compositionality in neural systems. In: The handbook of brain theory and neural networks, pp 223–226
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255
DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552
Fidler S, Boben M, Leonardis A (2014) Learning a hierarchical compositional shape vocabulary for multi-class object representation. arXiv:1408.5516
Fodor JA, Pylyshyn ZW (1988) Connectionism and cognitive architecture: a critical analysis. Cognition 28(1–2):3–71
Article CAS PubMed Google Scholar
George D et al (2017) A generative vision model that trains with high data efficiency and breaks text-based captchas. Science 358:6368
Article Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hu H, Cheng K, Li Z, Chen J, Hu H (2020) Workflow recognition with structured two-stream convolutional networks. Pattern Recogn Lett 130:267–274
Article ADS Google Scholar
Jin Y, Geman S (2006) Context and hierarchy in a probabilistic image model. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 2145–2152
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Kortylewski A (2017) Model-based image analysis for forensic shoe print recognition. PhD Thesis, University of Basel
Kortylewski A, He J, Liu Q, Yuille LA (2020a) Compositional convolutional neural networks: a deep architecture with innate robustness to partial occlusion. In: Proceedings of computer vision and pattern recognition, pp 8940–8949
Kortylewski A, Liu Q, Wang H, Zhang Z, Yuille A (2020b) Combining compositional models and deep networks for robust object classification under occlusion. In: IEEE Winter conference on applications of computer vision workshops, pp 1333–1341
Lampert CH, Blaschko MB, Hofmann T (2008) Beyond sliding windows: object localization by efficient subwindow search. In: Proceedings of computer vision and pattern recognition, pp 1–8
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of European conference on computer vision, pp 734–750
Liao R, Schwing A, Zemel R, Urtasun R (2016) Learning deep parsimonious representations. In: Proceedings of international conference neural information processing systems, pp 5076–5084
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of European conference on computer vision, pp 740–755
Makantasis K, Doulamis A, Doulamis N, Psychas K (2016) Deep learning based human behavior recognition in industrial workflows. In: Proceedings of international conference image processing, pp 1609–1613
Mannhardt F, Bovo R, Oliveira MF, Julier S (2018) A taxonomy for combining activity recognition and process discovery in industrial environments. In: Lecture notes in computer science, pp 84–93
Reddy ND, Vo M, Narasimhan SG (2019) Occlusion-net: 2d/3d occluded keypoint localization using graph networks. In: Proceedings of computer vision and pattern recognition, pp 7326–7335
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Proceedings of international conference on neural information processing systems, pp 91–99
Sasikumar D, Emeric E, Stuphorn V, Connor CE (2018) First-pass processing of value cues in the ventral visual pathway. Curr Biol 28(4):538–548
Article CAS PubMed PubMed Central Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Voulodimos A et al (2011) A dataset for workflow recognition in industrial scenes. In: Proceedings of international conference image processing, pp 3249–3252
Voulodimos A et al (2012) A threefold dataset for activity and workflow recognition in complex industrial environments. IEEE Multimed 3:42–52
Article Google Scholar
Wang J, Xie C, Zhang Z, Zhu J, Xie L, Yuille A (2017a) Detecting semantic parts on partially occluded objects. arXiv:1707.07819
Wang J et al (2017b) Detecting semantic parts on partially occluded objects. arXiv:1707.07819
Xiang Y, Savarese S (2013) Object detection by 3d aspectlets and occlusion reasoning. In: IEEE international conference on computer vision workshops, ICCVW, pp 530–537
Yan S, Liu Q (2015) Inferring occluded features for fast object detection. Signal Process 110:188–198
Article Google Scholar
Yang Y, Ma Z, Nie F, Chang X, Hauptmann AG (2015) Multi-class active learning by uncertainty sampling with diversity maximization. Int J Comput Vis 113(2):113–127
Article MathSciNet Google Scholar
Zhang L, Wang QW (2018) Xiolift database. https://pan.baidu.com/s/1ySILNURWDN40q5TpAvGKUA
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018a) Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of European conference on computer vision, pp 637–653
Zhang Z, Xie C, Wang J, Xie L, Yuille AL (2018b) Deepvoting: a robust and explainable deep network for semantic part detection under partial occlusion. In: Proceedings of computer vision and pattern Recognition, pp 1372–1380
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850
Zhu H, Tang F, Park J, Park S, Yuille A (2019) Robustness of object recognition under extreme occlusion in humans and computational models. arXiv:1905.04598

Download references

Funding

This research is based upon work partially supported by National Natural Science Foundation of China (Grant Nos. 61572251, 61572162, 61702144 and 61802095), the Zhejiang Provincial Key Science and Technology “LingYan” Project Foundation (No. 2023C01145), the Natural Science Foundation of Zhejiang Province (LQ17F020003), the Key Science and Technology Project Foundation of Zhejiang Province (2018C01012).

Author information

Authors and Affiliations

Department of Design and Art, Zhejiang Industry Polytechnic College, Shaoxing, China
Min Zhang
School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
Min Zhang, Haiyang Hu, Zhongjin Li & Jie Chen

Authors

Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haiyang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zhongjin Li
View author publications
You can also search for this author in PubMed Google Scholar
Jie Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

HH: conception and design of study; MZ: analysis/ interpretation of data and drafting the manuscript; ZL: acquisition of data; JC: revising the manuscript critically for important intellectual content.

Corresponding author

Correspondence to Haiyang Hu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This research does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, M., Hu, H., Li, Z. et al. Occlusion-robust workflow recognition with context-aware compositional ConvNet. Soft Comput 28, 5125–5135 (2024). https://doi.org/10.1007/s00500-023-09225-2

Download citation

Accepted: 09 September 2023
Published: 03 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00500-023-09225-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Occlusion-robust workflow recognition with context-aware compositional ConvNet

Abstract

Access this article

Similar content being viewed by others

Detection Beyond What and Where: A Benchmark for Detecting Occlusion State

Occlusion relationship reasoning with a feature separation and interaction network

ID-Net: an improved mask R-CNN model for intrusion detection under power grid surveillance

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Occlusion-robust workflow recognition with context-aware compositional ConvNet

Abstract

Access this article

Similar content being viewed by others

Detection Beyond What and Where: A Benchmark for Detecting Occlusion State

Occlusion relationship reasoning with a feature separation and interaction network

ID-Net: an improved mask R-CNN model for intrusion detection under power grid surveillance

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation