Abstract
Workflow recognition relying on deep convolutional neural network has obtained promising performance. Though impressive results have been achieved on standard industrial workflow, the performance on heavily occluded workflow remains far from satisfactory. In this paper, we present an effective context-aware compositional ConvNet (CA-CompNet) for occluded workflow detection with the following contributions. First, we combine compositional model and original ConvNet together to build a unified deep architecture for occluded workflow detection, which has shown innate robustness to address the problem of object classification under occlusion. Second, in order to overcome the variable occlusion limitations, the bounding box annotations are utilized to segment the context from target workflow instance during training. Then, these segmentations are used to learn the proposed CA-CompNet, which enables the network to untangle the feature representation of workflow instance from the context. Third, a robust voting mechanism for candidate bounding box is introduced to improve the detection accuracy, which facilitates the model to precisely detect the bounding box of a specific workflow instance. Comprehensive experiments demonstrate that the proposed context-aware network can robustly detect workflow instance under occlusion in industrial environment, increasing the detection performance on MS COCO dataset by 4.6% (from 45.1 to 49.7%) in absolute performance compared to the advanced CenterNet.
Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Arthur D, Vassilvitskii S (2007) k-means plus plus: the advantages of careful seeding. In: Proceedings of the eighteenth annual Acm-Siam symposium on discrete algorithms, pp 1027–1035
Banerjee A, Dhillon IS, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises-Fisher distributions. J Mach Learn Res 6:1345–1382
Bienenstock E, Geman S (1998) Compositionality in neural systems. In: The handbook of brain theory and neural networks, pp 223–226
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255
DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552
Fidler S, Boben M, Leonardis A (2014) Learning a hierarchical compositional shape vocabulary for multi-class object representation. arXiv:1408.5516
Fodor JA, Pylyshyn ZW (1988) Connectionism and cognitive architecture: a critical analysis. Cognition 28(1–2):3–71
George D et al (2017) A generative vision model that trains with high data efficiency and breaks text-based captchas. Science 358:6368
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hu H, Cheng K, Li Z, Chen J, Hu H (2020) Workflow recognition with structured two-stream convolutional networks. Pattern Recogn Lett 130:267–274
Jin Y, Geman S (2006) Context and hierarchy in a probabilistic image model. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 2145–2152
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Kortylewski A (2017) Model-based image analysis for forensic shoe print recognition. PhD Thesis, University of Basel
Kortylewski A, He J, Liu Q, Yuille LA (2020a) Compositional convolutional neural networks: a deep architecture with innate robustness to partial occlusion. In: Proceedings of computer vision and pattern recognition, pp 8940–8949
Kortylewski A, Liu Q, Wang H, Zhang Z, Yuille A (2020b) Combining compositional models and deep networks for robust object classification under occlusion. In: IEEE Winter conference on applications of computer vision workshops, pp 1333–1341
Lampert CH, Blaschko MB, Hofmann T (2008) Beyond sliding windows: object localization by efficient subwindow search. In: Proceedings of computer vision and pattern recognition, pp 1–8
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of European conference on computer vision, pp 734–750
Liao R, Schwing A, Zemel R, Urtasun R (2016) Learning deep parsimonious representations. In: Proceedings of international conference neural information processing systems, pp 5076–5084
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of European conference on computer vision, pp 740–755
Makantasis K, Doulamis A, Doulamis N, Psychas K (2016) Deep learning based human behavior recognition in industrial workflows. In: Proceedings of international conference image processing, pp 1609–1613
Mannhardt F, Bovo R, Oliveira MF, Julier S (2018) A taxonomy for combining activity recognition and process discovery in industrial environments. In: Lecture notes in computer science, pp 84–93
Reddy ND, Vo M, Narasimhan SG (2019) Occlusion-net: 2d/3d occluded keypoint localization using graph networks. In: Proceedings of computer vision and pattern recognition, pp 7326–7335
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Proceedings of international conference on neural information processing systems, pp 91–99
Sasikumar D, Emeric E, Stuphorn V, Connor CE (2018) First-pass processing of value cues in the ventral visual pathway. Curr Biol 28(4):538–548
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Voulodimos A et al (2011) A dataset for workflow recognition in industrial scenes. In: Proceedings of international conference image processing, pp 3249–3252
Voulodimos A et al (2012) A threefold dataset for activity and workflow recognition in complex industrial environments. IEEE Multimed 3:42–52
Wang J, Xie C, Zhang Z, Zhu J, Xie L, Yuille A (2017a) Detecting semantic parts on partially occluded objects. arXiv:1707.07819
Wang J et al (2017b) Detecting semantic parts on partially occluded objects. arXiv:1707.07819
Xiang Y, Savarese S (2013) Object detection by 3d aspectlets and occlusion reasoning. In: IEEE international conference on computer vision workshops, ICCVW, pp 530–537
Yan S, Liu Q (2015) Inferring occluded features for fast object detection. Signal Process 110:188–198
Yang Y, Ma Z, Nie F, Chang X, Hauptmann AG (2015) Multi-class active learning by uncertainty sampling with diversity maximization. Int J Comput Vis 113(2):113–127
Zhang L, Wang QW (2018) Xiolift database. https://pan.baidu.com/s/1ySILNURWDN40q5TpAvGKUA
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018a) Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of European conference on computer vision, pp 637–653
Zhang Z, Xie C, Wang J, Xie L, Yuille AL (2018b) Deepvoting: a robust and explainable deep network for semantic part detection under partial occlusion. In: Proceedings of computer vision and pattern Recognition, pp 1372–1380
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850
Zhu H, Tang F, Park J, Park S, Yuille A (2019) Robustness of object recognition under extreme occlusion in humans and computational models. arXiv:1905.04598
Funding
This research is based upon work partially supported by National Natural Science Foundation of China (Grant Nos. 61572251, 61572162, 61702144 and 61802095), the Zhejiang Provincial Key Science and Technology “LingYan” Project Foundation (No. 2023C01145), the Natural Science Foundation of Zhejiang Province (LQ17F020003), the Key Science and Technology Project Foundation of Zhejiang Province (2018C01012).
Author information
Authors and Affiliations
Contributions
HH: conception and design of study; MZ: analysis/ interpretation of data and drafting the manuscript; ZL: acquisition of data; JC: revising the manuscript critically for important intellectual content.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This research does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, M., Hu, H., Li, Z. et al. Occlusion-robust workflow recognition with context-aware compositional ConvNet. Soft Comput 28, 5125–5135 (2024). https://doi.org/10.1007/s00500-023-09225-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-023-09225-2