Skip to main content
Log in

Occlusion-robust workflow recognition with context-aware compositional ConvNet

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Workflow recognition relying on deep convolutional neural network has obtained promising performance. Though impressive results have been achieved on standard industrial workflow, the performance on heavily occluded workflow remains far from satisfactory. In this paper, we present an effective context-aware compositional ConvNet (CA-CompNet) for occluded workflow detection with the following contributions. First, we combine compositional model and original ConvNet together to build a unified deep architecture for occluded workflow detection, which has shown innate robustness to address the problem of object classification under occlusion. Second, in order to overcome the variable occlusion limitations, the bounding box annotations are utilized to segment the context from target workflow instance during training. Then, these segmentations are used to learn the proposed CA-CompNet, which enables the network to untangle the feature representation of workflow instance from the context. Third, a robust voting mechanism for candidate bounding box is introduced to improve the detection accuracy, which facilitates the model to precisely detect the bounding box of a specific workflow instance. Comprehensive experiments demonstrate that the proposed context-aware network can robustly detect workflow instance under occlusion in industrial environment, increasing the detection performance on MS COCO dataset by 4.6% (from 45.1 to 49.7%) in absolute performance compared to the advanced CenterNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.

References

  • Arthur D, Vassilvitskii S (2007) k-means plus plus: the advantages of careful seeding. In: Proceedings of the eighteenth annual Acm-Siam symposium on discrete algorithms, pp 1027–1035

  • Banerjee A, Dhillon IS, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises-Fisher distributions. J Mach Learn Res 6:1345–1382

    MathSciNet  Google Scholar 

  • Bienenstock E, Geman S (1998) Compositionality in neural systems. In: The handbook of brain theory and neural networks, pp 223–226

  • Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934

  • Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162

  • Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255

  • DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552

  • Fidler S, Boben M, Leonardis A (2014) Learning a hierarchical compositional shape vocabulary for multi-class object representation. arXiv:1408.5516

  • Fodor JA, Pylyshyn ZW (1988) Connectionism and cognitive architecture: a critical analysis. Cognition 28(1–2):3–71

    Article  CAS  PubMed  Google Scholar 

  • George D et al (2017) A generative vision model that trains with high data efficiency and breaks text-based captchas. Science 358:6368

    Article  Google Scholar 

  • Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  • Hu H, Cheng K, Li Z, Chen J, Hu H (2020) Workflow recognition with structured two-stream convolutional networks. Pattern Recogn Lett 130:267–274

    Article  ADS  Google Scholar 

  • Jin Y, Geman S (2006) Context and hierarchy in a probabilistic image model. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 2145–2152

  • Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  • Kortylewski A (2017) Model-based image analysis for forensic shoe print recognition. PhD Thesis, University of Basel

  • Kortylewski A, He J, Liu Q, Yuille LA (2020a) Compositional convolutional neural networks: a deep architecture with innate robustness to partial occlusion. In: Proceedings of computer vision and pattern recognition, pp 8940–8949

  • Kortylewski A, Liu Q, Wang H, Zhang Z, Yuille A (2020b) Combining compositional models and deep networks for robust object classification under occlusion. In: IEEE Winter conference on applications of computer vision workshops, pp 1333–1341

  • Lampert CH, Blaschko MB, Hofmann T (2008) Beyond sliding windows: object localization by efficient subwindow search. In: Proceedings of computer vision and pattern recognition, pp 1–8

  • Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of European conference on computer vision, pp 734–750

  • Liao R, Schwing A, Zemel R, Urtasun R (2016) Learning deep parsimonious representations. In: Proceedings of international conference neural information processing systems, pp 5076–5084

  • Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of European conference on computer vision, pp 740–755

  • Makantasis K, Doulamis A, Doulamis N, Psychas K (2016) Deep learning based human behavior recognition in industrial workflows. In: Proceedings of international conference image processing, pp 1609–1613

  • Mannhardt F, Bovo R, Oliveira MF, Julier S (2018) A taxonomy for combining activity recognition and process discovery in industrial environments. In: Lecture notes in computer science, pp 84–93

  • Reddy ND, Vo M, Narasimhan SG (2019) Occlusion-net: 2d/3d occluded keypoint localization using graph networks. In: Proceedings of computer vision and pattern recognition, pp 7326–7335

  • Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Proceedings of international conference on neural information processing systems, pp 91–99

  • Sasikumar D, Emeric E, Stuphorn V, Connor CE (2018) First-pass processing of value cues in the ventral visual pathway. Curr Biol 28(4):538–548

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  • Voulodimos A et al (2011) A dataset for workflow recognition in industrial scenes. In: Proceedings of international conference image processing, pp 3249–3252

  • Voulodimos A et al (2012) A threefold dataset for activity and workflow recognition in complex industrial environments. IEEE Multimed 3:42–52

    Article  Google Scholar 

  • Wang J, Xie C, Zhang Z, Zhu J, Xie L, Yuille A (2017a) Detecting semantic parts on partially occluded objects. arXiv:1707.07819

  • Wang J et al (2017b) Detecting semantic parts on partially occluded objects. arXiv:1707.07819

  • Xiang Y, Savarese S (2013) Object detection by 3d aspectlets and occlusion reasoning. In: IEEE international conference on computer vision workshops, ICCVW, pp 530–537

  • Yan S, Liu Q (2015) Inferring occluded features for fast object detection. Signal Process 110:188–198

    Article  Google Scholar 

  • Yang Y, Ma Z, Nie F, Chang X, Hauptmann AG (2015) Multi-class active learning by uncertainty sampling with diversity maximization. Int J Comput Vis 113(2):113–127

    Article  MathSciNet  Google Scholar 

  • Zhang L, Wang QW (2018) Xiolift database. https://pan.baidu.com/s/1ySILNURWDN40q5TpAvGKUA

  • Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018a) Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of European conference on computer vision, pp 637–653

  • Zhang Z, Xie C, Wang J, Xie L, Yuille AL (2018b) Deepvoting: a robust and explainable deep network for semantic part detection under partial occlusion. In: Proceedings of computer vision and pattern Recognition, pp 1372–1380

  • Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850

  • Zhu H, Tang F, Park J, Park S, Yuille A (2019) Robustness of object recognition under extreme occlusion in humans and computational models. arXiv:1905.04598

Download references

Funding

This research is based upon work partially supported by National Natural Science Foundation of China (Grant Nos. 61572251, 61572162, 61702144 and 61802095), the Zhejiang Provincial Key Science and Technology “LingYan” Project Foundation (No. 2023C01145), the Natural Science Foundation of Zhejiang Province (LQ17F020003), the Key Science and Technology Project Foundation of Zhejiang Province (2018C01012).

Author information

Authors and Affiliations

Authors

Contributions

HH: conception and design of study; MZ: analysis/ interpretation of data and drafting the manuscript; ZL: acquisition of data; JC: revising the manuscript critically for important intellectual content.

Corresponding author

Correspondence to Haiyang Hu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This research does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, M., Hu, H., Li, Z. et al. Occlusion-robust workflow recognition with context-aware compositional ConvNet. Soft Comput 28, 5125–5135 (2024). https://doi.org/10.1007/s00500-023-09225-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-023-09225-2

Keywords

Navigation