Disassembling Convolutional Segmentation Network

Hu, Kaiwen; Gao, Jing; Mao, Fangyuan; Song, Xinhui; Cheng, Lechao; Feng, Zunlei; Song, Mingli

doi:10.1007/s11263-023-01776-z

Disassembling Convolutional Segmentation Network

Published: 02 April 2023

Volume 131, pages 1741–1760, (2023)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Kaiwen Hu¹,
Jing Gao¹,
Fangyuan Mao¹,
Xinhui Song²,
Lechao Cheng³,
Zunlei Feng ORCID: orcid.org/0000-0001-8640-8434^4,5 &
…
Mingli Song^1,4

775 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

In recent years, the convolutional segmentation network has achieved remarkable performance in the computer vision area. However, training a practicable segmentation network is time- and resource-consuming. In this paper, focusing on the semantic image segmentation task, we attempt to disassemble a convolutional segmentation network into category-aware convolution kernels and achieve customizable tasks without additional training by utilizing those kernels. The core of disassembling convolutional segmentation networks is how to identify the relevant convolution kernels for a specific category. According to the encoder-decoder network architecture, the disassembling framework, named Disassembler, is devised to be composed of the forward channel-wise activation attribution and backward gradient attribution. In the forward channel-wise activation attribution process, for each image, the activation values of each feature map in the high-confidence mask area are summed into category-aware probability vectors. In the backward gradient attribution process, the positive gradients w.r.t. each feature map in the high-confidence mask area are summed into a relative coefficient vector for each category. With the cooperation of two vectors, the Disassembler can effectively disassemble category-aware convolution kernels. Extensive experiments demonstrate that the proposed Disassembler can accomplish the category-customizable task without additional training. The disassembled category-aware sub-network achieves comparable performance without any finetuning and will outperform existing state-of-the-art methods with one epoch of finetuning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GAN and DCN Based Multi-step Supervised Learning for Image Semantic Segmentation

An Image Segmentation Model Based on Cascaded Multilevel Features

Fully convolutional network with attention modules for semantic segmentation

Article 02 January 2021

Data availibility

The MSCOCO Lin et al. (2014) dataset can be obtained from https://cocodataset.org/. The Pascal VOC 2012 Everingham et al. (2015) dataset can be obtained from http://host.robots.ox.ac.uk/pascal/VOC/voc2012/. The Bird Welinder et al. (2010) dataset can be obtained from http://www.vision.caltech.edu/visipedia/CUB-200.html. The Flowers Nilsback and Zisserman (2008) dataset can be downloaded from https://www.robots.ox.ac.uk/~vgg/data/bicos/. The HumanMatting dataset can be downloaded from https://github.com/aisegmentcn/matting_human_datasets.

Notes

https://github.com/aisegmentcn/matting_human_datasets.

References

Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K. R., & Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7), 130140.
Article Google Scholar
Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., & Müller, K.-R. (2010). How to explain individual classification decisions. Journal of Machine Learning Research, 11(61), 1803–1831.
MathSciNet MATH Google Scholar
Berthelier, A., Chateau, T., Duffner, S., Garcia, C., & Blanc, C. (2020). Deep model compression and architecture optimization for embedded systems: A survey. Journal of Signal Processing Systems, 93(8), 863–878.
Article Google Scholar
Bucila, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In ACM SIGKDD international conference on knowledge discovery and data mining(KDD’06).
Chang, H., Han, J., Zhong, C., Snijders, A., & Mao, J. H. (2018). Unsupervised transfer learning via multi-scale convolutional sparse coding for biomedical applications. IEEE Transactions on Pattern Analysis & Machine Intelligence, 40(5), 1182–1194.
Article Google Scholar
Chen, L., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking Atrous convolution for semantic image segmentation. CoRR. arXiv:1706.05587.
Chen, T., Sui, Y., Chen, X., Zhang, A., & Wang, Z. (2021). A unified lottery ticket hypothesis for graph neural networks. In International conference on machine learning, pp. 1695–1706. PMLR.
Chen, T., Frankle, J., Chang, S., Liu, S., Zhang, Y., Wang, Z., & Carbin, M. (2020). The lottery ticket hypothesis for pre-trained BERT networks. Advances in Neural Information Processing Systems, 33, 15834–15846.
Google Scholar
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected CRFs. Computer Science, 4, 357–361.
Google Scholar
Chen, J., Wang, J., Wang, X., Wang, X., Feng, Z., Liu, R., & Song, M. (2021). CoEvo-Net: Coevolution network for video highlight detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(6), 3788–3797.
Article Google Scholar
Choudhary, T., Mishra, V., Goswami, A., & Sarangapani, J. (2020). A comprehensive survey on model compression and acceleration. Artificial Intelligence Review, 53(3), 5113–5155.
Article Google Scholar
Crowley, E. J., Gray, G., & Storkey, A. (2017). Moonshine: Distilling with cheap convolutions. In Conference on neural information processing systems.
Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE computer society conference on computer vision and pattern recognition (CVPR 2009), 20–25 June 2009, Miami, Florida, USA, pp. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Desai, S., & Ramaswamy, H. G. (2020). Ablation-CAM: Visual explanations for deep convolutional network via gradient-free localization. In IEEE winter conference on applications of computer vision, pp. 983–991.
Essen, D. V., & Deyoe, E. A. (1995). Concurrent processing in the primate visual cortex. In Cognitive neurosciences (pp. 383–400).
Everingham, M., Eslami, S., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.
Article Google Scholar
Fang, G., Bao, Y., Song, J., Wang, X., Xie, D., Shen, C., & Song, M. (2021). Mosaicking to distill: Knowledge distillation from out-of-domain data. In Conference on neural information processing systems.
Feng, Z., Cheng, L., Wang, X., Wang, X., Liu, Y., Du, X., & Song, M. (2021). Visual boundary knowledge translation for foreground segmentation. In AAAI conference on artificial intelligence.
Feng, Z., Hu, J., Wu, S., Yu, X., Song, J., & Song, M. (2022). Model doctor: A simple gradient aggregation strategy for diagnosing and treating CNN classifiers. In AAAI conference on artificial intelligence.
Feng, Z., Wang, Z., Wang, X., Zhang, X., & Song, M. (2021). Edge-competing pathological liver vessel segmentation with limited labels. In AAAI conference on artificial intelligence.
Feng, Y., Wu, F., Shao, X., Wang, Y., & Zhou, X. (2018). Joint 3d face reconstruction and dense alignment with position map regression network. In European conference on computer vision, pp. 557–574.
Feng, Z., Liang, W., Tao, D., Sun, L., & Song, M. (2019). CU-NET: Component unmixing network for textile fiber identification. International Journal of Computer Vision, 127(10), 1443–1454.
Article Google Scholar
Feng, Z., Wang, Z., Wang, X., Mao, Y., Li, T., Lei, J., Wang, Y., & Song, M. (2015). Mutual-complementing framework for nuclei detection and segmentation in pathology image. IEEE International Conference on Computer Vision, 39(4), 640–651.
Google Scholar
Flennerhag, S., Moreno, P. G., Lawrence, N. D., & Damianou, A. (2018). Transferring knowledge across learning processes. In International conference on learning representations.
Frankle, J., & Carbin, M. (2018). The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635
Girish, S., Maiya, S. R., Gupta, K., Chen, H., Davis, L. S., & Shrivastava, A. (2021). The lottery ticket hypothesis for object recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 762–771.
Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2021). Knowledge distillation: A survey. International Journal of Computer Vision, 1–31.
Gupta, S., Hoffman, J., & Malik, J. (2016). Cross modal distillation for supervision transfer. In IEEE computer society, pp. 2827–2836.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. CoRR. arXiv:1512.03385
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. Computer Science, 14(7), 38–39.
Google Scholar
Hong, Y., Pan, H., Sun, W., & Jia, Y. (2021). Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv preprint arXiv:2101.06085
Hu, J., Cao, L., Tong, T., Ye, Q., Zhang, S., Li, K., Huang, F., Shao, L., & Ji, R. (2021). Architecture disentanglement for deep neural networks. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 672–681.
Hu, J., Gao, J., Feng, Z., Cheng, L., Lei, J., Bao, H., & Song, M. (2022). CNN LEGO: Disassembling and assembling convolutional neural network.
Hu, H., Peng, R., Tai, Y., & Tang, C. (2016). Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. CoRR. arXiv:1607.03250
Jie, L., Luan, Q., Song, X., Xiao, L., Tao, D., & Song, M. (2019). Action parsing-driven video summarization based on reinforcement learning. IEEE Transactions on Circuits & Systems for Video Technology, 29(7), 2126–2137.
Article Google Scholar
Jing, Z., Li, W., & Ogunbona, P. (2017). Joint geometrical and statistical alignment for visual domain adaptation. In Computer vision and pattern recognition.
Jing, Y., Liu, X., Ding, Y., Wang, X., Ding, E., Song, M., & Wen, S. (2020). Dynamic instance normalization for arbitrary style transfer. In AAAI.
Jing, Y., Mao, Y., Yang, Y., Zhan, Y., Song, M., Wang, X., & Tao, D. (2022). Learning graph neural networks for image style transfer. In ECCV.
Jing, Y., Yang, Y., Wang, X., Song, M., & Tao, D. (2021a). Amalgamating knowledge from heterogeneous graph neural networks. In CVPR.
Jing, Y., Yang, Y., Wang, X., Song, M., & Tao, D. (2021b). Meta-aggregator: learning to aggregate for 1-bit graph neural networks. In ICCV.
Kang, M., Mun, J., & Han, B. (2019). Towards oracle knowledge distillation with neural architecture search. In International joint conference on artificial intelligence.
Kapoor, R., Sharma, D., & Gulati, T. (2021). State of the art content based image retrieval techniques using deep learning: A survey. Multimedia Tools and Applications, 80(19), 29561–29583.
Article Google Scholar
Khakzar, A., Baselizadeh, S., Khanduja, S., Rupprecht, C., Kim, S. T., & Navab, N. (2021). Neural response interpretation through the lens of critical pathways. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13528–13538.
Lalonde, J. F. (2018). Deep learning for augmented reality. In 2018 17th workshop on information optics (WIO).
Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2016). Pruning filters for efficient convnets. CoRR. arXiv:1608.08710
Li, G., Wang, J., Shen, H. W., Chen, K., & Lu, Z. (2021). CNNPruner: Pruning convolutional neural networks with visual analytics. IEEE Transactions on Visualization and Computer Graphics, 27(2), 1364–1373.
Article Google Scholar
Li, J., Cheng, H., Guo, H., & Qiu, S. (2018). Survey on artificial intelligence for vehicles. Automotive Innovation, 1, 2–14.
Article Google Scholar
Lin, M., Ji, R., Wang, Y., Zhang, Y., Zhang, B., Tian, Y., & Shao, L. (2020). HRank: Filter pruning using high-rank feature map. In 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 1526–1535. https://doi.org/10.1109/CVPR42600.2020.00160
Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick, R. B., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. CoRR. arXiv:1405.0312
Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., & Wang, J. (2019). Structured knowledge distillation for semantic segmentation. In IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 2604–2613. https://doi.org/10.1109/CVPR.2019.00271
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision. pp. 2755–2763
Liu, X., Liu, Z., Wang, G., Cai, Z., & Zhang, H. (2018). Ensemble transfer learning algorithm. IEEE Access, 6, 2389–2396.
Article Google Scholar
Livingstone, M. S., & Hubel, D. H. (1987). Psychophysical evidence for separate channels for the perception of form, color, movement, and depth. Journal of Neuroscience, 7(11), 3416–3468.
Article Google Scholar
Long, J., Shelhamer, E., & Darrell, T. (2014). Fully convolutional networks for semantic segmentation. CoRR. arXiv:1411.4038
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640–651.
Google Scholar
Luo, S., Pan, W., Wang, X., Wang, D., & Song, M. (2020). Collaboration by competition: Self-coordinated knowledge amalgamation for multi-talent student learning. In European conference on computer vision.
Marcel, S., & Rodriguez, Y. (2010). Torchvision the machine-vision package of torch. In Proceedings of the 18th ACM international conference on multimedia. MM ’10, pp. 1485–1488. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1873951.1874254
Naidu, R., & Michael, J. (2020). SS-CAM: Smoothed Score-CAM for sharper visual feature localization. arXiv preprint arXiv:2006.14255
Nilsback, M. -E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In Indian conference on computer vision, graphics and image processing.
Panigrahi, S., Nanda, A., & Swarnkar, T. (2021). A survey on transfer learning.
Pawar, K., & Attar, V. (2019). Deep learning approaches for video-based anomalous activity detection. World Wide Web, 22(2), 571–601.
Article Google Scholar
Polino, A., Pascanu, R., & Alistarh, D. (2018). Model compression via distillation and quantization. In International conference on learning representations.
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, 39(6), 1137–1149.
Article Google Scholar
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-NET: Convolutional networks for biomedical image segmentation. CoRR. arXiv:1505.04597
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. Springer International Publishing.
Shen, C., Xue, M., Wang, X., Song, J., Sun, L., & Song, M. (2019). Customizing student networks from heterogeneous teachers via adaptive knowledge amalgamation. In IEEE international conference on computer vision.
Shrikumar, A., Greenside, P., & Kundaje, A. (2017). Learning important features through propagating activation differences. In International conference on machine learning.
Shrikumar, A., Greenside, P., Shcherbina, A., & Kundaje, A. (2016). Not just a black box: Learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations.
Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. Computer Science.
Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. In International conference on learning representations workshop.
Sundararajan, M., Taly, A., & Yan, Q. (2017). Axiomatic attribution for deep networks. In International conference on machine learning, pp. 3319–3328. PMLR.
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., & Liu, C. (2018). A survey on deep transfer learning. In International conference on artificial neural networks.
Treisman, A. M. (1963). Selective attention in man. British Medical Bulletin, 20(1), 12–16.
Article Google Scholar
Tzeng, E., Hoffman, J., Darrell, T., & Saenko, K. (2017). Simultaneous deep transfer across domains and tasks. In IEEE international conference on computer vision.
Wang, Y., Su, H., Zhang, B., & Hu, X. (2018). Interpret neural networks by identifying critical data routing paths. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8906–8914.
Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S., Mardziel, P., & Hu, X. (2020). Score-CAM: Score-weighted visual explanations for convolutional neural networks. In IEEE conference on computer vision and pattern recognition workshops, pp. 111–119.
Wang, W., Zhang, B., Cui, T., Chai, Y., & Li, Y. (2021). Research on knowledge distillation of generative adversarial networks. In Data compression conference.
Wang, Y., Zhou, W., Jiang, T., Bai, X., & Xu, Y. (2020). Intra-class feature variation distillation for semantic segmentation. In A. Vedaldi, H. Bischof, T. Brox, & J. Frahm (Eds.), Computer vision—ECCV 2020—16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII. Lecture Notes in Computer Science, vol. 12352, pp. 346–362. https://doi.org/10.1007/978-3-030-58571-6_21
Wang, J., Zhu, H., Wang, S., & Zhang, Y. D. (2021). A review of deep learning on medical image analysis. Mobile Networks and Applications, 26(2), 351–380.
Article Google Scholar
Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., & Perona, P. (2010). Caltech-ucsd birds 200. Technical Report CNS-TR-201, Caltech. http://www.vision.caltech.edu/visipedia/CUB-200.html.
Yang, Y., Qiu, J., Song, M., Tao, D., & Wang, X. (2020). Distilling knowledge from graph convolutional networks. In IEEE conference on computer vision and pattern recognition.
Ye, J., Ji, Y., Wang, X., Gao, X., & Song, M. (2020). Data-free knowledge amalgamation via group-stack dual-GAN. In IEEE conference on computer vision and pattern recognition.
Ye, J., Wang, X., Ji, Y., Ou, K., & Song, M. (2019). Amalgamating filtered knowledge: Learning task-customized student from multi-task teachers. In International joint conference on artificial intelligence.
Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? Advances in Neural Information Processing Systems,27. https://proceedings.neurips.cc/paper_files/paper/2014/file/375c71349b295fbe2dcdca9206f20a06-Paper.pdf
Yu, X., Liu, T., Wang, X., & Tao, D. (2017). On compressing deep models by low rank and sparse decomposition. In IEEE conference on computer vision and pattern recognition.
Yu, F., Qin, Z., & Chen, X. (2018). Distilling critical paths in convolutional neural networks. arXiv preprint arXiv:1811.02643.
Zagoruyko, S., & Komodakis, N. (2016.) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. CoRR. arXiv:1612.03928
Zhou, Y., Chen, L., Xie, R., Song, L., & Zhang, W. (2019). Low-precision CNN model quantization based on optimal scaling factor estimation. In IEEE international symposium on broadband multimedia systems and broadcasting.
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In IEEE computer society.

Download references

Funding

This work is funded by National Key Research and Development Project (Grant No: 2022YFB2703100), Ningbo Natural Science Foundation (2022J182), the Starry Night Science Fund of Zhejiang University Shanghai Institute for Advanced Study (Grant No. SN-ZJU-SIAS-001), Response-driven Intelligent Enhanced Control Technology for AC/DC Hybrid Power Grid with High Proportion of New Energy (5100-202155426A-0-0-00), the Fundamental Research Funds for the Central Universities (2021FZZX001-23), and Zhejiang Lab (No.2019KD0AD01/014).

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
Kaiwen Hu, Jing Gao, Fangyuan Mao & Mingli Song
Fuxi AI Lab, Netease, Hangzhou, 310005, China
Xinhui Song
Research Center for Intelligent Computing Software, Zhejiang Lab, Hangzhou, 311121, China
Lechao Cheng
Shanghai Institute for Advanced Study of Zhejiang University, Shanghai, 201203, China
Zunlei Feng & Mingli Song
School of Software Technology, Zhejiang University, Ningbo, 315193, China
Zunlei Feng

Authors

Kaiwen Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jing Gao
View author publications
You can also search for this author in PubMed Google Scholar
Fangyuan Mao
View author publications
You can also search for this author in PubMed Google Scholar
Xinhui Song
View author publications
You can also search for this author in PubMed Google Scholar
Lechao Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Zunlei Feng
View author publications
You can also search for this author in PubMed Google Scholar
Mingli Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zunlei Feng.

Ethics declarations

Conflict of interest

There are no conflicts to declare.

Additional information

Communicated by Bumsub Ham.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 4402 KB)

Appendix A

1.1 Visualization

1.1.1 Decision Route

Inspired by the biological visual perception mechanism, the convolutional layers are designed to be feature extractors, which aggregate useful features for the final predictions and filter out useless or irrelevant features. The decision route of the model for the sub-task is exactly a sub-network extracting and delivering task-relevant features. The decision route reveals lots of category-aware information in the model, which can be used for many downstream tasks, including model diagnosing, model interpretation, and knowledge distillation, etc. In this section, we give the decision route visualization results of UNet (VGG-16) on BFH dataset (not shown due to its huge size, and it can be downloaded from Online Resource). As shown in the figure, each column represents a convolutional layer (the leftmost column is the first layer of the UNet, and the rightmost layer is the output layer), and those circles in the layer represent filters, whose color shows the filter type in the network. The connections (not shown due to the complexity of the model) of the specific category k are all possible connections between filters related to k in the adjacent layers.

1.1.2 Backward Patterns

Here, we give the visualization results of embedding vectors attributed to our Backward Gradient Attribution process. Figure 9 shows the phenomenon that positive gradients w.r.t. feature maps of the same category in the penultimate layer of the segmentation model are highly consistent, while patterns of different categories are not, which is similar to the visualizations of our forward channel-wise activation attribution presented in the paper.

1.1.3 Forward Patterns for Different Convolutional Layers

In this section, we visualize forward patterns of DeepLabv3 (ResNet-50) in different convolutional layers like Fig. 1b, to better understand the mechanism in networks. We manually select the activation of four convolutional layers (1-st, 15-th, 30-th, 39-th) to visualize, as shown in Fig. 10. From Fig. 10a, b, we can find all categories share similar activation patterns in the shallow and middle convolutional layers of the segmentation network, resulting from the category-irrelevant low-level feature extraction. In contrast, in Fig. 10c, d, the activation patterns among categories diverge, which provides the possibility to disassemble the network into category-aware components. Notice that there are some channels that keep high activation among two or more categories (even across all classes). We interpret these kernels as channels to produce the shared features.

1.1.4 More Visualization Results

Transfer Learning We also visualize some segmentation results of images from COCO-birds dataset with the fine-tuned disassembled sub-network or the entire DeepLabV3 (ResNet-50) in Fig. 11, where we can find our Disassembler segments slight better on small parts of birds, e.g. feet, heads.

Choice of gradient To compare the impact of pruning corresponding to three gradient strategies in the backward gradient attribution process, we demonstrate the visualization of predictions of the disassembled sub-network in this section. The results shown in Fig. 12 illustrate that, in a similar level of model size, disassembled sub-network attributed by negative gradient are much worse than the other two policies, and sub-network attributed by positive gradients are slightly better than absolute strategy.

Knowledge Distillation In this section, we provide some segmentation results of students models trained from different knowledge distillation methods, together with fine-tuned sub-network on BFH and Pascal VOC 2012 datasets, as shown in Figs. 13 and 14.

Disassembled Sub-network In Fig. 15, we visualize the segmentation results of our disassembled sub-network on the MSCOCO validation dataset (categories 11–20 are selected). It is clear that the proposed Disassembler can realize the category-customizable task without additional training and perform better after one-epoch finetuning.

1.2 Experiments

In this section, we add some additional experiments to explore the potential of our Disassembler on other architectures and applications.

1.2.1 Model Disassembling with DDRNet-23

In this section, we conduct the model disassembling experiment with a more advanced semantic segmentation network DDRNet (Hong et al., 2021). All baselines in this section are trained with the settings provided in Sect. 4.1 ‘Parameter Settings’ and other settings keep the same as Sect. 4.2. Detailed settings like disassembled layer numbers are provided in Sect. A.3. From Table 4, we conclude that the proposed method is also effective on the modern semantic segmentation network with complex skip connections.

1.2.2 Model Disassembling with GCN

In this section, we explore the potential to disassemble models with different architectures. We attempt to disassemble GCN trained with the Cora dataset. Because this is a node classification task, we do not use high-confidence masks in the forward and backward attribution. Table 5 shows baseline model results, where we disassemble all convolutional layers, differing from the Disassembler. From Table 5, we conclude that the proposed method is also effective on totally different architectures like GCN.

1.2.3 Model Optimization

The proposed Disassembler also has the potential to improve the model’s performance. In this section, we attempt to optimize the performance of U-Net (ResNet-50) trained with the Pascal VOC dataset. We first attribute forward and backward patterns to get the correct decision path of the model. After that, we optimize the model by aligning the decision path of wrong samples to the correct path, i.e., the activation not in the decision path will be suppressed. Table 6 shows the result of this optimization on U-Net (ResNet-50), where we can find the model has an extra 1.2% mIoU improvement.

1.3 Detail Settings

In this section, we provide detailed settings of the experiments.

1.3.1 Model Disassembling

Besides \(\alpha \) and \(\beta \), the most important parameters are \(\tau _2\) and the disassembled layer number. The disassembled layer of models is given in Table 7. The \(\tau _2\) is mainly chosen based on the dataset. For MSCOCO, Pascal VOC 2012, and BFH datasets, we set \(\tau _2\) to 0.9, 0.8, and 0.9, respectively. However, we set the \(\tau _2\) of FCN (ResNet-50) on the Pascal VOC 2012 dataset to 0.9 and all \(\tau _2\) for DDRNet-23 are set to 0.8.

1.3.2 Model Compression

All model compression methods for comparison in the paper are kernel-scoring algorithms. They will assign a relative score for each kernel in the network, then remove those low-score kernels in the layer. For the compression strategy, we keep all pruned models of the same structure. As for the finetuning stages of all pruned models, we adopt the Adam optimizer with a learning rate of \(10^{-3}\) for the BFH dataset, and Momentum with a learning rate of \(10^{-4}\) for the MSCOCO and Pascal VOC 2012 dataset. The finetuning epoch is set to 10.

Table 4 The performance of model disassembling

Full size table

Table 5 The performance of GCN disassembling on Cora dataset. ‘original’, ‘disassemble’ denote the accuracy of the orginal model, disassembled sub-network, respectively

Full size table

Table 6 The results of model optimization experiments

Full size table

Table 7 The disassembled layer number of models

Full size table

1.3.3 Knowledge Distillation

In the knowledge distillation experiment, all SOTA methods adopt their recommended hyperparameters. For fairness, all epoch is set to 30. The Disassembler adopts the Adam optimizer with a learning rate of \(10^{-4}\) in the distillation.

1.3.4 Transfer Learning

We finetune the sub-network and the full network trained on the BFH dataset with the MSCOCO-birds in the transfer learning experiment. We adopt the Adam optimizer with a learning rate of \(10^{-4}\) in the finetuning stage.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hu, K., Gao, J., Mao, F. et al. Disassembling Convolutional Segmentation Network. Int J Comput Vis 131, 1741–1760 (2023). https://doi.org/10.1007/s11263-023-01776-z

Download citation

Received: 05 September 2022
Accepted: 26 February 2023
Published: 02 April 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11263-023-01776-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Disassembling Convolutional Segmentation Network

Abstract

Access this article

Similar content being viewed by others

GAN and DCN Based Multi-step Supervised Learning for Image Semantic Segmentation

An Image Segmentation Model Based on Cascaded Multilevel Features

Fully convolutional network with attention modules for semantic segmentation

Data availibility

Notes

References

Funding