Skip to main content
Log in

Disassembling Convolutional Segmentation Network

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In recent years, the convolutional segmentation network has achieved remarkable performance in the computer vision area. However, training a practicable segmentation network is time- and resource-consuming. In this paper, focusing on the semantic image segmentation task, we attempt to disassemble a convolutional segmentation network into category-aware convolution kernels and achieve customizable tasks without additional training by utilizing those kernels. The core of disassembling convolutional segmentation networks is how to identify the relevant convolution kernels for a specific category. According to the encoder-decoder network architecture, the disassembling framework, named Disassembler, is devised to be composed of the forward channel-wise activation attribution and backward gradient attribution. In the forward channel-wise activation attribution process, for each image, the activation values of each feature map in the high-confidence mask area are summed into category-aware probability vectors. In the backward gradient attribution process, the positive gradients w.r.t. each feature map in the high-confidence mask area are summed into a relative coefficient vector for each category. With the cooperation of two vectors, the Disassembler can effectively disassemble category-aware convolution kernels. Extensive experiments demonstrate that the proposed Disassembler can accomplish the category-customizable task without additional training. The disassembled category-aware sub-network achieves comparable performance without any finetuning and will outperform existing state-of-the-art methods with one epoch of finetuning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availibility

The MSCOCO Lin et al. (2014) dataset can be obtained from https://cocodataset.org/. The Pascal VOC 2012 Everingham et al. (2015) dataset can be obtained from http://host.robots.ox.ac.uk/pascal/VOC/voc2012/. The Bird Welinder et al. (2010) dataset can be obtained from http://www.vision.caltech.edu/visipedia/CUB-200.html. The Flowers Nilsback and Zisserman (2008) dataset can be downloaded from https://www.robots.ox.ac.uk/~vgg/data/bicos/. The HumanMatting dataset can be downloaded from https://github.com/aisegmentcn/matting_human_datasets.

Notes

  1. https://github.com/aisegmentcn/matting_human_datasets.

References

  • Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K. R., & Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7), 130140.

    Article  Google Scholar 

  • Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., & Müller, K.-R. (2010). How to explain individual classification decisions. Journal of Machine Learning Research, 11(61), 1803–1831.

    MathSciNet  MATH  Google Scholar 

  • Berthelier, A., Chateau, T., Duffner, S., Garcia, C., & Blanc, C. (2020). Deep model compression and architecture optimization for embedded systems: A survey. Journal of Signal Processing Systems, 93(8), 863–878.

    Article  Google Scholar 

  • Bucila, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In ACM SIGKDD international conference on knowledge discovery and data mining(KDD’06).

  • Chang, H., Han, J., Zhong, C., Snijders, A., & Mao, J. H. (2018). Unsupervised transfer learning via multi-scale convolutional sparse coding for biomedical applications. IEEE Transactions on Pattern Analysis & Machine Intelligence, 40(5), 1182–1194.

    Article  Google Scholar 

  • Chen, L., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking Atrous convolution for semantic image segmentation. CoRR. arXiv:1706.05587.

  • Chen, T., Sui, Y., Chen, X., Zhang, A., & Wang, Z. (2021). A unified lottery ticket hypothesis for graph neural networks. In International conference on machine learning, pp. 1695–1706. PMLR.

  • Chen, T., Frankle, J., Chang, S., Liu, S., Zhang, Y., Wang, Z., & Carbin, M. (2020). The lottery ticket hypothesis for pre-trained BERT networks. Advances in Neural Information Processing Systems, 33, 15834–15846.

    Google Scholar 

  • Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected CRFs. Computer Science, 4, 357–361.

    Google Scholar 

  • Chen, J., Wang, J., Wang, X., Wang, X., Feng, Z., Liu, R., & Song, M. (2021). CoEvo-Net: Coevolution network for video highlight detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(6), 3788–3797.

    Article  Google Scholar 

  • Choudhary, T., Mishra, V., Goswami, A., & Sarangapani, J. (2020). A comprehensive survey on model compression and acceleration. Artificial Intelligence Review, 53(3), 5113–5155.

    Article  Google Scholar 

  • Crowley, E. J., Gray, G., & Storkey, A. (2017). Moonshine: Distilling with cheap convolutions. In Conference on neural information processing systems.

  • Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE computer society conference on computer vision and pattern recognition (CVPR 2009), 20–25 June 2009, Miami, Florida, USA, pp. 248–255. https://doi.org/10.1109/CVPR.2009.5206848

  • Desai, S., & Ramaswamy, H. G. (2020). Ablation-CAM: Visual explanations for deep convolutional network via gradient-free localization. In IEEE winter conference on applications of computer vision, pp. 983–991.

  • Essen, D. V., & Deyoe, E. A. (1995). Concurrent processing in the primate visual cortex. In Cognitive neurosciences (pp. 383–400).

  • Everingham, M., Eslami, S., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.

    Article  Google Scholar 

  • Fang, G., Bao, Y., Song, J., Wang, X., Xie, D., Shen, C., & Song, M. (2021). Mosaicking to distill: Knowledge distillation from out-of-domain data. In Conference on neural information processing systems.

  • Feng, Z., Cheng, L., Wang, X., Wang, X., Liu, Y., Du, X., & Song, M. (2021). Visual boundary knowledge translation for foreground segmentation. In AAAI conference on artificial intelligence.

  • Feng, Z., Hu, J., Wu, S., Yu, X., Song, J., & Song, M. (2022). Model doctor: A simple gradient aggregation strategy for diagnosing and treating CNN classifiers. In AAAI conference on artificial intelligence.

  • Feng, Z., Wang, Z., Wang, X., Zhang, X., & Song, M. (2021). Edge-competing pathological liver vessel segmentation with limited labels. In AAAI conference on artificial intelligence.

  • Feng, Y., Wu, F., Shao, X., Wang, Y., & Zhou, X. (2018). Joint 3d face reconstruction and dense alignment with position map regression network. In European conference on computer vision, pp. 557–574.

  • Feng, Z., Liang, W., Tao, D., Sun, L., & Song, M. (2019). CU-NET: Component unmixing network for textile fiber identification. International Journal of Computer Vision, 127(10), 1443–1454.

    Article  Google Scholar 

  • Feng, Z., Wang, Z., Wang, X., Mao, Y., Li, T., Lei, J., Wang, Y., & Song, M. (2015). Mutual-complementing framework for nuclei detection and segmentation in pathology image. IEEE International Conference on Computer Vision, 39(4), 640–651.

    Google Scholar 

  • Flennerhag, S., Moreno, P. G., Lawrence, N. D., & Damianou, A. (2018). Transferring knowledge across learning processes. In International conference on learning representations.

  • Frankle, J., & Carbin, M. (2018). The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635

  • Girish, S., Maiya, S. R., Gupta, K., Chen, H., Davis, L. S., & Shrivastava, A. (2021). The lottery ticket hypothesis for object recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 762–771.

  • Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2021). Knowledge distillation: A survey. International Journal of Computer Vision, 1–31.

  • Gupta, S., Hoffman, J., & Malik, J. (2016). Cross modal distillation for supervision transfer. In IEEE computer society, pp. 2827–2836.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. CoRR. arXiv:1512.03385

  • Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. Computer Science, 14(7), 38–39.

    Google Scholar 

  • Hong, Y., Pan, H., Sun, W., & Jia, Y. (2021). Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv preprint arXiv:2101.06085

  • Hu, J., Cao, L., Tong, T., Ye, Q., Zhang, S., Li, K., Huang, F., Shao, L., & Ji, R. (2021). Architecture disentanglement for deep neural networks. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 672–681.

  • Hu, J., Gao, J., Feng, Z., Cheng, L., Lei, J., Bao, H., & Song, M. (2022). CNN LEGO: Disassembling and assembling convolutional neural network.

  • Hu, H., Peng, R., Tai, Y., & Tang, C. (2016). Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. CoRR. arXiv:1607.03250

  • Jie, L., Luan, Q., Song, X., Xiao, L., Tao, D., & Song, M. (2019). Action parsing-driven video summarization based on reinforcement learning. IEEE Transactions on Circuits & Systems for Video Technology, 29(7), 2126–2137.

    Article  Google Scholar 

  • Jing, Z., Li, W., & Ogunbona, P. (2017). Joint geometrical and statistical alignment for visual domain adaptation. In Computer vision and pattern recognition.

  • Jing, Y., Liu, X., Ding, Y., Wang, X., Ding, E., Song, M., & Wen, S. (2020). Dynamic instance normalization for arbitrary style transfer. In AAAI.

  • Jing, Y., Mao, Y., Yang, Y., Zhan, Y., Song, M., Wang, X., & Tao, D. (2022). Learning graph neural networks for image style transfer. In ECCV.

  • Jing, Y., Yang, Y., Wang, X., Song, M., & Tao, D. (2021a). Amalgamating knowledge from heterogeneous graph neural networks. In CVPR.

  • Jing, Y., Yang, Y., Wang, X., Song, M., & Tao, D. (2021b). Meta-aggregator: learning to aggregate for 1-bit graph neural networks. In ICCV.

  • Kang, M., Mun, J., & Han, B. (2019). Towards oracle knowledge distillation with neural architecture search. In International joint conference on artificial intelligence.

  • Kapoor, R., Sharma, D., & Gulati, T. (2021). State of the art content based image retrieval techniques using deep learning: A survey. Multimedia Tools and Applications, 80(19), 29561–29583.

    Article  Google Scholar 

  • Khakzar, A., Baselizadeh, S., Khanduja, S., Rupprecht, C., Kim, S. T., & Navab, N. (2021). Neural response interpretation through the lens of critical pathways. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13528–13538.

  • Lalonde, J. F. (2018). Deep learning for augmented reality. In 2018 17th workshop on information optics (WIO).

  • Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2016). Pruning filters for efficient convnets. CoRR. arXiv:1608.08710

  • Li, G., Wang, J., Shen, H. W., Chen, K., & Lu, Z. (2021). CNNPruner: Pruning convolutional neural networks with visual analytics. IEEE Transactions on Visualization and Computer Graphics, 27(2), 1364–1373.

    Article  Google Scholar 

  • Li, J., Cheng, H., Guo, H., & Qiu, S. (2018). Survey on artificial intelligence for vehicles. Automotive Innovation, 1, 2–14.

    Article  Google Scholar 

  • Lin, M., Ji, R., Wang, Y., Zhang, Y., Zhang, B., Tian, Y., & Shao, L. (2020). HRank: Filter pruning using high-rank feature map. In 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 1526–1535. https://doi.org/10.1109/CVPR42600.2020.00160

  • Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick, R. B., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. CoRR. arXiv:1405.0312

  • Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., & Wang, J. (2019). Structured knowledge distillation for semantic segmentation. In IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 2604–2613. https://doi.org/10.1109/CVPR.2019.00271

  • Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision. pp. 2755–2763

  • Liu, X., Liu, Z., Wang, G., Cai, Z., & Zhang, H. (2018). Ensemble transfer learning algorithm. IEEE Access, 6, 2389–2396.

    Article  Google Scholar 

  • Livingstone, M. S., & Hubel, D. H. (1987). Psychophysical evidence for separate channels for the perception of form, color, movement, and depth. Journal of Neuroscience, 7(11), 3416–3468.

    Article  Google Scholar 

  • Long, J., Shelhamer, E., & Darrell, T. (2014). Fully convolutional networks for semantic segmentation. CoRR. arXiv:1411.4038

  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640–651.

    Google Scholar 

  • Luo, S., Pan, W., Wang, X., Wang, D., & Song, M. (2020). Collaboration by competition: Self-coordinated knowledge amalgamation for multi-talent student learning. In European conference on computer vision.

  • Marcel, S., & Rodriguez, Y. (2010). Torchvision the machine-vision package of torch. In Proceedings of the 18th ACM international conference on multimedia. MM ’10, pp. 1485–1488. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1873951.1874254

  • Naidu, R., & Michael, J. (2020). SS-CAM: Smoothed Score-CAM for sharper visual feature localization. arXiv preprint arXiv:2006.14255

  • Nilsback, M. -E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In Indian conference on computer vision, graphics and image processing.

  • Panigrahi, S., Nanda, A., & Swarnkar, T. (2021). A survey on transfer learning.

  • Pawar, K., & Attar, V. (2019). Deep learning approaches for video-based anomalous activity detection. World Wide Web, 22(2), 571–601.

    Article  Google Scholar 

  • Polino, A., Pascanu, R., & Alistarh, D. (2018). Model compression via distillation and quantization. In International conference on learning representations.

  • Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, 39(6), 1137–1149.

    Article  Google Scholar 

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-NET: Convolutional networks for biomedical image segmentation. CoRR. arXiv:1505.04597

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. Springer International Publishing.

  • Shen, C., Xue, M., Wang, X., Song, J., Sun, L., & Song, M. (2019). Customizing student networks from heterogeneous teachers via adaptive knowledge amalgamation. In IEEE international conference on computer vision.

  • Shrikumar, A., Greenside, P., & Kundaje, A. (2017). Learning important features through propagating activation differences. In International conference on machine learning.

  • Shrikumar, A., Greenside, P., Shcherbina, A., & Kundaje, A. (2016). Not just a black box: Learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713

  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations.

  • Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. Computer Science.

  • Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. In International conference on learning representations workshop.

  • Sundararajan, M., Taly, A., & Yan, Q. (2017). Axiomatic attribution for deep networks. In International conference on machine learning, pp. 3319–3328. PMLR.

  • Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., & Liu, C. (2018). A survey on deep transfer learning. In International conference on artificial neural networks.

  • Treisman, A. M. (1963). Selective attention in man. British Medical Bulletin, 20(1), 12–16.

    Article  Google Scholar 

  • Tzeng, E., Hoffman, J., Darrell, T., & Saenko, K. (2017). Simultaneous deep transfer across domains and tasks. In IEEE international conference on computer vision.

  • Wang, Y., Su, H., Zhang, B., & Hu, X. (2018). Interpret neural networks by identifying critical data routing paths. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8906–8914.

  • Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S., Mardziel, P., & Hu, X. (2020). Score-CAM: Score-weighted visual explanations for convolutional neural networks. In IEEE conference on computer vision and pattern recognition workshops, pp. 111–119.

  • Wang, W., Zhang, B., Cui, T., Chai, Y., & Li, Y. (2021). Research on knowledge distillation of generative adversarial networks. In Data compression conference.

  • Wang, Y., Zhou, W., Jiang, T., Bai, X., & Xu, Y. (2020). Intra-class feature variation distillation for semantic segmentation. In A. Vedaldi, H. Bischof, T. Brox, & J. Frahm (Eds.), Computer vision—ECCV 2020—16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII. Lecture Notes in Computer Science, vol. 12352, pp. 346–362. https://doi.org/10.1007/978-3-030-58571-6_21

  • Wang, J., Zhu, H., Wang, S., & Zhang, Y. D. (2021). A review of deep learning on medical image analysis. Mobile Networks and Applications, 26(2), 351–380.

    Article  Google Scholar 

  • Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., & Perona, P. (2010). Caltech-ucsd birds 200. Technical Report CNS-TR-201, Caltech. http://www.vision.caltech.edu/visipedia/CUB-200.html.

  • Yang, Y., Qiu, J., Song, M., Tao, D., & Wang, X. (2020). Distilling knowledge from graph convolutional networks. In IEEE conference on computer vision and pattern recognition.

  • Ye, J., Ji, Y., Wang, X., Gao, X., & Song, M. (2020). Data-free knowledge amalgamation via group-stack dual-GAN. In IEEE conference on computer vision and pattern recognition.

  • Ye, J., Wang, X., Ji, Y., Ou, K., & Song, M. (2019). Amalgamating filtered knowledge: Learning task-customized student from multi-task teachers. In International joint conference on artificial intelligence.

  • Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? Advances in Neural Information Processing Systems,27. https://proceedings.neurips.cc/paper_files/paper/2014/file/375c71349b295fbe2dcdca9206f20a06-Paper.pdf

  • Yu, X., Liu, T., Wang, X., & Tao, D. (2017). On compressing deep models by low rank and sparse decomposition. In IEEE conference on computer vision and pattern recognition.

  • Yu, F., Qin, Z., & Chen, X. (2018). Distilling critical paths in convolutional neural networks. arXiv preprint arXiv:1811.02643.

  • Zagoruyko, S., & Komodakis, N. (2016.) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. CoRR. arXiv:1612.03928

  • Zhou, Y., Chen, L., Xie, R., Song, L., & Zhang, W. (2019). Low-precision CNN model quantization based on optimal scaling factor estimation. In IEEE international symposium on broadband multimedia systems and broadcasting.

  • Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In IEEE computer society.

Download references

Funding

This work is funded by National Key Research and Development Project (Grant No: 2022YFB2703100), Ningbo Natural Science Foundation (2022J182), the Starry Night Science Fund of Zhejiang University Shanghai Institute for Advanced Study (Grant No. SN-ZJU-SIAS-001), Response-driven Intelligent Enhanced Control Technology for AC/DC Hybrid Power Grid with High Proportion of New Energy (5100-202155426A-0-0-00), the Fundamental Research Funds for the Central Universities (2021FZZX001-23), and Zhejiang Lab (No.2019KD0AD01/014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zunlei Feng.

Ethics declarations

Conflict of interest

There are no conflicts to declare.

Additional information

Communicated by Bumsub Ham.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 4402 KB)

Appendix A

Appendix A

1.1 Visualization

1.1.1 Decision Route

Inspired by the biological visual perception mechanism, the convolutional layers are designed to be feature extractors, which aggregate useful features for the final predictions and filter out useless or irrelevant features. The decision route of the model for the sub-task is exactly a sub-network extracting and delivering task-relevant features. The decision route reveals lots of category-aware information in the model, which can be used for many downstream tasks, including model diagnosing, model interpretation, and knowledge distillation, etc. In this section, we give the decision route visualization results of UNet (VGG-16) on BFH dataset (not shown due to its huge size, and it can be downloaded from Online Resource). As shown in the figure, each column represents a convolutional layer (the leftmost column is the first layer of the UNet, and the rightmost layer is the output layer), and those circles in the layer represent filters, whose color shows the filter type in the network. The connections (not shown due to the complexity of the model) of the specific category k are all possible connections between filters related to k in the adjacent layers.

1.1.2 Backward Patterns

Here, we give the visualization results of embedding vectors attributed to our Backward Gradient Attribution process. Figure 9 shows the phenomenon that positive gradients w.r.t. feature maps of the same category in the penultimate layer of the segmentation model are highly consistent, while patterns of different categories are not, which is similar to the visualizations of our forward channel-wise activation attribution presented in the paper.

Fig. 10
figure 10

The visualization of convolution filters (attributed with the proposed forward gradient attribution) for samples of different categories in convolutional layers of DeepLabV3 (ResNet-50) trained on the MSCOCO (Lin et al., 2014) dataset. The y-axes of subfigures represent the category id, and the x-axis index refers to different channels in the layer

Fig. 11
figure 11

Visualization results of transfer learning demo experiment. ‘Disassembler’ and ‘Original’ refers to the disassembled sub-network and the entire DeepLabV3 (ResNet-50), respectively

Fig. 12
figure 12

Visualization results of the disassembled sub-network with different gradient policies

Fig. 13
figure 13

Visualization results of knowledge distillation methods on BFH validation datasets

Fig. 14
figure 14

Visualization results of knowledge distillation methods on Pascal VOC 2012 validation datasets

1.1.3 Forward Patterns for Different Convolutional Layers

In this section, we visualize forward patterns of DeepLabv3 (ResNet-50) in different convolutional layers like Fig. 1b, to better understand the mechanism in networks. We manually select the activation of four convolutional layers (1-st, 15-th, 30-th, 39-th) to visualize, as shown in Fig. 10. From Fig. 10a, b, we can find all categories share similar activation patterns in the shallow and middle convolutional layers of the segmentation network, resulting from the category-irrelevant low-level feature extraction. In contrast, in Fig. 10c, d, the activation patterns among categories diverge, which provides the possibility to disassemble the network into category-aware components. Notice that there are some channels that keep high activation among two or more categories (even across all classes). We interpret these kernels as channels to produce the shared features.

1.1.4 More Visualization Results

Transfer Learning We also visualize some segmentation results of images from COCO-birds dataset with the fine-tuned disassembled sub-network or the entire DeepLabV3 (ResNet-50) in Fig. 11, where we can find our Disassembler segments slight better on small parts of birds, e.g. feet, heads.

Choice of gradient To compare the impact of pruning corresponding to three gradient strategies in the backward gradient attribution process, we demonstrate the visualization of predictions of the disassembled sub-network in this section. The results shown in Fig. 12 illustrate that, in a similar level of model size, disassembled sub-network attributed by negative gradient are much worse than the other two policies, and sub-network attributed by positive gradients are slightly better than absolute strategy.

Knowledge Distillation In this section, we provide some segmentation results of students models trained from different knowledge distillation methods, together with fine-tuned sub-network on BFH and Pascal VOC 2012 datasets, as shown in Figs. 13 and 14.

Disassembled Sub-network In Fig. 15, we visualize the segmentation results of our disassembled sub-network on the MSCOCO validation dataset (categories 11–20 are selected). It is clear that the proposed Disassembler can realize the category-customizable task without additional training and perform better after one-epoch finetuning.

1.2 Experiments

In this section, we add some additional experiments to explore the potential of our Disassembler on other architectures and applications.

1.2.1 Model Disassembling with DDRNet-23

In this section, we conduct the model disassembling experiment with a more advanced semantic segmentation network DDRNet (Hong et al., 2021). All baselines in this section are trained with the settings provided in Sect. 4.1 ‘Parameter Settings’ and other settings keep the same as Sect. 4.2. Detailed settings like disassembled layer numbers are provided in Sect. A.3. From Table 4, we conclude that the proposed method is also effective on the modern semantic segmentation network with complex skip connections.

1.2.2 Model Disassembling with GCN

In this section, we explore the potential to disassemble models with different architectures. We attempt to disassemble GCN trained with the Cora dataset. Because this is a node classification task, we do not use high-confidence masks in the forward and backward attribution. Table 5 shows baseline model results, where we disassemble all convolutional layers, differing from the Disassembler. From Table 5, we conclude that the proposed method is also effective on totally different architectures like GCN.

1.2.3 Model Optimization

The proposed Disassembler also has the potential to improve the model’s performance. In this section, we attempt to optimize the performance of U-Net (ResNet-50) trained with the Pascal VOC dataset. We first attribute forward and backward patterns to get the correct decision path of the model. After that, we optimize the model by aligning the decision path of wrong samples to the correct path, i.e., the activation not in the decision path will be suppressed. Table 6 shows the result of this optimization on U-Net (ResNet-50), where we can find the model has an extra 1.2% mIoU improvement.

Fig. 15
figure 15

Visualization results of the disassembled sub-network on the MSCOCO validation dataset. ‘Input’ and ‘GT’ represent input images and ground truth masks. ‘disassemble’ and ‘finetune’ represent the visualization results of disassembled sub-networks before/after additional one-epoch finetuning

1.3 Detail Settings

In this section, we provide detailed settings of the experiments.

1.3.1 Model Disassembling

Besides \(\alpha \) and \(\beta \), the most important parameters are \(\tau _2\) and the disassembled layer number. The disassembled layer of models is given in Table 7. The \(\tau _2\) is mainly chosen based on the dataset. For MSCOCO, Pascal VOC 2012, and BFH datasets, we set \(\tau _2\) to 0.9, 0.8, and 0.9, respectively. However, we set the \(\tau _2\) of FCN (ResNet-50) on the Pascal VOC 2012 dataset to 0.9 and all \(\tau _2\) for DDRNet-23 are set to 0.8.

1.3.2 Model Compression

All model compression methods for comparison in the paper are kernel-scoring algorithms. They will assign a relative score for each kernel in the network, then remove those low-score kernels in the layer. For the compression strategy, we keep all pruned models of the same structure. As for the finetuning stages of all pruned models, we adopt the Adam optimizer with a learning rate of \(10^{-3}\) for the BFH dataset, and Momentum with a learning rate of \(10^{-4}\) for the MSCOCO and Pascal VOC 2012 dataset. The finetuning epoch is set to 10.

Table 4 The performance of model disassembling
Table 5 The performance of GCN disassembling on Cora dataset. ‘original’, ‘disassemble’ denote the accuracy of the orginal model, disassembled sub-network, respectively
Table 6 The results of model optimization experiments
Table 7 The disassembled layer number of models

1.3.3 Knowledge Distillation

In the knowledge distillation experiment, all SOTA methods adopt their recommended hyperparameters. For fairness, all epoch is set to 30. The Disassembler adopts the Adam optimizer with a learning rate of \(10^{-4}\) in the distillation.

1.3.4 Transfer Learning

We finetune the sub-network and the full network trained on the BFH dataset with the MSCOCO-birds in the transfer learning experiment. We adopt the Adam optimizer with a learning rate of \(10^{-4}\) in the finetuning stage.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, K., Gao, J., Mao, F. et al. Disassembling Convolutional Segmentation Network. Int J Comput Vis 131, 1741–1760 (2023). https://doi.org/10.1007/s11263-023-01776-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01776-z

Keywords

Navigation