Advertisement

Constructing a Convolutional Neural Network with a Suitable Capacity for a Semantic Segmentation Task

  • Yalong JiangEmail author
  • Zheru Chi
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 866)

Abstract

Although the state-of-the-art performance has been achieved in many computer vision tasks such as image classification, object detection, saliency prediction and depth estimation, Convolutional Neural Networks (CNNs) still perform unsatisfactorily in some difficult tasks such as human parsing which is the focus of our research. The inappropriate capacity of a CNN model and insufficient training data both contribute to the failure in perceiving the semantic information of detailed regions. The feature representations learned by a high-capacity model cannot generalize to the variations in viewpoints, human poses and occlusions in real-world scenarios due to overfitting. On the other hand, the under-fitting problem prevents a low-capacity model from developing the representations which are sufficiently expressive. In this chapter, we propose an approach to estimate the complexity of a task and match the capacity of a CNN model to the complexity of a task while avoiding under-fitting and overfitting. Firstly, a novel training scheme is proposed to fully explore the potential of low-capacity CNN models. The scheme outperforms existing end-to-end training schemes and enables low-capacity models to outperform models with higher capacity. Secondly, three methods are proposed to optimize the capacity of a CNN model on a task. The first method is based on improving the orthogonality among kernels which contributes to higher computational efficiency and better performance. In the second method, the convolutional kernels within each layer are evaluated according to their semantic functions and contributions to the training and test accuracy. The kernels which only contribute to the training accuracy but has no effect on the testing accuracy are removed to avoid overfitting. In the third method, the capacity of a CNN model is optimized by adjusting the dependency among convolutional kernels. A novel structure of convolutional layers is proposed to reduce the number of parameters while maintaining the similar performance. Besides capacity optimization, we further propose a method to evaluate the complexity of a human parsing task. An independent CNN model is trained for this purpose using the labels for pose estimation. The evaluation on complexity is achieved based on estimated pose information in images. The proposed scheme for complexity evaluation was conducted on the Pascal Person Part dataset and the Look into Person dataset which are for human parsing. The schemes for capacity optimization were conducted on our models for human parsing which were trained on the two data sets. Both quantitative and qualitative results demonstrate that our proposed algorithms can match the capacity of a CNN model well to the complexity of a task.

Keywords

Convolutional neural networks (CNNs) Under-fitting Over-fitting Capacity optimization Complexity evaluation 

References

  1. 1.
    Lee, J., Xiao, L., Schoenholz, S.S., Bahri, Y., Sohl-Dickstein, J., Pennington, J.: Wide neural networks of any depth evolve as linear models under gradient descent (2019). arXiv preprint arXiv:1902.06720
  2. 2.
    Rolnick, D., Tegmark, M.: The power of deeper networks for expressing natural functions. In: International Conference on Learning Representations (2018)Google Scholar
  3. 3.
    Xiao, L., Bahri, Y., Sohl-Dickstein, J., Schoenholz, S.S., Pennington, J.: Dynamical isometry and a mean field theory of CNNs: how to train 10,000-layer vanilla convolutional neural networks (2018). arXiv preprint arXiv:1806.05393
  4. 4.
    Pierre, B., Roman, V.: Neuronal capacity. In: NIPS (2018)Google Scholar
  5. 5.
    Lei, N., Luo, Z., Yau, S.T., Gu, D.X.: Geometric understanding of deep learning (2018). arXiv preprint arXiv:1805.10451
  6. 6.
    Krizhevsky, A., Hinton, G.: Convolutional deep belief networks on cifar-10. In: Unpublished manuscript (2010)Google Scholar
  7. 7.
    Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., Li, F.F.: Large scale visual recognition challenge 2012. Available: http://www.image-net.org/challenges/ILSVRC/2012/ (2012)
  8. 8.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  9. 9.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
  10. 10.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  12. 12.
    Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1. pp. 3 (2017)Google Scholar
  13. 13.
    Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5987–5995 (2017)Google Scholar
  14. 14.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks (2017). arXiv preprint arXiv:1709.01507
  15. 15.
    Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition (2017). arXiv preprint arXiv:1707.07012
  16. 16.
    Liu, C., Zoph, B., Shlens, J., Hua, W., Li, L.J., Li, F.F., Yuille, A., Huang, J., Murphy, K.: Progressive neural architecture search (2017). arXiv preprint arXiv:1712.00559
  17. 17.
    Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search (2018). arXiv preprint arXiv:1802.01548
  18. 18.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)Google Scholar
  19. 19.
    He, K., Zhang, X., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630–645 (2016)CrossRefGoogle Scholar
  20. 20.
    Simon, M., Rodner, E., Denzler, J.: ImageNet pre-trained models with batch normalization (2016). arXiv preprint arXiv:1612.01452, https://github.com/cvjena/cnn-models
  21. 21.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual networks (2015). Available: https://github.com/KaimingHe/deep-residual-networks
  22. 22.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual networks with 1 K layers (2016). Available: https://github.com/KaimingHe/resnet-1k-layers
  23. 23.
    He, K., Zhang, X., Ren, S., Sun, J.: Trained ResNet torch models (2016). Available: https://github.com/facebook/fb.resnet.torch/tree/master/pretrained
  24. 24.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press (2016)Google Scholar
  25. 25.
    Kurakin, A., Goodfellow, I., Bengio, S., Dong, Y., Liao, F., Liang, M., Liang, J.: Adversarial attacks and defences competition (2018). arXiv preprint arXiv:1804.00097CrossRefGoogle Scholar
  26. 26.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual object classes challenge 2012 (VOC2012) results. Available: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html (2012)
  27. 27.
    Luo, L., Xiong, Y., Liu, Y., Sun, X.: Adaptive gradient methods with dynamic bound of learning rate (2019). arXiv preprint arXiv:1902.09843
  28. 28.
    Anonymous: Shallow learning for deep networks. In: Under double-blind review (2018)Google Scholar
  29. 29.
    Zhou, Z.H.: A brief introduction to weakly supervised learning. Nat. Sci. Rev. 5(1), 44–53 (2017)CrossRefGoogle Scholar
  30. 30.
    Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 932–940 (2017)Google Scholar
  31. 31.
    Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: Proceedings of the European Conference on Computer Vision (2018)Google Scholar
  32. 32.
    Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L.: Microsoft coco: common objects in context. In: Proceedings of ECCV (2014)Google Scholar
  33. 33.
    Neuhold, G., Ollmann, T., Rota Bulo, S., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4990–4999 (2017)Google Scholar
  34. 34.
    Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ADE20K dataset. Int. J. Comput. Vision 1–20 (2016)Google Scholar
  35. 35.
    Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  36. 36.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE CVPR (2014)Google Scholar
  37. 37.
    Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)CrossRefGoogle Scholar
  38. 38.
    Li, J., Zhao, J., Wei, Y., Lang, C., Li, Y., Sim, T., Yan, S., Feng, J.: Multiple-human parsing in the wild (2017). arXiv:1705.07206
  39. 39.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)CrossRefGoogle Scholar
  40. 40.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: Available: http://www.pascal-network.org/challenges/VOC/voc2011/workshop/index.html (2011)
  41. 41.
    Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A.: H-DenseUNet: hybrid densely connected UNet for liver and liver tumor segmentation from CT volumes (2017). arXiv preprint arXiv:1709.07330
  42. 42.
    Kirillov, A., He, K., Girshick, R., Rother, C.: Panoptic segmentation (2018). arXiv preprint arXiv:1801.00868
  43. 43.
    de Geus, D., Meletis, P., Dubbelman, G.: Panoptic segmentation with a joint semantic and instance segmentation network (2018). arXiv preprint arXiv:1809.02110
  44. 44.
    Zheng, S.: Conditional random fields as recurrent neural networks. In: Proceedings of IEEE ICCV (2015)Google Scholar
  45. 45.
    Li, X., Zhao, L., Wei, L., Yang, M.H., Wu, F., Zhuang, Y., Ling, H., Wang, J.: DeepSaliency: multi-task deep neural network model for salient object detection. IEEE Trans. Image Process. 25(8), 3919–3930 (2016)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Xia, F., Wang, P., Chen, X., Yuille, A.L.: Joint multi-person pose estimation and semantic part segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  47. 47.
    Jiang, Y., Chi, Z.: A fully-convolutional framework for semantic segmentation. In: Proceedings of IEEE DICTA (2017)Google Scholar
  48. 48.
    Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.L.: Detect what you can: detecting and representing objects using holistic models and body parts. In: Proceedings of IEEE CVPR (2014)Google Scholar
  49. 49.
    Chen, L.C., Barron, J.T., Papandreou, G., Murphy, K., Yuille, A.L.: Semantic image segmentation with task-specific edge detection using cnns and a discriminatively trained domain transform. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  50. 50.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Alan, L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)CrossRefGoogle Scholar
  51. 51.
    Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arXiv preprint arXiv:1706.05587
  52. 52.
    Li, Q., Arnab, A., Torr, P.H.: Weakly-and semi-supervised panoptic segmentation. In: Proceedings of the European Conference on Computer Vision, pp. 102–118 (2018)CrossRefGoogle Scholar
  53. 53.
    Dai, J., He, K., Sun, J.: Boxsup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of IEEE ICCV (2015)Google Scholar
  54. 54.
    Zhang, L., Yang, Y., Gao, Y., Yu, Y., Wang, C., Li, X.: A probabilistic associative model for segmenting weakly supervised images. IEEE Trans. Image Process. 23(9), 4150–4159 (2014)MathSciNetCrossRefGoogle Scholar
  55. 55.
    Zamir, A.R., Sax, A., Shen, W., Guibas, L.J., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  56. 56.
    Donahue, J., Krahenbuhl, P., Darrell, T.: Adversarial feature learning (2016). arXiv preprint arXiv:1605.09782
  57. 57.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Massachusetts, Cambridge (2016)Google Scholar
  58. 58.
    Wang, Y.X., Ramanan, D., Hebert, M.: Growing a brain: fine-tuning by increasing model capacity. In: Proceedings of IEEE CVPR (2017)Google Scholar
  59. 59.
    Wang, Y., Xie, L., Liu, C., Qiao, S., Zhang, Y., Zhang, W., Yuille, A.L.: Sort: second-order response transform for visual recognition. In: Proceedings of IEEE ICCV (2017)Google Scholar
  60. 60.
    Sigaud, O., Droniou, A.: Towards deep developmental learning. IEEE Trans. Cognit. Dev. Syst. 8(2), 99–114 (2016)CrossRefGoogle Scholar
  61. 61.
    Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle. In: Information Theory Workshop (ITW) (2015)Google Scholar
  62. 62.
    Gabrié, M., Manoel, A., Luneau, C., Barbier, J., Macris, N., Krzakala, F., Zdeborová, L.: Entropy and mutual information in models of deep neural networks (2018). arXiv preprint arXiv:1805.09785
  63. 63.
    Guss, W.H., Salakhutdinov, R.: On characterizing the capacity of neural networks using algebraic topology (2018). arXiv preprint arXiv:1802.04443
  64. 64.
    Zagoruyko, S., Komodakis, N.: Wide residual networks (2016). arXiv preprint arXiv:1605.07146
  65. 65.
    Krizhevsky, A., Hinton, G.E.: Learning Multiple Layers of Features from Tiny Images. Toronto (2009)Google Scholar
  66. 66.
    Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  67. 67.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)CrossRefGoogle Scholar
  68. 68.
    Jiang, Y., Chi, Z.: A CNN model for semantic person part segmentation with capacity optimization. IEEE Trans. Image Process. (2018)Google Scholar
  69. 69.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015). arXiv preprint arXiv:1503.02531
  70. 70.
    Cao, Z., Hidalgo, G., Simon, T., Wei, S., Sheikh, Y.: Openpose: realtime multi-person 2D pose estimation using part affinity fields (2018). arXiv preprint arXiv:1812.08008
  71. 71.
    Zhang, R., Phillip, I., Alexei, A.E.: Colorful image colorization. In: European Conference on Computer Vision (2016)Google Scholar
  72. 72.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger (2017). arXiv preprint arXiv:1612.08242
  73. 73.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  74. 74.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: new benchmark and state of the art analysis. In: Proceedings of IEEE CVPR (2014)Google Scholar
  75. 75.
    Huang, L., Liu, X., Lang, B., Yu, A.W., Wang, Y., Li, B.: Orthogonal weight normalization: solution to optimization over multiple dependent stiefel manifolds in deep neural networks. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  76. 76.
    Desjardins, G., Simonyan, K., Pascanu, R., Kavukcuoglu, K.: Natural neural networks. In: Neural Information Processing Systems (2015)Google Scholar
  77. 77.
    Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2(4), 433–459 (2010)CrossRefGoogle Scholar
  78. 78.
    Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)CrossRefGoogle Scholar
  79. 79.
    Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., Li, F.F.: Available: http://www.image-net.org/challenges/LSVRC/2012/ (2012)
  80. 80.
    Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Comput. 4(4), 473–493 (1992)CrossRefGoogle Scholar
  81. 81.
    Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Process. Mag. 13(6), 47–60 (1996)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Electronic and Information EngineeringThe Hong Kong Polytechnic UniversityHung Hom, KowloonHong Kong
  2. 2.Hong Kong Polytechnic University Shenzhen Research InstituteShenzhenChina

Personalised recommendations