Advertisement

Dense In Dense: Training Segmentation from Scratch

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11366)

Abstract

In recent years, training image segmentation networks often needs fine-tuning the model which comes from the initial training upon large-scale classification datasets like ImageNet. Such fine-tuning methods are confronted with three problems: (1) domain gap. (2) mismatch between data size and model size. (3) poor controllability. A more practical solution is to train the segmentation model from scratch, which motivates our Dense In Dense (DID) network. In DID, we put forward an efficient architecture based on DenseNet to further accelerate the information flow inside and outside the dense block. Deep supervision also applies to a progressive upsampling rather than the traditional straight-forward upsampling. Our DID Network performs favorably on Camvid dataset, Inria Aerial Image Labeling dataset and Cityscapes by training from scratch with less parameters.

Keyword

Image segmentation 

References

  1. 1.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017)CrossRefGoogle Scholar
  2. 2.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018)CrossRefGoogle Scholar
  3. 3.
    Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
  4. 4.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  5. 5.
    Gadde, R., Jampani, V., Gehler, P.V.: Semantic video CNNs through representation warping. CoRR (2017)Google Scholar
  6. 6.
    Ghiasi, G., Fowlkes, C.C.: Laplacian pyramid reconstruction and refinement for semantic segmentation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 519–534. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_32CrossRefGoogle Scholar
  7. 7.
    Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision (2015)Google Scholar
  8. 8.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  9. 9.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_38CrossRefGoogle Scholar
  11. 11.
    Hu, T., Wang, Y., Chen, Y., Lu, P., Wang, H., Wang, G.: SOBEL heuristic kernel for aerial semantic segmentation. In: International Conference on Image Processing (2018)Google Scholar
  12. 12.
    Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  13. 13.
    Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., Bengio, Y.: The one hundred layers tiramisu: fully convolutional DenseNets for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop Papers (2017)Google Scholar
  14. 14.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. International Conference on Learning Representations (2016)Google Scholar
  15. 15.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Advances in Neural Information Processing Systems (2011)Google Scholar
  16. 16.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)Google Scholar
  17. 17.
    Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  18. 18.
    Liu, S., De Mello, S., Gu, J., Zhong, G., Yang, M.H., Kautz, J.: Learning affinity via spatial propagation networks. In: Advances in Neural Information Processing Systems (2017)Google Scholar
  19. 19.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  20. 20.
    Maggiori, E., Tarabalka, Y., Charpiat, G., Alliez, P.: Can semantic labeling methods generalize to any city? The Inria aerial image labeling benchmark. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS) (2017)Google Scholar
  21. 21.
    Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: International Conference on Computer Vision (2015)Google Scholar
  22. 22.
    Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation (2016)Google Scholar
  23. 23.
    Pohlen, T., Hermans, A., Mathias, M., Leibe, B.: Full-resolution residual networks for semantic segmentation in street scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  24. 24.
    Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNet: efficient residual factorized ConvNet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. (2018)Google Scholar
  25. 25.
    Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  26. 26.
    Shen, Z., Liu, Z., Li, J., Jiang, Y.G., Chen, Y., Xue, X.: DSOD: learning deeply supervised object detectors from scratch. In: International Conference on Computer Vision (2017)Google Scholar
  27. 27.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  28. 28.
    Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  29. 29.
    Visin, F., et al.: ReSeg: a recurrent neural network-based model for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop Papers (2016)Google Scholar
  30. 30.
    Wang, P., et al.: Understanding convolution for semantic segmentation. In: IEEE Winter Conference on Applications of Computer Vision (2018)Google Scholar
  31. 31.
    Xie, S., Tu, Z.: Holistically-nested edge detection. In: International Conference on Computer Vision (2015)Google Scholar
  32. 32.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: International Conference on Learning Representations (2016)Google Scholar
  33. 33.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  34. 34.
    Zheng, S., et al.: Conditional random fields as recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of AmsterdamAmsterdamThe Netherlands

Personalised recommendations