Advertisement

The Devil is in the Decoder: Classification, Regression and GANs

  • Zbigniew WojnaEmail author
  • Vittorio Ferrari
  • Sergio Guadarrama
  • Nathan Silberman
  • Liang-Chieh Chen
  • Alireza Fathi
  • Jasper Uijlings
Article
  • 28 Downloads

Abstract

Many machine vision applications, such as semantic segmentation and depth prediction, require predictions for every pixel of the input image. Models for such problems usually consist of encoders which decrease spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders have been studied rigorously, relatively few studies address the decoder side. This paper presents an extensive comparison of a variety of decoders for a variety of pixel-wise tasks ranging from classification, regression to synthesis. Our contributions are: (1) decoders matter: we observe significant variance in results between different types of decoders on various problems. (2) We introduce new residual-like connections for decoders. (3) We introduce a novel decoder: bilinear additive upsampling. (4) We explore prediction artifacts.

Keywords

Machine vision Computer vision Neural network architectures Decoders 2D imagery Per-pixel prediction Semantic segmentation Depth prediction GANs 

Notes

References

  1. Alvarez, J. M., & Petersson, L. (2016). Decomposeme: Simplifying convnets for end-to-end learning. CoRR, arXiv:1606.05426.
  2. Andriluka, M., Pishchulin, L., Gehler, P., & Schiele, B. (2014). 2d human pose estimation: New benchmark and state of the art analysis. In CVPR.Google Scholar
  3. Arbeláez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 898–916.CrossRefGoogle Scholar
  4. Berthelot, D., Schumm, T., & Metz, L. (2017). BEGAN: Boundary equilibrium generative adversarial networks. arXiv:1703.10717
  5. Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.zbMATHGoogle Scholar
  6. Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.CrossRefGoogle Scholar
  7. Chen, L. C., Papandreou, G, Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. CoRR, arXiv:1706.05587.
  8. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In D. D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, Barcelona, December 5–10, 2016 (pp. 2172–2180).Google Scholar
  9. Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, July 21–26, 2017 (pp. 1800–1807).  https://doi.org/10.1109/CVPR.2017.195.
  10. Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence., 38, 295–307.CrossRefGoogle Scholar
  11. Dosovitskiy, A., & Brox, T. (2016). Inverting visual representations with convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4829–4837).Google Scholar
  12. Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., & Brox, T. (2015a) Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 2758–2766).Google Scholar
  13. Dosovitskiy, A., Springenberg, J. T., & Brox, T. (2015b). Learning to generate chairs with convolutional neural networks. In CVPR.Google Scholar
  14. Dumoulin, V., & Visin, F. (2016). A guide to convolution arithmetic for deep learning. CoRR, arXiv:1603.07285.
  15. Eigen, D., & Fergus, R. (2015). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE international conference on computer vision (pp. 2650–2658).Google Scholar
  16. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2012). The PASCAL visual object classes challenge 2012 (VOC2012) results.Google Scholar
  17. Goodfellow, I. J., Pouget-Abadie, J,, Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. C., & Bengio, Y. (2014). Generative adversarial nets. In NIPS.Google Scholar
  18. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of wasserstein gans. In Guyon et al. (Eds.), NeurIPS 2017 (pp. 5769–5779). http://papers.nips.cc/paper/7159-improved-training-of-wasserstein-gans.
  19. Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., & Malik, J. (2011). Semantic contours from inverse detectors. In 2011 IEEE international conference on computer vision (ICCV) (pp. 991–998). IEEE.Google Scholar
  20. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).Google Scholar
  21. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Guyon et al. (Eds.), 2017, (pp. 6629–6640).Google Scholar
  22. Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (vol. 1, pp. 3).Google Scholar
  23. Hui, T.-W., Loy, C. C., & Tang, X. (2016). Depth map super-resolution by deep multi-scale guidance. In ECCV.Google Scholar
  24. Iizuka, S., Simo-Serra, E., & Ishikawa, H. (2016). Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics (TOG), 35(4), 110.CrossRefGoogle Scholar
  25. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, July 21–26, 2017 (pp. 1647–1655).  https://doi.org/10.1109/CVPR.2017.179.
  26. Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML.Google Scholar
  27. Jia, X., Chang, H., & Tuytelaars, T. (2017). Super-resolution with deep adaptive image resampling. arXiv:1712.06463
  28. Kendall, A., Badrinarayanan, V., & Cipolla, R. (2015). Bayesian segnet: Model uncertainty in deep convolutional encoder–decoder architectures for scene understanding. CoRR, arXiv:1511.02680.
  29. Khoreva, A., Benenson, R., Omran, M., Hein, M., & Schiele, B. (2016). Weakly supervised object boundaries. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 183–192).Google Scholar
  30. Kim, J., Kwon Lee, J., & Mu Lee, K. (2016). Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1646–1654).Google Scholar
  31. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.Google Scholar
  32. Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., & Navab, N. (2016). Deeper depth prediction with fully convolutional residual networks. In 2016 Fourth international conference on 3D vision (3DV) (pp. 239–248). IEEE.Google Scholar
  33. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551.CrossRefGoogle Scholar
  34. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A. P., Tejani, A., Totz, J., Wang, Z., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In CVPR (Vol. 2, pp.  4).Google Scholar
  35. Lin, G., Milan, A., Shen, C., & Reid, I. D. (2017a). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In CVPR (Vol. 1, pp. 5).Google Scholar
  36. Lin, T.-Y., Dollár, P., Girshick, R. B., He, K., Hariharan, B., & Belongie, S. J. (2017b). Feature pyramid networks for object detection. In CVPR (Vol. 1, pp. 4).Google Scholar
  37. Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In Proceedings of international conference on computer vision (ICCV).Google Scholar
  38. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431–3440).Google Scholar
  39. Lucic, M., Kurach, K., Michalski, M., Gelly, S., & Bousquet, O. (2018). Are gans created equal? A large-scale study. In Advances in neural information processing systems (pp. 698–707).Google Scholar
  40. Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the eighth IEEE international conference on computer vision, 2001. ICCV 2001. (Vol. 2, pp. 416–423). IEEE.Google Scholar
  41. Miyato, T., Kataoka, T., Koyama, M., & Yoshida, Y. (2018). Spectral normalization for generative adversarial networks. In International conference on learning representations. ICLR 2018. https://openreview.net/forum?id=B1QRgziT-.
  42. Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807–814).Google Scholar
  43. Newell, A., Yang, K., & Deng, J. (2016). Stacked hourglass networks for human pose estimation. In European conference on computer vision (pp. 483–499). Springer.Google Scholar
  44. Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., & Yosinski, J. (2017). Plug & play generative networks: Conditional iterative generation of images in latent space. In 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, July 21–26, 2017 (pp. 3510–3520). https://doi.org/10.1109/CVPR.2017.374.
  45. Odena, A., Dumoulin, V., & Olah, C. (2016). Deconvolution and checkerboard artifacts. Distill, 1(10), e3.CrossRefGoogle Scholar
  46. Pan, J., Sayrol, E., Giro-i-Nieto, X., McGuinness, K., & O’Connor, N. E. (2016). Shallow and deep convolutional networks for saliency prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 598–606).Google Scholar
  47. Pinheiro, P. O., Lin, T.-Y., Collobert, R., & Dollár, P. (2016). Learning to refine object segments. In European conference on computer vision (pp. 75–91). Springer.Google Scholar
  48. Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  49. Romera, E., Alvarez, J., Bergasa, L. M., & Arroyo, R. (2018). Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 19, 263–272.CrossRefGoogle Scholar
  50. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention (pp. 234–241). Springer.Google Scholar
  51. Russakovsky, O., Deng, J., Hao, S., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.  https://doi.org/10.1007/s11263-015-0816-y.MathSciNetCrossRefGoogle Scholar
  52. Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A. P., Bishop, R., Rueckert, D., & Wang, Z. (2016). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1874–1883).Google Scholar
  53. Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from rgbd images. In ECCV.Google Scholar
  54. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).Google Scholar
  55. Taylor, G. W., Zeiler, M. D., & Fergus, R. (2011). Adaptive deconvolutional networks for mid and high level feature learning. In ICCV, 2011.Google Scholar
  56. Tighe, J., & Lazebnik, S. (2010). Superparsing: Scalable nonparametric image parsing with superpixels. In ECCV.Google Scholar
  57. Uijlings, J. R. R., & Ferrari, V. (2015). Situational object boundary detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4712–4721).Google Scholar
  58. Yu, X., & Porikli, F. (2016). Ultra-resolving face images by discriminative generative networks. In European conference on computer vision (pp. 318–333). Springer.Google Scholar
  59. Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. (2010). Deconvolutional networks. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2528–2535). IEEE.Google Scholar
  60. Zhang, R., Isola, P., & Efros, A. A. (2016). Colorful image colorization. In European conference on computer vision (pp. 649–666). Springer.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Zbigniew Wojna
    • 1
    Email author
  • Vittorio Ferrari
    • 2
  • Sergio Guadarrama
    • 2
  • Nathan Silberman
    • 2
  • Liang-Chieh Chen
    • 2
  • Alireza Fathi
    • 2
  • Jasper Uijlings
    • 2
  1. 1.University College LondonLondonUK
  2. 2.Google Inc.Mountain ViewUSA

Personalised recommendations