Advertisement

Joint Learning of Intrinsic Images and Semantic Segmentation

  • Anil S. BaslamisliEmail author
  • Thomas T. Groenestege
  • Partha Das
  • Hoang-An Le
  • Sezer Karaoglu
  • Theo Gevers
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11210)

Abstract

Semantic segmentation of outdoor scenes is problematic when there are variations in imaging conditions. It is known that albedo (reflectance) is invariant to all kinds of illumination effects. Thus, using reflectance images for semantic segmentation task can be favorable. Additionally, not only segmentation may benefit from reflectance, but also segmentation may be useful for reflectance computation. Therefore, in this paper, the tasks of semantic segmentation and intrinsic image decomposition are considered as a combined process by exploring their mutual relationship in a joint fashion. To that end, we propose a supervised end-to-end CNN architecture to jointly learn intrinsic image decomposition and semantic segmentation. We analyze the gains of addressing those two problems jointly. Moreover, new cascade CNN architectures for intrinsic-for-segmentation and segmentation-for-intrinsic are proposed as single tasks. Furthermore, a dataset of 35K synthetic images of natural environments is created with corresponding albedo and shading (intrinsics), as well as semantic labels (segmentation) assigned to each object/scene. The experiments show that joint learning of intrinsic image decomposition and semantic segmentation is beneficial for both tasks for natural scenes. Dataset and models are available at: (https://ivi.fnwi.uva.nl/cv/intrinseg).

Notes

Acknowledgements

This project was funded by the EU Horizon 2020 program No. 688007 (TrimBot2020). We thank Gjorgji Strezoski for his contributions to the website.

Supplementary material

474211_1_En_18_MOESM1_ESM.pdf (16.5 mb)
Supplementary material 1 (pdf 16942 KB)

References

  1. 1.
    Upcroft, B., McManus, C., Churchill, W., Maddern, W., Newman, P.: Lighting invariant urban street classification. In: IEEE International Conference on Robotics and Automations (2014)Google Scholar
  2. 2.
    Wang, C., Tang, Y., Zou, X., Situ, W., Feng, W.: A robust fruit image segmentation algorithm against varying illumination for vision system of fruit harvesting robot. Opt.-Int. J. Light Electron Opt. 131, 626–631 (2017)CrossRefGoogle Scholar
  3. 3.
    Suh, H.K., Hofstee, J.W., van Henten, E.J.: Shadow-resistant segmentation based on illumination invariant image transformation. In: International Conference of Agricultural Engineering (2014)Google Scholar
  4. 4.
    Ramakrishnan, R., Nieto, J., Scheding, S.: Shadow compensation for outdoor perception. In: IEEE International Conference on Robotics and Automation (2015)Google Scholar
  5. 5.
    Land, E.H., McCann, J.J.: Lightness and retinex theory. J. Opt. Soc. Am. 61, 1–11 (1971)CrossRefGoogle Scholar
  6. 6.
    Shen, L., Tan, P., Lin, S.: Intrinsic image decomposition with non-local texture cues. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  7. 7.
    Zhao, Q., Tan, P., Dai, Q., Shen, L., Wu, E., Lin, S.: A closed-form solution to retinex with nonlocal texture constraints. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1437–1444 (2012)CrossRefGoogle Scholar
  8. 8.
    Gehler, P.V., Rother, C., Kiefel, M., Zhang, L., Schölkopf, B.: Recovering intrinsic images with a global sparsity prior on reflectance. In: Advances in Neural Information Processing Systems (2011)Google Scholar
  9. 9.
    Shen, L., Yeo, C.: Intrinsic images decomposition using a local and global sparse representation of reflectance. In: IEEE Conference on Computer Vision and Pattern Recognition (2011)Google Scholar
  10. 10.
    Weiss, Y.: Deriving intrinsic images from image sequences. In: IEEE International Conference on Computer Vision (2001)Google Scholar
  11. 11.
    Matsushita, Y., Lin, S., Kang, S.B., Shum, H.-Y.: Estimating intrinsic images from image sequences with biased illumination. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3022, pp. 274–286. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-24671-8_22CrossRefGoogle Scholar
  12. 12.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)Google Scholar
  13. 13.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  14. 14.
    Narihira, T., Maire, M., Yu, S.X.: Direct intrinsics: learning albedo-shading decomposition by convolutional regression. In: IEEE International Conference on Computer Vision (2015)Google Scholar
  15. 15.
    Shi, J., Dong, Y., Su, H., Yu, S.X.: Learning non-Lambertian object intrinsics across shapenet categories. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  16. 16.
    Lettry, L., Vanhoey, K., Gool, L.V.: Darn: a deep adversarial residual network for intrinsic image decomposition. In: IEEE Winter Conference on Applications of Computer Vision (2018)Google Scholar
  17. 17.
    Baslamisli, A.S., Le, H.A., Gevers, T.: CNN based learning using reflection and retinex models for intrinsic image decomposition. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  18. 18.
    Grosse, R., Johnson, M.K., Adelson, E.H., Freeman, W.T.: Ground truth dataset and baseline evaluations for intrinsic image algorithms. In: IEEE International Conference on Computer Vision (2009)Google Scholar
  19. 19.
    Bell, S., Bala, K., Snavely, N.: Intrinsic images in the wild. In: ACM Transactions on Graphics (TOG) (2014)Google Scholar
  20. 20.
    Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33783-3_44CrossRefGoogle Scholar
  21. 21.
    Fulkerson, B., Vedaldi, A., Soatto, S.: Class segmentation and object localization with superpixel neighborhoods. In: IEEE International Conference on Computer Vision (2009)Google Scholar
  22. 22.
    Csurka, G., Perronnin, F.: An efficient approach to semantic segmentation. Int. J. Comput. Vis. 95(2), 198–212 (2011)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 95(2), 2–23 (2009)CrossRefGoogle Scholar
  24. 24.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017)CrossRefGoogle Scholar
  25. 25.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  26. 26.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv preprint arXiv:1606.00915 (2016)
  27. 27.
    Everingham, M., Eslami, S.M.A., van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)CrossRefGoogle Scholar
  28. 28.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  29. 29.
    Garcia-Garcia, A., Orts-Escolano, S., Oprea, S.O., Villena-Martinez, V., Garcia-Rodriguez, J.: A survey on deep learning techniques for image and video semantic segmentation. Appl. Soft Comput. 70, 41–65 (2018)CrossRefGoogle Scholar
  30. 30.
    Jafari, O.H., Groth, O., Kirillov, A., Yang, M.Y., Rother, C.: Analyzing modular CNN architectures for joint depth prediction and semantic segmentation. In: IEEE International Conference on Robotics and Automation (2017)Google Scholar
  31. 31.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: IEEE International Conference on Computer Vision (2015)Google Scholar
  32. 32.
    Mousavian, A., Pirsiavash, H., Kosecka, J.: Joint semantic segmentation and depth estimation with deep convolutional networks. In: IEEE International Conference on 3D Vision (2016)Google Scholar
  33. 33.
    Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 703–718. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_45CrossRefGoogle Scholar
  34. 34.
    Ladicky, L., et al.: Joint optimization for object class segmentation and dense stereo reconstruction. Int. J. Comput. Vis. 100(2), 122–133 (2012)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Barron, J.T., Malik, J.: Color constancy, intrinsic images, and shape estimation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 57–70. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33765-9_5CrossRefGoogle Scholar
  36. 36.
    Kim, S., Park, K., Sohn, K., Lin, S.: Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  37. 37.
    Shelhamer, E., Barron, J.T., Darrell, T.: Scene intrinsics and depth from a single image. In: IEEE International Conference on Computer Vision Workshop (2015)Google Scholar
  38. 38.
    Vineet, V., Rother, C., Torr, P.H.S.: Higher order priors for joint intrinsic image, objects, and attributes estimation. In: Advances in Neural Information Processing Systems (2013)Google Scholar
  39. 39.
    Shafer, S.: Using color to separate reflection components. Color Res. Appl. 10, 210–218 (1985)CrossRefGoogle Scholar
  40. 40.
    Weber, J., Penn, J.: Creation and rendering of realistic trees. In: Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH) (1995)Google Scholar
  41. 41.
    Sattler, T., Tylecek, R., Brok, T., Pollefeys, M., Fisher, R.B.: 3D reconstruction meets semantics - reconstruction challange 2017. In: IEEE International Conference on Computer Vision Workshop (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Anil S. Baslamisli
    • 1
    Email author
  • Thomas T. Groenestege
    • 1
    • 2
  • Partha Das
    • 1
    • 2
  • Hoang-An Le
    • 1
  • Sezer Karaoglu
    • 1
    • 2
  • Theo Gevers
    • 1
    • 2
  1. 1.University of AmsterdamAmsterdamThe Netherlands
  2. 2.3DUniversum B.V.AmsterdamThe Netherlands

Personalised recommendations