Advertisement

CGIntrinsics: Better Intrinsic Image Decomposition Through Physically-Based Rendering

  • Zhengqi Li
  • Noah Snavely
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11207)

Abstract

Intrinsic image decomposition is a challenging, long-standing computer vision problem for which ground truth data is very difficult to acquire. We explore the use of synthetic data for training CNN-based intrinsic image decomposition models, then applying these learned models to real-world images. To that end, we present CGIntrinsics, a new, large-scale dataset of physically-based rendered images of scenes with full ground truth decompositions. The rendering process we use is carefully designed to yield high-quality, realistic images, which we find to be crucial for this problem domain. We also propose a new end-to-end training method that learns better decompositions by leveraging CGIntrinsics, and optionally IIW and SAW, two recent datasets of sparse annotations on real-world images. Surprisingly, we find that a decomposition network trained solely on our synthetic data outperforms the state-of-the-art on both IIW and SAW, and performance improves even further when IIW and SAW data is added during training. Our work demonstrates the suprising effectiveness of carefully-rendered synthetic data for the intrinsic images task.

Notes

Acknowledgments

We thank Jingguang Zhou for his help with data generation. This work was funded by the National Science Foundation through grant IIS-1149393, and by a grant from Schmidt Sciences.

Supplementary material

474178_1_En_23_MOESM1_ESM.pdf (4 mb)
Supplementary material 1 (pdf 4123 KB)

References

  1. 1.
    Janner, M., Wu, J., Kulkarni, T., Yildirim, I., Tenenbaum, J.B.: Self-supervised intrinsic image decomposition. In: Neural Information Processing Systems (2017)Google Scholar
  2. 2.
    Shi, J., Dong, Y., Su, H., Yu, S.X.: Learning non-Lambertian object intrinsics across ShapeNet categories. In: Proceedings Computer Vision and Pattern Recognition (CVPR), pp. 5844–5853 (2017)Google Scholar
  3. 3.
    Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
  4. 4.
    Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33783-3_44CrossRefGoogle Scholar
  5. 5.
    Bell, S., Bala, K., Snavely, N.: Intrinsic images in the wild. ACM Trans. Graph. 33(4), 159 (2014)CrossRefGoogle Scholar
  6. 6.
    Kovacs, B., Bell, S., Snavely, N., Bala, K.: Shading annotations in the wild. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 850–859 (2017)Google Scholar
  7. 7.
    Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_7CrossRefGoogle Scholar
  8. 8.
    Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 3234–3243 (2016)Google Scholar
  9. 9.
    Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 4340–4349 (2016)Google Scholar
  10. 10.
    Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 2232–2241 (2017)Google Scholar
  11. 11.
    Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: Proceedings Computer Vision and Pattern Recognition (CVPR), pp. 190–198 (2017)Google Scholar
  12. 12.
    Zhang, Y., et al.: Physically-based rendering for indoor scene understanding using convolutional neural networks. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 5057–5065 (2017)Google Scholar
  13. 13.
    Land, E.H., McCann, J.J.: Lightness and retinex theory. Josa 61(1), 1–11 (1971)CrossRefGoogle Scholar
  14. 14.
    Zhao, Q., Tan, P., Dai, Q., Shen, L., Wu, E., Lin, S.: A closed-form solution to retinex with nonlocal texture constraints. Trans. Pattern Anal. Mach. Intell. 34(7), 1437–1444 (2012)CrossRefGoogle Scholar
  15. 15.
    Rother, C., Kiefel, M., Zhang, L., Schölkopf, B., Gehler, P.V.: Recovering intrinsic images with a global sparsity prior on reflectance. In: Neural Information Processing Systems, pp. 765–773 (2011)Google Scholar
  16. 16.
    Shen, L., Yeo, C.: Intrinsic images decomposition using a local and global sparse representation of reflectance. In: Proceedings Computer Vision and Pattern Recognition (CVPR), pp. 697–704 (2011)Google Scholar
  17. 17.
    Garces, E., Munoz, A., Lopez-Moreno, J., Gutierrez, D.: Intrinsic images by clustering. In: Computer Graphics Forum (Proceedings of the EGSR 2012), vol. 31, no. 4 (2012)CrossRefGoogle Scholar
  18. 18.
    Chen, Q., Koltun, V.: A simple model for intrinsic image decomposition with depth cues. In: Proceedings Computer Vision and Pattern Recognition (CVPR), pp. 241–248 (2013)Google Scholar
  19. 19.
    Barron, J.T., Malik, J.: Intrinsic scene properties from a single RGB-D image. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 17–24 (2013)Google Scholar
  20. 20.
    Jeon, J., Cho, S., Tong, X., Lee, S.: Intrinsic image decomposition using structure-texture separation and surface normals. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 218–233. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10584-0_15CrossRefGoogle Scholar
  21. 21.
    Barron, J.T., Malik, J.: Shape, illumination, and reflectance from shading. Trans. Pattern Anal. Mach. Intell. 37(8), 1670–1687 (2015)CrossRefGoogle Scholar
  22. 22.
    Narihira, T., Maire, M., Yu, S.X.: Direct intrinsics: learning albedo-shading decomposition by convolutional regression. In: Proceedings International Conference on Computer Vision (ICCV), pp. 2992–2992 (2015)Google Scholar
  23. 23.
    Kim, S., Park, K., Sohn, K., Lin, S.: Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 143–159. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_9CrossRefGoogle Scholar
  24. 24.
    Shu, Z., Yumer, E., Hadap, S., Sunkavalli, K., Shechtman, E., Samaras, D.: Neural face editing with intrinsic image disentangling. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 5444–5453 (2017)Google Scholar
  25. 25.
    Zhou, T., Krahenbuhl, P., Efros, A.A.: Learning data-driven reflectance priors for intrinsic image decomposition. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 3469–3477 (2015)Google Scholar
  26. 26.
    Zoran, D., Isola, P., Krishnan, D., Freeman, W.T.: Learning ordinal relationships for mid-level vision. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 388–396 (2015)Google Scholar
  27. 27.
    Narihira, T., Maire, M., Yu, S.X.: Learning lightness from human judgement on relative reflectance. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 2965–2973 (2015)Google Scholar
  28. 28.
    Li, Z., Snavely, N.: Learning intrinsic image decomposition from watching the world. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  29. 29.
    Beigpour, S., et al.: Intrinsic image evaluation on synthetic complex scenes. In: International Conference on Image Processing (2013)Google Scholar
  30. 30.
    Bonneel, N., Kovacs, B., Paris, S., Bala, K.: Intrinsic decompositions for image editing. In: Computer Graphics Forum (Eurographics State of the Art Reports 2017), vol. 36, no. 2 (2017)CrossRefGoogle Scholar
  31. 31.
    Jakob, W.: Mitsuba renderer (2010). http://www.mitsuba-renderer.org
  32. 32.
    Takahashi, D.: How Pixar made Monsters University, its latest technological marvel (2013). https://venturebeat.com/2013/04/24/the-making-of-pixars-latest-technological-marvel-monsters-university/
  33. 33.
    Reinhard, E., Stark, M., Shirley, P., Ferwerda, J.: Photographic tone reproduction for digital images. ACM Trans. Graph. SIGGRAPH 21, 267–276 (2002)Google Scholar
  34. 34.
    Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33783-3_44CrossRefGoogle Scholar
  35. 35.
    Grosse, R., Johnson, M.K., Adelson, E.H., Freeman, W.T.: Ground truth dataset and baseline evaluations for intrinsic image algorithms. In: Proceedings of the International Conference on Computer Vision (ICCV) (2009)Google Scholar
  36. 36.
    Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)CrossRefGoogle Scholar
  37. 37.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Neural Information Processing Systems, pp. 2366–2374 (2014)Google Scholar
  38. 38.
    Barron, J.T., Adams, A., Shih, Y., Hernández, C.: Fast bilateral-space stereo for synthetic defocus. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 4466–4474 (2015)Google Scholar
  39. 39.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 6967–5976 (2017)Google Scholar
  40. 40.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  41. 41.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the International Conference on Computer Vision (ICCV) (2015)Google Scholar
  42. 42.
    Pytorch (2016). http://pytorch.org
  43. 43.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014)Google Scholar
  44. 44.
    Bi, S., Han, X., Yu, Y.: An \(l1\) image transform for edge-preserving smoothing and scene-level intrinsic decomposition. ACM Trans. Graph. 34, 78:1–78:12 (2015)CrossRefGoogle Scholar
  45. 45.
    Nestmeyer, T., Gehler, P.V.: Reflectance adaptive filtering improves intrinsic image estimation. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Computer Science & Cornell TechCornell UniversityIthacaUSA

Personalised recommendations