Learning Transparent Object Matting

  • Guanying ChenEmail author
  • Kai Han
  • Kwan-Yee K. Wong


This paper addresses the problem of image matting for transparent objects. Existing approaches often require tedious capturing procedures and long processing time, which limit their practical use. In this paper, we formulate transparent object matting as a refractive flow estimation problem, and propose a deep learning framework, called TOM-Net, for learning the refractive flow. Our framework comprises two parts, namely a multi-scale encoder-decoder network for producing a coarse prediction, and a residual network for refinement. At test time, TOM-Net takes a single image as input, and outputs a matte (consisting of an object mask, an attenuation mask and a refractive flow field) in a fast feed-forward pass. As no off-the-shelf dataset is available for transparent object matting, we create a large-scale synthetic dataset consisting of 178 K images of transparent objects rendered in front of images sampled from the Microsoft COCO dataset. We also capture a real dataset consisting of 876 samples using 14 transparent objects and 60 background images. Besides, we show that our method can be easily extended to handle the cases where a trimap or a background image is available. Promising experimental results have been achieved on both synthetic and real data, which clearly demonstrate the effectiveness of our approach.


Transparent object Image matting Convolutional neural network 



This project is supported by a Grant from the Research Grant Council of the Hong Kong (SAR), China, under the Project HKU 718113E. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.


  1. Chen, G., Han, K., & Wong, K. Y. K. (2018). TOM-Net: Learning transparent object matting from a single image. In CVPR.Google Scholar
  2. Cho, D., Tai, Y. W., & Kweon, I. (2016). Natural image matting using deep convolutional neural networks. In ECCV.Google Scholar
  3. Chuang, Y. Y., Zongker, D. E., Hindorff, J., Curless, B., Salesin, D. H., & Szeliski, R. (2000). Environment matting extensions: Towards higher accuracy and real-time capture. In SIGGRAPH.Google Scholar
  4. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.Google Scholar
  5. Duan, Q., Cai, J., & Zheng, J. (2015). Compressive environment matting. The Visual Computer, 31, 1587–1600.CrossRefGoogle Scholar
  6. Duan, Q., Cai, J., Zheng, J., & Lin, W. (2011). Fast environment matting extraction using compressive sensing. In ICME.Google Scholar
  7. Duan, Q., Zheng, J., & Cai, J. (2011). Flexible and accurate transparent-object matting and compositing using refractive vector field. In Computer graphics forum.Google Scholar
  8. Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In NIPS.Google Scholar
  9. Fischer, P., Dosovitskiy, A., Ilg, E., Häusser, P., Hazırbaş, C., Golkov, V., van der Smagt, P., Cremers, D., & Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In ICCV.Google Scholar
  10. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.Google Scholar
  11. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In CVPR.Google Scholar
  12. Kim, J., Kwon Lee, J., & Mu Lee, K. (2016). Accurate image super-resolution using very deep convolutional networks. In CVPR.Google Scholar
  13. Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.Google Scholar
  14. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C.L. (2014). Microsoft coco: Common objects in context. In ECCV.Google Scholar
  15. Nah, S., Kim, T. H., & Lee, K. M. (2017). Deep multi-scale convolutional neural network for dynamic scene deblurring. In CVPR.Google Scholar
  16. Peers, P., & Dutré, P. (2003). Wavelet environment matting. In Eurographics workshop on rendering.Google Scholar
  17. Persistence of vision (tm) raytracer.
  18. Qian, Y., Gong, M., & Yang, Y. H. (2015). Frequency-based environment matting by compressive sensing. In ICCV.Google Scholar
  19. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention.Google Scholar
  20. Shen, X., Tao, X., Gao, H., Zhou, C., & Jia, J. (2016). Deep automatic portrait matting. In ECCV.Google Scholar
  21. Shi, J., Dong, Y., Su, H., & Yu, S. X. (2017). Learning non-Lambertian object intrinsics across shapenet categories. In CVPR.Google Scholar
  22. Smith, A. R., & Blinn, J. F. (1996). Blue screen matting. In SIGGRAPH.Google Scholar
  23. Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE TIP, 13, 600–612.Google Scholar
  24. Wexler, Y., Fitzgibbon, A. W., & Zisserman, A., et al. (2002). Image-based environment matting. In: Rendering techniques.Google Scholar
  25. Xu, N., Price, B., Cohen, S., & Huang, T. (2017). Deep image matting. In CVPR.Google Scholar
  26. Yeung, S. K., Tang, C. K., Brown, M. S., & Kang, S. B. (2011). Matting and compositing of transparent and refractive objects. ACM TOG, 30, 2.CrossRefGoogle Scholar
  27. Zhu, J., & Yang, Y. H. (2004). Frequency-based environment matting. In Computer graphics and applications.Google Scholar
  28. Zongker, D. E., Werner, D. M., Curless, B., & Salesin, D. H. (1999). Environment matting and compositing. In SIGGRAPH.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.The University of Hong KongHong KongChina
  2. 2.University of OxfordOxfordUK

Personalised recommendations