Learning Transparent Object Matting


This paper addresses the problem of image matting for transparent objects. Existing approaches often require tedious capturing procedures and long processing time, which limit their practical use. In this paper, we formulate transparent object matting as a refractive flow estimation problem, and propose a deep learning framework, called TOM-Net, for learning the refractive flow. Our framework comprises two parts, namely a multi-scale encoder-decoder network for producing a coarse prediction, and a residual network for refinement. At test time, TOM-Net takes a single image as input, and outputs a matte (consisting of an object mask, an attenuation mask and a refractive flow field) in a fast feed-forward pass. As no off-the-shelf dataset is available for transparent object matting, we create a large-scale synthetic dataset consisting of 178 K images of transparent objects rendered in front of images sampled from the Microsoft COCO dataset. We also capture a real dataset consisting of 876 samples using 14 transparent objects and 60 background images. Besides, we show that our method can be easily extended to handle the cases where a trimap or a background image is available. Promising experimental results have been achieved on both synthetic and real data, which clearly demonstrate the effectiveness of our approach.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13


  1. 1.

    For an image of size \(512\times 512\), 18 pictures and around 20 min processing time are needed.

  2. 2.

    For an image with n pixel, we have 7 unknowns (3 for B, 2 for P, 1 for m, and 1 for \(\rho \)) for each pixel, resulting in a total of 7n unknowns.

  3. 3.

    Other large-scale datasets like ImageNet (Deng et al. 2009) can also be used.

  4. 4.

    The objects consist of 7 glasses, 1 lens and 6 complex objects. Glasses with water are implicitly included.

  5. 5.

    Complex shape is excluded in experiments here to speed up training.

  6. 6.

    The first value is measured on the whole image and the second measured within the object region.

  7. 7.

    Glass \(\times \)12, glass and water \(\times \)4, lens \(\times \)2, and complex shape \(\times \)2.

  8. 8.

    We simply multiply the refractive flow field by a scaling factor (\(<1\)).


  1. Chen, G., Han, K., & Wong, K. Y. K. (2018). TOM-Net: Learning transparent object matting from a single image. In CVPR.

  2. Cho, D., Tai, Y. W., & Kweon, I. (2016). Natural image matting using deep convolutional neural networks. In ECCV.

  3. Chuang, Y. Y., Zongker, D. E., Hindorff, J., Curless, B., Salesin, D. H., & Szeliski, R. (2000). Environment matting extensions: Towards higher accuracy and real-time capture. In SIGGRAPH.

  4. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.

  5. Duan, Q., Cai, J., & Zheng, J. (2015). Compressive environment matting. The Visual Computer, 31, 1587–1600.

    Article  Google Scholar 

  6. Duan, Q., Cai, J., Zheng, J., & Lin, W. (2011). Fast environment matting extraction using compressive sensing. In ICME.

  7. Duan, Q., Zheng, J., & Cai, J. (2011). Flexible and accurate transparent-object matting and compositing using refractive vector field. In Computer graphics forum.

  8. Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In NIPS.

  9. Fischer, P., Dosovitskiy, A., Ilg, E., Häusser, P., Hazırbaş, C., Golkov, V., van der Smagt, P., Cremers, D., & Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In ICCV.

  10. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.

  11. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In CVPR.

  12. Kim, J., Kwon Lee, J., & Mu Lee, K. (2016). Accurate image super-resolution using very deep convolutional networks. In CVPR.

  13. Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.

  14. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C.L. (2014). Microsoft coco: Common objects in context. In ECCV.

  15. Nah, S., Kim, T. H., & Lee, K. M. (2017). Deep multi-scale convolutional neural network for dynamic scene deblurring. In CVPR.

  16. Peers, P., & Dutré, P. (2003). Wavelet environment matting. In Eurographics workshop on rendering.

  17. Persistence of vision (tm) raytracer. http://www.povray.org/.

  18. Qian, Y., Gong, M., & Yang, Y. H. (2015). Frequency-based environment matting by compressive sensing. In ICCV.

  19. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention.

  20. Shen, X., Tao, X., Gao, H., Zhou, C., & Jia, J. (2016). Deep automatic portrait matting. In ECCV.

  21. Shi, J., Dong, Y., Su, H., & Yu, S. X. (2017). Learning non-Lambertian object intrinsics across shapenet categories. In CVPR.

  22. Smith, A. R., & Blinn, J. F. (1996). Blue screen matting. In SIGGRAPH.

  23. Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE TIP, 13, 600–612.

    Google Scholar 

  24. Wexler, Y., Fitzgibbon, A. W., & Zisserman, A., et al. (2002). Image-based environment matting. In: Rendering techniques.

  25. Xu, N., Price, B., Cohen, S., & Huang, T. (2017). Deep image matting. In CVPR.

  26. Yeung, S. K., Tang, C. K., Brown, M. S., & Kang, S. B. (2011). Matting and compositing of transparent and refractive objects. ACM TOG, 30, 2.

    Article  Google Scholar 

  27. Zhu, J., & Yang, Y. H. (2004). Frequency-based environment matting. In Computer graphics and applications.

  28. Zongker, D. E., Werner, D. M., Curless, B., & Salesin, D. H. (1999). Environment matting and compositing. In SIGGRAPH.

Download references


This project is supported by a Grant from the Research Grant Council of the Hong Kong (SAR), China, under the Project HKU 718113E. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.

Author information



Corresponding author

Correspondence to Guanying Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Patrick Perez.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, G., Han, K. & Wong, K.K. Learning Transparent Object Matting. Int J Comput Vis 127, 1527–1544 (2019). https://doi.org/10.1007/s11263-019-01202-3

Download citation


  • Transparent object
  • Image matting
  • Convolutional neural network