Skip to main content

Synthesizing Training Images for Semantic Segmentation

  • Conference paper
  • First Online:
  • 1861 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 875))

Abstract

Semantic segmentation is one of the key problems in the computer vision area. Recently, Convolutional Neural Networks (CNNs) have yielded a significant performance for the semantic segmentation task. However, CNNs require a sufficient amount of annotated training images, which is challenging since massive human labour is needed. In this paper, we propose to use 3D models to automatically generate synthetic images with pixel-level annotations. We take advantage of 3D models to generate synthetic images of high diversity in object appearance and background clutterness, by randomly sampling rendering parameters and adding random background patterns. Then, we use the synthetic images to augment training samples for semantic segmentation by combining with publicly available real-world images. Experimental results demonstrate that CNNs trained with our synthetic images improve performance on the semantic segmentation task in the PASCAL VOC 2012 dataset.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. CoRR abs/1511.00561 (2015)

    Google Scholar 

  2. Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_5

    Chapter  Google Scholar 

  3. Chen, W., et al.: Synthesizing training images for boosting human 3D pose estimation. In: Fourth International Conference on 2016 3D Vision 3DV 2016, Stanford, CA, USA, 25–28, October, 2016 pp. 479–488 (2016)

    Google Scholar 

  4. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016)

    Google Scholar 

  5. Eigen, D., Fergus, R.: Predicting depth surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV, pp. 2650–2658 (2015)

    Google Scholar 

  6. Everingham, M., Gool, L.J.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)

    Article  Google Scholar 

  7. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR, pp. 3354–3361 (2012)

    Google Scholar 

  8. Hariharan, B., Arbelaez, P., Bourdev, L.D., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: IEEE International Conference on 2011 Computer Vision ICCV , Barcelona, Spain, 6–13, November, 2011 pp. 991–998 (2011)

    Google Scholar 

  9. Hong, S., Oh, J., Lee, H., Han, B.: Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In: CVRP, pp. 3204–3212 (2016)

    Google Scholar 

  10. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1106–1114 (2012)

    Google Scholar 

  11. Ladický, Ľ., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, where and how many? combining object detectors and CRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_31

    Chapter  Google Scholar 

  12. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  13. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on 2015 Computer Vision and Pattern Recognition CVPR 2015, Boston, MA, USA, 7–12, June, 2015 pp. 3431–3440 (2015)

    Google Scholar 

  14. Pathak, D., Krähenbühl, P., Darrell, T.: Constrained convolutional neural networks for weakly supervised segmentation. In: ICCV, pp. 1796–1804 (2015)

    Google Scholar 

  15. Pinheiro, P.H.O., Collobert, R.: From image-level to pixel-level labeling with convolutional networks. In: CVPR, pp. 1713–1721 (2015)

    Google Scholar 

  16. Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7

    Chapter  Google Scholar 

  17. Ros, G., Sellart, L., Materzynska, J., Vázquez, D., Lopez, A.M.: The Synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR, pp. 3234–3243 (2016)

    Google Scholar 

  18. Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: CVPR. IEEE Computer Society (2008)

    Google Scholar 

  19. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)

    Google Scholar 

  20. Sturgess, P., Alahari, K., Ladicky, L., Torr, P.H.S.: Combining appearance and structure from motion features for road scene understanding. In: British Machine Vision Conference BMVC, pp. 1–11 (2009)

    Google Scholar 

  21. Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using cnns trained with rendered 3D model views. In: ICCV, pp. 2686–2694 (2015)

    Google Scholar 

  22. Szegedy, C., et al.: Going deeper with convolutions. CoRR abs/1409.4842 (2014)

    Google Scholar 

  23. Wang, L., et al.: Temporal segment networks for action recognition in videos. CoRR abs/1705.02953 (2017)

    Google Scholar 

  24. Wu, Z., et al.: 3D shapeNets: a deep representation for volumetric shapes. In: CVPR, pp. 1912–1920 (2015)

    Google Scholar 

  25. Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to zoo. In: The Twenty-Third IEEE Conference on 2010 Computer Vision and Pattern Recognition CVPR, San Francisco, CA, USA, 13–18 June 2010. pp. 3485–3492 (2010)

    Google Scholar 

  26. Zeiler, M.D., Taylor, G.W., Fergus, R.: Adaptive deconvolutional networks for mid and high level feature learning. In: ICCV, pp. 2018–2025 (2011)

    Google Scholar 

  27. Zheng, S., et al.: Conditional random fields as recurrent neural networks. In: ICCV, pp. 1529–1537 (2015)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (No. 61602139) and Zhejiang Province science and technology planning project (2018C01030).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunhui Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Y., Wu, Z., Zhou, Z., Wang, Y. (2018). Synthesizing Training Images for Semantic Segmentation. In: Wang, Y., Jiang, Z., Peng, Y. (eds) Image and Graphics Technologies and Applications. IGTA 2018. Communications in Computer and Information Science, vol 875. Springer, Singapore. https://doi.org/10.1007/978-981-13-1702-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1702-6_22

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1701-9

  • Online ISBN: 978-981-13-1702-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics