Computational Visual Media

, Volume 4, Issue 4, pp 385–397 | Cite as

DeepPrimitive: Image decomposition by layered primitive detection

  • Jiahui Huang
  • Jun Gao
  • Vignesh Ganapathi-Subramanian
  • Hao Su
  • Yin Liu
  • Chengcheng Tang
  • Leonidas J. Guibas
Open Access
Research Article


The perception of the visual world through basic building blocks, such as cubes, spheres, and cones, gives human beings a parsimonious understanding of the visual world. Thus, efforts to find primitive-based geometric interpretations of visual data date back to 1970s studies of visual media. However, due to the difficulty of primitive fitting in the pre-deep learning age, this research approach faded from the main stage, and the vision community turned primarily to semantic image understanding. In this paper, we revisit the classical problem of building geometric interpretations of images, using supervised deep learning tools. We build a framework to detect primitives from images in a layered manner by modifying the YOLO network; an RNN with a novel loss function is then used to equip this network with the capability to predict primitives with a variable number of parameters. We compare our pipeline to traditional and other baseline learning methods, demonstrating that our layered detection model has higher accuracy and performs better reconstruction.


layered image decomposition primitive detection biologically inspired vision deep learning 



Chengcheng Tang would like to acknowledge NSF grant IIS-1528025, a Google Focused Research award, a gift from the Adobe Corporation, and a gift from the NVIDIA Corporation.

Supplementary material

41095_2018_128_MOESM1_ESM.pdf (8.1 mb)
DeepPrimitive: Image Decomposition by Layered Primitive Detection Electronic Supplementary Material


  1. [1]
    Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779–788, 2016.Google Scholar
  2. [2]
    Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6517–6525, 2017.Google Scholar
  3. [3]
    Roberts, L. G. Machine perception of three-dimensional solids. Ph.D. Thesis. Massachusetts Institute of Technology, 1963.Google Scholar
  4. [4]
    Binford, T. O. Visual perception by computer. In: Proceedings of the IEEE Conference on Systems and Control, 1971.Google Scholar
  5. [5]
    Biederman, I. Recognition-by-components: A theory of human image understanding. Psychological Review Vol. 94, No. 2, 115–147, 1987.CrossRefGoogle Scholar
  6. [6]
    Bellver, M.; Giro-i-Nieto, X.; Marques, F.; Torres, J. Hierarchical object detection with deep reinforcement learning. In: Proceedings of the Deep Reinforcement Learning Workshop, NIPS, 2016.Google Scholar
  7. [7]
    Ballard, D. H. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition Vol. 13, No. 2, 111–122, 1981.CrossRefzbMATHGoogle Scholar
  8. [8]
    Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137–1149, 2017.CrossRefGoogle Scholar
  9. [9]
    Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A. C. SSD: Single shot multibox detector. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 21–37, 2016.Google Scholar
  10. [10]
    Higgins, I.; Sonnerat, N.; Matthey, L.; Pal, A.; Burgess, C.; Botvinick, M.; Hassabis, D.; Lerchner, A. SCAN: Learning abstract hierarchical compositional visual concepts. arXiv preprint arXiv:1707.03389, 2017.Google Scholar
  11. [11]
    Lake, B. M.; Salakhutdinov, R.; Tenenbaum, J. B. Human-level concept learning through probabilistic program induction. Science Vol. 350, No. 6266, 1332–1338, 2015.MathSciNetCrossRefzbMATHGoogle Scholar
  12. [12]
    Rogers, D. F.; Fog, N. Constrained B-spline curve and surface fitting. Computer-Aided Design Vol. 21, No. 10, 641–648, 1989.CrossRefzbMATHGoogle Scholar
  13. [13]
    Besl, P. J.; McKay, N. D. A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 14, No. 2, 239–256, 1992.CrossRefGoogle Scholar
  14. [14]
    Chen, Y.; Medioni, G. Object modeling by registration of multiple range images. In: Proceedings of the IEEE International Conference on Robotics and Automation, 2724–2729, 1991.Google Scholar
  15. [15]
    Wang, W.; Pottmann, H.; Liu, Y. Fitting B-spline curves to point clouds by curvature-based squared distance minimization. ACM Transactions on Graphics Vol. 25, No. 2, 214–238, 2006.CrossRefGoogle Scholar
  16. [16]
    Zheng, W.; Bo, P.; Liu, Y.; Wang, W. Fast B-spline curve fitting by L-BFGS. Computer Aided Geometric Design Vol. 29, No. 7, 448–462, 2012.MathSciNetCrossRefzbMATHGoogle Scholar
  17. [17]
    Sun, J.; Liang, L.; Wen, F.; Shum, H.-Y. Image vectorization using optimized gradient meshes. ACM Transactions on Graphics Vol. 26, No. 3, Article No. 11, 2007.Google Scholar
  18. [18]
    Lecot, G.; Levy, B. Ardeco: Automatic region detection and conversion. In: Proceedings of the 17th Eurographics Symposium on Rendering Techniques, 349–360, 2006.Google Scholar
  19. [19]
    Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2117–2125, 2017.Google Scholar
  20. [20]
    Sharma, G.; Goyal, R.; Liu, D.; Kalogerakis, E.; Maji, S. CSGNet: Neural shape parser for constructive solid geometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5515–5523, 2018.Google Scholar
  21. [21]
    Gers, F. A.; Schraudolph, N. N.; Schmidhuber, J. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research Vol. 3, No. 1, 115–143, 2002.MathSciNetzbMATHGoogle Scholar
  22. [22]
    Cho, K.; Merriënboer, B. V.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.CrossRefGoogle Scholar
  23. [23]
    Castrejón, L.; Kundu, K.; Urtasun, R.; Fidler, S. Annotating object instances with a polygon-RNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5230–5238, 2017.Google Scholar
  24. [24]
    Jetley, S.; Sapienza, M.; Golodetz, S.; Torr, P. H. S. Straight to shapes: Real-time detection of encoded shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4207–4216, 2017.Google Scholar
  25. [25]
    Girshick, R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 1440–1448, 2015.Google Scholar
  26. [26]
    Bengio, S.; Vinyals, O.; Jaitly, N.; Shazeer, N. Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems 28. Cortes, C.; Lawrence, N. D.; Lee, D. D.; Sugiyama, M.; Garnett, R. Eds. Curran Associates, Inc., 1171–1179, 2015.Google Scholar
  27. [27]
    Wu, J.; Tenenbaum, J. B.; Kohli, P. Neural scene derendering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.Google Scholar
  28. [28]
    Itseez. Open source computer vision library. 2015. Available at
  29. [29]
    Duda, R. O.; Hart, P. E. Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM Vol. 15, No. 1, 11–15, 1972.CrossRefzbMATHGoogle Scholar
  30. [30]
    Xie, Y.; Ji, Q. A new efficient ellipse detection method. In: Proceedings of the IEEE International Conference on Pattern Recognition, Vol. 2, 957–960, 2002.Google Scholar
  31. [31]
    Google. Google material icon. 2017. Available at
  32. [32]
    Everingham, M. The PASCAL Visual Object Classes Challenge 2012 (VOC2012). Available at
  33. [33]
    Barnes, C.; Shechtman, E.; Finkelstein, A.; Goldman, D. B. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics Vol. 28, No. 3, Article No. 24, 2009.Google Scholar

Copyright information

© The Author(s) 2018

Open Access The articles published in this journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Other papers from this open access journal are available free of charge from To submit a manuscript, please go to

Authors and Affiliations

  • Jiahui Huang
    • 1
  • Jun Gao
    • 2
  • Vignesh Ganapathi-Subramanian
    • 3
  • Hao Su
    • 4
  • Yin Liu
    • 5
  • Chengcheng Tang
    • 3
  • Leonidas J. Guibas
    • 3
  1. 1.Tsinghua UniversityBeijingChina
  2. 2.Computer Science DepartmentUniversity of TorontoTorontoCanada
  3. 3.Stanford UniversityStanfordUSA
  4. 4.University of California San DiegoLa JollaUSA
  5. 5.University of Wisconsin-MadisonMadisonUSA

Personalised recommendations