Skip to main content

Advertisement

SpringerLink
  1. Home
  2. Computational Visual Media
  3. Article
DeepPrimitive: Image decomposition by layered primitive detection
Download PDF
Your article has downloaded

Similar articles being viewed by others

Slider with three articles shown per slide. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide.

Dual Convolutional Neural Networks for Low-Level Vision

06 April 2022

Jinshan Pan, Deqing Sun, … Ming-Hsuan Yang

Scalable image decomposition

26 January 2021

Hwanbok Mun, Gang-Joon Yoon, … Sang Min Yoon

Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks

02 January 2020

Domen Tabernik, Matej Kristan & Aleš Leonardis

SDNet: A Versatile Squeeze-and-Decomposition Network for Real-Time Image Fusion

30 July 2021

Hao Zhang & Jiayi Ma

Hierarchical MVSNet with cost volume separation and fusion based on U-shape feature extraction

27 September 2022

Wanjun Liu, Junkai Wang, … Lei Shen

$$\alpha$$ α ILP: thinking visual scenes as differentiable logic programs

14 March 2023

Hikaru Shindo, Viktor Pfanschilling, … Kristian Kersting

Conv-PVT: a fusion architecture of convolution and pyramid vision transformer

22 December 2022

Xin Zhang & Yi Zhang

Neural Networks Trained on Natural Scenes Exhibit Gestalt Closure

09 April 2021

Been Kim, Emily Reif, … Michael C. Mozer

Bio-inspired interactive feedback neural networks for edge detection

02 December 2022

Chuan Lin, Yakun Qiao & Yongcai Pan

Download PDF
  • Research Article
  • Open Access
  • Published: 23 December 2018

DeepPrimitive: Image decomposition by layered primitive detection

  • Jiahui Huang1,
  • Jun Gao2,
  • Vignesh Ganapathi-Subramanian3,
  • Hao Su4,
  • Yin Liu5,
  • Chengcheng Tang3 &
  • …
  • Leonidas J. Guibas3 

Computational Visual Media volume 4, pages 385–397 (2018)Cite this article

  • 1240 Accesses

  • 2 Citations

  • Metrics details

Abstract

The perception of the visual world through basic building blocks, such as cubes, spheres, and cones, gives human beings a parsimonious understanding of the visual world. Thus, efforts to find primitive-based geometric interpretations of visual data date back to 1970s studies of visual media. However, due to the difficulty of primitive fitting in the pre-deep learning age, this research approach faded from the main stage, and the vision community turned primarily to semantic image understanding. In this paper, we revisit the classical problem of building geometric interpretations of images, using supervised deep learning tools. We build a framework to detect primitives from images in a layered manner by modifying the YOLO network; an RNN with a novel loss function is then used to equip this network with the capability to predict primitives with a variable number of parameters. We compare our pipeline to traditional and other baseline learning methods, demonstrating that our layered detection model has higher accuracy and performs better reconstruction.

Download to read the full article text

Working on a manuscript?

Avoid the most common mistakes and prepare your manuscript for journal editors.

Learn more

References

  1. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779–788, 2016.

    Google Scholar 

  2. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6517–6525, 2017.

    Google Scholar 

  3. Roberts, L. G. Machine perception of three-dimensional solids. Ph.D. Thesis. Massachusetts Institute of Technology, 1963.

    Google Scholar 

  4. Binford, T. O. Visual perception by computer. In: Proceedings of the IEEE Conference on Systems and Control, 1971.

    Google Scholar 

  5. Biederman, I. Recognition-by-components: A theory of human image understanding. Psychological Review Vol. 94, No. 2, 115–147, 1987.

    Article  Google Scholar 

  6. Bellver, M.; Giro-i-Nieto, X.; Marques, F.; Torres, J. Hierarchical object detection with deep reinforcement learning. In: Proceedings of the Deep Reinforcement Learning Workshop, NIPS, 2016.

    Google Scholar 

  7. Ballard, D. H. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition Vol. 13, No. 2, 111–122, 1981.

    Article  MATH  Google Scholar 

  8. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137–1149, 2017.

    Article  Google Scholar 

  9. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A. C. SSD: Single shot multibox detector. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 21–37, 2016.

    Google Scholar 

  10. Higgins, I.; Sonnerat, N.; Matthey, L.; Pal, A.; Burgess, C.; Botvinick, M.; Hassabis, D.; Lerchner, A. SCAN: Learning abstract hierarchical compositional visual concepts. arXiv preprint arXiv:1707.03389, 2017.

    Google Scholar 

  11. Lake, B. M.; Salakhutdinov, R.; Tenenbaum, J. B. Human-level concept learning through probabilistic program induction. Science Vol. 350, No. 6266, 1332–1338, 2015.

    Article  MathSciNet  MATH  Google Scholar 

  12. Rogers, D. F.; Fog, N. Constrained B-spline curve and surface fitting. Computer-Aided Design Vol. 21, No. 10, 641–648, 1989.

    Article  MATH  Google Scholar 

  13. Besl, P. J.; McKay, N. D. A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 14, No. 2, 239–256, 1992.

    Article  Google Scholar 

  14. Chen, Y.; Medioni, G. Object modeling by registration of multiple range images. In: Proceedings of the IEEE International Conference on Robotics and Automation, 2724–2729, 1991.

    Google Scholar 

  15. Wang, W.; Pottmann, H.; Liu, Y. Fitting B-spline curves to point clouds by curvature-based squared distance minimization. ACM Transactions on Graphics Vol. 25, No. 2, 214–238, 2006.

    Article  Google Scholar 

  16. Zheng, W.; Bo, P.; Liu, Y.; Wang, W. Fast B-spline curve fitting by L-BFGS. Computer Aided Geometric Design Vol. 29, No. 7, 448–462, 2012.

    Article  MathSciNet  MATH  Google Scholar 

  17. Sun, J.; Liang, L.; Wen, F.; Shum, H.-Y. Image vectorization using optimized gradient meshes. ACM Transactions on Graphics Vol. 26, No. 3, Article No. 11, 2007.

    Google Scholar 

  18. Lecot, G.; Levy, B. Ardeco: Automatic region detection and conversion. In: Proceedings of the 17th Eurographics Symposium on Rendering Techniques, 349–360, 2006.

    Google Scholar 

  19. Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2117–2125, 2017.

    Google Scholar 

  20. Sharma, G.; Goyal, R.; Liu, D.; Kalogerakis, E.; Maji, S. CSGNet: Neural shape parser for constructive solid geometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5515–5523, 2018.

    Google Scholar 

  21. Gers, F. A.; Schraudolph, N. N.; Schmidhuber, J. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research Vol. 3, No. 1, 115–143, 2002.

    MathSciNet  MATH  Google Scholar 

  22. Cho, K.; Merriënboer, B. V.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.

    Book  Google Scholar 

  23. Castrejón, L.; Kundu, K.; Urtasun, R.; Fidler, S. Annotating object instances with a polygon-RNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5230–5238, 2017.

    Google Scholar 

  24. Jetley, S.; Sapienza, M.; Golodetz, S.; Torr, P. H. S. Straight to shapes: Real-time detection of encoded shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4207–4216, 2017.

    Google Scholar 

  25. Girshick, R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 1440–1448, 2015.

    Google Scholar 

  26. Bengio, S.; Vinyals, O.; Jaitly, N.; Shazeer, N. Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems 28. Cortes, C.; Lawrence, N. D.; Lee, D. D.; Sugiyama, M.; Garnett, R. Eds. Curran Associates, Inc., 1171–1179, 2015.

    Google Scholar 

  27. Wu, J.; Tenenbaum, J. B.; Kohli, P. Neural scene derendering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.

    Google Scholar 

  28. Itseez. Open source computer vision library. 2015. Available at https://doi.org/github.com/itseez/opencv.

  29. Duda, R. O.; Hart, P. E. Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM Vol. 15, No. 1, 11–15, 1972.

    Article  MATH  Google Scholar 

  30. Xie, Y.; Ji, Q. A new efficient ellipse detection method. In: Proceedings of the IEEE International Conference on Pattern Recognition, Vol. 2, 957–960, 2002.

    Google Scholar 

  31. Google. Google material icon. 2017. Available at https://doi.org/material.io/icons/.

  32. Everingham, M. The PASCAL Visual Object Classes Challenge 2012 (VOC2012). Available at https://doi.org/www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.

  33. Barnes, C.; Shechtman, E.; Finkelstein, A.; Goldman, D. B. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics Vol. 28, No. 3, Article No. 24, 2009.

    Google Scholar 

Download references

Acknowledgements

Chengcheng Tang would like to acknowledge NSF grant IIS-1528025, a Google Focused Research award, a gift from the Adobe Corporation, and a gift from the NVIDIA Corporation.

Author information

Authors and Affiliations

  1. Tsinghua University, Beijing, 100084, China

    Jiahui Huang

  2. Computer Science Department, University of Toronto, Toronto, M5S2E4, Canada

    Jun Gao

  3. Stanford University, Stanford, 94305, USA

    Vignesh Ganapathi-Subramanian, Chengcheng Tang & Leonidas J. Guibas

  4. University of California San Diego, La Jolla, 92093, USA

    Hao Su

  5. University of Wisconsin-Madison, Madison, 53715, USA

    Yin Liu

Authors
  1. Jiahui Huang
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Jun Gao
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Vignesh Ganapathi-Subramanian
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. Hao Su
    View author publications

    You can also search for this author in PubMed Google Scholar

  5. Yin Liu
    View author publications

    You can also search for this author in PubMed Google Scholar

  6. Chengcheng Tang
    View author publications

    You can also search for this author in PubMed Google Scholar

  7. Leonidas J. Guibas
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiahui Huang.

Additional information

Jiahui Huang received his B.S. degree in computer science and technology from Tsinghua University in 2018. He is currently a Ph.D. candidate in computer science in Tsinghua University. His research interests include computer vision and computer graphics.

Jun Gao received his B.S. degree in computer science from Peking University in 2018. He is a graduate student in the Machine Learning Group at the University of Toronto and also affiliates to the Vector Institute. His research interests are in deep learning and computer vision.

Vignesh G. Subramanian is a Ph.D. candidate in the Department of Electrical Engineering, Stanford University. He previously obtained his dual degrees (B.Tech. in EE and M.Tech. in communication engineering) from IIT Madras, India. His research interests include shape correspondences, 3D geometry, graphics, and vision.

Hao Su received his Ph.D. degree from Stanford University, under the supervision from Leonidas Guibas. He joined UC San Diego in 2017 and is currently an assistant professor of computer science and engineering. His research interests include computer vision, computer graphics, machine learning, robotics, and optimization. More details of his research can be found at https://doi.org/ai.ucsd.edu/haosu.

Yin Liu received his B.S. degree from Department of Automation of Tsinghua University in 2018. He is currently a Ph.D. candidate in computer science at the University of Wisconsin-Madison. His research interest is in machine learning.

Chengcheng Tang received his Ph.D. and M.S. degrees from King Abdullah University of Science and Technology (KAUST) in 2015 and 2011, respectively, and his bachelor degree from Jilin University in 2009. He is currently a postdoctoral scholar in the Computer Science Department at Stanford University. His research interests include computer graphics, geometric computing, computational design, and machine learning.

Leonidas J. Guibas received his Ph.D. degree from Stanford University in 1976, under the supervision of Donald Knuth. His main subsequent employers were Xerox PARC, MIT, and DEC/SRC. Since 1984, he has been at Stanford University, where he is a professor of computer science. His research interests include computational geometry, geometric modeling, computer graphics, computer vision, sensor networks, robotics, and discrete algorithms. He is a senior member of the IEEE and the IEEE Computer Society. More details about his research can be found at https://doi.org/geometry.stanford.edu/member/guibas/.

Electronic supplementary material

DeepPrimitive: Image Decomposition by Layered Primitive Detection Electronic Supplementary Material

Rights and permissions

Open Access The articles published in this journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (https://doi.org/creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Other papers from this open access journal are available free of charge from https://doi.org/www.springer.com/journal/41095. To submit a manuscript, please go to https://doi.org/www.editorialmanager.com/cvmj.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Huang, J., Gao, J., Ganapathi-Subramanian, V. et al. DeepPrimitive: Image decomposition by layered primitive detection. Comp. Visual Media 4, 385–397 (2018). https://doi.org/10.1007/s41095-018-0128-6

Download citation

  • Received: 30 November 2018

  • Accepted: 03 December 2018

  • Published: 23 December 2018

  • Issue Date: December 2018

  • DOI: https://doi.org/10.1007/s41095-018-0128-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • layered image decomposition
  • primitive detection
  • biologically inspired vision
  • deep learning
Download PDF

Working on a manuscript?

Avoid the most common mistakes and prepare your manuscript for journal editors.

Learn more

Advertisement

Over 10 million scientific documents at your fingertips

Switch Edition
  • Academic Edition
  • Corporate Edition
  • Home
  • Impressum
  • Legal information
  • Privacy statement
  • California Privacy Statement
  • How we use cookies
  • Manage cookies/Do not sell my data
  • Accessibility
  • FAQ
  • Contact us
  • Affiliate program

Not affiliated

Springer Nature

© 2023 Springer Nature Switzerland AG. Part of Springer Nature.