DeepPrimitive: Image decomposition by layered primitive detection

Huang, Jiahui; Gao, Jun; Ganapathi-Subramanian, Vignesh; Su, Hao; Liu, Yin; Tang, Chengcheng; Guibas, Leonidas J.

doi:10.1007/s41095-018-0128-6

DeepPrimitive: Image decomposition by layered primitive detection

Research Article
Open access
Published: 23 December 2018

Volume 4, pages 385–397, (2018)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

DeepPrimitive: Image decomposition by layered primitive detection

Download PDF

Jiahui Huang¹,
Jun Gao²,
Vignesh Ganapathi-Subramanian³,
Hao Su⁴,
Yin Liu⁵,
Chengcheng Tang³ &
…
Leonidas J. Guibas³

1420 Accesses
4 Citations
Explore all metrics

Abstract

The perception of the visual world through basic building blocks, such as cubes, spheres, and cones, gives human beings a parsimonious understanding of the visual world. Thus, efforts to find primitive-based geometric interpretations of visual data date back to 1970s studies of visual media. However, due to the difficulty of primitive fitting in the pre-deep learning age, this research approach faded from the main stage, and the vision community turned primarily to semantic image understanding. In this paper, we revisit the classical problem of building geometric interpretations of images, using supervised deep learning tools. We build a framework to detect primitives from images in a layered manner by modifying the YOLO network; an RNN with a novel loss function is then used to equip this network with the capability to predict primitives with a variable number of parameters. We compare our pipeline to traditional and other baseline learning methods, demonstrating that our layered detection model has higher accuracy and performs better reconstruction.

Article PDF

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Deep learning models for digital image processing: a review

Article 07 January 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779–788, 2016.
Google Scholar
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6517–6525, 2017.
Google Scholar
Roberts, L. G. Machine perception of three-dimensional solids. Ph.D. Thesis. Massachusetts Institute of Technology, 1963.
Google Scholar
Binford, T. O. Visual perception by computer. In: Proceedings of the IEEE Conference on Systems and Control, 1971.
Google Scholar
Biederman, I. Recognition-by-components: A theory of human image understanding. Psychological Review Vol. 94, No. 2, 115–147, 1987.
Article Google Scholar
Bellver, M.; Giro-i-Nieto, X.; Marques, F.; Torres, J. Hierarchical object detection with deep reinforcement learning. In: Proceedings of the Deep Reinforcement Learning Workshop, NIPS, 2016.
Google Scholar
Ballard, D. H. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition Vol. 13, No. 2, 111–122, 1981.
Article MATH Google Scholar
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137–1149, 2017.
Article Google Scholar
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A. C. SSD: Single shot multibox detector. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 21–37, 2016.
Google Scholar
Higgins, I.; Sonnerat, N.; Matthey, L.; Pal, A.; Burgess, C.; Botvinick, M.; Hassabis, D.; Lerchner, A. SCAN: Learning abstract hierarchical compositional visual concepts. arXiv preprint arXiv:1707.03389, 2017.
Google Scholar
Lake, B. M.; Salakhutdinov, R.; Tenenbaum, J. B. Human-level concept learning through probabilistic program induction. Science Vol. 350, No. 6266, 1332–1338, 2015.
Article MathSciNet MATH Google Scholar
Rogers, D. F.; Fog, N. Constrained B-spline curve and surface fitting. Computer-Aided Design Vol. 21, No. 10, 641–648, 1989.
Article MATH Google Scholar
Besl, P. J.; McKay, N. D. A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 14, No. 2, 239–256, 1992.
Article Google Scholar
Chen, Y.; Medioni, G. Object modeling by registration of multiple range images. In: Proceedings of the IEEE International Conference on Robotics and Automation, 2724–2729, 1991.
Google Scholar
Wang, W.; Pottmann, H.; Liu, Y. Fitting B-spline curves to point clouds by curvature-based squared distance minimization. ACM Transactions on Graphics Vol. 25, No. 2, 214–238, 2006.
Article Google Scholar
Zheng, W.; Bo, P.; Liu, Y.; Wang, W. Fast B-spline curve fitting by L-BFGS. Computer Aided Geometric Design Vol. 29, No. 7, 448–462, 2012.
Article MathSciNet MATH Google Scholar
Sun, J.; Liang, L.; Wen, F.; Shum, H.-Y. Image vectorization using optimized gradient meshes. ACM Transactions on Graphics Vol. 26, No. 3, Article No. 11, 2007.
Google Scholar
Lecot, G.; Levy, B. Ardeco: Automatic region detection and conversion. In: Proceedings of the 17th Eurographics Symposium on Rendering Techniques, 349–360, 2006.
Google Scholar
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2117–2125, 2017.
Google Scholar
Sharma, G.; Goyal, R.; Liu, D.; Kalogerakis, E.; Maji, S. CSGNet: Neural shape parser for constructive solid geometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5515–5523, 2018.
Google Scholar
Gers, F. A.; Schraudolph, N. N.; Schmidhuber, J. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research Vol. 3, No. 1, 115–143, 2002.
MathSciNet MATH Google Scholar
Cho, K.; Merriënboer, B. V.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
Book Google Scholar
Castrejón, L.; Kundu, K.; Urtasun, R.; Fidler, S. Annotating object instances with a polygon-RNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5230–5238, 2017.
Google Scholar
Jetley, S.; Sapienza, M.; Golodetz, S.; Torr, P. H. S. Straight to shapes: Real-time detection of encoded shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4207–4216, 2017.
Google Scholar
Girshick, R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 1440–1448, 2015.
Google Scholar
Bengio, S.; Vinyals, O.; Jaitly, N.; Shazeer, N. Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems 28. Cortes, C.; Lawrence, N. D.; Lee, D. D.; Sugiyama, M.; Garnett, R. Eds. Curran Associates, Inc., 1171–1179, 2015.
Google Scholar
Wu, J.; Tenenbaum, J. B.; Kohli, P. Neural scene derendering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
Google Scholar
Itseez. Open source computer vision library. 2015. Available at https://doi.org/github.com/itseez/opencv.
Duda, R. O.; Hart, P. E. Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM Vol. 15, No. 1, 11–15, 1972.
Article MATH Google Scholar
Xie, Y.; Ji, Q. A new efficient ellipse detection method. In: Proceedings of the IEEE International Conference on Pattern Recognition, Vol. 2, 957–960, 2002.
Google Scholar
Google. Google material icon. 2017. Available at https://doi.org/material.io/icons/.
Everingham, M. The PASCAL Visual Object Classes Challenge 2012 (VOC2012). Available at https://doi.org/www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
Barnes, C.; Shechtman, E.; Finkelstein, A.; Goldman, D. B. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics Vol. 28, No. 3, Article No. 24, 2009.
Google Scholar

Download references

Acknowledgements

Chengcheng Tang would like to acknowledge NSF grant IIS-1528025, a Google Focused Research award, a gift from the Adobe Corporation, and a gift from the NVIDIA Corporation.

Author information

Authors and Affiliations

Tsinghua University, Beijing, 100084, China
Jiahui Huang
Computer Science Department, University of Toronto, Toronto, M5S2E4, Canada
Jun Gao
Stanford University, Stanford, 94305, USA
Vignesh Ganapathi-Subramanian, Chengcheng Tang & Leonidas J. Guibas
University of California San Diego, La Jolla, 92093, USA
Hao Su
University of Wisconsin-Madison, Madison, 53715, USA
Yin Liu

Authors

Jiahui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Gao
View author publications
You can also search for this author in PubMed Google Scholar
Vignesh Ganapathi-Subramanian
View author publications
You can also search for this author in PubMed Google Scholar
Hao Su
View author publications
You can also search for this author in PubMed Google Scholar
Yin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chengcheng Tang
View author publications
You can also search for this author in PubMed Google Scholar
Leonidas J. Guibas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiahui Huang.

Additional information

Jiahui Huang received his B.S. degree in computer science and technology from Tsinghua University in 2018. He is currently a Ph.D. candidate in computer science in Tsinghua University. His research interests include computer vision and computer graphics.

Jun Gao received his B.S. degree in computer science from Peking University in 2018. He is a graduate student in the Machine Learning Group at the University of Toronto and also affiliates to the Vector Institute. His research interests are in deep learning and computer vision.

Vignesh G. Subramanian is a Ph.D. candidate in the Department of Electrical Engineering, Stanford University. He previously obtained his dual degrees (B.Tech. in EE and M.Tech. in communication engineering) from IIT Madras, India. His research interests include shape correspondences, 3D geometry, graphics, and vision.

Hao Su received his Ph.D. degree from Stanford University, under the supervision from Leonidas Guibas. He joined UC San Diego in 2017 and is currently an assistant professor of computer science and engineering. His research interests include computer vision, computer graphics, machine learning, robotics, and optimization. More details of his research can be found at https://doi.org/ai.ucsd.edu/haosu.

Yin Liu received his B.S. degree from Department of Automation of Tsinghua University in 2018. He is currently a Ph.D. candidate in computer science at the University of Wisconsin-Madison. His research interest is in machine learning.

Chengcheng Tang received his Ph.D. and M.S. degrees from King Abdullah University of Science and Technology (KAUST) in 2015 and 2011, respectively, and his bachelor degree from Jilin University in 2009. He is currently a postdoctoral scholar in the Computer Science Department at Stanford University. His research interests include computer graphics, geometric computing, computational design, and machine learning.

Leonidas J. Guibas received his Ph.D. degree from Stanford University in 1976, under the supervision of Donald Knuth. His main subsequent employers were Xerox PARC, MIT, and DEC/SRC. Since 1984, he has been at Stanford University, where he is a professor of computer science. His research interests include computational geometry, geometric modeling, computer graphics, computer vision, sensor networks, robotics, and discrete algorithms. He is a senior member of the IEEE and the IEEE Computer Society. More details about his research can be found at https://doi.org/geometry.stanford.edu/member/guibas/.

Electronic supplementary material