Interactive Annotation of 3D Object Geometry Using 2D Scribbles

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12362)


Inferring detailed 3D geometry of the scene is crucial for robotics applications, simulation, and 3D content creation. However, such information is hard to obtain, and thus very few datasets support it. In this paper, we propose an interactive framework for annotating 3D object geometry from both point cloud data and RGB imagery. The key idea behind our approach is to exploit strong priors that humans have about the 3D world in order to interactively annotate complete 3D shapes. Our framework targets naive users without artistic or graphics expertise. We introduce two simple-to-use interaction modules. First, we make an automatic guess of the 3D shape and allow the user to provide feedback about large errors by drawing scribbles in desired 2D views. Next, we aim to correct minor errors, in which users drag and drop mesh vertices, assisted by a neural interactive module implemented as a Graph Convolutional Network. Experimentally, we show that only a few user interactions are needed to produce good quality 3D shapes on popular benchmarks such as ShapeNet, Pix3D and ScanNet. We implement our framework as a web service and conduct a user study, where we show that user annotated data using our method effectively facilitates real-world learning tasks. Web service:



We thank Louis Clergue for assistance with developing the web tool and extended discussion. This work was supported by NSERC. SF acknowledges the Canada CIFAR AI Chair award at the Vector Institute.

Supplementary material (62 mb)
Supplementary material 1 (zip 63485 KB)


  1. 1.
    Acuna, D., Ling, H., Kar, A., Fidler, S.: Efficient interactive annotation of segmentation datasets with Polygon-RNN++. In: CVPR, pp. 859–868 (2018)Google Scholar
  2. 2.
    Avetisyan, A., Dahnert, M., Dai, A., Savva, M., Chang, A.X., Niessner, M.: Scan2CAD: learning CAD model alignment in RGB-D scans. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  3. 3.
    Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
  4. 4.
    Chen, T., Zhu, Z., Shamir, A., Hu, S.M., Cohen-Or, D.: 3-Sweep: extracting editable objects from a single photo. ACM Trans. Graph. (TOG) 32(6), 195 (2013)Google Scholar
  5. 5.
    Chen, W., et al.: Learning to predict 3d objects with an interpolation-based differentiable renderer. In: Advances In Neural Information Processing Systems (2019)Google Scholar
  6. 6.
    Choi, S., Zhou, Q.Y., Miller, S., Koltun, V.: A large dataset of object scans. arXiv:1602.02481 (2016)
  7. 7.
    Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). Scholar
  8. 8.
    Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017)Google Scholar
  9. 9.
    Delanoy, J., Aubry, M., Isola, P., Efros, A.A., Bousseau, A.: 3D sketching using multi-view deep volumetric prediction. Proc. ACM Comput. Graph. Interact. Tech. 1(1), 21 (2018)CrossRefGoogle Scholar
  10. 10.
    Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 605–613 (2017)Google Scholar
  11. 11.
    Funkhouser, T., et al.: A search engine for 3D models. ACM Trans. Graph. (TOG) 22(1), 83–105 (2003)CrossRefGoogle Scholar
  12. 12.
    Gao, J., Tang, C., Ganapathi-Subramanian, V., Huang, J., Su, H., Guibas, L.J.: DeepSpline: data-driven reconstruction of parametric curves and surfaces. arXiv preprint arXiv:1901.03781 (2019)
  13. 13.
    Gingold, Y., Igarashi, T., Zorin, D.: Structured annotations for 2D-to-3D modeling. ACM Trans. Graph. (TOG) 28, 148 (2009)CrossRefGoogle Scholar
  14. 14.
    Gkioxari, G., Malik, J., Johnson, J.: Mesh R-CNN. arXiv preprint arXiv:1906.02739 (2019)
  15. 15.
    Groueix, T., Fisher, M., Kim, V.G., Russell, B., Aubry, M.: AtlasNet: a Papier-Mâché approach to learning 3D surface generation. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  16. 16.
    Guo, R., Hoiem, D.: Support surface prediction in indoor scenes. In: ICCV, pp. 2144–2151 (2013)Google Scholar
  17. 17.
    Huang, X., Wang, P., Cheng, X., Zhou, D., Geng, Q., Yang, R.: The apolloscape open dataset for autonomous driving and its application. arXiv:1803.06184 (2018)
  18. 18.
    Iskakov, K., Burkov, E., Lempitsky, V., Malkov, Y.: Learnable triangulation of human pose. In: International Conference on Computer Vision (ICCV) (2019)Google Scholar
  19. 19.
    Jatavallabhula, K.M., et al.: Kaolin: a pytorch library for accelerating 3D deep learning research. arXiv:1911.05063 (2019)
  20. 20.
    Lee, J., Funkhouser, T.A.: Sketch-based search and composition of 3D models. In: SBM, pp. 97–104 (2008)Google Scholar
  21. 21.
    Lim, J.J., Pirsiavash, H., Torralba, A.: Parsing IKEA objects: fine pose estimation. In: ICCV (2013)Google Scholar
  22. 22.
    Ling, H., Gao, J., Kar, A., Chen, W., Fidler, S.: Fast interactive object annotation with Curve-GCN. In: CVPR, pp. 5257–5266 (2019)Google Scholar
  23. 23.
    Lipson, H., Shpitalni, M.: Conceptual design and analysis by sketching. AI EDAM 14(5), 391–401 (2000)Google Scholar
  24. 24.
    Liu, J., Yu, F., Funkhouser, T.: Interactive 3D modeling with a generative adversarial network. In: 2017 International Conference on 3D Vision (3DV), pp. 126–134. IEEE (2017)Google Scholar
  25. 25.
    Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)Google Scholar
  26. 26.
    Oh, S.W., Lee, J., Xu, N., Kim, S.J.: Fast user-guided video object segmentation by interaction-and-propagation networks. CoRR abs/1904.09791 (2019).
  27. 27.
    Shtof, A., Agathos, A., Gingold, Y., Shamir, A., Cohen-Or, D.: Geosemantic snapping for sketch-based modeling. Comput. Graph. Forum 32, 245–253 (2013)CrossRefGoogle Scholar
  28. 28.
    Smith, E., Fujimoto, S., Meger, D.: Multi-view silhouette and depth decomposition for high resolution 3D object representation. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 6479–6489. Curran Associates, Inc., Red Hook (2018)Google Scholar
  29. 29.
    Sun, X., et al.: Pix3D: dataset and methods for single-image 3D shape modeling. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  30. 30.
    Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.-G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 55–71. Springer, Cham (2018). Scholar
  31. 31.
    Wang, Z., Acuna, D., Ling, H., Kar, A., Fidler, S.: Object instance annotation with deep extreme level set evolution. In: CVPR (2019)Google Scholar
  32. 32.
    Xiang, Y., et al.: ObjectNet3D: a large scale database for 3D object recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 160–176. Springer, Cham (2016). Scholar
  33. 33.
    Xie, X., et al.: Sketch-to-design: context-based part assembly. Comput. Graph. Forum 32, 233–245 (2013)CrossRefGoogle Scholar
  34. 34.
    Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: DISN: deep implicit surface network for high-quality single-view 3D reconstruction. arXiv preprint arXiv:1905.10711 (2019)
  35. 35.
    Zhang, X., Zhang, Z., Zhang, C., Tenenbaum, J.B., Freeman, W.T., Wu, J.: Learning to reconstruct shapes from unseen classes. In: NeurIPS (2018)Google Scholar
  36. 36.
    Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 597–613. Springer, Cham (2016). Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of TorontoTorontoCanada
  2. 2.Vector InstituteTorontoCanada
  3. 3.NvidiaSanta ClaraUSA

Personalised recommendations