Skip to main content

CAD-Deform: Deformable Fitting of CAD Models to 3D Scans

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12358))

Included in the following conference series:

Abstract

Shape retrieval and alignment are a promising avenue towards turning 3D scans into lightweight CAD representations that can be used for content creation such as mobile or AR/VR gaming scenarios. Unfortunately, CAD model retrieval is limited by the availability of models in standard 3D shape collections (e.g., ShapeNet). In this work, we address this shortcoming by introducing CAD-Deform (The code for the project: https://github.com/alexeybokhovkin/CAD-Deform), a method which obtains more accurate CAD-to-scan fits by non-rigidly deforming retrieved CAD models. Our key contribution is a new non-rigid deformation model incorporating smooth transformations and preservation of sharp features, that simultaneously achieves very tight fits from CAD models to the 3D scan and maintains the clean, high-quality surface properties of hand-modeled CAD objects. A series of thorough experiments demonstrate that our method achieves significantly tighter scan-to-CAD fits, allowing a more accurate digital replica of the scanned real-world environment while preserving important geometric features present in synthetic CAD environments.

V. Ishimtsev and A. Bokhovkin—equal contribution.

A. Artemov—Technical lead.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Achenbach, J., Zell, E., Botsch, M.: Accurate face reconstruction through anisotropic fitting and eye correction. In: VMV, pp. 1–8 (2015)

    Google Scholar 

  2. Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3D point clouds. In: International Conference on Machine Learning, pp. 40–49 (2018)

    Google Scholar 

  3. Amberg, B., Romdhani, S., Vetter, T.: Optimal step nonrigid ICP algorithms for surface registration. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)

    Google Scholar 

  4. Aubry, M., Maturana, D., Efros, A., Russell, B., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of cad models. In: CVPR (2014)

    Google Scholar 

  5. Avetisyan, A., Dahnert, M., Dai, A., Savva, M., Chang, A.X., Nießner, M.: Scan2CAD: learning CAD model alignment in RGB-D scans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2614–2623 (2019)

    Google Scholar 

  6. Avetisyan, A., Dai, A., Nießner, M.: End-to-end CAD model retrieval and 9DoF alignment in 3D scans. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2551–2560 (2019)

    Google Scholar 

  7. Botsch, M., Kobbelt, L.: An intuitive framework for real-time freeform modeling. ACM Trans. Graph. (TOG) 23(3), 630–634 (2004)

    Article  Google Scholar 

  8. Cagniart, C., Boyer, E., Ilic, S.: Iterative mesh deformation for dense surface tracking. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 1465–1472. IEEE (2009)

    Google Scholar 

  9. Chang, A.X., et al.: Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)

  10. Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, pp. 303–312. Association for Computing Machinery, New York (1996). https://doi.org/10.1145/237170.237269

  11. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)

    Google Scholar 

  12. Dai, A., Nießner, M., Zollöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3D reconstruction using on-the-fly surface re-integration. In: ACM Transactions on Graphics 2017 (TOG) (2017)

    Google Scholar 

  13. Dai, A., Ruizhongtai Qi, C., Nießner, M.: Shape completion using 3D-encoder-predictor CNNs and shape synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5868–5877 (2017)

    Google Scholar 

  14. Deng, H., Birdal, T., Ilic, S.: 3D local features for direct pairwise registration. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  15. Dey, T.K., Fu, B., Wang, H., Wang, L.: Automatic posing of a meshed human model using point clouds. Comput. Graph. 46, 14–24 (2015)

    Article  Google Scholar 

  16. Drost, B., Ilic, S.: 3D object detection and localization using multimodal point pair features. In: 3DIMPVT, pp. 9–16. IEEE Computer Society (2012)

    Google Scholar 

  17. Egiazarian, V., et al.: Latent-Space Laplacian Pyramids for Adversarial Representation Learning with 3D Point Clouds, December 2019

    Google Scholar 

  18. Firman, M., Mac Aodha, O., Julier, S., Brostow, G.J.: Structured prediction of unobserved voxels from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5431–5440 (2016)

    Google Scholar 

  19. Fröhlich, S., Botsch, M.: Example-driven deformations based on discrete shells. Comput. Graph. Forum 30, 2246–2257 (2011). https://doi.org/10.1111/j.1467-8659.2011.01974.x

    Article  Google Scholar 

  20. Grinspun, E., Hirani, A.N., Desbrun, M., Schröder, P.: Discrete shells. In: Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 2003, pp. 62–67. Eurographics Association, Goslar, DEU (2003)

    Google Scholar 

  21. Guo, R., Zou, C., Hoiem, D.: Predicting complete 3D models of indoor scenes. arXiv preprint arXiv:1504.02437 (2015)

  22. Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Aligning 3D models to RGB-D images of cluttered scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4731–4740 (2015)

    Google Scholar 

  23. He, L., Schaefer, S.: Mesh denoising via l0 minimization. In: Proceedings of ACM SIGGRAPH, pp. 64:1–64:8, January 2013

    Google Scholar 

  24. Huang, J., Su, H., Guibas, L.: Robust watertight manifold surface generation method for shapenet models. arXiv preprint arXiv:1802.01698 (2018)

  25. Izadi, S., et al.: Kinectfusion: real-time 3D reconstruction and interaction using a moving depth camera. In: UIST 2011 Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 559–568. ACM (2011)

    Google Scholar 

  26. Jacobson, A., Tosun, E., Sorkine, O., Zorin, D.: Mixed finite elements for variational surface modeling. In: Computer Graphics Forum, vol. 29, pp. 1565–1574. Wiley Online Library (2010)

    Google Scholar 

  27. Koch, S., et al.: ABC: a big cad model dataset for geometric deep learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  28. Li, Y., Dai, A., Guibas, L., Nießner, M.: Database-assisted object retrieval for real-time 3D reconstruction. Comput. Graph. Forum 34(2), 435–446 (2015)

    Article  Google Scholar 

  29. Liao, M., Zhang, Q., Wang, H., Yang, R., Gong, M.: Modeling Deformable Objects from a Single Depth Camera, pp. 167–174, November 2009. https://doi.org/10.1109/ICCV.2009.5459161

  30. Mattausch, O., Panozzo, D., Mura, C., Sorkine-Hornung, O., Pajarola, R.: Object detection and classification from large-scale cluttered indoor scans. In: Computer Graphics Forum, vol. 33, pp. 11–21. Wiley Online Library (2014)

    Google Scholar 

  31. Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 909–918 (2019)

    Google Scholar 

  32. Newcombe, R.A., et al.: Kinectfusion: real-time dense surface mapping and tracking. In: IEEE ISMAR. IEEE, October 2011

    Google Scholar 

  33. Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. In: ACM Transactions on Graphics (TOG) (2013)

    Google Scholar 

  34. Park, S.I., Lim, S.J.: Template-based reconstruction of surface mesh animation from point cloud animation. ETRI J. 36(6), 1008–1015 (2014)

    Article  Google Scholar 

  35. Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp. 145–152. IEEE (2001)

    Google Scholar 

  36. Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H.J., Davison, A.J.: Slam++: simultaneous localisation and mapping at the level of objects. In: CVPR, pp. 1352–1359. IEEE Computer Society (2013)

    Google Scholar 

  37. Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1746–1754 (2017)

    Google Scholar 

  38. Sorkine, O., Alexa, M.: As-rigid-as-possible surface modeling. In: Symposium on Geometry Processing, vol. 4, pp. 109–116 (2007)

    Google Scholar 

  39. Stoll, C., Karni, Z., Rössl, C., Yamauchi, H., Seidel, H.P.: Template deformation for point cloud fitting. In: SPBG, pp. 27–35 (2006)

    Google Scholar 

  40. Sungjoon Choi, Zhou, Q., Koltun, V.: Robust reconstruction of indoor scenes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5556–5565, June 2015. https://doi.org/10.1109/CVPR.2015.7299195

  41. Váša, L., Rus, J.: Dihedral angle mesh error: a fast perception correlated distortion measure for fixed connectivity triangle meshes. Comput. Graph. Forum 31(5), 1715–1724 (2012). https://doi.org/10.1111/j.1467-8659.2012.03176.x, https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8659.2012.03176.x

  42. Whelan, T., Leutenegger, S., Salas-Moreno, R.F., Glocker, B., Davison, A.J.: ElasticFusion: dense SLAM without a pose graph. In: Robotics: Science and Systems (RSS), Rome, Italy, July 2015

    Google Scholar 

  43. Zhou, X., Karpur, A., Gan, C., Luo, L., Huang, Q.: Unsupervised domain adaptation for 3D keypoint estimation via view consistency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 141–157. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_9

    Chapter  Google Scholar 

Download references

Acknowledgements

The authors acknowledge the usage of the Skoltech CDISE HPC cluster Zhores for obtaining the results presented in this paper. The work was partially supported by the Russian Science Foundation under Grant 19-41-04109.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexey Bokhovkin .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 7601 KB)

Appendices

A Statistics on the Used Datasets

In Tables 4 and 5, we summarize statistical information on the number of instances and categories considered in our evaluation. As we require parts annotations as an important ingredient in our deformation, we only select instances in Scan2CAD  [5] where the associated parts annotation in PartNet  [31] is available, resulting in total in 9 categories (25%), 572 instances (18%), and 1979 annotated correspondences (14%). Note that the vast majority of cases remain within our consideration, keeping our evaluation comprehensive.

Table 4. Overall statistics on the numbers of categories, instances, and correspondences present in our datasets.
Table 5. The top 15 most frequent ShapeNet categories in Scan2CAD dataset including a detailed information on those with the availability of the corresponding parts annotations.

We further select the most well-presented six shape categories as our core evaluation set, outlined in Table 5. Note that as our method is non-learnable, we can just as easily experiment with the remaining categories, at the cost of somewhat reduced statistical power.

B Optimization Details

Our full experimental pipeline is a sequence of deformation stages with different optimization parameters, and Hessian being recomputed before each stage. Specifically, we perform one part-to-part optimization with parameters \(\alpha _{\text {shape}} = 1, \alpha _{\text {smooth}} = 0, \alpha _{\text {sharp}} = 0, \alpha _{\text {data}} = 5\times 10^{4}\) for 100 iterations, then we perform 5 runs of nearest-neighbor deformation for 50 iterations with parameters \(\alpha _{\text {shape}} = 1, \alpha _{\text {smooth}} = 10, \alpha _{\text {sharp}} = 10, \alpha _{\text {data}} = 10^3\). Such number of iterations was sufficient to achieve convergence with energy changes less than \(10^{-1}\) in our experiments. Runtime of our method breaks into cost computation (\(\mathtt {\sim }0.3\) s), backward (\(\mathtt {\sim }0.2\) s), and optimization steps containing the main bottleneck (sparse matrix-vector multiplication) (\(\mathtt {\sim }1.2\) s) for a typical \(10^4\) vertices mesh. All operations can be easily further optimized.

C Qualitative Fitting Results

In Fig. 6, we display a series of qualitative results with a variety of shape deformations with different classes of instances. Comparing to baselines, our framework achieves accurate fit while preserving sufficient perceptual quality.

Table 6. Quantitative results of local surface quality evaluation using DAME measure  [41] (the smaller, the better, normalized to a maximum score of 100), where our CAD-Deform compares favourably to the baselines across all considered categories. Note, however, how surface quality significantly decreases when smoothness and sharp feature-related terms are dropped.
Table 7. Results of LSLP-GAN reconstruction in terms of Earth-Mover’s Distance between reconstructed and original point clouds of mesh vertices.

Table 6 reports the results of surface quality evaluation using deformations obtained with our CAD-Deform vs. the baselines, category-wise. While outperforming the baseline methods across all categories, we discover the smoothness and sharpness energy terms to be the crucial ingredients in keeping high-quality meshes.

Figure 7 displays visually the deformation results using the three distinct classes, highlighting differences in surfaces obtained using the three methods.

Table 7 reports shape abnormality evaluation results across the six considered categories. Baselines show (Fig. 8) low reconstruction quality as evidenced by a larger number of black points. In other words, comparing to CAD-Deform, the distance from these meshes to undeformed ones is mush larger.

In Fig. 9, we show a series of examples for CAD-Deform ablation study. Perceptual quality degrades when excluding every term from the energy.

Table 8. Comparative evaluation of our approach in terms of Accuracy on different levels of detail.
Table 9. Comparative evaluation of ARAP deformations w.r.t. the change of Laplacian term weight in terms of Accuracy (%).

D Morphing

In this section, we present an additional series of examples of morphing properties (Fig. 10). Every iteration of optimization process gradually increases the quality of fit. With CAD-Deform we can morph each part to imitate the structure of the target shape.

E PartNet Annotation

This set of experiments shows how quality of fitting depends on mesh vertices labelling. We can provide labels for mesh in different ways depending on the level in PartNet hierarchy [31]. We observe the increase of fitting quality with greater level of detail (Table 8). Examples presented in Fig. 11 are selected as the most distinguishable deformations on different levels. There are minor visual differences in deformation performance of part labeling level.

F Fitting Accuracy Analysis

CAD-Deform deformation framework is sensitive to Accuracy threshold \(\tau \) for the distance between mesh vertices and close scan points. In Fig. 12 variation of \(\tau \) threshold is presented and we selected \(\tau = 0.2\text {~m}\) for fitting Accuracy metric.

G Perceptual Assessment and User Study Details

Having obtained a collection of deformed meshes, we aim to assess their visual quality in comparison to two baseline deformation methods: as-rigid-as-possible (ARAP)  [38] and Harmonic deformation  [7, 26], using a set of perceptual quality measures. The details of our user study design and visual assessment are provided in the supplementary. To this end, we use original and deformed meshes to compute DAME and reconstruction errors, as outlined in Sect. 6.1, and complement these with visual quality scores obtained with a user study (see below). These scores, presented in Table 3, demonstrate that shapes obtained using CAD-Deform have \(2\times \) higher surface quality, only slightly deviate from undeformed shapes as viewed by neural autoencoders, and receive \(2\times \) higher ratings in human assessment, while sacrificing only 1.1–4.5 % accuracy compared to other deformation methods.

Table 10. Comparative evaluation of Harmonic deformations w.r.t. the change of Laplacian term weight in terms of Accuracy (%).
Fig. 6.
figure 6

Qualitative shape deformation results using obtained using ARAP  [38], Harmonic deformation  [7, 26], and our CAD-Deform. Mesh surface is colored according to the value of tMMD measure, with darker values corresponding to the larger distance values.

Fig. 7.
figure 7

Qualitative comparison of deformations obtained using ARAP  [38], Harmonic deformation  [7, 26], and our CAD-Deform, with shapes coloured according to the value of DAME measure  [41]. Our approach results in drastic improvements in local surface quality, producing higher-quality surfaces compared to other deformations.

Fig. 8.
figure 8

Qualitative comparison of reconstruction of point clouds extracted from mesh vertices. These meshes are obtained using ARAP  [38], Harmonic deformation  [7, 26], and our CAD-Deform, the first column corresponds to original undeformed meshes. The color of reconstructed point clouds is related to Earth-Mover’s Distance between reconstructed and original point clouds of mesh vertices.

Design of Our User Study. The users were requested to examine renders of shapes from four different categories: the original undeformed shapes as well as shapes deformed using ARAP, Harmonic, and CAD-Deform methods, and give a score to each shape according to the following perceptual aspects: surface quality and smoothness, mesh symmetry, visual similarity to real-world objects, and overall consistency. Ten random shapes from each of the four categories have been rendered from eight different views and scored by 100 unique users on a scale from 1 (bad) to 10 (good). The resulting visual quality scores are computed by averaging over users and shapes in each category.

In Fig. 13, we present a distribution of user scores over different deformation methods and shapes. It can be clearly seen that users prefer our deformation results to baselines for all of the cases, which is obvious from the gap between histogram of CAD-Deform and ARAP/Harmonic histograms. At the same time, shapes deformed by CAD-Deform are close to undeformed ShapeNet shapes in terms of surface quality and smoothness, mesh symmetry, visual similarity to real-world objects, and overall consistency. Besides, in Tables 9, 10 we provide numbers for evaluation of ARAP/Harmonic deformations w.r.t. the change of Laplacian term weight.

Fig. 9.
figure 9

Qualitative results of ablation study usind our deformation framework, with mesh coloured according to the value of the tMMD measure.

Fig. 10.
figure 10

Qualitative shape translation results, interpolating between the original mesh (left) and the target mesh (right).

Fig. 11.
figure 11

Deformation performance depending on different level of labelling from the PartNet dataset [31]. Deformed mesh surfaces are colored according to the value of tMMD measure, with darker values corresponding to the larger distance values.

Fig. 12.
figure 12

Fitting Accuracy vs. varying \(\tau \) threshold for the distance between mesh vertices and close scan points.

Fig. 13.
figure 13

Distribution of user scores averaged by ten shapes from original ShapeNet [9], meshes deformed with ARAP [38], Harmonic [7, 26] and CAD-Deform.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ishimtsev, V. et al. (2020). CAD-Deform: Deformable Fitting of CAD Models to 3D Scans. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12358. Springer, Cham. https://doi.org/10.1007/978-3-030-58601-0_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58601-0_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58600-3

  • Online ISBN: 978-3-030-58601-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics