CAD-Deform: Deformable Fitting of CAD Models to 3D Scans

Ishimtsev, Vladislav; Bokhovkin, Alexey; Artemov, Alexey; Ignatyev, Savva; Niessner, Matthias; Zorin, Denis; Burnaev, Evgeny

doi:10.1007/978-3-030-58601-0_36

Vladislav Ishimtsev¹²,
Alexey Bokhovkin¹²,
Alexey Artemov¹²,
Savva Ignatyev¹²,
Matthias Niessner¹³,
Denis Zorin^12,14 &
…
Evgeny Burnaev¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12358))

Included in the following conference series:

European Conference on Computer Vision

3188 Accesses
10 Citations

Abstract

Shape retrieval and alignment are a promising avenue towards turning 3D scans into lightweight CAD representations that can be used for content creation such as mobile or AR/VR gaming scenarios. Unfortunately, CAD model retrieval is limited by the availability of models in standard 3D shape collections (e.g., ShapeNet). In this work, we address this shortcoming by introducing CAD-Deform (The code for the project: https://github.com/alexeybokhovkin/CAD-Deform), a method which obtains more accurate CAD-to-scan fits by non-rigidly deforming retrieved CAD models. Our key contribution is a new non-rigid deformation model incorporating smooth transformations and preservation of sharp features, that simultaneously achieves very tight fits from CAD models to the 3D scan and maintains the clean, high-quality surface properties of hand-modeled CAD objects. A series of thorough experiments demonstrate that our method achieves significantly tighter scan-to-CAD fits, allowing a more accurate digital replica of the scanned real-world environment while preserving important geometric features present in synthetic CAD environments.

V. Ishimtsev and A. Bokhovkin—equal contribution.

A. Artemov—Technical lead.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Achenbach, J., Zell, E., Botsch, M.: Accurate face reconstruction through anisotropic fitting and eye correction. In: VMV, pp. 1–8 (2015)
Google Scholar
Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3D point clouds. In: International Conference on Machine Learning, pp. 40–49 (2018)
Google Scholar
Amberg, B., Romdhani, S., Vetter, T.: Optimal step nonrigid ICP algorithms for surface registration. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Google Scholar
Aubry, M., Maturana, D., Efros, A., Russell, B., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of cad models. In: CVPR (2014)
Google Scholar
Avetisyan, A., Dahnert, M., Dai, A., Savva, M., Chang, A.X., Nießner, M.: Scan2CAD: learning CAD model alignment in RGB-D scans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2614–2623 (2019)
Google Scholar
Avetisyan, A., Dai, A., Nießner, M.: End-to-end CAD model retrieval and 9DoF alignment in 3D scans. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2551–2560 (2019)
Google Scholar
Botsch, M., Kobbelt, L.: An intuitive framework for real-time freeform modeling. ACM Trans. Graph. (TOG) 23(3), 630–634 (2004)
Article Google Scholar
Cagniart, C., Boyer, E., Ilic, S.: Iterative mesh deformation for dense surface tracking. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 1465–1472. IEEE (2009)
Google Scholar
Chang, A.X., et al.: Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)
Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, pp. 303–312. Association for Computing Machinery, New York (1996). https://doi.org/10.1145/237170.237269
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
Google Scholar
Dai, A., Nießner, M., Zollöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3D reconstruction using on-the-fly surface re-integration. In: ACM Transactions on Graphics 2017 (TOG) (2017)
Google Scholar
Dai, A., Ruizhongtai Qi, C., Nießner, M.: Shape completion using 3D-encoder-predictor CNNs and shape synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5868–5877 (2017)
Google Scholar
Deng, H., Birdal, T., Ilic, S.: 3D local features for direct pairwise registration. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Dey, T.K., Fu, B., Wang, H., Wang, L.: Automatic posing of a meshed human model using point clouds. Comput. Graph. 46, 14–24 (2015)
Article Google Scholar
Drost, B., Ilic, S.: 3D object detection and localization using multimodal point pair features. In: 3DIMPVT, pp. 9–16. IEEE Computer Society (2012)
Google Scholar
Egiazarian, V., et al.: Latent-Space Laplacian Pyramids for Adversarial Representation Learning with 3D Point Clouds, December 2019
Google Scholar
Firman, M., Mac Aodha, O., Julier, S., Brostow, G.J.: Structured prediction of unobserved voxels from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5431–5440 (2016)
Google Scholar
Fröhlich, S., Botsch, M.: Example-driven deformations based on discrete shells. Comput. Graph. Forum 30, 2246–2257 (2011). https://doi.org/10.1111/j.1467-8659.2011.01974.x
Article Google Scholar
Grinspun, E., Hirani, A.N., Desbrun, M., Schröder, P.: Discrete shells. In: Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 2003, pp. 62–67. Eurographics Association, Goslar, DEU (2003)
Google Scholar
Guo, R., Zou, C., Hoiem, D.: Predicting complete 3D models of indoor scenes. arXiv preprint arXiv:1504.02437 (2015)
Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Aligning 3D models to RGB-D images of cluttered scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4731–4740 (2015)
Google Scholar
He, L., Schaefer, S.: Mesh denoising via l0 minimization. In: Proceedings of ACM SIGGRAPH, pp. 64:1–64:8, January 2013
Google Scholar
Huang, J., Su, H., Guibas, L.: Robust watertight manifold surface generation method for shapenet models. arXiv preprint arXiv:1802.01698 (2018)
Izadi, S., et al.: Kinectfusion: real-time 3D reconstruction and interaction using a moving depth camera. In: UIST 2011 Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 559–568. ACM (2011)
Google Scholar
Jacobson, A., Tosun, E., Sorkine, O., Zorin, D.: Mixed finite elements for variational surface modeling. In: Computer Graphics Forum, vol. 29, pp. 1565–1574. Wiley Online Library (2010)
Google Scholar
Koch, S., et al.: ABC: a big cad model dataset for geometric deep learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Li, Y., Dai, A., Guibas, L., Nießner, M.: Database-assisted object retrieval for real-time 3D reconstruction. Comput. Graph. Forum 34(2), 435–446 (2015)
Article Google Scholar
Liao, M., Zhang, Q., Wang, H., Yang, R., Gong, M.: Modeling Deformable Objects from a Single Depth Camera, pp. 167–174, November 2009. https://doi.org/10.1109/ICCV.2009.5459161
Mattausch, O., Panozzo, D., Mura, C., Sorkine-Hornung, O., Pajarola, R.: Object detection and classification from large-scale cluttered indoor scans. In: Computer Graphics Forum, vol. 33, pp. 11–21. Wiley Online Library (2014)
Google Scholar
Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 909–918 (2019)
Google Scholar
Newcombe, R.A., et al.: Kinectfusion: real-time dense surface mapping and tracking. In: IEEE ISMAR. IEEE, October 2011
Google Scholar
Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. In: ACM Transactions on Graphics (TOG) (2013)
Google Scholar
Park, S.I., Lim, S.J.: Template-based reconstruction of surface mesh animation from point cloud animation. ETRI J. 36(6), 1008–1015 (2014)
Article Google Scholar
Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp. 145–152. IEEE (2001)
Google Scholar
Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H.J., Davison, A.J.: Slam++: simultaneous localisation and mapping at the level of objects. In: CVPR, pp. 1352–1359. IEEE Computer Society (2013)
Google Scholar
Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1746–1754 (2017)
Google Scholar
Sorkine, O., Alexa, M.: As-rigid-as-possible surface modeling. In: Symposium on Geometry Processing, vol. 4, pp. 109–116 (2007)
Google Scholar
Stoll, C., Karni, Z., Rössl, C., Yamauchi, H., Seidel, H.P.: Template deformation for point cloud fitting. In: SPBG, pp. 27–35 (2006)
Google Scholar
Sungjoon Choi, Zhou, Q., Koltun, V.: Robust reconstruction of indoor scenes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5556–5565, June 2015. https://doi.org/10.1109/CVPR.2015.7299195
Váša, L., Rus, J.: Dihedral angle mesh error: a fast perception correlated distortion measure for fixed connectivity triangle meshes. Comput. Graph. Forum 31(5), 1715–1724 (2012). https://doi.org/10.1111/j.1467-8659.2012.03176.x, https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8659.2012.03176.x
Whelan, T., Leutenegger, S., Salas-Moreno, R.F., Glocker, B., Davison, A.J.: ElasticFusion: dense SLAM without a pose graph. In: Robotics: Science and Systems (RSS), Rome, Italy, July 2015
Google Scholar
Zhou, X., Karpur, A., Gan, C., Luo, L., Huang, Q.: Unsupervised domain adaptation for 3D keypoint estimation via view consistency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 141–157. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_9
Chapter Google Scholar

Download references

Acknowledgements

The authors acknowledge the usage of the Skoltech CDISE HPC cluster Zhores for obtaining the results presented in this paper. The work was partially supported by the Russian Science Foundation under Grant 19-41-04109.

Author information

Authors and Affiliations

Skolkovo Institute of Science and Technology, Moscow, Russia
Vladislav Ishimtsev, Alexey Bokhovkin, Alexey Artemov, Savva Ignatyev, Denis Zorin & Evgeny Burnaev
Technical University of Munich, Munich, Germany
Matthias Niessner
New York University, New York, USA
Denis Zorin

Authors

Vladislav Ishimtsev
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Bokhovkin
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Artemov
View author publications
You can also search for this author in PubMed Google Scholar
Savva Ignatyev
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Niessner
View author publications
You can also search for this author in PubMed Google Scholar
Denis Zorin
View author publications
You can also search for this author in PubMed Google Scholar
Evgeny Burnaev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexey Bokhovkin .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 7601 KB)

Appendices

A Statistics on the Used Datasets

In Tables 4 and 5, we summarize statistical information on the number of instances and categories considered in our evaluation. As we require parts annotations as an important ingredient in our deformation, we only select instances in Scan2CAD [5] where the associated parts annotation in PartNet [31] is available, resulting in total in 9 categories (25%), 572 instances (18%), and 1979 annotated correspondences (14%). Note that the vast majority of cases remain within our consideration, keeping our evaluation comprehensive.

Table 4. Overall statistics on the numbers of categories, instances, and correspondences present in our datasets.

Full size table

Table 5. The top 15 most frequent ShapeNet categories in Scan2CAD dataset including a detailed information on those with the availability of the corresponding parts annotations.

Full size table

We further select the most well-presented six shape categories as our core evaluation set, outlined in Table 5. Note that as our method is non-learnable, we can just as easily experiment with the remaining categories, at the cost of somewhat reduced statistical power.

B Optimization Details

Our full experimental pipeline is a sequence of deformation stages with different optimization parameters, and Hessian being recomputed before each stage. Specifically, we perform one part-to-part optimization with parameters \(\alpha _{\text {shape}} = 1, \alpha _{\text {smooth}} = 0, \alpha _{\text {sharp}} = 0, \alpha _{\text {data}} = 5\times 10^{4}\) for 100 iterations, then we perform 5 runs of nearest-neighbor deformation for 50 iterations with parameters \(\alpha _{\text {shape}} = 1, \alpha _{\text {smooth}} = 10, \alpha _{\text {sharp}} = 10, \alpha _{\text {data}} = 10^3\). Such number of iterations was sufficient to achieve convergence with energy changes less than \(10^{-1}\) in our experiments. Runtime of our method breaks into cost computation (\(\mathtt {\sim }0.3\) s), backward (\(\mathtt {\sim }0.2\) s), and optimization steps containing the main bottleneck (sparse matrix-vector multiplication) (\(\mathtt {\sim }1.2\) s) for a typical \(10^4\) vertices mesh. All operations can be easily further optimized.

C Qualitative Fitting Results

In Fig. 6, we display a series of qualitative results with a variety of shape deformations with different classes of instances. Comparing to baselines, our framework achieves accurate fit while preserving sufficient perceptual quality.

Table 6. Quantitative results of local surface quality evaluation using DAME measure [41] (the smaller, the better, normalized to a maximum score of 100), where our CAD-Deform compares favourably to the baselines across all considered categories. Note, however, how surface quality significantly decreases when smoothness and sharp feature-related terms are dropped.

Full size table

Table 7. Results of LSLP-GAN reconstruction in terms of Earth-Mover’s Distance between reconstructed and original point clouds of mesh vertices.

Full size table

Table 6 reports the results of surface quality evaluation using deformations obtained with our CAD-Deform vs. the baselines, category-wise. While outperforming the baseline methods across all categories, we discover the smoothness and sharpness energy terms to be the crucial ingredients in keeping high-quality meshes.

Figure 7 displays visually the deformation results using the three distinct classes, highlighting differences in surfaces obtained using the three methods.

Table 7 reports shape abnormality evaluation results across the six considered categories. Baselines show (Fig. 8) low reconstruction quality as evidenced by a larger number of black points. In other words, comparing to CAD-Deform, the distance from these meshes to undeformed ones is mush larger.

In Fig. 9, we show a series of examples for CAD-Deform ablation study. Perceptual quality degrades when excluding every term from the energy.

Table 8. Comparative evaluation of our approach in terms of Accuracy on different levels of detail.

Full size table

Table 9. Comparative evaluation of ARAP deformations w.r.t. the change of Laplacian term weight in terms of Accuracy (%).

Full size table

D Morphing

In this section, we present an additional series of examples of morphing properties (Fig. 10). Every iteration of optimization process gradually increases the quality of fit. With CAD-Deform we can morph each part to imitate the structure of the target shape.

E PartNet Annotation

This set of experiments shows how quality of fitting depends on mesh vertices labelling. We can provide labels for mesh in different ways depending on the level in PartNet hierarchy [31]. We observe the increase of fitting quality with greater level of detail (Table 8). Examples presented in Fig. 11 are selected as the most distinguishable deformations on different levels. There are minor visual differences in deformation performance of part labeling level.

F Fitting Accuracy Analysis

CAD-Deform deformation framework is sensitive to Accuracy threshold \(\tau \) for the distance between mesh vertices and close scan points. In Fig. 12 variation of \(\tau \) threshold is presented and we selected \(\tau = 0.2\text {~m}\) for fitting Accuracy metric.

G Perceptual Assessment and User Study Details

Having obtained a collection of deformed meshes, we aim to assess their visual quality in comparison to two baseline deformation methods: as-rigid-as-possible (ARAP) [38] and Harmonic deformation [7, 26], using a set of perceptual quality measures. The details of our user study design and visual assessment are provided in the supplementary. To this end, we use original and deformed meshes to compute DAME and reconstruction errors, as outlined in Sect. 6.1, and complement these with visual quality scores obtained with a user study (see below). These scores, presented in Table 3, demonstrate that shapes obtained using CAD-Deform have \(2\times \) higher surface quality, only slightly deviate from undeformed shapes as viewed by neural autoencoders, and receive \(2\times \) higher ratings in human assessment, while sacrificing only 1.1–4.5 % accuracy compared to other deformation methods.

Table 10. Comparative evaluation of Harmonic deformations w.r.t. the change of Laplacian term weight in terms of Accuracy (%).

Full size table

Design of Our User Study. The users were requested to examine renders of shapes from four different categories: the original undeformed shapes as well as shapes deformed using ARAP, Harmonic, and CAD-Deform methods, and give a score to each shape according to the following perceptual aspects: surface quality and smoothness, mesh symmetry, visual similarity to real-world objects, and overall consistency. Ten random shapes from each of the four categories have been rendered from eight different views and scored by 100 unique users on a scale from 1 (bad) to 10 (good). The resulting visual quality scores are computed by averaging over users and shapes in each category.

In Fig. 13, we present a distribution of user scores over different deformation methods and shapes. It can be clearly seen that users prefer our deformation results to baselines for all of the cases, which is obvious from the gap between histogram of CAD-Deform and ARAP/Harmonic histograms. At the same time, shapes deformed by CAD-Deform are close to undeformed ShapeNet shapes in terms of surface quality and smoothness, mesh symmetry, visual similarity to real-world objects, and overall consistency. Besides, in Tables 9, 10 we provide numbers for evaluation of ARAP/Harmonic deformations w.r.t. the change of Laplacian term weight.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ishimtsev, V. et al. (2020). CAD-Deform: Deformable Fitting of CAD Models to 3D Scans. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12358. Springer, Cham. https://doi.org/10.1007/978-3-030-58601-0_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-58601-0_36
Published: 28 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58600-3
Online ISBN: 978-3-030-58601-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CAD-Deform: Deformable Fitting of CAD Models to 3D Scans

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 7601 KB)

Appendices

A Statistics on the Used Datasets

B Optimization Details

C Qualitative Fitting Results

D Morphing

E PartNet Annotation

F Fitting Accuracy Analysis

G Perceptual Assessment and User Study Details

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation