Skip to main content
Log in

Instance-level 3D shape retrieval from a single image by hybrid-representation-assisted joint embedding

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

We present a novel and effective joint embedding approach for retrieving the most similar 3D shape for a single image query. Our approach builds upon hybrid 3D representations—the octree-based representation and the multi-view image representation, which characterize shape geometry in different ways. We first pre-train a 3D feature space via jointly embedding 3D shapes with hybrid representations and then introduce a transform layer and an image encoder to map both shape codes and real images into a common space via a second joint embedding. Our pre-training benefits from the hybrid representation of 3D shapes and builds a more discriminative 3D shape space than using any one of 3D representations only. The transform layer helps to mind the gap between the 3D shape space and the real image space. We validate the efficacy of our method on the instance-level single-image 3D retrieval task and achieve significant improvements over existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Abadi, M., Agarwal, A., Barham, P., et al. TensorFlow: large-scale machine learning on heterogeneous systems (2015)

  2. Bai, S., Bai, X., Zhou, Z., Zhang, Z., Jan Latecki, L.: Gift: a real-time and scalable 3D shape search engine. In: Computer Vision and Pattern Recognition (CVPR), pp. 5023–5032 (2016)

  3. Bai, S., Bai, X., Zhou, Z., Zhang, Z., Tian, Q., Latecki, L.J.: Gift: towards scalable 3D shape retrieval. IEEE Trans. Multimed. 19(6), 1257–1271 (2017)

    Article  Google Scholar 

  4. Boscaini, D., Masci, J., Rodolà, E., Bronstein, M.: Learning shape correspondence with anisotropic convolutional neural networks. In: Neural Information Processing Systems (NIPS), pp. 3189–3197 (2016)

  5. Boscaini, D., Masci, J., Rodolà, E., Bronstein, M.M., Cremers, D.: Anisotropic diffusion descriptors. Comput. Graph. Forum 35(2), 431–441 (2016b)

    Article  Google Scholar 

  6. Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. In: Neural Information Processing Systems (NIPS), pp. 737–744 (1994)

  7. Bronstein, A.M., Bronstein, M.M., Guibas, L.J., Ovsjanikov, M.: Shape Google: geometric words and expressions for invariant shape retrieval. ACM Trans. Graph. 30(1), 1:1–1:20 (2011)

    Article  Google Scholar 

  8. Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al. ShapeNet: an information-rich 3D model repository. (2015). arXiv:1512.03012 [cs.GR]

  9. Chen, K., Choy, C.B., Savva, M., Chang, A.X., Funkhouser, T., Savarese, S.: Text2shape: generating shapes from natural language by learning joint embeddings. In: Asian Conference on Computer Vision, Springer, pp. 100–116 (2018)

  10. Chopra, S., Hadsell, R., LeCun, Y., et al. Learning a similarity metric discriminatively, with application to face verification. In: Computer Vision and Pattern Recognition (CVPR), pp. 539–546 (2005)

  11. Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., Alexa, M.: Sketch-based shape retrieval. ACM Trans. Graph. (SIGGRAPH) 31(4), 31 (2012)

    Google Scholar 

  12. Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: Gvcnn: group-view convolutional neural networks for 3D shape recognition. In: Computer Vision and Pattern Recognition (CVPR), pp. 264–272 (2018)

  13. Grabner, A., Roth, P.M., Lepetit, V.: 3D pose estimation and 3D model retrieval for objects in the wild. In: Computer Vision and Pattern Recognition (CVPR) (2018)

  14. Gu, J., Cai, J., Joty, S.R., Niu, L., Wang, G.: Look, imagine and match: improving textual-visual cross-modal retrieval with generative models. In: Computer Vision and Pattern Recognition (CVPR), pp. 7181–7189 (2018)

  15. Han, Z., Shang, M., Wang, X., Liu, Y.S., Zwicker, M.: \(y^{2}\)seq2seq: Cross-modal representation learning for 3D shape and text by joint reconstruction and prediction of view and word sequences. arXiv:1811.02745 [cs.CV] (2018)

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

  17. He, X., Zhou, Y., Zhou, Z., Bai, S., Bai, X.: Triplet-center loss for multi-view 3D object retrieval. In: Computer Vision and Pattern Recognition (CVPR), pp. 1945–1954 (2018)

  18. Hegde, V., Zadeh, R.: FusionNet: 3D object classification using multiple data representations. (2016). arXiv:1607.05695 [cs.CV]

  19. Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Computer Vision and Pattern Recognition (CVPR), pp. 3128–3137 (2015)

  20. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)

  21. Laga, H., Guo, Y., Tabia, H., Fisher, R.B., Bennamoun, M.: 3D Shape Analysis: Fundamentals, Theory, and Applications. Wiley, Hoboken (2019)

    Book  Google Scholar 

  22. Lee, T., Lin, Y.L., Chiang, H., Chiu, M.W., Hsu, W., Huang, P.: Cross-domain image-based 3D shape retrieval by view sequence learning. In: Proceedings of International Conference on 3D Vision (3DV), IEEE, pp. 258–266 (2018)

  23. Li, D., Tian, Y.: Survey and experimental study on metric learning methods. Neural Netw. 105, 447–462 (2018)

    Article  Google Scholar 

  24. Li, W., Liu, A., et al. Monocular image based 3D model retrieval. In: 12th EG Workshop 3D Object Retrieval (2019)

  25. Li, Y., Su, H., Qi, C.R., Fish, N., Cohen-Or, D., Guibas, L.J.: Joint embeddings of shapes and images via CNN image purification. ACM Trans. Graph. (SIGGRAPH) 34(6), 234 (2015)

    Google Scholar 

  26. Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on \(\cal{X}\)-transformed points. In: Neural Information Processing Systems (NIPS), pp. 820–830 (2018)

  27. Lin, K.Z., Xu, W., Sun, Q., Theobalt, C., Chua, T.S.: Learning a disentangled embedding for monocular 3D shape retrieval and pose estimation. (2018). arXiv:1812.09899 [cs.CV]

  28. Liu, H., Tian, Y., Yang, Y., Pang, L., Huang, T.: Deep relative distance learning: tell the difference between similar vehicles. In: Computer Vision and Pattern Recognition (CVPR), pp. 2167–2175 (2016)

  29. Ma, L., Lu, Z., Shang, L., Li, H.: Multimodal convolutional neural networks for matching image and sentence. In: Computer Vision and Pattern Recognition (CVPR), pp. 2623–2631 (2015)

  30. Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 922–928 (2015)

  31. Muralikrishnan, S., Kim, V., Fisher, M., Chaudhuri, S.: Shape unicode: a unified shape representation. In: Computer Vision and Pattern Recognition (CVPR) (2019)

  32. Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Computer Vision and Pattern Recognition (CVPR), pp. 4004–4012 (2016)

  33. Peng, Y., Qi, J., Huang, X., Yuan, Y.: Ccl: Cross-modal correlation learning with multigrained fusion by hierarchical network. IEEE Trans. Multimed. 20(2), 405–420 (2017)

    Article  Google Scholar 

  34. Qi, A., Song, Y., Xiang, T.: Semantic embedding for sketch-based 3D shape retrieval. In: British Machine Vision Conference (BMVC) (2018)

  35. Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: Computer Vision and Pattern Recognition (CVPR), pp. 5648–5656 (2016)

  36. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Computer Vision and Pattern Recognition (CVPR), pp. 652–660 (2017)

  37. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Neural Information Processing Systems (NIPS), pp. 5099–5108 (2017)

  38. Qi, J., Peng, Y., Zhuo, Y.: Life-long cross-media correlation learning. In: ACM Multimedia Conference on Multimedia Conference, ACM, pp. 528–536 (2018)

  39. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  40. Savva, M., Yu, F., Su, H., et al.: Large-scale 3D shape retrieval from ShapeNet Core55. In: Eurographics Workshop on 3D Object Retrieval (2017)

  41. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Computer Vision and Pattern Recognition (CVPR), pp. 815–823 (2015)

  42. Shin, D., Fowlkes, C., Hoiem, D.: Pixels, voxels, and views: a study of shape representations for single view 3D object shape prediction. In: Computer Vision and Pattern Recognition (CVPR) (2018)

  43. Sohn, K.: Improved deep metric learning with multi-class N-pair loss objective. In: Advances in Neural Information Processing Systems, pp. 1857–1865 (2016)

  44. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Computer Vision and Pattern Recognition (CVPR), pp. 945–953 (2015)

  45. Su, Y.T., Li, Y.Q., Nie, W.Z., Song, D., Liu, A.A.: Joint heterogeneous feature learning and distribution alignment for 2D image-based 3D object retrieval. IEEE Trans. Circuits Syst. Video Technol. (2019)

  46. Tangelder, J.W.H., Veltkamp, R.C.: A survey of content based 3D shape retrieval methods. Multimed. Tools Appl. 39, 441–471 (2008). https://doi.org/10.1007/s11042-007-0181-0

    Article  Google Scholar 

  47. vander Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  48. Wang, F., Kang, L., Li, Y.: Sketch-based 3D shape retrieval using convolutional neural networks. In: Computer Vision and Pattern Recognition (CVPR), pp. 1875–1883 (2015)

  49. Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Graph. (SIGGRAPH) 36(4), 72:1–72:11 (2017)

    Google Scholar 

  50. Wang, P.S., Sun, C.Y., Liu, Y., Tong, X.: Adaptive O-CNN: a patch-based deep representation of 3D shapes. ACM Trans. Graph. (SIGGRAPH ASIA) 37(6), 217:1–217:11 (2018)

    Google Scholar 

  51. Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)

    MATH  Google Scholar 

  52. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: Computer Vision and Pattern Recognition (CVPR), pp. 1912–1920 (2015)

  53. Xiang, Y., Mottaghi, R., Savarese, S.: Beyond Pascal: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision, pp. 75–82 (2014)

  54. Xiang, Y., Kim, W., Chen, W., Ji, J., Choy, C., Su, H., Mottaghi, R., Guibas, L., Savarese, S.: ObjectNet3D: a large scale database for 3D object recognition. In: European Conference on Computer Vision (ECCV), Springer, pp. 160–176 (2016)

  55. Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to zoo. In: Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 3485–3492 (2010)

  56. Xie, J., Dai, G., Zhu, F., Wong, E.K., Fang, Y.: DeepShape: deep-learned shape descriptor for 3D shape retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1335–1345 (2016)

    Article  Google Scholar 

  57. Zhang, Z., Yang, Z., Ma, C., Luo, L., Huth, A., Vouga, E., Huang, Q.: Deep generative modeling for scene synthesis via hybrid representations. ACM Trans Graph p accepted with minor revision (2019)

  58. Zhou, H., Liu, A.A., Nie, W.: Dual-level embedding alignment network for 2D image-based 3D object retrieval. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1667–1675 (2019)

  59. Zhu, F., Xie, J., Fang, Y.: Learning cross-domain neural networks for sketch-based 3D shape retrieval. In: AAAI Conference on Artificial Intelligence, pp. 3683–3689 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Liu.

Ethics declarations

Conflictof Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, QF., Liu, L. & Liu, Y. Instance-level 3D shape retrieval from a single image by hybrid-representation-assisted joint embedding. Vis Comput 37, 1743–1756 (2021). https://doi.org/10.1007/s00371-020-01935-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-020-01935-0

Keywords

Navigation