Multimodal 3D Object Retrieval

Pegia, Maria; Jónsson, Björn Þór; Moumtzidou, Anastasia; Diplaris, Sotiris; Gialampoukidis, Ilias; Vrochidis, Stefanos; Kompatsiaris, Ioannis

doi:10.1007/978-3-031-53302-0_14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14557))

Included in the following conference series:

International Conference on Multimedia Modeling

379 Accesses

Abstract

Three-dimensional (3D) retrieval of objects and models plays a crucial role in many application areas, such as industrial design, medical imaging, gaming and virtual and augmented reality. Such 3D retrieval involves storing and retrieving different representations of single objects, such as images, meshes or point clouds. Early approaches considered only one such representation modality, but recently the CMCL method has been proposed, which considers multimodal representations. Multimodal retrieval, meanwhile, has recently seen significant interest in the image retrieval domain. In this paper, we explore the application of state-of-the-art multimodal image representations to 3D retrieval, in comparison to existing 3D approaches. In a detailed study over two benchmark 3D datasets, we show that the MuseHash approach from the image domain outperforms other approaches, improving recall over the CMCL approach by about 11\(\%\) for unimodal retrieval and 9\(\%\) for multimodal retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Brutto, M.L., Meli, P.: Computer vision tools for 3D modelling in archaeology. HDE 1, 1–6 (2012)
Google Scholar
Dummer, M.M., Johnson, K.L., Rothwell, S., Tatah, K., Hibbs-Brenner, M.K.: The role of VCSELs in 3D sensing and LiDAR. In: OPTO (2021)
Google Scholar
Feng, Y., Feng, Y., You, H., Zhao, X., Gao, Y.: MeshNet: mesh neural network for 3D shape representation. In: AAAI, Honolulu, Hawaii (2019)
Google Scholar
Garland, M., Heckbert, P.S.: Simplifying surfaces with color and texture using quadric error metrics. In: VC (1998)
Google Scholar
Gezawa, A.S., Zhang, Y., Wang, Q., Yunqi, L.: A review on deep learning approaches for 3D data representations in retrieval and classifications. IEEE Access 8, 57566–57593 (2020)
Article Google Scholar
Ha, Q., Yen, L., Balaguer, C.: Robotic autonomous systems for earthmoving in military applications. Autom. Constr. 107, 102934 (2019)
Article Google Scholar
Han, Y.S., Lee, J., Lee, J., Lee, W., Lee, K.: 3D CAD data extraction and conversion for application of augmented/virtual reality to the construction of ships and offshore structures. Int. J. Comput. Integr. Manuf. 32(7), 658–668 (2019)
Article Google Scholar
Hanocka, R., Hertz, A., Fish, N., Giryes, R., Fleishman, S., Cohen-Or, D.: MeshCNN: a network with an edge. ACM Trans. Graph. (ToG) 38(4), 1–12 (2019)
Article Google Scholar
Javaid, M., Haleem, A., Singh, R.P., Suman, R.: Industrial perspectives of 3D scanning: features, roles and it’s analytical applications. Sensors Int. 2, 100114 (2021)
Article Google Scholar
Jing, L., Vahdani, E., Tan, J., Tian, Y.: Cross-modal center loss for 3D cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3142–3151 (2021)
Google Scholar
Klokov, R., Lempitsky, V.: Escape from cells: deep Kd-networks for the recognition of 3D point cloud models. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 863–872 (2017)
Google Scholar
Lin, D., et al.: Multi-view 3D object retrieval leveraging the aggregation of view and instance attentive features. Knowl. Based Syst. 247, 108754 (2022)
Article Google Scholar
Maglo, A., Lavoué, G., Dupont, F., Hudelot, C.: 3D mesh compression: survey, comparisons, and emerging trends. ACM Comput. Surv. (CSUR) 47(3), 1–41 (2015)
Article Google Scholar
Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE (2015)
Google Scholar
Mohr, E., Thum, T., Bär, C.: Accelerating cardiovascular research: recent advances in translational 2D and 3D heart models. Eur. J. Heart Fail. 24(10), 1778–1791 (2022)
Article Google Scholar
Pal, P., Ghosh, K.K.: Estimating digitization efforts of complex product realization processes. Int. J. Adv. Manuf. Technol. 95, 3717–3730 (2018)
Article Google Scholar
Pegia, M., et al.: MuseHash: supervised Bayesian hashing for multimodal image representation. In: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, pp. 434–442 (2023)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR, Honolulu, HI, USA (2017)
Google Scholar
Rahate, A., Walambe, R., Ramanna, S., Kotecha, K.: Multimodal co-learning: challenges, applications with datasets, recent advances and future directions. Inf. Fus. 81, 203–239 (2022)
Article Google Scholar
Selvaraju, P., et al.: BuildingNet: learning to label 3D buildings. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10397–10407 (2021)
Google Scholar
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)
Google Scholar
Su, J.-C., Gadelha, M., Wang, R., Maji, S.: A deeper look at 3D shape classifiers. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 645–661. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_49
Chapter Google Scholar
Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2088–2096 (2017)
Google Scholar
Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Grap. (TOG) 36(4), 1–11 (2017)
Google Scholar
Wang, Y., Chen, Z.D., Luo, X., Li, R., Xu, X.S.: Fast cross-modal hashing with global and local similarity embedding. IEEE Trans. Cybern. 52(10), 10064–10077 (2021)
Article Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (ToG) 38(5), 1–12 (2019)
Article Google Scholar
Willemink, M.J., et al.: Preparing medical imaging data for machine learning. Radiology 295(1), 4–15 (2020)
Article Google Scholar
Wu, W., Qi, Z., Fuxin, L.: PointConv: deep convolutional networks on 3D point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9621–9630 (2019)
Google Scholar
Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: CVPR, pp. 1912–1920 (2015)
Google Scholar
Xie, Y., Liu, Y., Wang, Y., Gao, L., Wang, P., Zhou, K.: Label-attended hashing for multi-label image retrieval. In: IJCAI, pp. 955–962 (2020)
Google Scholar
Zhan, Y.W., Wang, Y., Sun, Y., Wu, X.M., Luo, X., Xu, X.S.: Discrete online cross-modal hashing. Pattern Recogn. 122, 108262 (2022)
Article Google Scholar

Download references

Acknowledgment

This work was supported by the EU’s Horizon 2020 research and innovation programme under grant agreement H2020-101004152 XRECO.

Author information

Authors and Affiliations

Centre for Research and Technology Hellas, Information Technologies Institute, Thessaloniki, Greece
Maria Pegia, Anastasia Moumtzidou, Sotiris Diplaris, Ilias Gialampoukidis, Stefanos Vrochidis & Ioannis Kompatsiaris
Reykjavik University, Reykjavík, Iceland
Maria Pegia & Björn Þór Jónsson

Authors

Maria Pegia
View author publications
You can also search for this author in PubMed Google Scholar
Björn Þór Jónsson
View author publications
You can also search for this author in PubMed Google Scholar
Anastasia Moumtzidou
View author publications
You can also search for this author in PubMed Google Scholar
Sotiris Diplaris
View author publications
You can also search for this author in PubMed Google Scholar
Ilias Gialampoukidis
View author publications
You can also search for this author in PubMed Google Scholar
Stefanos Vrochidis
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Kompatsiaris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria Pegia .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
Delft University of Technology, Delft, The Netherlands
Alan Hanjalic
Delft University of Technology, Delft, The Netherlands
Cynthia Liem
University of Amsterdam, Amsterdam, The Netherlands
Marcel Worring
Reykjavik University, Reykjavik, Iceland
Björn Þór Jónsson
Microsoft Research Lab – Asia, Beijing, China
Bei Liu
The University of Tokyo, Tokyo, Japan
Yoko Yamakata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pegia, M. et al. (2024). Multimodal 3D Object Retrieval. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14557. Springer, Cham. https://doi.org/10.1007/978-3-031-53302-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-53302-0_14
Published: 29 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53301-3
Online ISBN: 978-3-031-53302-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics