Skip to main content
Log in

Visual information quantification for object recognition and retrieval

  • Article
  • Published:
Science China Technological Sciences Aims and scope Submit manuscript

Abstract

The rapid development of computer vision has led to an increasing amount of 3D data, such as multiple views and point clouds, which are widely used in 3D object recognition and retrieval. Intuitively, the quality of 3D data is the most crucial factor that directly affects the performance of 3D applications. However, how to evaluate the 3D data quality, especially the multi-view data quality, is still an open question. To tackle this issue, we propose an entropy-based multi-view information quantification model (MV-Info model) to quantitatively evaluate the multi-view data information. Our proposed MV-Info model consists of hierarchical data module, feature generation module, and quantitative calculation module. Besides, it considers the information entropy theory for more reasonable quantification results. In our method, how much information we can observe from a group of views can be quantified, which can be used to support 3D recognition and retrieval. We also designed a series of experiments to evaluate the effectiveness of the proposed model. The experimental results demonstrate the rationality and validity of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anderson R, Gallup D, Barron J T, et al. Jump: Virtual reality video. ACM Trans Graph, 2016, 35: 1–13

    Article  Google Scholar 

  2. Xu X B, Wang Z, Deng Y M. A software platform for vision-based UAV autonomous landing guidance based on markers estimation. Sci China Tech Sci, 2019, 62: 1825–1836

    Article  Google Scholar 

  3. Wang G, Shi Z C, Shang Y, et al. Precise monocular vision-based pose measurement system for lunar surface sampling manipulator. Sci China Tech Sci, 2019, 62: 1783–1794

    Article  Google Scholar 

  4. Jaderberg M, Czarnecki W M, Dunning I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 2019, 364: 859–865

    Article  MathSciNet  Google Scholar 

  5. Bustos B, Keim D A, Saupe D, et al. Feature-based similarity search in 3D object databases. ACM Comput Surv, 2005, 37: 345–387

    Article  Google Scholar 

  6. Maturana D, Scherer S. Voxnet: A 3D convolutional neural network for real-time object recognition. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, 2015. 922–928

  7. Wu Z R, Song S, Khosla A, et al. 3D shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, 2015. 1912–1920

  8. Garcia-Garcia A, Gomez-Donoso F, Garcia-Rodriguez J, et al. Pointnet: A 3D convolutional neural network for real-time object class recognition. In: Proceedings of the International Joint Conference on Neural Networks. Vancouver, 2016. 1578–1584

  9. Wu Z Z, Chen H C, Du S Y, et al. Correntropy based scale ICP algorithm for robust point set registration. Pattern Recogn, 2019, 93: 14–24

    Article  Google Scholar 

  10. Li W J, Bebis G, Bourbakis N G. 3-D object recognition using 2-D views. IEEE Trans Image Process, 2008, 17: 2236–2255

    Article  MathSciNet  Google Scholar 

  11. Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, 2009. 248–255

  12. Shilane P, Min P, Kazhdan M, et al. The princeton shape benchmark. In: Proceedings of the Shape Modeling Applications. Genova, 2004. 167–178

  13. Daras P, Axenopoulos A. A 3D shape retrieval framework supporting multimodal queries. Int J Comput Vis, 2010, 89: 229–247

    Article  Google Scholar 

  14. Feng Y F, Zhang Z Z, Zhao X B, et al. Gvcnn: Group-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018. 264–272

  15. Kanezaki A, Matsushita Y, Nishida Y. Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018. 5010–5019

  16. Ohbuchi R, Osada K, Furuya T, et al. Salient local visual features for shape-based 3D model retrieval. In: Proceedings of the IEEE International Conference on Shape Modeling and Applications. Stony Brook, 2008. 93–102

  17. Ansary T F, Daoudi M, Vandeborre J P. A Bayesian 3-D search engine using adaptive views clustering. IEEE Trans Multimedia, 2007, 9: 78–88

    Article  Google Scholar 

  18. Gao Y, Zhang Z Z, Lin H J, et al. Hypergraph learning: Methods and practices. IEEE Trans Pattern Anal Mach Intell, 2020, doi: https://doi.org/10.1109/TPAMI.2020.3039374

  19. Zhang Z Z, Lin H J, Zhao X B, et al. Inductive multi-hypergraph learning and its application on view-based 3D object classification. IEEE Trans Image Process, 2018, 27: 5957–5968

    Article  MathSciNet  Google Scholar 

  20. Shannon C E. A mathematical theory of communication. Bell Syst Tech J, 1948, 27: 379–423

    Article  MathSciNet  Google Scholar 

  21. Vazquez P P, Feixas M, Sbert M, et al. Viewpoint selection using viewpoint entropy. Vision Model Vis, 2001, 1: 273–280

    Google Scholar 

  22. Gao Y, Dai Q H. View-Based 3D Object Retrieval. San Francisco: Morgan Kaufmann, 2014

    Google Scholar 

  23. Ansary T F, Vandeborre J P, Mahmoudi S, et al. A bayesian framework for 3D models retrieval based on characteristic views. In: Proceedings of the 2nd International Symposium on 3D Data Processing, Visualization, and Transmission. Washington, 2004. 139–146

  24. Ohbuchi R, Furuya T. Scale-weighted dense bag of visual features for 3D model retrieval from a partial view 3D model. In: Proceedings of the 12th International Conference on Computer Vision Workshops. Kyoto, 2009. 63–70

  25. Furuya T, Ohbuchi R. Dense sampling and fast encoding for 3D model retrieval using bag-of-visual features. In: Proceedings of the ACM International Conference on Image and Video Retrieval. Santorini, 2009. 1–8

  26. Chen D Y, Tian X P, Shen Y T, et al. On visual similarity based 3D model retrieval. Comput Graph Forum, 2003, 22: 223–232

    Article  Google Scholar 

  27. Shih J L, Lee C H, Wang J T. A new 3D model retrieval approach based on the elevation descriptor. Pattern Recogn, 2007, 40: 283–295

    Article  Google Scholar 

  28. Su H, Maji S, Kalogerakis E, et al. Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision. Santiago, 2015. 945–953

  29. Khotanzad A, Hong Y H. Invariant image recognition by Zernike moments. IEEE Trans Pattern Anal Mach Intell, 1990, 12: 489–497

    Article  Google Scholar 

  30. Bracewell R N. The Fourier Transform and Its Applications. New York: McGraw-Hill, 1986

    MATH  Google Scholar 

  31. Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, 2012. 1097–1105

  32. Cheng X, Rao Z F, Chen Y L, et al. Explaining knowledge distillation by quantifying the knowledge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. 12925–12935

  33. Phong B T. Illumination for computer generated pictures. Commun ACM, 1975, 18: 311–317

    Article  Google Scholar 

  34. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014, ArXiv: 1409.1556

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Gao.

Additional information

This work was supported by the SGCC Science and Technology Project (Grant No. 52020119000A).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, J., Bie, L., Zhao, X. et al. Visual information quantification for object recognition and retrieval. Sci. China Technol. Sci. 64, 2618–2626 (2021). https://doi.org/10.1007/s11431-021-1930-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11431-021-1930-8

Keywords

Navigation