Multimedia Tools and Applications

, Volume 78, Issue 1, pp 555–572 | Cite as

Multi-view and multivariate gaussian descriptor for 3D object retrieval

  • Zan GaoEmail author
  • Kai-Xin Xue
  • Hua Zhang


3D object retrieval is a hot research topic in computer vision domain, and several feature descriptors have been proposed, such as Zernike moments and HOG. However, multi-view images factor often be ignored in the feature extraction. Inspired by the Multivariate Gaussian descriptor and multi-view latent relationships, we propose a new feature descriptor called Multi-view and Multivariate Gaussian (MMG) Descriptor for 3D object retrieval. In detail, the local statistics of an image is characterized by using multivariate Gaussian distribution which is continuous and can effectively estimate different orders statistics in the local neighborhood. Furthermore, images from different perspectives are explored when extracting the characteristics of an object. Extensive experimental results on ETH dataset and 3Ddataset show that: 1) MMG descriptor is more suitable for 3D object retrieval than Zernike Moments and HOG whose performance is much better than that of other two descriptors; 2) The performance can also obtain some improvements when multi-view factor is considered. 3) When the different angles and number of images are chosen, their performances also have fluctuations.


3D object retrieval Image descriptors Multi-view Multivariate gaussian distribution 



This work was supported in part by the National Natural Science Foundation of China (No.61572357, No.61202168), Tianjin Research Program of Application Foundation and Advanced Technology (14JCZDJC31700 and 13JCQNJC0040), Tianjin Municipal Natural Science Foundation (No.13JCQNJC0040) and Country China Scholarship Council (No.201608120021).


  1. 1.
    Ansary TF, Daoudi M, Vandeborre JP (2006) A bayesian 3-d search engine using adaptive views clustering. IEEE Trans Multimed 9(1):78–88CrossRefGoogle Scholar
  2. 2.
    Arandjelovic R (2012) Three things everyone should know to improve object retrieval. In: Computer vision and pattern recognition, pp 2911–2918Google Scholar
  3. 3.
    Baker A (2002) Matrix groups: An introduction to lie group theory. Amer Math Mon 110(5):446Google Scholar
  4. 4.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005. CVPR 2005. IEEE computer society conference on computer vision and pattern recognition, pp 886–893Google Scholar
  5. 5.
    Gao Y, Dai Q, Wang M, Zhang N (2011) 3d model retrieval using weighted bipartite graph matching. Signal Process Image Commun 26(1):39–47CrossRefGoogle Scholar
  6. 6.
    Gao Y, Tang J, Hong R, Yan S, Dai Q, Zhang N, Chua TS (2012) Camera constraint-free view-based 3-d object retrieval. IEEE Trans Image Process 21(4):2269–2281MathSciNetCrossRefGoogle Scholar
  7. 7.
    Gao Y, Wang M, Ji R, Wu X, Dai Q (2013) 3-d object retrieval with hausdorff distance learning. IEEE Trans Ind Electron 61(4):2088–2098CrossRefGoogle Scholar
  8. 8.
    Gao Z, Zhang H, Xu G, Xue Y, Hauptmann AG (2014) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112(C):83–97Google Scholar
  9. 9.
    Gao Z, Wang D, Zhang H, Xue Y, Xu G (2016) A fast 3d retrieval algorithm via class-statistic and pair-constraint model. In: ACM on multimedia conference, pp 117–121Google Scholar
  10. 10.
    Gao Y, Zhang H, Zhao X, Yan S (2017) Event classification in microblogs via social tracking. ACM Trans Intell Syst Technol 8(3):35:1–35:14CrossRefGoogle Scholar
  11. 11.
    Gao Z, Li SH, Zhang GT, Zhu YJ, Wang C, Zhang H (2017) Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition. Multimedia Tools and Applications:1–24Google Scholar
  12. 12.
    Gao Z, Zhang GT, Zhang H, Xue Y, Xu G (2017) 3d human action recognition model based on image set and regularized multi-task leaning. Neurocomputing 252(C):67–76CrossRefGoogle Scholar
  13. 13.
    Gao Z, Li SH, Zhu YJ, Wang C, Zhang H (2017) Collaborative sparse representation leaning model for rgbd action recognition. Journal of Visual Communication and Image RepresentationGoogle Scholar
  14. 14.
    Hall B (2003) Lie groups, lie algebras, and representations. Springer, BerlinCrossRefGoogle Scholar
  15. 15.
    He X, Kan M -Y, Xie P, Chen X (2014) Comment-based multi-view clustering of web 2.0 items. In: Proceedings of the 23rd international conference on world wide web, pp 771–782Google Scholar
  16. 16.
    He X, Liao L, Zhang H, Nie L, Hu X, Chua TS (2017) Neural collaborative filtering. In: International conference on world wide web, pp 173–182Google Scholar
  17. 17.
    He X, Gao M, Wang D, Wang D (2017) Birank: Towards ranking on bipartite graphs. IEEE Trans Knowl Data Eng 29(1):57–71CrossRefGoogle Scholar
  18. 18.
    Ju R, Liu Y, Ren T, Ge L, Wu G (2015) Depth-aware salient object detection using anisotropic center-surround difference. Signal Process Image Commun 38(C):115–126CrossRefGoogle Scholar
  19. 19.
    Kumar A, Iii HD (2011) A co-training approach for multi-view spectral clustering. In: International conference on international conference on machine learning, pp 393–400Google Scholar
  20. 20.
    Li P, Wang Q (2012) Local log-euclidean covariance matrix (l 2 ecm) for image representation and its applications. In: European conference on computer vision, pp 469–482Google Scholar
  21. 21.
    Li P, Wang Q, Zhang L (2013) A novel earth mover’s distance methodology for image matching with gaussian mixture models. In: IEEE international conference on computer vision, pp 1689–1696Google Scholar
  22. 22.
    Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans Multimed 17(11):1989–1999CrossRefGoogle Scholar
  23. 23.
    Li X, Larson M, Hanjalic A (2015) Pairwise geometric matching for large-scale object retrieval. In: 2015 IEEE Conference On Computer Vision And Pattern Recognition (CVPR), pp 5153–5161Google Scholar
  24. 24.
    Liu AA, Nie WZ, Su YT, Ma L, Hao T, Yang ZX (2015) Coupled hidden conditional random fields for rgb-d human action recognition. Signal Processing 112(C):74–82CrossRefGoogle Scholar
  25. 25.
    Liu AA, Su YT, Jia PP, Zan G, Tong H, Yang ZX (2015) Multiple/single-view human action recognition via part-induced multitask structural learning. IEEE Trans Cybern 45(6):1194–1208CrossRefGoogle Scholar
  26. 26.
    Liu J, Ren T, Wang Y, Zhong SH, Bei J, Chen S (2016) Object proposal on rgb-d images via elastic edge boxes. Neurocomputing 236Google Scholar
  27. 27.
    Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process Public IEEE Signal Process Soc 25(5):2103–2116MathSciNetCrossRefGoogle Scholar
  28. 28.
    Liu AA, Su YT, Nie WZ, Kankanhalli M (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114CrossRefGoogle Scholar
  29. 29.
    Li P, Wang Q, Hui Z, Lei Z (2017) Local log-euclidean multivariate gaussian descriptor and its application to image classification. IEEE Trans Pattern Anal Mach Intell 39(4):803–817CrossRefGoogle Scholar
  30. 30.
    Lu K, He N, Xue J, Dong J, Shao L (2015) Learning view-model joint relevance for 3d object retrieval. IEEE Trans Image Process Publ IEEE Signal Process Soc 24(5):1449–59MathSciNetCrossRefGoogle Scholar
  31. 31.
    Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27(10):1615–1630CrossRefGoogle Scholar
  32. 32.
    Nie WZ, Liu AA, Gao Z, Su YT (2015) Clique-graph matching by preserving global and local structure. In: Computer vision and pattern recognition, pp 4503–4510Google Scholar
  33. 33.
    Nie W, Liu A, Li W, Su Y (2016) Cross-view action recognition by cross-domain learning. Image Vis Comput 55:109–118CrossRefGoogle Scholar
  34. 34.
    Nie WZ, Liu AA, Su YT (2016) 3d object retrieval based on sparse coding in weak supervision. J Vis Commun Image Represent 37(C):40–45CrossRefGoogle Scholar
  35. 35.
    Pennec X, Fillard P, Ayache N (2006) A riemannian framework for tensor computing. Int J Comput Vis 66(1):41–66CrossRefGoogle Scholar
  36. 36.
    Quo J, Ren T, Bei J (2016) Salient object detection for rgb-d image via saliency evolution. In: IEEE International Conference on Multimedia and Expo, pp 1–6Google Scholar
  37. 37.
    Savarese S, Li FF (2007) 3d generic object categorization, localization and pose estimation. In: IEEE international conference on computer vision, pp 1–8Google Scholar
  38. 38.
    Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: Theory and practice. Int J Comput Vis 105(3):222–245MathSciNetCrossRefGoogle Scholar
  39. 39.
    Serra G, Grana C, Manfredi M, Cucchiara R (2013) Modeling local descriptors with multivariate gaussians for object and scene recognition, pp 709–712Google Scholar
  40. 40.
    Si S (2015) Compositional performance evaluation with importance measures. Commun Stat - Theory Methods 44(24):5240–5253MathSciNetCrossRefGoogle Scholar
  41. 41.
    Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23 (7-8):2031–2038CrossRefGoogle Scholar
  42. 42.
    Tangelder JWH, Veltkamp RC (2008) A survey of content based 3d shape retrieval methods. In: Proceedings of shape modeling applications, 2004, pp 145–156Google Scholar
  43. 43.
    Tang J, Li Z, Wang M, Zhao R (2015) Neighborhood discriminant hashing for large-scale image retrieval. IEEE Trans Image Process 24(9):2827–2840MathSciNetCrossRefGoogle Scholar
  44. 44.
    Tang J, Li Z (2017) Weakly-supervised multimodal hashing for scalable social image retrieval. IEEE Trans Circ Syst Vid Technol PP(99):1–1Google Scholar
  45. 45.
    Thomas A, Ferrar V, Leibe B, Tuytelaars T (2006) Towards multi-view object class detection. In: 2006 IEEE computer society conference on computer vision and pattern recognition, pp 1589–1596Google Scholar
  46. 46.
    Wang M, Wang M, Shen J, Dai Q, Zhang N (2010) Intelligent query: open another door to 3d object retrieval. In: ACM international conference on multimedia, pp 1711–1714Google Scholar
  47. 47.
    Zhang H, Zha Z-J, Yang Y, Yan S, Gao Y, Chua T-S (2013) Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: Proceedings of the 21st ACM international conference on multimedia. ACM, New York, pp 33–42Google Scholar
  48. 48.
    Zhao X, Zhang H, Jiang Y, Song S, Jiao X, Gu M (2013) An effective heuristic-based approach for partitioning. J Appl Math 2013,(2013-4-16) 2013(9):289–325Google Scholar
  49. 49.
    Zhang H, Shang X, Luan H, Wang M, Chua T-S (2016) Learning from collective intelligence: Feature learning using social images and tags. In: ACM transactions on multimedia computing, communications, and applications (TOMM), vol 13Google Scholar
  50. 50.
    Zhang H, Shen F, Liu W, He X, Luan H, Chua T-S (2016) Discrete collaborative filtering. In: Proceedings of SIGIR, vol 16Google Scholar
  51. 51.
    Zhang H, Wang M, Hong R, Chua T-S (2016) Play and rewind: Optimizing binary representations of videos by self-supervised temporal hashing. In: Proceedings of the 2016 ACM on multimedia conference. ACM, New York, pp 781–790Google Scholar
  52. 52.
    Zhang X, Zhang H, Zhang Y, Yang Y, Meng W, Luan H, Li J, Chua TS (2016) Deep fusion of multiple semantic cues for complex event recognition. IEEE Trans Image Process 25(3):1033MathSciNetCrossRefGoogle Scholar
  53. 53.
    Zhuang Y, Luo J, Yang Y, Nie F, Xu D, Pan Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Key Laboratory of Computer Vision and SystemMinistry of EducationTianjinChina
  2. 2.Tianjin Key Laboratory of Intelligence Computing and Novel Software TechnologyTianjin University of TechnologyTianjinChina

Personalised recommendations