Learning to Weight Color and Depth for RGB-D Visual Search

  • Alioscia Petrelli
  • Luigi Di Stefano
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10484)


Both color and depth information may be deployed to seek by content through RGB-D imagery. Previous works dealing with global descriptors for RGB-D images advocate a decision level fusion whereby independently computed color and depth representations are juxtaposed to pursue similarity search. Differently, in this paper we propose a learning-to-rank paradigm aimed at weighting the two information channels according to the specific traits of the task and data at hand, thereby effortlessly addressing the potential diversity across applications. In particular, we propose a novel method, referred to as kNN-rank, which can learn the regularities among the outputs yielded by similarity-based queries. A further novel contribution of this paper concerns the HyperRGBD framework, a set of tools conceived to enable seamless aggregation of existing RGB-D datasets in order to obtain new data featuring desired peculiarities and cardinality.


RGB-D image search Compact descriptors Learning-to-rank 


  1. 1.
    Bar-Hillel, A., Hanukaev, D., Levi, D.: Fusing visual and range imaging for object class recognition. In: International Conference on Computer Vision, pp. 65–72 (2011)Google Scholar
  2. 2.
    Blum, M., Wulfing, J., Riedmiller, M.: A learned feature descriptor for object recognition in RGB-D data. In: International Conference on Robotics and Automation, pp. 1298–1303 (2012)Google Scholar
  3. 3.
    Browatzki, B., Fischer, J.: Going into depth: evaluating 2D and 3D cues for object classification on a new, large-scale object dataset. In: International Conference on Computer Vision Workshops (2011)Google Scholar
  4. 4.
    Chandrasekhar, V., Lin, J., Morere, O., Veillard, A., Goh, H.: Compact global descriptors for visual search. In: Data Compression Conference, pp. 333–342 (2015)Google Scholar
  5. 5.
    Cheng, Y., Cai, R., Zhang, C., Li, Z., Zhao, X., Huang, K., Rui, Y.: Query adaptive similarity measure for RGB-D object recognition. In: International Conference on Computer Vision, pp. 145–153 (2015)Google Scholar
  6. 6.
    Faria, F.F., Veloso, A., Almeida, H.M., Valle, E., Torres, R.D.S., Gonçalves, M.A., Meira, W.: Learning to rank for content-based image retrieval. In: International Conference on Multimedia Information Retrieval (2010)Google Scholar
  7. 7.
    Guan, T.A.O., Wang, Y., Duan, L., Ji, R.: On-device mobile landmark recognition using binarized descriptor with multifeature fusion. Trans. Intell. Syst. Technol. 7(1), 12–29 (2015)Google Scholar
  8. 8.
    Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). doi: 10.1007/978-3-319-10584-0_23 Google Scholar
  9. 9.
    He, J., Feng, J., Liu, X., Cheng, T., Lin, T.H., Chung, H., Chang, S.F.: Mobile product search with bag of hash bits and boundary reranking. In: Conference on Computer Vision and Pattern Recognition, pp. 3005–3012 (2012)Google Scholar
  10. 10.
    Joachims, T.: Training linear SVMs in linear time. In: International Conference on Knowledge Discovery and Data Mining (2006)Google Scholar
  11. 11.
    Li, S.Z., Zhao, C.S., Ao, M., Lei, Z.: Learning to fuse 3D+2D based face recognition at both feature and decision levels. In: Zhao, W., Gong, S., Tang, X. (eds.) AMFG 2005. LNCS, vol. 3723, pp. 44–54. Springer, Heidelberg (2005). doi: 10.1007/11564386_5 CrossRefGoogle Scholar
  12. 12.
    Li, Y., Zhou, C., Geng, B., Xu, C., Liu, H.: A comprehensive study on learning to rank for content-based image retrieval. Sig. Process. 93(6), 1426–1434 (2013)CrossRefGoogle Scholar
  13. 13.
    Lv, T., Liu, G., Huang, S.B., Wang, Z.X.: Selective feature combination and automatic shape categorization of 3D models. In: International Conference on Fuzzy Systems and Knowledge Discovery, pp. 447–451 (2009)Google Scholar
  14. 14.
    Perronnin, F., Liu, Y., Jorge, S.: Large-scale image retrieval with compressed fisher vectors. In: Conference on Computer Vision and Pattern Recognition, pp. 3384–3391 (2010)Google Scholar
  15. 15.
    Petrelli, A., Pau, D., Stefano, L.: Analysis of compact features for RGB-D visual search. In: Murino, V., Puppo, E. (eds.) ICIAP 2015. LNCS, vol. 9280, pp. 14–24. Springer, Cham (2015). doi: 10.1007/978-3-319-23234-8_2 CrossRefGoogle Scholar
  16. 16.
    Petrelli, A., Pau, D., Plebani, E., Di Stefano, L.: RGB-D visual search with compact binary codes. In: International Conference on 3D Vision, pp. 82–90 (2015)Google Scholar
  17. 17.
    Socher, R., Huval, B., Bhat, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: Advances in Neural Information Processing Systems, pp. 1–9 (2012)Google Scholar
  18. 18.
    Song, D., Liu, W., Ji, R., Meyer, D.A., Smith, J.R.: Top rank supervised binary coding for visual search. In: International Conference on Computer Vision, pp. 1922–1930 (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.University of BolognaBolognaItaly

Personalised recommendations