Composite Descriptors and Deep Features Based Visual Phrase for Image Retrieval

  • Yanhong Wang
  • Linna Zhang
  • Yigang CenEmail author
  • Ruizhen Zhao
  • Tingting Chai
  • Yi Cen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11068)


Local descriptors are very effective features in bag-of-visual-words (BoW) and vector of locally aggregated descriptors (VALD) models for image retrieval. Different kinds of local descriptors represent different visual content. We recognize that spatial contextual information play an important role in image matching, image retrieval and image recognition. Therefore, to explore efficient features, firstly, a new local composite descriptor is proposed, which combines the advantages of SURF and color name (CN) information. Then, VLAD method is used to encode the proposed composite descriptors to a vector. Third, local deep features are extracted and fused with the encoded vector in the image block. Finally, to implement efficient retrieval system, a novel image retrieval framework is organized a novel image retrieval framework is organized based on the proposed feature fusion strategies. The proposed methods areis verified on three benchmark datasets, i.e., Holidays, Oxford5k and Ukbench. Experimental results show that our methods achieves good performance. Eespecially, the mAP and N-S score achieve 0.8281 and 3.5498 on Holidays and Ukbench datasets, respectively.


Composite descriptors Visual phrase Vector of locally aggregated descriptors Feature fusion Deep feature Image retrieval 


  1. 1.
    Jégou, H., Douze, M., Schmid, C., et al.: Aggregating local descriptors into a compact image representation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3304–3311. IEEE, San Francisco (2010).
  2. 2.
    Spyromitros-Xioufis, E.: A comprehensive study over VLAD and product quantization in large-scale image retrieval. IEEE Trans. Multimed. 16(6), 1713–1728 (2014)CrossRefGoogle Scholar
  3. 3.
    Spyromitros-Xioufis, E., Papadopoulos, S., Kompatsiaris, I.Y., et al.: An empirical study on the combination of surf features with VLAD vectors for image search. In: 13th International Workshop on Image Analysis for Multimedia Interactive Services, pp. 1–4. IEEE, Dublin (2012).
  4. 4.
    Alzu’bi, A., Amira, A., Ramzan, N., Jaber, T.: Robust fusion of color and local descriptors for image retrieval and classification. In: 2015 International Conference on Systems, Signals and Image Processing (IWSSIP), pp. 253–256. IEEE, London (2015).
  5. 5.
    Fan, P., Men, A., Chen, M., et al.: COLOR-SURF: a SURF descriptor with local kernel color histograms. In: IEEE International Conference on Network Infrastructure and Digital Content, pp. 726–730. IEEE, Beijing (2009).
  6. 6.
    Weijer, J.V.D., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. IEEE Trans. Image Process. 18(7), 1512–1523 (2009)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: 25th International Conference on Neural Information Processing Systems, pp. 1097–1105, Curran Associates Inc., Lake Tahoe (2012). Scholar
  8. 8.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). Scholar
  9. 9.
    Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519. IEEE, Columbus (2014).
  10. 10.
    Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 584–599. Springer, Cham (2014). Scholar
  11. 11.
    Jiang, Y., Meng, J., Yuan, J.: Randomized visual phrases for object search. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3100–3104. IEEE, Providence (2012).
  12. 12.
    Zheng, L., Wang, S., Liu, Z., Tian, Q.: Packing and padding: coupled multi-index for accurate image retrieval. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1947–1954. IEEE, Columbus (2014).
  13. 13.
    Arandjelovic, R.: Three things everyone should know to improve object retrieval. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, vol. 157, pp. 2911–2918. IEEE, Providence (2012).
  14. 14.
    Vigo, D.A.R., Khan, F.S., Weijer, J.V.D., Gevers, T.: The impact of color on bag-of-words based object recognition. In: 20th International Conference on Pattern Recognition, pp. 1549–1553. IEEE, Istanbul (2010).
  15. 15.
    Khan, F.S.: Modulating shape features by color attention for object recognition. Int. J. Comput. Vis. 98(1), 49–64 (2012)CrossRefGoogle Scholar
  16. 16.
    Bagdanov, A.D.: Color attributes for object detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, vol. 157, pp. 3306–3313. IEEE, Providence (2012).
  17. 17.
    Zhang, S.: Generating descriptive visual words and visual phrases for large-scale image applications. IEEE Trans. Image Process. 20(9), 2664–2677 (2011)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Cour, T., Zhu, S., Han, T.X.: Contextual weighting for vocabulary tree based image retrieval. In: 2011 International Conference on Computer Vision, vol. 23, pp. 209–216. IEEE, Barcelona (2011).
  19. 19.
    Liu, Z., Li, H., Zhou, W., Tian, Q.: Embedding spatial context information into inverted filefor large-scale image retrieval. In: 20th ACM International Conference on Multimedia, pp. 199–208. ACM, Nara (2012).
  20. 20.
    Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed Fisher vectors. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 26, pp. 3384–3391. IEEE, San Francisco (2010).
  21. 21.
    Jégou, H., Chum, O.: Negative evidences and co-occurences in image retrieval: the benefit of PCA and whitening. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 774–787. Springer, Heidelberg (2012). Scholar
  22. 22.
    Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: 1st ACM International Conference on Multimedia Information Retrieval, pp. 39–43. ACM, Vancouver (2008).
  23. 23.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. (2014).
  24. 24.
    Jégou, H., Zisserman, A.: Triangulation embedding and democratic aggregation for image search. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3310–3317. IEEE, Columbus (2014).
  25. 25.
    Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 392–407. Springer, Cham (2014). Scholar
  26. 26.
    Dong, J., Soatto, S.: Domain-size pooling in local descriptors: DSP-SIFT, pp. 5097–5106. eprint arXiv:1412.8556 (2014)

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yanhong Wang
    • 1
    • 2
  • Linna Zhang
    • 3
  • Yigang Cen
    • 1
    • 2
    Email author
  • Ruizhen Zhao
    • 1
    • 2
  • Tingting Chai
    • 1
    • 2
  • Yi Cen
    • 4
  1. 1.Institute of Information ScienceBeijing Jiaotong UniversityBeijingChina
  2. 2.Key Laboratory of Advanced Information Science and Network Technology of BeijingBeijingChina
  3. 3.College of Mechanical EngineeringGuizhou UniversityGuiyangChina
  4. 4.School of Information EngineeringMinzu University of ChinaBeijingChina

Personalised recommendations