MHEF-TripNet: Mixed Triplet Loss with Hard Example Feedback Network for Image Retrieval

  • Xuebin Yang
  • Shouhong WanEmail author
  • Peiquan Jin
  • Chang Zou
  • Xingyue Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11903)


Image retrieval has made significant advances, fueled mainly by deep convolutional neural networks, but their training procedure is not efficient enough. Because of the large imbalance between easy examples and hard examples, networks lack direct guidance information from hard examples. In this paper, we solve the problem by developing an effective and efficient method, called mixed triplet loss with hard example feedback network (MHEF-TripNet). Since the proportion of hard examples is small, a sample selection probability matrix is introduced to select hard examples, which assists a network to focus more on enlarging the gap between the confusing categories in triplet loss. And it will be adjusted according to the feedback of test results after each training iteration. Furthermore, a mixed triplet loss function is proposed, which combines triplet loss with category loss to take advantage of association information between images and category information. The effectiveness of MHEF-TripNet is confirmed by experimentation on UC Merced Land Use and Kdelab Airplane datasets. Compared with previous image retrieval approaches, our approach obtains superior performance.


Triplet loss Probability matrix of sample selection Image retrieval 


  1. 1.
    Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)Google Scholar
  2. 2.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., et al.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  3. 3.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)Google Scholar
  4. 4.
    Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)
  5. 5.
    Smeulders, A.W.M., Worring, M., Santini, S., et al.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 12, 1349–1380 (2000)CrossRefGoogle Scholar
  6. 6.
    Rui, Y., Huang, T.S., Ortega, M., et al.: Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans. Circuits Syst. Video Technol. 8(5), 644–655 (1998)CrossRefGoogle Scholar
  7. 7.
    Liu, Y., Zhang, D., Lu, G., et al.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)CrossRefGoogle Scholar
  8. 8.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  9. 9.
    Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
  10. 10.
    Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820 (2015)
  11. 11.
    Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2892–2900 (2015)Google Scholar
  12. 12.
    Taigman, Y., Yang, M., Ranzato, M.A., et al.: Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)Google Scholar
  13. 13.
    Kumar, N.S., Arun, M., Dangi, M.K.: Remote sensing image retrieval using object-based, semantic classifier techniques. Int. J. Inf. Commun. Technol. 13(1), 68–82 (2018)Google Scholar
  14. 14.
    Yang, Y., Newsam, S.: Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 270–279. ACM (2010)Google Scholar
  15. 15.
    Napoletano, P.: Visual descriptors for content-based retrieval of remote-sensing images. Int. J. Remote Sens. 39(5), 1343–1376 (2018)CrossRefGoogle Scholar
  16. 16.
    Xia, G.S., IEEE, et al.: AID: a benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosc. Remote Sens. 55(7), 3965–3981 (2017)CrossRefGoogle Scholar
  17. 17.
    Yan, Y., Wang, X., Yang, X., Bai, X., Liu, W.: Joint classification loss and histogram loss for sketch-based image retrieval. In: ICIG (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Xuebin Yang
    • 1
  • Shouhong Wan
    • 1
    • 2
    Email author
  • Peiquan Jin
    • 1
    • 2
  • Chang Zou
    • 1
  • Xingyue Li
    • 1
  1. 1.School of Computer Science and TechnologyUniversity of Science and Technology of ChinaHefeiChina
  2. 2.Key Laboratory of Electromagnetic Space InformationChinese Academy of ScienceHefeiChina

Personalised recommendations