MHEF-TripNet: Mixed Triplet Loss with Hard Example Feedback Network for Image Retrieval
Abstract
Image retrieval has made significant advances, fueled mainly by deep convolutional neural networks, but their training procedure is not efficient enough. Because of the large imbalance between easy examples and hard examples, networks lack direct guidance information from hard examples. In this paper, we solve the problem by developing an effective and efficient method, called mixed triplet loss with hard example feedback network (MHEF-TripNet). Since the proportion of hard examples is small, a sample selection probability matrix is introduced to select hard examples, which assists a network to focus more on enlarging the gap between the confusing categories in triplet loss. And it will be adjusted according to the feedback of test results after each training iteration. Furthermore, a mixed triplet loss function is proposed, which combines triplet loss with category loss to take advantage of association information between images and category information. The effectiveness of MHEF-TripNet is confirmed by experimentation on UC Merced Land Use and Kdelab Airplane datasets. Compared with previous image retrieval approaches, our approach obtains superior performance.
Keywords
Triplet loss Probability matrix of sample selection Image retrievalReferences
- 1.Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)Google Scholar
- 2.Felzenszwalb, P.F., Girshick, R.B., McAllester, D., et al.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
- 3.Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)Google Scholar
- 4.Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)
- 5.Smeulders, A.W.M., Worring, M., Santini, S., et al.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 12, 1349–1380 (2000)CrossRefGoogle Scholar
- 6.Rui, Y., Huang, T.S., Ortega, M., et al.: Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans. Circuits Syst. Video Technol. 8(5), 644–655 (1998)CrossRefGoogle Scholar
- 7.Liu, Y., Zhang, D., Lu, G., et al.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)CrossRefGoogle Scholar
- 8.Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
- 9.Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
- 10.Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820 (2015)
- 11.Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2892–2900 (2015)Google Scholar
- 12.Taigman, Y., Yang, M., Ranzato, M.A., et al.: Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)Google Scholar
- 13.Kumar, N.S., Arun, M., Dangi, M.K.: Remote sensing image retrieval using object-based, semantic classifier techniques. Int. J. Inf. Commun. Technol. 13(1), 68–82 (2018)Google Scholar
- 14.Yang, Y., Newsam, S.: Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 270–279. ACM (2010)Google Scholar
- 15.Napoletano, P.: Visual descriptors for content-based retrieval of remote-sensing images. Int. J. Remote Sens. 39(5), 1343–1376 (2018)CrossRefGoogle Scholar
- 16.Xia, G.S., IEEE, et al.: AID: a benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosc. Remote Sens. 55(7), 3965–3981 (2017)CrossRefGoogle Scholar
- 17.Yan, Y., Wang, X., Yang, X., Bai, X., Liu, W.: Joint classification loss and histogram loss for sketch-based image retrieval. In: ICIG (2017)Google Scholar