Skip to main content
Log in

Logo detection using weakly supervised saliency map

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Box level annotation of a large number of logo images for training purpose of typical deep learning architecture is highly challenging. Thus, a method that can detect the logo with the help of training to remove box-level annotations can be helpful. In this paper, we present a method of logo detection that utilizes weakly supervised learning of Convolutional Neural Network (CNN) to generate a deep saliency map. The saliency map is generated from the back-propagated response of the CNN trained with the classification task. The saliency map produces responses for the regions of logos. GrabCut segmentation method has been applied then to obtain the bounding box corresponding to the logo class predicted by the CNN for a given image. AlexNet, CaffeNet, and VGGNet deep architectures has been fine-tuned for the classification purpose. The framework is further utilized for detection through a back-propagated saliency map. The performance of the proposed methodology has been validated on the FlickrLogos-32 logo benchmark dataset. The proposed method outperforms the state-of-the-art baseline fully supervised methods with mean average precision (mAP) of 75.83%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Alaei A, Roy PP, Pal U (2016) Logo and seal based administrative document image retrieval: a survey. Comput Sci Rev 22:47

    Article  MathSciNet  Google Scholar 

  2. Bhunia AK, Bhunia AK, Ghose S, Das A, Roy PP, Pal U (2019) A deep one-shot network for query-based logo retrieval. Pattern Recogn 96:106965

    Article  Google Scholar 

  3. Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2846–2854

  4. Biswas C (2014) Logo recognition technique using sift descriptor, surf descriptor and hog descriptor. Ph.D. thesis

  5. Borji A, Cheng MM, Jiang H, Li J (2015) Salient object detection: A benchmark. IEEE Trans Image Process 24(12):5706

    Article  MathSciNet  Google Scholar 

  6. Boykov Y, Funka-Lea G (2006) Graph cuts and efficient ND image segmentation. Int J Comput Vis 70(2):109

    Article  Google Scholar 

  7. Candemir S, Palaniappan K, Akgul YS (2013) Multi-class regularization parameter learning for graph cut image segmentation. In: 10th international symposium on biomedical imaging, pp 1473–1476

  8. Chen X, Kundu K, Zhang K, Ma H, Fidler S, Urtasun R (2016) Monocular 3d object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2147–2156

  9. Cheng MM, Mitra NJ, Huang X, Torr PH, Hu SM (2011) Salient object detection and segmentation. Image 2(3):9

    Google Scholar 

  10. Cheng MM, Mitra NJ, Huang X, Torr PH, Hu SM (2015) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37 (3):569

    Article  Google Scholar 

  11. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88 (2):303

    Article  Google Scholar 

  12. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627

    Article  Google Scholar 

  13. Gao K, Lin S, Zhang Y, Tang S, Zhang D (2009) Logo detection based on spatial-spectral saliency and partial spatial context. In: International conference on multimedia and expo, pp 322–329

  14. Gao R, Uchida S, Shahab A, Shafait F, Frinken V (2014) Visual saliency models for text detection in real world. Plos One 9(12):e114539

    Article  Google Scholar 

  15. Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  16. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  17. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    MATH  Google Scholar 

  18. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp 346–361

  19. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

  20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  21. Hoi SC, Wu X, Liu H, Wu Y, Wang H, Xue H, Wu Q (2015) Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv:1511.02462

  22. Iandola FN, Shen A, Gao P, Keutzer K (2015) Deeplogo: Hitting logo recognition with the deep neural network hammer. arXiv:1510.02131

  23. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678

  24. Joly A, Buisson O (2009) Logo retrieval with a contrario visual query expansion. In: Proceedings of the 17th ACM international conference on multimedia, pp 581–584

  25. Kalantidis Y, Pueyo LG, Trevisiol M, van Zwol R, Avrithis Y (2011) Scalable triangulation based logo recognition. In: Proceedings of the 1st ACM international conference on multimedia retrieval, p 20

  26. Keserwani P, De P, Roy PP, Pal U (2019) Zero shot learning based script identification in the wild. In: 2019 international conference on document analysis and recognition. IEEE, pp 987–992

  27. Kleban J, Xie X, Ma WY (2008) Spatial pyramid mining for logo detection in natural scenes. In: International conference on multimedia and expo, pp 1077–1080

  28. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  29. LeCun YA, Bottou L, Orr GB, Müller KR (2012) Efficient BackProp. In: Neural networks: tricks of the trade. Springer, pp 9–48

  30. Li Z, Schulte-Austum M, Neschen M (2010) Fast logo detection and recognition in document images. In: 20th International conference on pattern recognition, pp 2716–2719

  31. Lin Y, Kong S, Wang D, Zhuang Y (2014) Saliency detection within a deep convolutional architecture

  32. Malmer T (2010) Image segmentation using grabcut. IEEE Transactions on Signal Processing 5(1):1

    Google Scholar 

  33. Na IS, Oh KH, Kim SH (2013) Unconstrained object segmentation using grabcut based on automatic generation of initial boundary. International Journal of Contents 9(1):6

    Article  Google Scholar 

  34. Pham TD (2003) Unconstrained logo detection in document images. Pattern Recogn 36(12):3023

    Article  Google Scholar 

  35. Pigou L, Van Den Oord A, Dieleman S, Van Herreweghe M, Dambre J (2018) Beyond temporal pooling: Recurrence and temporal convolutions for gesture recognition in video. Int J Comput Vision 126(2-4):430

    Article  MathSciNet  Google Scholar 

  36. Plath N, Toussaint M, Nakajima S (2009) Multi-class image segmentation using conditional random fields and global classification. In: Proceedings of the 26th annual international conference on machine learning, pp 817–824

  37. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  38. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  39. Romberg S, Pueyo LG, Lienhart R, Van Zwol R (2011) Scalable logo recognition in real-world images. In: Proceedings of the 1st ACM international conference on multimedia retrieval, p 25

  40. Rother C, Kolmogorov V, Blake A (2004) Grabcut: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics 23:309–314

    Article  Google Scholar 

  41. Rusinol M, Llados J (2009) Logo spotting by a bag-of-words approach for document categorization. In: 10th international conference on document analysis and recognition, pp 111–115

  42. Sanyal S, Sengamedu SH (2007) Logoseeker: a system for detecting and matching logos in natural images. In: Proceedings of the 15th ACM international conference on multimedia, pp 166–167

  43. Scharfenberger C, Wong A, Fergani K, Zelek JS, Clausi DA (2013) Statistical textural distinctiveness for salient region detection in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 979–986

  44. Sharma S, Kiros R, Salakhutdinov R (2015) Action recognition using visual attention. arXiv:1511.04119

  45. Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv:1312.6034

  46. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  47. Su H, Gong S, Zhu X (2017) Weblogo-2m: Scalable logo detection by deep learning from the web. In: Proceedings of the IEEE international conference on computer vision workshops, pp 270–279

  48. Su H, Gong S, Zhu X (2020) Scalable logo detection by self co-learning. Pattern Recogn 97:107003

    Article  Google Scholar 

  49. Su H, Zhu X, Gong S (2017) Deep learning logo detection with data expansion by synthesising context. In: 2017 IEEE winter conference on applications of computer vision. IEEE, pp 530–539

  50. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  51. Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708

  52. Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, Liang J (2016) Convolutional neural networks for medical image analysis: Full training or fine tuning?. IEEE Trans Med Imaging 35(5):1299

    Article  Google Scholar 

  53. Tang P, Peng Y (2017) Exploiting distinctive topological constraint of local feature matching for logo image recognition. Neurocomputing 236:113

    Article  Google Scholar 

  54. Tang P, Wang X, Bai X, Liu W (2017) Multiple instance detection network with online instance classifier refinement. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2843–2851

  55. Tang P, Wang X, Wang A, Yan Y, Liu W, Huang J, Yuille A (2018) Weakly supervised region proposal network and object detection. In: Proceedings of the European conference on computer vision, pp 352–368

  56. Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154

  57. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 1, pp 511–518

  58. Wang WRKH, Chong TT (2014) Gradient-based learning applied to document recognition. In: European Conference on Computer Vision, 86 pp 431–445

  59. Xie L, Shen J, Zhu L (2016) Online cross-modal hashing for web image retrieval. In: Thirtieth AAAI conference on artificial intelligence

  60. Xing L, Tian Z, Huang W, Scott MR (2019) Convolutional character networks. In: Proceedings of the IEEE international conference on computer vision, pp 9126–9136

  61. Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly supervised object detection network. In: Proceedings of the IEEE international conference on computer vision, pp 8372–8381

  62. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, pp 818–833

  63. Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) W2F: A weakly-supervised to fully supervised framework for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 928–936

  64. Zhang S, Benenson R, Omran M, Hosang J, Schiele B (2018) Towards reaching human performance in pedestrian detection. IEEE Trans Pattern Anal Mach Intell 40(4):973

    Article  Google Scholar 

  65. Zhang Y, Zhu M, Wang D, Feng S (2014) Logo detection and recognition based on classification. In: International conference on web-age information management. Springer, pp 805–816

  66. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495

  67. Zhu G, Doermann D (2007) Automatic document logo detection. In: Ninth international conference on document analysis and recognition, vol 2, pp 864–868

  68. Zhu L, Shen J, Xie L, Cheng Z (2016) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybern 47(11):3941

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to acknowledge the support of DST-SERB. The Project ID is SB/S3/EECE/099/2016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gautam Kumar.

Ethics declarations

Conflict of interests

The authors declared that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, G., Keserwani, P., Roy, P.P. et al. Logo detection using weakly supervised saliency map. Multimed Tools Appl 80, 4341–4365 (2021). https://doi.org/10.1007/s11042-020-09813-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09813-6

Keywords

Navigation