Abstract
Box level annotation of a large number of logo images for training purpose of typical deep learning architecture is highly challenging. Thus, a method that can detect the logo with the help of training to remove box-level annotations can be helpful. In this paper, we present a method of logo detection that utilizes weakly supervised learning of Convolutional Neural Network (CNN) to generate a deep saliency map. The saliency map is generated from the back-propagated response of the CNN trained with the classification task. The saliency map produces responses for the regions of logos. GrabCut segmentation method has been applied then to obtain the bounding box corresponding to the logo class predicted by the CNN for a given image. AlexNet, CaffeNet, and VGGNet deep architectures has been fine-tuned for the classification purpose. The framework is further utilized for detection through a back-propagated saliency map. The performance of the proposed methodology has been validated on the FlickrLogos-32 logo benchmark dataset. The proposed method outperforms the state-of-the-art baseline fully supervised methods with mean average precision (mAP) of 75.83%.
Similar content being viewed by others
References
Alaei A, Roy PP, Pal U (2016) Logo and seal based administrative document image retrieval: a survey. Comput Sci Rev 22:47
Bhunia AK, Bhunia AK, Ghose S, Das A, Roy PP, Pal U (2019) A deep one-shot network for query-based logo retrieval. Pattern Recogn 96:106965
Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2846–2854
Biswas C (2014) Logo recognition technique using sift descriptor, surf descriptor and hog descriptor. Ph.D. thesis
Borji A, Cheng MM, Jiang H, Li J (2015) Salient object detection: A benchmark. IEEE Trans Image Process 24(12):5706
Boykov Y, Funka-Lea G (2006) Graph cuts and efficient ND image segmentation. Int J Comput Vis 70(2):109
Candemir S, Palaniappan K, Akgul YS (2013) Multi-class regularization parameter learning for graph cut image segmentation. In: 10th international symposium on biomedical imaging, pp 1473–1476
Chen X, Kundu K, Zhang K, Ma H, Fidler S, Urtasun R (2016) Monocular 3d object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2147–2156
Cheng MM, Mitra NJ, Huang X, Torr PH, Hu SM (2011) Salient object detection and segmentation. Image 2(3):9
Cheng MM, Mitra NJ, Huang X, Torr PH, Hu SM (2015) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37 (3):569
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88 (2):303
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627
Gao K, Lin S, Zhang Y, Tang S, Zhang D (2009) Logo detection based on spatial-spectral saliency and partial spatial context. In: International conference on multimedia and expo, pp 322–329
Gao R, Uchida S, Shahab A, Shafait F, Frinken V (2014) Visual saliency models for text detection in real world. Plos One 9(12):e114539
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp 346–361
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hoi SC, Wu X, Liu H, Wu Y, Wang H, Xue H, Wu Q (2015) Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv:1511.02462
Iandola FN, Shen A, Gao P, Keutzer K (2015) Deeplogo: Hitting logo recognition with the deep neural network hammer. arXiv:1510.02131
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678
Joly A, Buisson O (2009) Logo retrieval with a contrario visual query expansion. In: Proceedings of the 17th ACM international conference on multimedia, pp 581–584
Kalantidis Y, Pueyo LG, Trevisiol M, van Zwol R, Avrithis Y (2011) Scalable triangulation based logo recognition. In: Proceedings of the 1st ACM international conference on multimedia retrieval, p 20
Keserwani P, De P, Roy PP, Pal U (2019) Zero shot learning based script identification in the wild. In: 2019 international conference on document analysis and recognition. IEEE, pp 987–992
Kleban J, Xie X, Ma WY (2008) Spatial pyramid mining for logo detection in natural scenes. In: International conference on multimedia and expo, pp 1077–1080
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
LeCun YA, Bottou L, Orr GB, Müller KR (2012) Efficient BackProp. In: Neural networks: tricks of the trade. Springer, pp 9–48
Li Z, Schulte-Austum M, Neschen M (2010) Fast logo detection and recognition in document images. In: 20th International conference on pattern recognition, pp 2716–2719
Lin Y, Kong S, Wang D, Zhuang Y (2014) Saliency detection within a deep convolutional architecture
Malmer T (2010) Image segmentation using grabcut. IEEE Transactions on Signal Processing 5(1):1
Na IS, Oh KH, Kim SH (2013) Unconstrained object segmentation using grabcut based on automatic generation of initial boundary. International Journal of Contents 9(1):6
Pham TD (2003) Unconstrained logo detection in document images. Pattern Recogn 36(12):3023
Pigou L, Van Den Oord A, Dieleman S, Van Herreweghe M, Dambre J (2018) Beyond temporal pooling: Recurrence and temporal convolutions for gesture recognition in video. Int J Comput Vision 126(2-4):430
Plath N, Toussaint M, Nakajima S (2009) Multi-class image segmentation using conditional random fields and global classification. In: Proceedings of the 26th annual international conference on machine learning, pp 817–824
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Romberg S, Pueyo LG, Lienhart R, Van Zwol R (2011) Scalable logo recognition in real-world images. In: Proceedings of the 1st ACM international conference on multimedia retrieval, p 25
Rother C, Kolmogorov V, Blake A (2004) Grabcut: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics 23:309–314
Rusinol M, Llados J (2009) Logo spotting by a bag-of-words approach for document categorization. In: 10th international conference on document analysis and recognition, pp 111–115
Sanyal S, Sengamedu SH (2007) Logoseeker: a system for detecting and matching logos in natural images. In: Proceedings of the 15th ACM international conference on multimedia, pp 166–167
Scharfenberger C, Wong A, Fergani K, Zelek JS, Clausi DA (2013) Statistical textural distinctiveness for salient region detection in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 979–986
Sharma S, Kiros R, Salakhutdinov R (2015) Action recognition using visual attention. arXiv:1511.04119
Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv:1312.6034
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Su H, Gong S, Zhu X (2017) Weblogo-2m: Scalable logo detection by deep learning from the web. In: Proceedings of the IEEE international conference on computer vision workshops, pp 270–279
Su H, Gong S, Zhu X (2020) Scalable logo detection by self co-learning. Pattern Recogn 97:107003
Su H, Zhu X, Gong S (2017) Deep learning logo detection with data expansion by synthesising context. In: 2017 IEEE winter conference on applications of computer vision. IEEE, pp 530–539
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708
Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, Liang J (2016) Convolutional neural networks for medical image analysis: Full training or fine tuning?. IEEE Trans Med Imaging 35(5):1299
Tang P, Peng Y (2017) Exploiting distinctive topological constraint of local feature matching for logo image recognition. Neurocomputing 236:113
Tang P, Wang X, Bai X, Liu W (2017) Multiple instance detection network with online instance classifier refinement. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2843–2851
Tang P, Wang X, Wang A, Yan Y, Liu W, Huang J, Yuille A (2018) Weakly supervised region proposal network and object detection. In: Proceedings of the European conference on computer vision, pp 352–368
Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 1, pp 511–518
Wang WRKH, Chong TT (2014) Gradient-based learning applied to document recognition. In: European Conference on Computer Vision, 86 pp 431–445
Xie L, Shen J, Zhu L (2016) Online cross-modal hashing for web image retrieval. In: Thirtieth AAAI conference on artificial intelligence
Xing L, Tian Z, Huang W, Scott MR (2019) Convolutional character networks. In: Proceedings of the IEEE international conference on computer vision, pp 9126–9136
Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly supervised object detection network. In: Proceedings of the IEEE international conference on computer vision, pp 8372–8381
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, pp 818–833
Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) W2F: A weakly-supervised to fully supervised framework for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 928–936
Zhang S, Benenson R, Omran M, Hosang J, Schiele B (2018) Towards reaching human performance in pedestrian detection. IEEE Trans Pattern Anal Mach Intell 40(4):973
Zhang Y, Zhu M, Wang D, Feng S (2014) Logo detection and recognition based on classification. In: International conference on web-age information management. Springer, pp 805–816
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495
Zhu G, Doermann D (2007) Automatic document logo detection. In: Ninth international conference on document analysis and recognition, vol 2, pp 864–868
Zhu L, Shen J, Xie L, Cheng Z (2016) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybern 47(11):3941
Acknowledgments
The authors would like to acknowledge the support of DST-SERB. The Project ID is SB/S3/EECE/099/2016.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declared that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kumar, G., Keserwani, P., Roy, P.P. et al. Logo detection using weakly supervised saliency map. Multimed Tools Appl 80, 4341–4365 (2021). https://doi.org/10.1007/s11042-020-09813-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09813-6