Abstract
In this work we discuss the problem of automatically determining bounding box annotations for objects in images whereas we only assume weak labeling in the form of global image labels. We therefore are only given a set of positive images all containing at least one instance of a desired object and a negative set of images which represent background. Our goal is then to determine the locations of the object instances within the positive images by bounding boxes. We also describe and analyze a method for automatic bounding box annotation which consists of two major steps. First, we apply a statistical model for determining visual features which are likely to be indicative for the respective object class. Based on these feature models we infer preliminary estimations for bounding boxes. Second, we use a CCCP training algorithm for latent structured SVM in order to improve the initial estimations by using them as initializations for latent variables modeling the optimal bounding box positions. We evaluate our approach on three publicly available datasets.
Similar content being viewed by others
References
Alexe B, Deselaers T, Ferrari V (2010) What is an object? In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’10
Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell 34(11):2189–2202
Blaschko MB, Lampert CH (2008) Learning to localize objects with structured output regression. In: Proceedings of European conference on computer vision, ECCV ’08, pp 2–15
Chen CY, Grauman K (2013) Watching unlabeled video helps learn new human actions from very few labeled snapshots. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’13
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’05, vol 1, pp 886–893
Dietterich TG, Lathrop RH, Lozano-Prez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1-2):31–71
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2007) The PASCAL visual object classes challenge (VOC2007) results http://www.pascal-networkorg/challenges/VOC/voc2007/workshop/indexhtml
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Joachims T (1999) Making large-scale svm learning practical. In: Advances in kernel methods - support vector learning. MIT Press, pp 169–184
Joachims T, Finley T, Yu CN (2009) Cutting-plane training of structural svms. Mach Learn 77(1):27–59
Jones MJ, Rehg JM (2002) Statistical color models with application to skin detection. Int J Comput Vis 46(1):81–96
Lampert C, Blaschko M, Hofmann T (2009) Efficient subwindow search: a branch and bound framework for object localization. IEEE Patt Anal Mach Intell 31(12):2129–2142
Maron O, Ratan AL Multiple-instance learning for natural scene classification. In: Proceedings of international conference on machine learning 1998, ICML ’98
Nilsback ME, Zisserman A A visual vocabulary for flower classification In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’06
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’08, pp 1–8
Ries CX, Lienhart R (2012) Deriving a discriminative color model for a given object class from weakly labeled training data. In: Proceedings of ACM international conference on multimedia retrieval, ICMR ’12, pp 44:1–44:8
Ries CX, Richter F, Lienhart R (2013) Towards automatic object annotations from global image labels. In: Proceedings of ACM conference on international conference on multimedia retrieval, ICMR ’13, pp 207–214
Ries C X (2014) Automatic object annotations from weakly labeled images. Dissertation, University of Augsburg
Romberg S, Lienhart R (2013) Bundle min-hashing. Int J Multimed Inf Retr 2(4):243–259
Romberg S, Pueyo LG, Lienhart R, van Zwol R (2011) Scalable logo recognition in real-world images. In: Proceedings of ACM international conference on multimedia retrieval, ICMR ’11, pp 25:1–25:8
Savarese S, Fei-Fei L (2007) Generic object categorization, localization and pose estimation. In: Proceedings of IEEE international conference on computer vision, ICCV ’07
Siva P, Russell C, Xiang T (2012) In defence of negative mining for annotating weakly labelled data. In: Proceedings of European conference on computer vision, ECCV ’12, pp 594–608
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE international conference on computer vision, ICCV ’03, vol 2, pp 1470–1477
Tang K, Rahul S, Jay Y, Li FF (2013) Discriminative segment annotation in weakly labeled video. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’13
Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of international conference on machine learning, ICML ’04, pp 104–
Yu CNJ, Joachims T (2009) Learning structural svms with latent variables. In: Proceedings of international conference on machine learning, ICML ’09, pp 1169–1176
Yuille AL, Rangarajan A (2003) The concave-convex procedure. Neural Comput 15(4):915–936
Zhang C, Platt JC, Viola PA (2005) Multiple instance boosting for object detection. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18, pp 1417–1424
Zhu L, Chen Y, Yuille AL, Freeman WT (2010) Latent hierarchical structural learning for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’10, pp 1062–1069
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ries, C.X., Richter, F. & Lienhart, R. Towards automatic bounding box annotations from weakly labeled images. Multimed Tools Appl 75, 6091–6118 (2016). https://doi.org/10.1007/s11042-014-2434-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2434-z