Skip to main content
Log in

Towards automatic bounding box annotations from weakly labeled images

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

An Erratum to this article was published on 04 July 2015

Abstract

In this work we discuss the problem of automatically determining bounding box annotations for objects in images whereas we only assume weak labeling in the form of global image labels. We therefore are only given a set of positive images all containing at least one instance of a desired object and a negative set of images which represent background. Our goal is then to determine the locations of the object instances within the positive images by bounding boxes. We also describe and analyze a method for automatic bounding box annotation which consists of two major steps. First, we apply a statistical model for determining visual features which are likely to be indicative for the respective object class. Based on these feature models we infer preliminary estimations for bounding boxes. Second, we use a CCCP training algorithm for latent structured SVM in order to improve the initial estimations by using them as initializations for latent variables modeling the optimal bounding box positions. We evaluate our approach on three publicly available datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Alexe B, Deselaers T, Ferrari V (2010) What is an object? In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’10

  2. Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell 34(11):2189–2202

    Article  Google Scholar 

  3. Blaschko MB, Lampert CH (2008) Learning to localize objects with structured output regression. In: Proceedings of European conference on computer vision, ECCV ’08, pp 2–15

  4. Chen CY, Grauman K (2013) Watching unlabeled video helps learn new human actions from very few labeled snapshots. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’13

  5. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’05, vol 1, pp 886–893

  6. Dietterich TG, Lathrop RH, Lozano-Prez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1-2):31–71

    Article  MATH  Google Scholar 

  7. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2007) The PASCAL visual object classes challenge (VOC2007) results http://www.pascal-networkorg/challenges/VOC/voc2007/workshop/indexhtml

  8. Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  9. Joachims T (1999) Making large-scale svm learning practical. In: Advances in kernel methods - support vector learning. MIT Press, pp 169–184

  10. Joachims T, Finley T, Yu CN (2009) Cutting-plane training of structural svms. Mach Learn 77(1):27–59

    Article  MATH  Google Scholar 

  11. Jones MJ, Rehg JM (2002) Statistical color models with application to skin detection. Int J Comput Vis 46(1):81–96

    Article  MATH  Google Scholar 

  12. Lampert C, Blaschko M, Hofmann T (2009) Efficient subwindow search: a branch and bound framework for object localization. IEEE Patt Anal Mach Intell 31(12):2129–2142

    Article  Google Scholar 

  13. Maron O, Ratan AL Multiple-instance learning for natural scene classification. In: Proceedings of international conference on machine learning 1998, ICML ’98

  14. Nilsback ME, Zisserman A A visual vocabulary for flower classification In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’06

  15. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’08, pp 1–8

  16. Ries CX, Lienhart R (2012) Deriving a discriminative color model for a given object class from weakly labeled training data. In: Proceedings of ACM international conference on multimedia retrieval, ICMR ’12, pp 44:1–44:8

  17. Ries CX, Richter F, Lienhart R (2013) Towards automatic object annotations from global image labels. In: Proceedings of ACM conference on international conference on multimedia retrieval, ICMR ’13, pp 207–214

  18. Ries C X (2014) Automatic object annotations from weakly labeled images. Dissertation, University of Augsburg

  19. Romberg S, Lienhart R (2013) Bundle min-hashing. Int J Multimed Inf Retr 2(4):243–259

    Article  Google Scholar 

  20. Romberg S, Pueyo LG, Lienhart R, van Zwol R (2011) Scalable logo recognition in real-world images. In: Proceedings of ACM international conference on multimedia retrieval, ICMR ’11, pp 25:1–25:8

  21. Savarese S, Fei-Fei L (2007) Generic object categorization, localization and pose estimation. In: Proceedings of IEEE international conference on computer vision, ICCV ’07

  22. Siva P, Russell C, Xiang T (2012) In defence of negative mining for annotating weakly labelled data. In: Proceedings of European conference on computer vision, ECCV ’12, pp 594–608

  23. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE international conference on computer vision, ICCV ’03, vol 2, pp 1470–1477

  24. Tang K, Rahul S, Jay Y, Li FF (2013) Discriminative segment annotation in weakly labeled video. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’13

  25. Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of international conference on machine learning, ICML ’04, pp 104–

  26. Yu CNJ, Joachims T (2009) Learning structural svms with latent variables. In: Proceedings of international conference on machine learning, ICML ’09, pp 1169–1176

  27. Yuille AL, Rangarajan A (2003) The concave-convex procedure. Neural Comput 15(4):915–936

    Article  MATH  Google Scholar 

  28. Zhang C, Platt JC, Viola PA (2005) Multiple instance boosting for object detection. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18, pp 1417–1424

  29. Zhu L, Chen Y, Yuille AL, Freeman WT (2010) Latent hierarchical structural learning for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’10, pp 1062–1069

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian X. Ries.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ries, C.X., Richter, F. & Lienhart, R. Towards automatic bounding box annotations from weakly labeled images. Multimed Tools Appl 75, 6091–6118 (2016). https://doi.org/10.1007/s11042-014-2434-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2434-z

Keywords

Navigation