Towards automatic bounding box annotations from weakly labeled images

Ries, Christian X.; Richter, Fabian; Lienhart, Rainer

doi:10.1007/s11042-014-2434-z

Towards automatic bounding box annotations from weakly labeled images

Published: 23 January 2015

Volume 75, pages 6091–6118, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Christian X. Ries¹,
Fabian Richter¹ &
Rainer Lienhart¹

495 Accesses
3 Citations
Explore all metrics

An Erratum to this article was published on 04 July 2015

Abstract

In this work we discuss the problem of automatically determining bounding box annotations for objects in images whereas we only assume weak labeling in the form of global image labels. We therefore are only given a set of positive images all containing at least one instance of a desired object and a negative set of images which represent background. Our goal is then to determine the locations of the object instances within the positive images by bounding boxes. We also describe and analyze a method for automatic bounding box annotation which consists of two major steps. First, we apply a statistical model for determining visual features which are likely to be indicative for the respective object class. Based on these feature models we infer preliminary estimations for bounding boxes. Second, we use a CCCP training algorithm for latent structured SVM in order to improve the initial estimations by using them as initializations for latent variables modeling the optimal bounding box positions. We evaluate our approach on three publicly available datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Alexe B, Deselaers T, Ferrari V (2010) What is an object? In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’10
Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell 34(11):2189–2202
Article Google Scholar
Blaschko MB, Lampert CH (2008) Learning to localize objects with structured output regression. In: Proceedings of European conference on computer vision, ECCV ’08, pp 2–15
Chen CY, Grauman K (2013) Watching unlabeled video helps learn new human actions from very few labeled snapshots. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’13
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’05, vol 1, pp 886–893
Dietterich TG, Lathrop RH, Lozano-Prez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1-2):31–71
Article MATH Google Scholar
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2007) The PASCAL visual object classes challenge (VOC2007) results http://www.pascal-networkorg/challenges/VOC/voc2007/workshop/indexhtml
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Joachims T (1999) Making large-scale svm learning practical. In: Advances in kernel methods - support vector learning. MIT Press, pp 169–184
Joachims T, Finley T, Yu CN (2009) Cutting-plane training of structural svms. Mach Learn 77(1):27–59
Article MATH Google Scholar
Jones MJ, Rehg JM (2002) Statistical color models with application to skin detection. Int J Comput Vis 46(1):81–96
Article MATH Google Scholar
Lampert C, Blaschko M, Hofmann T (2009) Efficient subwindow search: a branch and bound framework for object localization. IEEE Patt Anal Mach Intell 31(12):2129–2142
Article Google Scholar
Maron O, Ratan AL Multiple-instance learning for natural scene classification. In: Proceedings of international conference on machine learning 1998, ICML ’98
Nilsback ME, Zisserman A A visual vocabulary for flower classification In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’06
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’08, pp 1–8
Ries CX, Lienhart R (2012) Deriving a discriminative color model for a given object class from weakly labeled training data. In: Proceedings of ACM international conference on multimedia retrieval, ICMR ’12, pp 44:1–44:8
Ries CX, Richter F, Lienhart R (2013) Towards automatic object annotations from global image labels. In: Proceedings of ACM conference on international conference on multimedia retrieval, ICMR ’13, pp 207–214
Ries C X (2014) Automatic object annotations from weakly labeled images. Dissertation, University of Augsburg
Romberg S, Lienhart R (2013) Bundle min-hashing. Int J Multimed Inf Retr 2(4):243–259
Article Google Scholar
Romberg S, Pueyo LG, Lienhart R, van Zwol R (2011) Scalable logo recognition in real-world images. In: Proceedings of ACM international conference on multimedia retrieval, ICMR ’11, pp 25:1–25:8
Savarese S, Fei-Fei L (2007) Generic object categorization, localization and pose estimation. In: Proceedings of IEEE international conference on computer vision, ICCV ’07
Siva P, Russell C, Xiang T (2012) In defence of negative mining for annotating weakly labelled data. In: Proceedings of European conference on computer vision, ECCV ’12, pp 594–608
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE international conference on computer vision, ICCV ’03, vol 2, pp 1470–1477
Tang K, Rahul S, Jay Y, Li FF (2013) Discriminative segment annotation in weakly labeled video. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’13
Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of international conference on machine learning, ICML ’04, pp 104–
Yu CNJ, Joachims T (2009) Learning structural svms with latent variables. In: Proceedings of international conference on machine learning, ICML ’09, pp 1169–1176
Yuille AL, Rangarajan A (2003) The concave-convex procedure. Neural Comput 15(4):915–936
Article MATH Google Scholar
Zhang C, Platt JC, Viola PA (2005) Multiple instance boosting for object detection. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18, pp 1417–1424
Zhu L, Chen Y, Yuille AL, Freeman WT (2010) Latent hierarchical structural learning for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’10, pp 1062–1069

Download references

Author information

Authors and Affiliations

Universität Augsburg, Universitätsstr. 6a, 86650, Augsburg, Germany
Christian X. Ries, Fabian Richter & Rainer Lienhart

Authors

Christian X. Ries
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Richter
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Lienhart
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian X. Ries.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ries, C.X., Richter, F. & Lienhart, R. Towards automatic bounding box annotations from weakly labeled images. Multimed Tools Appl 75, 6091–6118 (2016). https://doi.org/10.1007/s11042-014-2434-z

Download citation

Received: 24 March 2014
Revised: 05 November 2014
Accepted: 21 December 2014
Published: 23 January 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11042-014-2434-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards automatic bounding box annotations from weakly labeled images

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards automatic bounding box annotations from weakly labeled images

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation