Advertisement

Multimedia Tools and Applications

, Volume 75, Issue 11, pp 6091–6118 | Cite as

Towards automatic bounding box annotations from weakly labeled images

  • Christian X. RiesEmail author
  • Fabian Richter
  • Rainer Lienhart
Article
  • 285 Downloads

Abstract

In this work we discuss the problem of automatically determining bounding box annotations for objects in images whereas we only assume weak labeling in the form of global image labels. We therefore are only given a set of positive images all containing at least one instance of a desired object and a negative set of images which represent background. Our goal is then to determine the locations of the object instances within the positive images by bounding boxes. We also describe and analyze a method for automatic bounding box annotation which consists of two major steps. First, we apply a statistical model for determining visual features which are likely to be indicative for the respective object class. Based on these feature models we infer preliminary estimations for bounding boxes. Second, we use a CCCP training algorithm for latent structured SVM in order to improve the initial estimations by using them as initializations for latent variables modeling the optimal bounding box positions. We evaluate our approach on three publicly available datasets.

Keywords

Automatic annotation Weakly labeled data Statistical feature model Visual features Image analysis 

References

  1. 1.
    Alexe B, Deselaers T, Ferrari V (2010) What is an object? In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’10Google Scholar
  2. 2.
    Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell 34(11):2189–2202CrossRefGoogle Scholar
  3. 3.
    Blaschko MB, Lampert CH (2008) Learning to localize objects with structured output regression. In: Proceedings of European conference on computer vision, ECCV ’08, pp 2–15Google Scholar
  4. 4.
    Chen CY, Grauman K (2013) Watching unlabeled video helps learn new human actions from very few labeled snapshots. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’13Google Scholar
  5. 5.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’05, vol 1, pp 886–893Google Scholar
  6. 6.
    Dietterich TG, Lathrop RH, Lozano-Prez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1-2):31–71CrossRefzbMATHGoogle Scholar
  7. 7.
    Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2007) The PASCAL visual object classes challenge (VOC2007) results http://www.pascal-networkorg/challenges/VOC/voc2007/workshop/indexhtml
  8. 8.
    Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645CrossRefGoogle Scholar
  9. 9.
    Joachims T (1999) Making large-scale svm learning practical. In: Advances in kernel methods - support vector learning. MIT Press, pp 169–184Google Scholar
  10. 10.
    Joachims T, Finley T, Yu CN (2009) Cutting-plane training of structural svms. Mach Learn 77(1):27–59CrossRefzbMATHGoogle Scholar
  11. 11.
    Jones MJ, Rehg JM (2002) Statistical color models with application to skin detection. Int J Comput Vis 46(1):81–96CrossRefzbMATHGoogle Scholar
  12. 12.
    Lampert C, Blaschko M, Hofmann T (2009) Efficient subwindow search: a branch and bound framework for object localization. IEEE Patt Anal Mach Intell 31(12):2129–2142CrossRefGoogle Scholar
  13. 13.
    Maron O, Ratan AL Multiple-instance learning for natural scene classification. In: Proceedings of international conference on machine learning 1998, ICML ’98Google Scholar
  14. 14.
    Nilsback ME, Zisserman A A visual vocabulary for flower classification In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’06Google Scholar
  15. 15.
    Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’08, pp 1–8Google Scholar
  16. 16.
    Ries CX, Lienhart R (2012) Deriving a discriminative color model for a given object class from weakly labeled training data. In: Proceedings of ACM international conference on multimedia retrieval, ICMR ’12, pp 44:1–44:8Google Scholar
  17. 17.
    Ries CX, Richter F, Lienhart R (2013) Towards automatic object annotations from global image labels. In: Proceedings of ACM conference on international conference on multimedia retrieval, ICMR ’13, pp 207–214Google Scholar
  18. 18.
    Ries C X (2014) Automatic object annotations from weakly labeled images. Dissertation, University of AugsburgGoogle Scholar
  19. 19.
    Romberg S, Lienhart R (2013) Bundle min-hashing. Int J Multimed Inf Retr 2(4):243–259CrossRefGoogle Scholar
  20. 20.
    Romberg S, Pueyo LG, Lienhart R, van Zwol R (2011) Scalable logo recognition in real-world images. In: Proceedings of ACM international conference on multimedia retrieval, ICMR ’11, pp 25:1–25:8Google Scholar
  21. 21.
    Savarese S, Fei-Fei L (2007) Generic object categorization, localization and pose estimation. In: Proceedings of IEEE international conference on computer vision, ICCV ’07Google Scholar
  22. 22.
    Siva P, Russell C, Xiang T (2012) In defence of negative mining for annotating weakly labelled data. In: Proceedings of European conference on computer vision, ECCV ’12, pp 594–608Google Scholar
  23. 23.
    Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE international conference on computer vision, ICCV ’03, vol 2, pp 1470–1477Google Scholar
  24. 24.
    Tang K, Rahul S, Jay Y, Li FF (2013) Discriminative segment annotation in weakly labeled video. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’13Google Scholar
  25. 25.
    Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of international conference on machine learning, ICML ’04, pp 104–Google Scholar
  26. 26.
    Yu CNJ, Joachims T (2009) Learning structural svms with latent variables. In: Proceedings of international conference on machine learning, ICML ’09, pp 1169–1176Google Scholar
  27. 27.
    Yuille AL, Rangarajan A (2003) The concave-convex procedure. Neural Comput 15(4):915–936CrossRefzbMATHGoogle Scholar
  28. 28.
    Zhang C, Platt JC, Viola PA (2005) Multiple instance boosting for object detection. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18, pp 1417–1424Google Scholar
  29. 29.
    Zhu L, Chen Y, Yuille AL, Freeman WT (2010) Latent hierarchical structural learning for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’10, pp 1062–1069Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Christian X. Ries
    • 1
    Email author
  • Fabian Richter
    • 1
  • Rainer Lienhart
    • 1
  1. 1.Universität AugsburgAugsburgGermany

Personalised recommendations