Skip to main content
Log in

Spatial division networks for weakly supervised detection

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

With only global image-level annotations, weakly supervised learning of deep convolutional neural networks has shown enough capacity in classification and localization but lack of ability to present the detection explicitly. In this work, we propose a novel spatial division network, which is applied to detect bounding boxes only with weak supervision. The essence of our model is two innovative differentiable modules, determination network and parameterized division, which perform the spatial division in feature maps of classification networks. After training, the learned parameters of the spatial division would correspond to a set of predicted bounding box coordinates. To demonstrate the effectiveness of our model for multi-label classification and weakly supervised detection, we conduct extensive experiments on the multi-MNIST dataset. Experimental results show our spatial division networks can (1) help improve the accuracy of multi-label classification, (2) implement in an end-to-end way only with the image-level annotations, and (3) output accurate bounding box coordinate, thereby achieving multi-digits detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Achutti A, Achutti VR (2009) Curriculum learning. In: International conference on machine learning (ICML). ACM, Montreal, pp 41–48. https://doi.org/10.1017/s1047951100000925

  2. Bilen H, Namboodiri VP, Van Gool LJ (2014) Object and action classification with latent window parameters. Int J Comput Vis 106(3):237–251

    Article  Google Scholar 

  3. Bilen H, Pedersoli M, Namboodiri VP, Tuytelaars T, Van Gool L (2014) Object classification with adaptable regions. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 3662–3669

  4. Bilen H, Pedersoli M, Tuytelaars T (2014) Weakly supervised detection with posterior regularization. In: British machine vision conference, Nottingham, pp 1–12

  5. Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, Nevada, pp 2846–2854

  6. Diba A, Sharma V, Pazandeh A, Pirsiavash H, Van Gool L, Leuven K (2017) Weakly supervised cascaded convolutional networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, pp 914–922

  7. Durand T, Mordan, T, Thome N, Cord M (2017) WILDCAT: weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In: The IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, vol 2, pp 5957–5966

  8. Durand T, Thome N, Cord M (2016) WELDON: Weakly supervised learning of deep convolutional neural networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, Nevada, pp 4743–4752

  9. Durand T, Thome N, Cord M (2018) Exploiting negative evidence for deep latent structured models. IEEE Trans Pattern Anal Mach Intell 41:337–351

    Article  Google Scholar 

  10. Everingham M, Winn J (2011) The PASCAL visual object classes challenge 2012 (VOC2012) development kit, Pattern Analysis, Statistical Modelling and Computational Learning. Tech Rep 1(1):1–32

    Google Scholar 

  11. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, Sardinia, pp 249–256

  12. Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. In: Advances in neural information processing systems (NIPS), Montreal, pp 2017–2015

  13. Jiang W, Zhao Z, Su F (2018) Weakly supervised detection with decoupled attention-based deep representation. Multimed Tools Appl 77(3):3261–3277

    Article  Google Scholar 

  14. Kantorov V, Oquab M, Cho M, Laptev I (2016) ContextLocNet: context-aware deep network models for weakly supervised localization. In: European conference on computer vision (ECCV), pp 350–365. https://doi.org/10.1007/978-3-319-46448-0

  15. Kosugi S, Yamasaki T, Aizawa K (2019) Object-aware instance labeling for weakly supervised object detection. In: The IEEE conference on computer vision and pattern recognition (CVPR), Long Beach, CA, pp 6064–6072

  16. Kumar MP, Packer B, Koller D (2010) Self-paced learning for latent variable models M. In: Advances in neural information processing systems (NIPS), Vancouver, pp 1189–1197

  17. Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint p. arXiv:1312.4400

  18. Liu Y, Chen W, Mahmud SMH, Qu H (2019) Mutual constraint learning for weakly supervised object detection. In: IEEE 14th international conference on intelligent systems and knowledge engineering

  19. Murtza I, Khan A, Akhtar N (2019) Object detection using hybridization of static and dynamic feature spaces and its exploitation by ensemble classification. Neural Comput Appl 31(2):347–361

    Article  Google Scholar 

  20. Neri P, Heeger DJ (2002) Spatiotemporal mechanisms for detecting and identifying image features in human vision. Nat Neurosci 5(8):812–816

    Article  Google Scholar 

  21. Nguyen MH, Torresani L, de la Torre F, Carsten (2009) Weakly supervised discriminative localization and classification: a joint learning approach. In: IEEE international conference on computer vision, Kyoto, pp 925–1932

  22. Oquab M, Bottou L, Laptev I, Sivic J (2015) Is object localization for free?-weakly-supervised learning with convolutional neural networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), Boston, Massachusetts, pp 685–694

  23. Pandey M, Lazebnik S (2011) Scene recognition and weakly supervised object localization with deformable part-based models megha pandey and svetlana lazebnik. In: The IEEE conference on computer vision and pattern recognition (CVPR), Colorado Springs, pp 1307–1314

  24. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: The IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, pp 7263–7271

  25. Russakovsky O, Lin Y, Yu K, Fei-Fei L (2012) Object-centric spatial pooling for image classification. In: European conference on computer vision (ECCV), Florence, pp 1–15

  26. Sande KVD (2011) Segmentation as selective search for object recognition. In: The IEEE international conference on computer vision (ICCV), vol 1, p 7. Colorado Springs. https://doi.org/10.1109/ICCV.2011.6126456

  27. Sangineto E, Nabi M, Culibrk D, Sebe N (2018) Self paced deep learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell 41(3):712–725

    Article  Google Scholar 

  28. Shen Y, Ji R, Wang Y, Wu Y, Cao L (2019) Cyclic guidance for weakly supervised joint detection and segmentation. In: The IEEE conference on computer vision and pattern recognition (CVPR), Long Beach, CA, pp 697–707

  29. Shi Z, Yang Y, Hospedales TM, Xiang T (2014) Weakly supervised learning of objects, attributes and their associations. In: European conference on computer vision (ECCV), Springer, pp 472–487

  30. Sun C, Paluri M, Collobert R, Nevatia R, Bourdev L (2016) ProNet: learning to propose object-specific boxes for cascaded neural networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, Nevada, pp 3485–3493

  31. Tang P, Wang X, Bai X, Liu W (2017) Multiple instance detection network with online instance classifier refinement. In: The IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, vol 1, pp 2843–2851. https://doi.org/10.1109/CVPR.2017.326

  32. Vo T, Nguyen T, Le CT (2019) A hybrid framework for smile detection in class imbalance scenarios. Neural Comput Appl 31(12):8583–8592

    Article  Google Scholar 

  33. Wang J, Wang N, Li L, Ren Z (2020) Real-time behavior detection and judgment of egg breeders based on YOLO v3. Neural Comput Appl 32(10):5471–5481

    Article  Google Scholar 

  34. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

  35. Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly supervised object detection network. In: Proceedings of the IEEE international conference on computer vision (ICCV), Seoul, pp 8372–8381

  36. Zeng Z, Liu B, Fu J, Chao H, Zhang L (2019) WSOD2: learning bottom-up and top-down objectness distillation for weakly-supervised object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), Seoul, pp 8292–8300

  37. Zhang H, Li D, Ji Y, Zhou H, Wu W, Liu K (2019) Towards new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans Ind Inform. https://doi.org/10.1109/TII.2019.2954956

    Article  Google Scholar 

  38. Zhang X, Feng J, Xiong H, Tian Q (2018) Zigzag learning for weakly supervised object detection. In: The IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake City, Utah, pp 4262–4270

  39. Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) W2F: a weakly-supervised to fully-supervised framework for object detection. In: The IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake City, Utah, pp 928–936

  40. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: The IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, Nevada, pp 2921–2929

  41. Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision (ECCV), Springer, Zurich, pp 391–405

  42. Zhang M, Luo X, Chen Y, Wu J, Belatreche A, Pan Z, Qu H, Li H (2020) An efficient threshold-driven aggregate-label learning algorithm for multimodal information processing. IEEE J Sel Top Signal Process 14(3):592–602

    Article  Google Scholar 

  43. Zhang M, Qu H, Belatreche A, Chen Y, Zhang Y (2018) A highly effective and robust membrane potential-driven supervised learning method for spiking neuron. IEEE Trans Neural Netw Learn Syst 30(1):123–137

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Key R&D Program of China under Grant 2018YFC0808304, and in part by the National Science Foundation of China under Grant 61976043 and Grant 61573081.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Qu.

Ethics declarations

Conflict of interest

The authors confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Chen, W., Qu, H. et al. Spatial division networks for weakly supervised detection. Neural Comput & Applic 33, 4965–4978 (2021). https://doi.org/10.1007/s00521-020-05257-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05257-z

Keywords

Navigation