Weakly Supervised Group Mask Network for Object Detection

Song, Lingyun; Liu, Jun; Sun, Mingxuan; Shang, Xuequn

doi:10.1007/s11263-020-01397-w

Weakly Supervised Group Mask Network for Object Detection

Published: 09 November 2020

Volume 129, pages 681–702, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Lingyun Song ORCID: orcid.org/0000-0002-7892-2617^1,2,3,
Jun Liu³,
Mingxuan Sun⁴ &
…
Xuequn Shang^1,2

1276 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Learning object detectors from weak image annotations is an important yet challenging problem. Many weakly supervised approaches formulate the task as a multiple instance learning problem, where each image is represented as a bag of instances. For predicting the score for each object that occurs in an image, existing MIL based approaches tend to select the instance that responds more strongly to a specific class, which, however, overlooks the contextual information. Besides, objects often exhibit dramatic variations such as scaling and transformations, which makes them hard to detect. In this paper, we propose the weakly supervised group mask network (WSGMN), which mainly has two distinctive properties: (i) it exploits the relations among regions to generate community instances, which contain context information and are robust to object variations. (ii) It generates a mask for each label group, and utilizes these masks to dynamically select the feature information of the most useful community instances for recognizing specific objects. Extensive experiments on several benchmark datasets demonstrate the effectiveness of WSGMN on the tasks of weakly supervised object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Active Learning Strategies for Weakly-Supervised Object Detection

Diverse Learner: Exploring Diverse Supervision for Semi-supervised Object Detection

Forget and Diversify: Regularized Refinement for Weakly Supervised Object Detection

Notes

In our experiment, we set \(Z = 10\) for PASCAL VOC datasets, \(Z = 24\) for MS-COCO, and \(Z = 26\) for ImageNet detection dataset.

References

Arun, A., Jawahar, C., & Kumar, M. P. (2019). Dissimilarity coefficient based weakly supervised object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9432–9441).
Bency, A. J., Kwon, H., Lee, H., Karthikeyan, S., & Manjunath, B. (2016). Weakly supervised localization using deep feature maps. In: European conference on computer vision (pp. 714–731).
Bilen, H., & Vedaldi, A. (2016). Weakly supervised deep detection networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2846–2854).
Bilen, H., Pedersoli, M., & Tuytelaars, T. (2014). Weakly supervised object detection with posterior regularization. In: Proceedings of the British machine vision conference (pp. 1–12).
Bosch, A., Munoz, X., Oliver, A., & Marti, R. (2006). Object and scene classification: What does a supervised approach provide us? International Conference on Pattern Recognition, 1, 773–777.
Google Scholar
Cao, J., Cholakkal, H., Anwer, R. M., Khan, F. S., Pang, Y., & Shao, L. (2020) D2det: Towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11485–11494).
Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. ArXiv preprint, arXiv:1405.3531.
Cinbis, R. G., Verbeek, J., & Schmid, C. (2017a). Weakly supervised object localization with multi-fold multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1), 189–203.
Article Google Scholar
Cinbis, R. G., Verbeek, J., & Schmid, C. (2017b). Weakly supervised object localization with multi-fold multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1), 189–203.
Article Google Scholar
Deselaers, T., Alexe, B., & Ferrari, V. (2012). Weakly supervised localization and learning with generic knowledge. International Journal of Computer Vision, 100(3), 275–293.
Article MathSciNet Google Scholar
Diba, A., Sharma, V., Pazandeh, A., Pirsiavash, H., & Van Gool, L. (2017). Weakly supervised cascaded convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 914–922).
Dietterich, T. G., Lathrop, R. H., & Lozano-Pérez, T. (1997). Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89(1), 31–71.
Article Google Scholar
Divvala, S. K., Hoiem, D., Hays, J. H., Efros, A. A., & Hebert, M. (2009). An empirical study of context in object detection. In: IEEE conference on computer vision and pattern recognition (pp. 1271–1278).
Durand, T., Mordan, T., Thome, N., & Cord, M. (2017). Wildcat: Weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition.
Durand, T., Thome, N., & Cord, M. (2015). Mantra: Minimum maximum latent structural svm for image classification and ranking. In: Proceedings of the IEEE international conference on computer vision (pp. 2713–2721).
Durand, T., Thome, N., & Cord, M. (2016). Weldon: Weakly supervised learning of deep convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4743–4752).
Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The pascal visual object classes challenge 2007 (voc 2007) results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop
Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2012) The pascal visual object classes challenge 2012 results. In: See http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html (Vol. 5).
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Article Google Scholar
Ge, W., Yang, S., & Yu, Y. (2018). Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1277–1286).
Girshick, R. (2015) Fast r-CNN. In: Proceedings of the IEEE international conference on computer vision (pp. 1440–1448).
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587).
Hand, E. M., & Chellappa, R. (2017). Attributes for improved attributes: A multi-task network utilizing implicit and explicit relationships for facial attribute classification. In: AAAI (pp. 4068–4074).
He, S., Lau, R. W., Liu, W., Huang, Z., & Yang, Q. (2015). Supercnn: A superpixelwise convolutional neural network for salient object detection. International Journal of Computer Vision, 115(3), 330–344.
Article MathSciNet Google Scholar
Huang, J., Li, G., Huang, Q., & Wu, X. (2015). Learning label specific features for multi-label classification. In: IEEE international conference on data mining (pp. 181–190).
Jie, Z., Wei, Y., Jin, X., Feng, J., & Liu, W. (2017). Deep self-taught learning for weakly supervised object localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition.
Kantorov, V., Oquab, M., Cho, M., & Laptev, I. (2016). Contextlocnet: Context-aware deep network models for weakly supervised localization. In: European conference on computer vision (pp. 350–365).
Li, Y. F., Hu, J. H., Jiang, Y., & Zhou, Z. H. (2012). Towards discovering what patterns trigger what labels. In: Proceedings of the 26th AAAI conference on artificial intelligence.
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision (pp. 2980–2988).
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In: European conference on computer vision (pp. 740–755).
Lowe, D. G. (1999). Object recognition from local scale-invariant features. In: The proceedings of the 7th IEEE international conference on computer vision (Vol. 2, pp. 1150–1157).
Nikulin, M. S. (2001). Hellinger distance. Encyclopedia of Mathematics. http://encyclopediaofmath.org/index.php?title=Hellinger_distance&oldid=16453
Oliva, A., & Torralba, A. (2007). The role of context in object recognition. Trends in Cognitive Sciences, 11(12), 520–527.
Article Google Scholar
Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2015). Is object localization for free?-weakly-supervised learning with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 685–694).
Parizi, S. N., Vedaldi, A., Zisserman, A., & Felzenszwalb, P. (2014). Automatic discovery and optimization of parts for image classification. ArXiv preprint, arXiv:1412.6598.
Pourian, N., Karthikeyan, S., & Manjunath, B. (2015). Weakly supervised graph based semantic segmentation by learning communities of image-parts. In: Proceedings of the IEEE international conference on computer vision (pp. 1359–1367).
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007) Objects in context. In: IEEE international conference on Computer vision (pp. 1–8).
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (pp. 91–99).
Ren, Z., Yu, Z., Yang, X., Liu, M. Y., Lee, Y. J., Schwing, A. G., & Kautz, J. (2020). Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10598–10607).
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Article MathSciNet Google Scholar
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Article Google Scholar
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations.
Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781–10790).
Tang, P., Xinggang, W., Xiang, B., & Liu, W. (2017). Multiple instance detection network with online instance classifier refinement. In: Proceedings of the IEEE conference on computer vision and pattern recognition.
Tang, P., Wang, X., Bai, S., Shen, W., Bai, X., Liu, W., et al. (2018a). PCL: Proposal cluster learning for weakly supervised object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(1), 176–191.
Article Google Scholar
Tang, P., Wang, X., Wang, A., Yan, Y., Liu, W., Huang, J., & Yuille, A. (2018b). Weakly supervised region proposal network and object detection. In: Proceedings of the European conference on computer vision (ECCV) (pp. 352–368).
Wu, Y., Chen, Y., Yuan, L., Liu, Z., Wang, L., Li, H., & Fu, Y. (2020) Rethinking classification and localization for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10186–10195).
Zhang, M. L., & Wu, L. (2015). Lift: Multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 107–120.
Article Google Scholar
Zhang, X., Feng, J., Xiong, H., & Tian, Q. (2018) Zigzag learning for weakly supervised object detection. In: The IEEE conference on computer vision and pattern recognition.
Zhao, R., Ouyang, W., Li, H., & Wang, X. (2015). Saliency detection by multi-context deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1265–1274).
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2921–2929).
Zhu, C., Zheng, Y., Luu, K., & Savvides, M. (2017) CMS-rCNN: Contextual multi-scale region-based CNN for unconstrained face detection. In: Deep learning for biometrics (pp. 57–79).
Zitnick, C. L., & Dollár, P. (2014) Edge boxes: Locating object proposals from edges. In: European conference on computer vision (pp. 391–405).

Download references

Author information

Authors and Affiliations

School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129, China
Lingyun Song & Xuequn Shang
Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi’an, 710129, China
Lingyun Song & Xuequn Shang
SPKLSTN Lab, Department of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, 710049, China
Lingyun Song & Jun Liu
Division of Computer Science and Engineering, School of Electrical Engineering and Computer Science, Louisiana State University, Baton Rouge, LA, 70803, USA
Mingxuan Sun

Authors

Lingyun Song
View author publications
You can also search for this author in PubMed Google Scholar
Jun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mingxuan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xuequn Shang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lingyun Song.

Additional information

Communicated by Antonio Torralba.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research was supported in part by National Key Research and Development Program of China under Grant No. 2018YFB1004500, National Nature Science Foundation of China under Grant Nos. 61772426, 61672419, 61672418, 61532004, 61502377, 61532015, 61721002, the Joint Funds of the National Natural Science Foundation of China under Grant No. U1811262, Innovation Research Team of Ministry of Education under Grant No. IRT_17R86, Fundamental Research Funds for the Central Universities under Grant No. D5000200146, China Postdoctoral Science Foundation under Grant No. 2020M673487.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, L., Liu, J., Sun, M. et al. Weakly Supervised Group Mask Network for Object Detection. Int J Comput Vis 129, 681–702 (2021). https://doi.org/10.1007/s11263-020-01397-w

Download citation

Received: 09 August 2018
Accepted: 30 October 2020
Published: 09 November 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11263-020-01397-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weakly Supervised Group Mask Network for Object Detection

Abstract

Access this article

Similar content being viewed by others

Active Learning Strategies for Weakly-Supervised Object Detection

Diverse Learner: Exploring Diverse Supervision for Semi-supervised Object Detection

Forget and Diversify: Regularized Refinement for Weakly Supervised Object Detection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Weakly Supervised Group Mask Network for Object Detection

Abstract

Access this article

Similar content being viewed by others

Active Learning Strategies for Weakly-Supervised Object Detection

Diverse Learner: Exploring Diverse Supervision for Semi-supervised Object Detection

Forget and Diversify: Regularized Refinement for Weakly Supervised Object Detection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation