Skip to main content

MIDCN: A Multiple Instance Deep Convolutional Network for Image Classification

  • Conference paper
  • First Online:
PRICAI 2019: Trends in Artificial Intelligence (PRICAI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11670))

Included in the following conference series:


For the image classification task, usually, the image collected in the wild contains multiple objects instead of a single dominant one. Besides, the image label is not explicitly associated with the object region, i.e., it is weakly annotated. In this paper, we propose a novel deep convolutional network for image classification under a weakly supervised condition. The proposed method, namely MIDCN, formulate the problem into Multiple Instance Learning (MIL), where each image is a bag which contains multiple instances (objects). Different with previous deep MIL methods which predict the label of each bag (i.e., image) by simply performing pooling/voting strategy over their instance (i.e., region) predictions, MIDCN directly predicts the label of a bag via bag features learned by measuring the similarities between instance features and a set of learned informative prototypes. Specifically, the prototypes are obtained by a newly proposed Global Contrast Pooling (GCP) layer which leverages instances not only coming from the current bag but also the other bags. Thus the learned bag features also contain global information of all the training bags, which is more robust and noise free. We did extensive experiments on two real-world image datasets, including both natural image dataset (PASCAL VOC 07) and pathological lung cancer image dataset, and show the results of the proposed MIDCN consistently outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. Amores, J.: Multiple instance classification: review, taxonomy and comparative study. Artif. Intell. 201, 81–105 (2013).

    Article  MathSciNet  MATH  Google Scholar 

  2. Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: NIPS, Vancouver, BC, Canada, 9–14 December 2002, pp. 561–568 (2002)

    Google Scholar 

  3. Babenko, B., Verma, N., Dollár, P., Belongie, S.J.: Multiple instance learning with manifold bags. In: ICML 2011, Bellevue, WA, USA, 28 June–2 July 2011, pp. 81–88 (2011)

    Google Scholar 

  4. Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM TIST 2(3), 27 (2011).

    Article  Google Scholar 

  5. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)

    Google Scholar 

  6. Cheng, M., Zhang, Z., Lin, W., Torr, P.H.S.: BING: binarized normed gradients for objectness estimation at 300 fps. In: CVPR 2014, Columbus, OH, USA, 23–28 June 2014, pp. 3286–3293 (2014).

  7. Everingham, M., Eslami, S.M.A., Gool, L.J.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015).

    Article  Google Scholar 

  8. Feng, J., Zhou, Z.H.: Deep MIML network. In: AAAI, pp. 1884–1890 (2017)

    Google Scholar 

  9. Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR 2014, Columbus, OH, USA, 23–28 June 2014, pp. 580–587 (2014).

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  11. Hoffman, J., Pathak, D., Darrell, T., Saenko, K.: Detector discovery in the wild: joint multiple instance and representation learning. In: CVPR 2015, Boston, MA, USA, 7–12 June 2015, pp. 2883–2891 (2015).

  12. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  13. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: MM 2014, Orlando, FL, USA, 03–07 November 2014, pp. 675–678 (2014).

  14. Karpathy, A., Li, F.: Deep visual-semantic alignments for generating image descriptions. In: CVPR 2015, Boston, MA, USA, 7–12 June 2015, pp. 3128–3137 (2015).

  15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, Lake Tahoe, NV, USA, 3–6 December 2012, pp. 1106–1114 (2012)

    Google Scholar 

  16. LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989).

    Article  Google Scholar 

  17. Liu, M., Zhang, D., Shen, D.: Ensemble sparse classification of Alzheimer’s disease. NeuroImage 60(2), 1106–1116 (2012).

    Article  Google Scholar 

  18. Mittelman, R., Lee, H., Kuipers, B., Savarese, S.: Weakly supervised learning of mid-level features with Beta-Bernoulli process restricted Boltzmann machines. In: CVPR, Portland, OR, USA, 23–28 June 2013, pp. 476–483 (2013).

  19. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: CVPR 2014, Columbus, OH, USA, 23–28 June 2014, pp. 1717–1724 (2014).

  20. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Is object localization for free? Weakly-supervised learning with convolutional neural networks. In: CVPR, Boston, USA, June 2015

    Google Scholar 

  21. Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS-W (2017)

    Google Scholar 

  22. Pathak, D., Krähenbühl, P., Darrell, T.: Constrained convolutional neural networks for weakly supervised segmentation. In: ICCV 2015, Santiago, Chile, 7–13 December 2015, pp. 1796–1804 (2015).

  23. Pinheiro, P.H.O., Collobert, R.: From image-level to pixel-level labeling with convolutional networks. In: CVPR 2015, Boston, MA, USA, 7–12 June 2015, pp. 1713–1721 (2015).

  24. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  25. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. CoRR abs/1312.6229 (2013)

    Google Scholar 

  26. Shi, Y., Gao, Y., Yang, Y., Zhang, Y., Wang, D.: Multimodal sparse representation-based classification for lung needle biopsy images. IEEE Trans. Biomed. Eng. 60(10), 2675–2685 (2013).

    Article  Google Scholar 

  27. Sun, M., Han, T.X., Liu, M.C., Khodayari-Rostamabad, A.: Multiple instance learning convolutional neural networks for object recognition. In: 2016 International Conference on Pattern Recognition, pp. 3270–3275. IEEE (2016)

    Google Scholar 

  28. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: CVPR 2014, Columbus, OH, USA, 23–28 June 2014, pp. 1701–1708 (2014).

  29. Wei, Y., et al.: CNN: single-label to multi-label. CoRR abs/1406.5726 (2014)

    Google Scholar 

  30. Wu, J., Yu, Y., Huang, C., Yu, K.: Deep multiple instance learning for image classification and auto-annotation. In: CVPR 2015, Boston, MA, USA, 7–12 June 2015, pp. 3460–3469 (2015).

  31. Xu, Y., Mo, T., Feng, Q., Zhong, P., Lai, M., Chang, E.I.: Deep learning of feature representation with multiple instance learning for medical image analysis. In: ICASSP 2014, Florence, Italy, 4–9 May 2014, pp. 1626–1630 (2014).

  32. Zhang, L., et al.: Kernel sparse representation-based classifier. IEEE Trans. Signal Process. 60(4), 1684–1695 (2012).

    Article  MathSciNet  MATH  Google Scholar 

Download references


This work was supported in part by the National Key Research and Development Program of China (2017YFB0702601), the National Natural Science Foundation of China (Grant Nos. 61673203, 61806092), Jiangsu Natural Science Foundation (BK20180326), and the Fundamental Research Funds for the Central Universities (14380056).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Yang Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, K., Huo, J., Shi, Y., Gao, Y., Shen, D. (2019). MIDCN: A Multiple Instance Deep Convolutional Network for Image Classification. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11670. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29907-1

  • Online ISBN: 978-3-030-29908-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics