Skip to main content
Log in

Multi-instance learning by maximizing the area under receiver operating characteristic curve

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

The purpose of this study is to solve the multi-instance classification problem by maximizing the area under the Receiver Operating Characteristic (ROC) curve obtained for witness instances. We derive a mixed integer linear programming model that chooses witnesses and produces the best possible ROC curve using a linear ranking function for multi-instance classification. The formulation is solved using a commercial mathematical optimization solver as well as a fast metaheuristic approach. When the data is not linearly separable, we illustrate how new features can be generated to tackle the problem. We present a comprehensive computational study to compare our methods against the state-of-the-art approaches in the literature. Our study reveals the success of an optimal linear ranking function through cross validation for several benchmark instances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. In [eMI-BR], the witness selection variable is relaxed, which might technically lead to more than one variable in the same bag having nonzero values. However, as shown later in the proof, either one of these instances can be chosen as a witness under the standard assumption.

  2. One exception to this, as explained later, is where we add features for nonlinear classification; but we make it explicit and compare against a study that uses the same approach.

References

  1. Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning, 1st edn. The MIT Press (2010)

    Google Scholar 

  2. Jordan, M.I., Mitchell, T.M.: Machine learning: Trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)

    Article  MATH  Google Scholar 

  3. Mannino, M., Yang, Y., Ryu, Y.: Classification algorithm sensitivity to training data with non representative attribute noise. Decision Support Systems 46(3), 743–751 (2009)

    Article  Google Scholar 

  4. Foulds, J., Frank, E.: A review of multi-instance learning assumptions. The Knowledge Engineering Review 25(1), 1–25 (2010)

    Article  Google Scholar 

  5. Vanwinckelen, G., Fierens, D., Blockeel, H., et al.: Instance-level accuracy versus bag-level accuracy in multi-instance learning. Data Mining and Knowledge Discovery 30(2), 313–341 (2016)

    Article  MATH  Google Scholar 

  6. Amores, J.: Multiple instance classification: Review, taxonomy and comparative study. Artificial Intelligence 201, 81–105 (2013)

    Article  MATH  Google Scholar 

  7. Xu, X.: Statistical learning in multiple instance problems. Master’s thesis, The University of Waikato, (2003)

  8. Carbonneau, M.-A., Cheplygina, V., Granger, E., Gagnon, G.: Multiple instance learning: A survey of problem characteristics and applications. Pattern Recognition 77, 329–353 (2018)

    Article  Google Scholar 

  9. Weidmann, N., Frank, E., Pfahringer, B.: A two-level learning method for generalized multi-instance problems. In Proceedings of the 14th European Conference on Machine Learning, ECML’03, pages 468–479, Berlin, Heidelberg. Springer-Verlag. (2003) ISBN 3-540-20121-1, 978-3-540-20121-2

  10. Zhou, Z.-H.: Multi-instance learning : A survey. Technical report, AI Lab, Department of Computer Science & Technology, Nanjing University, Nanjing, China (2004)

    Google Scholar 

  11. Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems 10, NIPS ’97, pages 570–576, Cambridge, MA, USA. MIT Press. (1998) ISBN 0-262-10076-2

  12. Wang, J., Zucker, J.-D.: Solving multiple-instance problem: A lazy learning approach. In Proceedings of the 17th International Conference on Machine Learning, pages 1119—1125. Morgan Kaufmann, (2000)

  13. Zucker, J.-D., Chevaleyre, Y.: Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. application to the mutagenesis problem. In Proceedings of the 14th Canadian Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence, Ottawa, Canada, pages 204–214, (2000)

  14. Zhou, Z.-H., Zhang, M.-L.: Neural networks for multi-instance learning. In Proceedings of the International Conference on Intelligent Information Technology, Beijing, China, pages 455–459, (2002)

  15. Babenko, Boris: Multiple instance learning : Algorithms and applications. Technical report, Department of Computer Science and Engineering. University of California, San Diego, USA (2008)

    Google Scholar 

  16. Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89(1–2), 31–71 (1997)

    Article  MATH  Google Scholar 

  17. Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 577–584. MIT Press (2003)

    Google Scholar 

  18. Erhun Kundakcioglu, O., Seref, O., Pardalos, P.M.: Multiple instance learning via margin maximization. Applied Numerical Mathematics 60(4), 358–369 (2010)

    Article  MATH  Google Scholar 

  19. Poursaeidi, M.H., Erhun Kundakcioglu, O.: Robust support vector machines for multiple instance learning. Annals of Operations Research 216(1), 205–227 (2014)

    Article  MATH  Google Scholar 

  20. Carbonneau, M.-A., Granger, E., Raymond, A.J., Gagnon, G.: Robust multiple-instance learning ensembles using random subspace instance selection. Pattern Recognition 58, 83–99 (2016)

    Article  Google Scholar 

  21. Wang, X., Yan, Y., Tang, P., Bai, X., Liu, W.: Revisiting multiple instance neural networks. Pattern Recognition 74, 15–24 (2018)

    Article  Google Scholar 

  22. Ilse, M., Tomczak, J., Welling, M.: Attention-based deep multiple instance learning. In International conference on machine learning, pages 2127–2136. PMLR, (2018)

  23. Bertsimas, D., Chang, A., Rudin, C.: A discrete optimization approach to supervised ranking. In Proceedings of the 5th INFORMS Workshop on Data Mining and Health Informatics (DM-HI 2010), (2010)

  24. Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861–874 (2006)

    Article  Google Scholar 

  25. Fawcett, T.: Prie: a system for generating rulelists to maximize roc performance. Data Mining and Knowledge Discovery 17(2), 207–224 (2008)

    Article  Google Scholar 

  26. Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R. H.: Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp 848–855, (2003)

  27. Krishna Menon, A., Williamson, R.C.: Bipartite ranking: A risk-theoretic perspective. The Journal of Machine Learning Research 17(1), 6766–6867 (2016)

    MATH  Google Scholar 

  28. Green, D.M., Swets, J.A.: Signal Detection Theory and Psychophysics. Wiley, New York (1966)

    Google Scholar 

  29. Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)

    Article  Google Scholar 

  30. Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. Advances in Neural Information Processing Systems 16, 313–320 (2004)

    Google Scholar 

  31. Eberhart, R.C., Shi, Y., Kennedy, J.: Swarm Intelligence. Elsevier (2001)

    Google Scholar 

  32. Zhang, Q., Goldman, S.A.: EM-DD: An improved multiple-instance learning technique. In Advances in Neural Information Processing Systems 14, 1073–1080 (2002)

    Google Scholar 

  33. Kucukasci, E. S., Baydogan, M. G., Taskin, Z. C.: A linear programming approach to multiple instance learning. Turkish Journal of Electrical Engineering & Computer Sciences, 1–16, (2021). https://doi.org/10.3906/elk-2009-144

  34. Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization. Swarm intelligence 1(1), 33–57 (2007)

    Article  Google Scholar 

  35. Gurobi Optimization. Gurobi optimizer reference manual, (2020). URL http://www.gurobi.com

Download references

Acknowledgements

The authors would like to thank Gizem Atasoy, who while preferring not to contribute to this paper as a co-author, experimented with the initial version of our mathematical model with some data instances during her MSc Thesis study. The authors are also grateful to Nima Manafzadeh Dizbin, who helped with the implementation of deep-learning methods that are used for benchmarking.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to O. Erhun Kundakcioglu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Flowchart of the particle swarm optimization (PSO) algorithm

See Appendix Fig. 3.

Fig. 3
figure 3

Flowchart of the Particle Swarm Optimization (PSO) algorithm

Training and test performance for PSO with different parameters

See Appendix Tables 12 and 13.

Table 12 Training Performance — Average training AUC and average time spent in seconds (in parenthesis) for PSO using three parameter sets after five repetitions of 10-fold cross validation
Table 13 Test Performance — Average AUC (top) and accuracy (bottom) for PSO using three parameter sets after five repetitions of 10-fold cross validation

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sakarya, I.E., Kundakcioglu, O.E. Multi-instance learning by maximizing the area under receiver operating characteristic curve. J Glob Optim 85, 351–375 (2023). https://doi.org/10.1007/s10898-022-01219-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-022-01219-y

Keywords

Navigation