Multi-instance learning by maximizing the area under receiver operating characteristic curve

Sakarya, I. Edhem; Kundakcioglu, O. Erhun

doi:10.1007/s10898-022-01219-y

Multi-instance learning by maximizing the area under receiver operating characteristic curve

Published: 12 August 2022

Volume 85, pages 351–375, (2023)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

323 Accesses
Explore all metrics

Abstract

The purpose of this study is to solve the multi-instance classification problem by maximizing the area under the Receiver Operating Characteristic (ROC) curve obtained for witness instances. We derive a mixed integer linear programming model that chooses witnesses and produces the best possible ROC curve using a linear ranking function for multi-instance classification. The formulation is solved using a commercial mathematical optimization solver as well as a fast metaheuristic approach. When the data is not linearly separable, we illustrate how new features can be generated to tackle the problem. We present a comprehensive computational study to compare our methods against the state-of-the-art approaches in the literature. Our study reveals the success of an optimal linear ranking function through cross validation for several benchmark instances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quicksort leave-pair-out cross-validation for ROC curve analysis

Article Open access 06 October 2022

Learning to improve medical decision making from imbalanced data without a priori cost

Article Open access 05 December 2014

The ROC Diagonal is Not Layperson’s Chance: A New Baseline Shows the Useful Area

Notes

In [eMI-BR], the witness selection variable is relaxed, which might technically lead to more than one variable in the same bag having nonzero values. However, as shown later in the proof, either one of these instances can be chosen as a witness under the standard assumption.
One exception to this, as explained later, is where we add features for nonlinear classification; but we make it explicit and compare against a study that uses the same approach.

References

Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning, 1st edn. The MIT Press (2010)
Google Scholar
Jordan, M.I., Mitchell, T.M.: Machine learning: Trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)
Article MATH Google Scholar
Mannino, M., Yang, Y., Ryu, Y.: Classification algorithm sensitivity to training data with non representative attribute noise. Decision Support Systems 46(3), 743–751 (2009)
Article Google Scholar
Foulds, J., Frank, E.: A review of multi-instance learning assumptions. The Knowledge Engineering Review 25(1), 1–25 (2010)
Article Google Scholar
Vanwinckelen, G., Fierens, D., Blockeel, H., et al.: Instance-level accuracy versus bag-level accuracy in multi-instance learning. Data Mining and Knowledge Discovery 30(2), 313–341 (2016)
Article MATH Google Scholar
Amores, J.: Multiple instance classification: Review, taxonomy and comparative study. Artificial Intelligence 201, 81–105 (2013)
Article MATH Google Scholar
Xu, X.: Statistical learning in multiple instance problems. Master’s thesis, The University of Waikato, (2003)
Carbonneau, M.-A., Cheplygina, V., Granger, E., Gagnon, G.: Multiple instance learning: A survey of problem characteristics and applications. Pattern Recognition 77, 329–353 (2018)
Article Google Scholar
Weidmann, N., Frank, E., Pfahringer, B.: A two-level learning method for generalized multi-instance problems. In Proceedings of the 14th European Conference on Machine Learning, ECML’03, pages 468–479, Berlin, Heidelberg. Springer-Verlag. (2003) ISBN 3-540-20121-1, 978-3-540-20121-2
Zhou, Z.-H.: Multi-instance learning : A survey. Technical report, AI Lab, Department of Computer Science & Technology, Nanjing University, Nanjing, China (2004)
Google Scholar
Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems 10, NIPS ’97, pages 570–576, Cambridge, MA, USA. MIT Press. (1998) ISBN 0-262-10076-2
Wang, J., Zucker, J.-D.: Solving multiple-instance problem: A lazy learning approach. In Proceedings of the 17th International Conference on Machine Learning, pages 1119—1125. Morgan Kaufmann, (2000)
Zucker, J.-D., Chevaleyre, Y.: Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. application to the mutagenesis problem. In Proceedings of the 14th Canadian Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence, Ottawa, Canada, pages 204–214, (2000)
Zhou, Z.-H., Zhang, M.-L.: Neural networks for multi-instance learning. In Proceedings of the International Conference on Intelligent Information Technology, Beijing, China, pages 455–459, (2002)
Babenko, Boris: Multiple instance learning : Algorithms and applications. Technical report, Department of Computer Science and Engineering. University of California, San Diego, USA (2008)
Google Scholar
Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89(1–2), 31–71 (1997)
Article MATH Google Scholar
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 577–584. MIT Press (2003)
Google Scholar
Erhun Kundakcioglu, O., Seref, O., Pardalos, P.M.: Multiple instance learning via margin maximization. Applied Numerical Mathematics 60(4), 358–369 (2010)
Article MATH Google Scholar
Poursaeidi, M.H., Erhun Kundakcioglu, O.: Robust support vector machines for multiple instance learning. Annals of Operations Research 216(1), 205–227 (2014)
Article MATH Google Scholar
Carbonneau, M.-A., Granger, E., Raymond, A.J., Gagnon, G.: Robust multiple-instance learning ensembles using random subspace instance selection. Pattern Recognition 58, 83–99 (2016)
Article Google Scholar
Wang, X., Yan, Y., Tang, P., Bai, X., Liu, W.: Revisiting multiple instance neural networks. Pattern Recognition 74, 15–24 (2018)
Article Google Scholar
Ilse, M., Tomczak, J., Welling, M.: Attention-based deep multiple instance learning. In International conference on machine learning, pages 2127–2136. PMLR, (2018)
Bertsimas, D., Chang, A., Rudin, C.: A discrete optimization approach to supervised ranking. In Proceedings of the 5th INFORMS Workshop on Data Mining and Health Informatics (DM-HI 2010), (2010)
Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861–874 (2006)
Article Google Scholar
Fawcett, T.: Prie: a system for generating rulelists to maximize roc performance. Data Mining and Knowledge Discovery 17(2), 207–224 (2008)
Article Google Scholar
Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R. H.: Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp 848–855, (2003)
Krishna Menon, A., Williamson, R.C.: Bipartite ranking: A risk-theoretic perspective. The Journal of Machine Learning Research 17(1), 6766–6867 (2016)
MATH Google Scholar
Green, D.M., Swets, J.A.: Signal Detection Theory and Psychophysics. Wiley, New York (1966)
Google Scholar
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)
Article Google Scholar
Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. Advances in Neural Information Processing Systems 16, 313–320 (2004)
Google Scholar
Eberhart, R.C., Shi, Y., Kennedy, J.: Swarm Intelligence. Elsevier (2001)
Google Scholar
Zhang, Q., Goldman, S.A.: EM-DD: An improved multiple-instance learning technique. In Advances in Neural Information Processing Systems 14, 1073–1080 (2002)
Google Scholar
Kucukasci, E. S., Baydogan, M. G., Taskin, Z. C.: A linear programming approach to multiple instance learning. Turkish Journal of Electrical Engineering & Computer Sciences, 1–16, (2021). https://doi.org/10.3906/elk-2009-144
Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization. Swarm intelligence 1(1), 33–57 (2007)
Article Google Scholar
Gurobi Optimization. Gurobi optimizer reference manual, (2020). URL http://www.gurobi.com

Download references

Acknowledgements

The authors would like to thank Gizem Atasoy, who while preferring not to contribute to this paper as a co-author, experimented with the initial version of our mathematical model with some data instances during her MSc Thesis study. The authors are also grateful to Nima Manafzadeh Dizbin, who helped with the implementation of deep-learning methods that are used for benchmarking.

Author information

Authors and Affiliations

Department of Industrial Engineering and Innovation Sciences, Eindhoven University of Technology, Eindhoven, Netherlands
I. Edhem Sakarya
Department of Industrial Engineering, Ozyegin University, Istanbul, Turkey
O. Erhun Kundakcioglu

Authors

I. Edhem Sakarya
View author publications
You can also search for this author in PubMed Google Scholar
O. Erhun Kundakcioglu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to O. Erhun Kundakcioglu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Flowchart of the particle swarm optimization (PSO) algorithm

See Appendix Fig. 3.

Training and test performance for PSO with different parameters

See Appendix Tables 12 and 13.

Table 12 Training Performance — Average training AUC and average time spent in seconds (in parenthesis) for PSO using three parameter sets after five repetitions of 10-fold cross validation

Full size table

Table 13 Test Performance — Average AUC (top) and accuracy (bottom) for PSO using three parameter sets after five repetitions of 10-fold cross validation

Full size table

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sakarya, I.E., Kundakcioglu, O.E. Multi-instance learning by maximizing the area under receiver operating characteristic curve. J Glob Optim 85, 351–375 (2023). https://doi.org/10.1007/s10898-022-01219-y

Download citation

Received: 04 November 2021
Accepted: 24 July 2022
Published: 12 August 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10898-022-01219-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-instance learning by maximizing the area under receiver operating characteristic curve

Abstract

Access this article

Similar content being viewed by others

Quicksort leave-pair-out cross-validation for ROC curve analysis

Learning to improve medical decision making from imbalanced data without a priori cost

The ROC Diagonal is Not Layperson’s Chance: A New Baseline Shows the Useful Area

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

Flowchart of the particle swarm optimization (PSO) algorithm

Training and test performance for PSO with different parameters

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-instance learning by maximizing the area under receiver operating characteristic curve

Abstract

Access this article

Similar content being viewed by others

Quicksort leave-pair-out cross-validation for ROC curve analysis

Learning to improve medical decision making from imbalanced data without a priori cost

The ROC Diagonal is Not Layperson’s Chance: A New Baseline Shows the Useful Area

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

Flowchart of the particle swarm optimization (PSO) algorithm

Training and test performance for PSO with different parameters

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation