Abstract
The purpose of this study is to solve the multi-instance classification problem by maximizing the area under the Receiver Operating Characteristic (ROC) curve obtained for witness instances. We derive a mixed integer linear programming model that chooses witnesses and produces the best possible ROC curve using a linear ranking function for multi-instance classification. The formulation is solved using a commercial mathematical optimization solver as well as a fast metaheuristic approach. When the data is not linearly separable, we illustrate how new features can be generated to tackle the problem. We present a comprehensive computational study to compare our methods against the state-of-the-art approaches in the literature. Our study reveals the success of an optimal linear ranking function through cross validation for several benchmark instances.
Similar content being viewed by others
Notes
In [eMI-BR], the witness selection variable is relaxed, which might technically lead to more than one variable in the same bag having nonzero values. However, as shown later in the proof, either one of these instances can be chosen as a witness under the standard assumption.
One exception to this, as explained later, is where we add features for nonlinear classification; but we make it explicit and compare against a study that uses the same approach.
References
Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning, 1st edn. The MIT Press (2010)
Jordan, M.I., Mitchell, T.M.: Machine learning: Trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)
Mannino, M., Yang, Y., Ryu, Y.: Classification algorithm sensitivity to training data with non representative attribute noise. Decision Support Systems 46(3), 743–751 (2009)
Foulds, J., Frank, E.: A review of multi-instance learning assumptions. The Knowledge Engineering Review 25(1), 1–25 (2010)
Vanwinckelen, G., Fierens, D., Blockeel, H., et al.: Instance-level accuracy versus bag-level accuracy in multi-instance learning. Data Mining and Knowledge Discovery 30(2), 313–341 (2016)
Amores, J.: Multiple instance classification: Review, taxonomy and comparative study. Artificial Intelligence 201, 81–105 (2013)
Xu, X.: Statistical learning in multiple instance problems. Master’s thesis, The University of Waikato, (2003)
Carbonneau, M.-A., Cheplygina, V., Granger, E., Gagnon, G.: Multiple instance learning: A survey of problem characteristics and applications. Pattern Recognition 77, 329–353 (2018)
Weidmann, N., Frank, E., Pfahringer, B.: A two-level learning method for generalized multi-instance problems. In Proceedings of the 14th European Conference on Machine Learning, ECML’03, pages 468–479, Berlin, Heidelberg. Springer-Verlag. (2003) ISBN 3-540-20121-1, 978-3-540-20121-2
Zhou, Z.-H.: Multi-instance learning : A survey. Technical report, AI Lab, Department of Computer Science & Technology, Nanjing University, Nanjing, China (2004)
Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems 10, NIPS ’97, pages 570–576, Cambridge, MA, USA. MIT Press. (1998) ISBN 0-262-10076-2
Wang, J., Zucker, J.-D.: Solving multiple-instance problem: A lazy learning approach. In Proceedings of the 17th International Conference on Machine Learning, pages 1119—1125. Morgan Kaufmann, (2000)
Zucker, J.-D., Chevaleyre, Y.: Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. application to the mutagenesis problem. In Proceedings of the 14th Canadian Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence, Ottawa, Canada, pages 204–214, (2000)
Zhou, Z.-H., Zhang, M.-L.: Neural networks for multi-instance learning. In Proceedings of the International Conference on Intelligent Information Technology, Beijing, China, pages 455–459, (2002)
Babenko, Boris: Multiple instance learning : Algorithms and applications. Technical report, Department of Computer Science and Engineering. University of California, San Diego, USA (2008)
Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89(1–2), 31–71 (1997)
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 577–584. MIT Press (2003)
Erhun Kundakcioglu, O., Seref, O., Pardalos, P.M.: Multiple instance learning via margin maximization. Applied Numerical Mathematics 60(4), 358–369 (2010)
Poursaeidi, M.H., Erhun Kundakcioglu, O.: Robust support vector machines for multiple instance learning. Annals of Operations Research 216(1), 205–227 (2014)
Carbonneau, M.-A., Granger, E., Raymond, A.J., Gagnon, G.: Robust multiple-instance learning ensembles using random subspace instance selection. Pattern Recognition 58, 83–99 (2016)
Wang, X., Yan, Y., Tang, P., Bai, X., Liu, W.: Revisiting multiple instance neural networks. Pattern Recognition 74, 15–24 (2018)
Ilse, M., Tomczak, J., Welling, M.: Attention-based deep multiple instance learning. In International conference on machine learning, pages 2127–2136. PMLR, (2018)
Bertsimas, D., Chang, A., Rudin, C.: A discrete optimization approach to supervised ranking. In Proceedings of the 5th INFORMS Workshop on Data Mining and Health Informatics (DM-HI 2010), (2010)
Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861–874 (2006)
Fawcett, T.: Prie: a system for generating rulelists to maximize roc performance. Data Mining and Knowledge Discovery 17(2), 207–224 (2008)
Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R. H.: Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp 848–855, (2003)
Krishna Menon, A., Williamson, R.C.: Bipartite ranking: A risk-theoretic perspective. The Journal of Machine Learning Research 17(1), 6766–6867 (2016)
Green, D.M., Swets, J.A.: Signal Detection Theory and Psychophysics. Wiley, New York (1966)
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)
Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. Advances in Neural Information Processing Systems 16, 313–320 (2004)
Eberhart, R.C., Shi, Y., Kennedy, J.: Swarm Intelligence. Elsevier (2001)
Zhang, Q., Goldman, S.A.: EM-DD: An improved multiple-instance learning technique. In Advances in Neural Information Processing Systems 14, 1073–1080 (2002)
Kucukasci, E. S., Baydogan, M. G., Taskin, Z. C.: A linear programming approach to multiple instance learning. Turkish Journal of Electrical Engineering & Computer Sciences, 1–16, (2021). https://doi.org/10.3906/elk-2009-144
Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization. Swarm intelligence 1(1), 33–57 (2007)
Gurobi Optimization. Gurobi optimizer reference manual, (2020). URL http://www.gurobi.com
Acknowledgements
The authors would like to thank Gizem Atasoy, who while preferring not to contribute to this paper as a co-author, experimented with the initial version of our mathematical model with some data instances during her MSc Thesis study. The authors are also grateful to Nima Manafzadeh Dizbin, who helped with the implementation of deep-learning methods that are used for benchmarking.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sakarya, I.E., Kundakcioglu, O.E. Multi-instance learning by maximizing the area under receiver operating characteristic curve. J Glob Optim 85, 351–375 (2023). https://doi.org/10.1007/s10898-022-01219-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-022-01219-y