Abstract
Motivated by the binary classification problem in machine learning, we study in this paper a class of decision problems where the decision maker has a list of goals, from which he aims to attain the maximal possible number of goals. In binary classification, this essentially means seeking a prediction rule to achieve the lowest probability of misclassification, and computationally it involves minimizing a (difficult) non-convex, 0–1 loss function. To address the intractability, previous methods consider minimizing the cumulative loss—the sum of convex surrogates of the 0–1 loss of each goal. We revisit this paradigm and develop instead an axiomatic framework by proposing a set of salient properties on functions for goal scoring and then propose the coherent loss approach, which is a tractable upper-bound of the loss over the entire set of goals. We show that the proposed approach yields a strictly tighter approximation to the total loss (i.e., the number of missed goals) than any convex cumulative loss approach while preserving the convexity of the underlying optimization problem. Moreover, this approach, applied to for binary classification, also has a robustness interpretation which builds a connection to robust SVMs.
Similar content being viewed by others
Notes
The margin a is introduced to ensure that Theorem 5 holds. Notice that the hinge-loss approximation with or without the margin leads to the same formulation of the standard SVM.
References
Ahmed, S., Shapiro, A.: Solving chance-constrained stochastic programs via sampling and integer programming. In: Tutorial in Operations Research, pp. 261–269. Informs (2008)
Arora, S., Babai, L., Stern, J., Sweedyk, Z.: The hardness of approximate optima in lattices, codes, and systems of linear equations. J. Comput. Syst. Sci. 54, 317–331 (1997)
Artzner, P., Delbaen, F., Eber, J., Heath, D.: Coherent measures of risk. Math. Finance 9, 203–228 (1999)
Asuncion, A., Newman, D.J.: UCI machine learning repository (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html
Atamtürk, A., Nemhauser, G.L., Savelsbergh, M.W.P.: The mixed vertex packing problems. Math. Program. 99, 35–53 (2000)
Ben-David, S., Eiron, N., Long, P.M.: On the difficulty of approximately maximizing agreements. J. Comput. Syst. Sci. 66, 496–513 (2003)
Bordley, R., LiCalzi, M.: Decision analysis using targets instead of utility functions. Decis. Econ. Finance 23, 53–74 (2000)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, New York, NY, pp. 144–152 (1992)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Brown, D., Sim, M.: Satisficing measures for analysis of risky positions. Manag. Sci. 55(1), 71–84 (2009)
Castagnoli, E., LiCalzi, M.: Expected utility without utility. Theory Decis. 41, 281–301 (1996)
Charnes, A., Cooper, W.W.: Management models and industrial applications of linear programming. Manag. Sci. 4(1), 38–91 (1957)
Charnes, A., Cooper, W.W.: Chance constrained programming. Manag. Sci. 6, 73–79 (1959)
Charnes, A., Cooper, W.W., Ferguson, R.: Optimal estimation of executive compensation by linear programming. Manag. Sci. 1, 138–151 (1955)
Charnes, A., Haynes, K.E., Hazleton, J.E., Ryan, M.J.: An hierarchical goal programming approach to environmental-land use management. In: Mathematical Analysis of Decision Problems in Ecology, pp. 2–13 (1975)
Chen, W., Sim, M.: Goal driven optimization. Oper. Res. 57(2), 342–357 (2009)
Cortes, C., Vapnik, V.N.: Support vector networks. Mach. Learn. 20, 1–25 (1995)
Courtney, J.F., Klastorin, T.D., Ruefli, T.W.: A goal programming approach to urban-suburban location preferences. Manag. Sci. 18(6), 258–268 (1972)
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2002)
Delbaen, F.: Coherent Risk Measures on General Probability Spaces, pp. 1–37. Springer, Berlin (2002)
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28, 337–407 (2000)
Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V., Boyd, S., Kimura, H. (eds.) Recent Advances in Learning and Control, pp. 95–110. Springer, Berlin (2008)
Grant, M., Boyd, S.: CVX: Matlab Software for Disciplined Convex Programming, Version 1.21 (2011). http://cvxr.com/cvx
Gurobi Optimization, I.: Gurobi Optimizer Reference Manual (2013). http://www.gurobi.com
Lam, S., Ng, T., Sim, M., Song, J.: Multiple objectives satisficing under uncertainty. Oper. Res. 61(1), 214–227 (2013)
Lee, Y., Lin, Y., Wahba, G.: Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data. J. Am. Stat. Assoc. 99, 67–81 (2004)
Liu, Y., Shen, X.: Multicategory \(\varphi \)-learning. J. Am. Stat. Assoc. 101(474), 500–509 (2006)
Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic constraints. SIAM J. Optim. 19, 674–699 (2008)
Nemirovski, A., Shapiro, A.: Scenario approximation of chance constraints. In: Calafiore, G., Dabbene, F. (eds.) Probabilistic and Randomized Methods for Design Under Uncertainty, pp. 3–48. Springer, London (2005)
Nemirovski, A., Shapiro, A.: Convex approximations of chance constrained programs. SIAM J. Optim. 17, 969–996 (2006)
Norton, M., Mafusalov, A., Uryasev, S.: Soft margin support vector classification as buffered probability minimization. J. Mach. Learn. Res. 18, 1–43 (2017)
Norton, M., Uryasev, S.: Maximization of AUC and buffered AUC in classification. Research Report (2015)
Poggio, T., Rifkin, R., Mukherjee, S., Niyogi, P.: General conditions for predictivity in learning theory. Nature 428(6981), 419–422 (2004)
Prékopa, A.: On probabilistic constrained programming. In: Proceedings of the Princeton Symposium on Mathematical Programming, pp. 113–138 (1970)
Prékopa, A.: Stochastic Programming, pp. 319–371. Kluwer, Dordrecht (1995)
Rockafellar, R., Royset, J.: On buffered failure probability in design and optimization of structures. Reliabil. Eng. Syst. Safety 95(5), 499–510 (2010)
Schapire, E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37, 297–336 (1999)
Schölkopf, B., Smola, A.J.: Learning with Kernels, pp. 407–423. MIT Press, Cambridge (2002)
Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on Stochastic Programming: Modeling and Theory, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2014)
Shivaswamy, P.K., Bhattacharyya, C., Smola, A.J.: Second order cone programming approaches for handling missing and uncertain data. J. Mach. Learn. Res. 7, 1283–1314 (2006)
Simon, H.: A behavior model for rational choice. Q. J. Econ. 69, 99–118 (1955)
Simon, H.: Theories of decision-making in economics and behavioral science. Am. Econ. Rev. 49(3), 253–283 (1959)
Vapnik, V.N., Chervonenkis, A.: The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognit. Image Anal. 1(3), 260–284 (1991)
Vapnik, V.N., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote Control 24, 744–780 (1963)
Vazirani, V.: Approximation Algorithms. Springer, Berlin (2001)
Yang, W., Xu, H.: The Coherent Loss Function for Classification. ICML, Stockholm (2014)
Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Stat. 32, 56–85 (2004)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, W., Sim, M. & Xu, H. Goal scoring, coherent loss and applications to machine learning. Math. Program. 182, 103–140 (2020). https://doi.org/10.1007/s10107-019-01387-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-019-01387-y