On support vector machines under a multiple-cost scenario

Benítez-Peña, Sandra; Blanquero, Rafael; Carrizosa, Emilio; Ramírez-Cobo, Pepa

doi:10.1007/s11634-018-0330-5

On support vector machines under a multiple-cost scenario

Regular Article
Published: 31 July 2018

Volume 13, pages 663–682, (2019)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Sandra Benítez-Peña ORCID: orcid.org/0000-0002-6246-2847^1,2,
Rafael Blanquero^1,2,
Emilio Carrizosa^1,2 &
…
Pepa Ramírez-Cobo^1,3

532 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Support vector machine (SVM) is a powerful tool in binary classification, known to attain excellent misclassification rates. On the other hand, many realworld classification problems, such as those found in medical diagnosis, churn or fraud prediction, involve misclassification costs which may be different in the different classes. However, it may be hard for the user to provide precise values for such misclassification costs, whereas it may be much easier to identify acceptable misclassification rates values. In this paper we propose a novel SVM model in which misclassification costs are considered by incorporating performance constraints in the problem formulation. Specifically, our aim is to seek the hyperplane with maximal margin yielding misclassification rates below given threshold values. Such maximal margin hyperplane is obtained by solving a quadratic convex problem with linear constraints and integer variables. The reported numerical experience shows that our model gives the user control on the misclassification rates in one class (possibly at the expense of an increase in misclassification rates for the other class) and is feasible in terms of running times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Optimization of Multi-class Support Vector Machines with MSVMpack

High-dimensional penalized Bernstein support vector classifier

Article 16 January 2024

Nonlinear optimization and support vector machines

Article Open access 14 April 2022

References

Alcalá-Fdez J, Sanchez L, Garcia S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM et al (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318
Article Google Scholar
Allwein EL, Schapire RE, Singer Y (2001) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141
MathSciNet MATH Google Scholar
Benítez-Peña S, Blanquero R, Carrizosa E, Ramírez-Cobo P (2018) Cost-sensitive feature selection for support vector machines. Comput Oper Res. https://doi.org/10.1016/j.cor.2018.03.005
Article MATH Google Scholar
Bertsimas D, King A, Mazumder R et al (2016) Best subset selection via a modern optimization lens. Ann Stat 44(2):813–852
Article MathSciNet MATH Google Scholar
Bertsimas D, Weismantel R (2005) Optimization over integers. Dynamic Ideas, Belmont
Google Scholar
Bewick V, Cheek L, Ball J (2004) Statistics review 13: receiver operating characteristic curves. Crit Care 8(6):508–512
Article Google Scholar
Bonami P, Biegler LT, Conn AR, Cornujols G, Grossmann IE, Laird CD, Lee J, Lodi A, Margot F, Sawaya N, Wchter A (2008) An algorithmic framework for convex mixed integer nonlinear programs. Discrete Optim 5(2):186–204 (in Memory of George B. Dantzig)
Article MathSciNet MATH Google Scholar
Bradford JP, Kunz C, Kohavi R, Brunk C, Brodley CE (1998) Pruning decision trees with misclassification costs. In: Proceedings of the 10th European conference on machine learning, ECML’98. Springer, pp. 131–136
Burer S, Letchford AN (2012) Non-convex mixed-integer nonlinear programming: a survey. Surv Oper Res Manag Sci 17(2):97–106
MathSciNet Google Scholar
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167
Article Google Scholar
Camm JD, Raturi AS, Tsubakitani S (1990) Cutting big M down to size. Interfaces 20(5):61–66
Article Google Scholar
Carrizosa E, Martin-Barragan B, Romero Morales D (2008) Multi-group support vector machines with measurement costs: a biobjective approach. Discrete Appl Math 156(6):950–966
Article MathSciNet MATH Google Scholar
Carrizosa E, Romero Morales D (2013) Supervised classification and mathematical optimization. Comput Oper Res 40(1):150–165
Article MathSciNet MATH Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, New York
Book MATH Google Scholar
Datta S, Das S (2015) Near-Bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs. Neural Netw 70:39–52
Article MATH Google Scholar
Freitas A, Costa-Pereira A, Brazdil P (2007) Cost-sensitive decision trees applied to medical data. In: Data warehousing and knowledge discovery: 9th international conference, DaWaK 2007, Regensburg Germany, September 3–7, 2007. Proceedings. Springer, Berlin, pp 303–312
Guo J (2010) Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis. Biostatistics 11(4):599–608
Article MathSciNet Google Scholar
Gurobi Optimization I (2016) Gurobi optimizer reference manual. http://www.gurobi.com
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New York
Book MATH Google Scholar
He H, Ma Y (2013) Imbalanced learning: foundations, algorithms, and applications. Wiley, Hoboken
Book MATH Google Scholar
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30
Article MathSciNet MATH Google Scholar
Horn D, Demircioğlu A, Bischl B, Glasmachers T, Weihs C (2016) A comparative study on large scale kernelized support vector machines. Adv Data Anal Classif. https://doi.org/10.1007/s11634-016-0265-7
Hsu CW, Chang CC, Lin CJ et al (2003) A practical guide to support vector classification. Tech. rep., Department of Computer Science, National Taiwan University
Kohavi R et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol 14. Stanford, CA, pp 1137–1145
Lichman M (2013) UCI machine learning repository. https://archive.ics.uci.edu/ml/index.php
Lin Y, Lee Y, Wahba G (2002) Support vector machines for classification in nonstandard situations. Mach Learn 46(1–3):191–202
Article MATH Google Scholar
Lyu T, Lock EF, Eberly LE (2017) Discriminating sample groups with multi-way data. Biostatistics 18:434–450
MathSciNet Google Scholar
Maldonado S, Prez J, Bravo C (2017) Cost-based feature selection for support vector machines: an application in credit scoring. Eur J Oper Res 261(2):656–665
Article MathSciNet MATH Google Scholar
Mansouri K, Ringsted T, Ballabio D, Todeschini R, Consonni V (2013) Quantitative structureactivity relationship models for ready biodegradability of chemicals. J Chem Inf Model 53(4):867–878
Article Google Scholar
Mercer J (1909) Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc Lond Ser A 209:415–446
Article MATH Google Scholar
Prati RC, Batista GE, Silva DF (2015) Class imbalance revisited: a new experimental setup to assess the performance of treatment methods. Knowl Inf Syst 45(1):247–270
Article Google Scholar
Sánchez BN, Wu M, Song PXK, Wang W (2016) Study design in high-dimensional classification analysis. Biostatistics 17(4):722
Article MathSciNet Google Scholar
Silva APD (2017) Optimization approaches to supervised classification. Eur J Oper Res 261(2):772–788
Article MathSciNet MATH Google Scholar
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222
Article MathSciNet Google Scholar
Tang Y, Zhang YQ, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern B (Cybern) 39(1):281–288
Article Google Scholar
Van Rossum G, Drake FL (2011) An Introduction to Python. Network Theory Ltd, United Kingdom
Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Book MATH Google Scholar
Vapnik VN (1998) Statistical learning theory, vol 1, 1st edn. Wiley, New York
MATH Google Scholar
Yao Y, Lee Y (2014) Another look at linear programming for feature selection via methods of regularization. Stat Comput 24(5):885–905
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research is supported by Fundación BBVA, and by Projects FQM329 and P11-FQM-7603 (Junta de Andalucía, Spain) and MTM2015-65915-R (Ministerio de Economía y Competitividad, Spain). The last three are cofunded with EU ERD Funds. The authors are thankful for such support.

Author information

Authors and Affiliations

IMUS, Instituto de Matemáticas de la Universidad de Sevilla, Seville, Spain
Sandra Benítez-Peña, Rafael Blanquero, Emilio Carrizosa & Pepa Ramírez-Cobo
Departamento de Estadística e Investigación Operativa, Universidad de Sevilla, Seville, Spain
Sandra Benítez-Peña, Rafael Blanquero & Emilio Carrizosa
Departamento de Estadística e Investigación Operativa, Universidad de Cádiz, Cádiz, Spain
Pepa Ramírez-Cobo

Authors

Sandra Benítez-Peña
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Blanquero
View author publications
You can also search for this author in PubMed Google Scholar
Emilio Carrizosa
View author publications
You can also search for this author in PubMed Google Scholar
Pepa Ramírez-Cobo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandra Benítez-Peña.

Appendix A: Derivation of the CSVM

In this section, the detailed steps to build the CSVM formulation are shown. For that, suppose that we are given the mixed-integer quadratic model

$$\begin{aligned} \min _{\omega , \beta , \xi ,z}&\omega ^\top \omega + {C_+\sum \limits _{i \in I : y_i =+1} \xi _i + C_-\sum \limits _{i \in I : y_i =-1} \xi _i }&\\ \text {s.t.}&y_i(\omega ^\top x_i + \beta ) \ge 1 - \xi _i,&i \in I \\&\xi _i \ge 0&i \in I\\&y_j(\omega ^\top x_j + \beta ) \ge 1 - M_1(1-z_j),&j \in J \\&z_j \in \{ 0,1\}&j \in J\\&{\hat{p}_\ell \ge {p_{0}^*}_{\ell }}&\ell \in L. \end{aligned}$$

Hence, the problem above can be rewritten as

$$\begin{aligned} \begin{array}{lllllll} \min _{{z}} &{} &{} &{} {\min _{{\omega },\beta ,{\xi }} }&{} {{\omega }^\top {\omega } + {C_+\sum \limits _{i \in I : y_i =+1} \xi _i + C_-\sum \limits _{i \in I : y_i =-1} \xi _i } }\\ \text{ s.t. } &{} z_j \in \{0,1\} &{} j \in J &{} {\text{ s.t. }} &{} {y_i \left( {\omega }^\top {x}_i + \beta \right) \ge 1 - \xi _i} &{} {i \in I}\\ &{} {\hat{p}_\ell \ge {p_{0}^*}_{\ell }} &{} \ell \in L &{} &{}y_j\left( \omega ^\top x_j + \beta \right) \ge 1 - M_1(1-z_j),&{} j \in J \\ &{} &{} &{} &{} { \xi _i \ge 0} &{} {i \in I }. \end{array} \end{aligned}$$

We first develop the expression of the dual for the linear case and then we show how the kernel trick applies. As a previous step we should consider the variables z as fixed. Hence, having those variables fixed, the inner problem is rewritten as:

$$\begin{aligned} \begin{array}{llll} {\min _{{\omega },\beta ,{\xi }} }&{} {{\omega }^\top {\omega } + {C_+\sum \limits _{i \in I : y_i =+1} \xi _i + C_-\sum \limits _{i \in I : y_i =-1} \xi _i } }\\ {\text{ s.t. }} &{} {y_i \left( {\omega }^\top {x}_i + \beta \right) \ge 1 - \xi _i} &{} {i \in I}\\ &{}y_j\left( \omega ^\top x_j + \beta \right) \ge 1,&{} j \in J : z_j =1 \\ &{}y_j\left( \omega ^\top x_j + \beta \right) \ge 1 - M_1,&{} j \in J : z_j = 0\\ &{} { \xi _i \ge 0} &{} {i \in I }. \end{array} \end{aligned}$$

As $M_1$ is a large number, the fourth constraints always result feasible, so they can be removed. Also, we can denote $\{j \in J : z_j =1\}$ by J(z), obtaining

$$\begin{aligned} \begin{array}{llll} {\min _{{\omega },\beta ,{\xi }} }&{} {{\omega }^\top {\omega } + {C_+\sum \limits _{i \in I : y_i =+1} \xi _i + C_-\sum \limits _{i \in I : y_i =-1} \xi _i } }\\ {\text{ s.t. }} &{} {y_i \left( {\omega }^\top {x}_i + \beta \right) \ge 1 - \xi _i} &{} {i \in I}\\ &{}y_j\left( \omega ^\top x_j + \beta \right) \ge 1,&{} j \in J(z) \\ &{} { \xi _i \ge 0} &{} {i \in I }. \end{array} \end{aligned}$$

Hence, we can build the Lagrangian

$$\begin{aligned} \mathcal {L}(\omega ,\beta ,\xi )= & {} {{\omega }^\top {\omega } + {C_+\sum \limits _{i \in I : y_i =+1} \xi _i + C_-\sum \limits _{i \in I : y_i =-1} \xi _i }} \\&- \sum \limits _{s\in I} \lambda _s (y_s(\omega ^\top x_s+\beta ) -1 + \xi _s) \\&- \sum \limits _{t\in J(z)} \mu _t (y_t(\omega ^\top x_t+\beta ) -1) - \sum _{i' \in I} \delta _{i'} \xi _{i'} \end{aligned}$$

The KKT conditions are, therefore

$$\begin{aligned} \begin{array}{llllll} \dfrac{\partial \mathcal {L}}{\partial \omega } = 0 &{} \Rightarrow &{} {\omega } &{} = &{} \sum \limits _{s \in I} (\lambda _s/2) y_s {x}_s+ \sum \limits _{t \in J(z)} (\mu _t/2) y_t {x}_t \\ \dfrac{\partial \mathcal {L}}{\partial \beta } = 0 &{} \Rightarrow &{} 0 &{} = &{} \sum \limits _{s \in I} \lambda _s y_s + \sum \limits _{t \in J(z)} \mu _t y_t \\ \dfrac{\partial \mathcal {L}}{\partial \xi _i} = 0 &{} \Rightarrow &{} 0 &{} = &{} -\lambda _i -\delta _i + C_+ &{} i \in I:y_i =+1\\ \dfrac{\partial \mathcal {L}}{\partial \xi _i} = 0 &{} \Rightarrow &{} 0 &{} = &{} -\lambda _i -\delta _i + C_- &{} i \in I:y_i =-1\\ &{} &{} 0 &{} \le &{} \lambda _{i} &{} i \in I\\ &{} &{} 0 &{} \le &{} \mu _t &{} t \in J(z)\\ &{} &{} 0 &{} \le &{} \delta _{i} &{} i \in I \end{array} \end{aligned}$$

Note that we can replace, without loss of generality, $\lambda _s/2$, $\mu _t/2$ by $\lambda _s$ and $\mu _t$, respectively. Then, in the condition ${\partial \mathcal {L}}/{\partial \beta } = 0$ we have

$$\begin{aligned} 0 = \sum \limits _{s \in I} 2\lambda _s y_s + \sum \limits _{t \in J(z)} 2\mu _t y_t, \end{aligned}$$

that can be simplified to

$$\begin{aligned} 0 = \sum \limits _{s \in I} \lambda _s y_s + \sum \limits _{t \in J(z)} \mu _t y_t, \end{aligned}$$

as stated. In addition, the condition ${\partial \mathcal {L}}/{\partial \xi _i} = 0$ is transformed into

$$\begin{aligned} 0 = -2\lambda _i - \delta _i + C_+, \quad i \in I:y_i=+1 \end{aligned}$$

and

$$\begin{aligned} 0 = -2\lambda _i - \delta _i + C_-, \quad i \in I:y_i=-1. \end{aligned}$$

Furthermore, since these results must be equivalent to the case if we had maintained the previously removed constraint, we have $\mu _t = 0$ when $z_t=0, \quad t \in J$ and $\mu _t \ge 0$ when $z_t=1, \quad t \in J$. This can be summarized as $0 \le \mu _t \le M_2z_t,\quad t \in J$. Also, as usual, $\delta _i$ is removed since we add

$$\begin{aligned} 0 \le \lambda _i \le C_+/2, \quad i \in I:y_i=+1 \end{aligned}$$

and

$$\begin{aligned} 0 \le \lambda _i \le C_-/2, \quad i \in I:y_i=-1, \end{aligned}$$

as we know that $\delta _i \ge 0$. Therefore, the KKT conditions result:

$$\begin{aligned} \begin{array}{llll} {\omega } &{} = &{} \sum \limits _{s \in I} \lambda _s y_s {x}_s+ \sum \limits _{t \in J} \mu _t y_t {x}_t \\ 0 &{} = &{} \sum \limits _{s \in I} \lambda _s y_s + \sum \limits _{t \in J} \mu _t y_t \\ 0 &{} \le &{} \lambda _s \le C_+/2 &{} s\in I:y_i=+1 \\ 0 &{} \le &{} \lambda _s \le C_-/2 &{} s\in I:y_i=-1 \\ 0 &{} \le &{} \mu _t \le {M_2} z_t&{} t \in J. \end{array} \end{aligned}$$

Note that we have replaced all the J(z) by J using the previous clarification.

Thus, substituting the previous expressions into the second optimization problem, the partial dual of such problem can be calculated, yielding

$$\begin{aligned} \begin{array}{lllll} \min \limits _{{z}} &{} &{} &{} {\min \limits _{{\lambda },{\mu }, \beta , {\xi }} } &{} \left( \sum \limits _{s \in I} \lambda _s y_s {x}_s+ \sum \limits _{t \in J} \mu _t y_t {x}_t \right) ^\top \left( \sum \limits _{s \in I} \lambda _s y_s {x}_s+ \sum \limits _{t \in J} \mu _t y_t {x}_t \right) \\ &{} &{} &{} &{} {+ \, \, C_+\sum \limits _{i \in I : y_i =+1} \xi _i + C_-\sum \limits _{i \in I : y_i =-1} \xi _i } \\ \text{ s.t. } &{} z_j \in \{0,1\} &{} j \in J &{} {\text{ s.t. }} &{} {y_i \left( \left( \sum \limits _{s \in I} \lambda _s y_s {x}_s+ \sum \limits _{t \in J} \mu _t y_t {x}_t\right) ^\top {x}_i + \beta \right) }\\ &{}&{}&{}&{}{\ge 1 - \xi _i} \quad {i \in I}\\ &{} {\hat{p}_\ell \ge {p_{0}^*}_{\ell }} &{} \ell \in L &{} &{} { y_j\left( \left( \sum \limits _{s \in I} \lambda _s y_s {x}_s+ \sum \limits _{t \in J} \mu _t y_t {x}_t\right) ^\top {x}_j + \beta \right) }\\ &{}&{}&{}&{} {\ge 1 -{M_1}(1-z_j)} \quad {j \in J} \\ &{} &{} &{} &{} { \xi _i \ge 0} \quad {i \in I }\\ &{} &{} &{} &{} {\sum \limits _{i \in I} \lambda _i y_i + \sum \limits _{j \in J} \mu _j y_j = 0}\\ &{} &{} &{} &{} {{ 0 \le \lambda _i \le C_+/2} \quad {i \in I:y_i=+1 }}\\ &{} &{} &{} &{} {{ 0 \le \lambda _i \le C_-/2} \quad {i \in I:y_i=-1 }}\\ &{} &{} &{} &{} { 0 \le \mu _j \le {M_2}z_j} \quad {j \in J}. \end{array} \end{aligned}$$

Finally, since this problem only depends on the observation via the inner product, we can use the kernel trick and Problem (CSVM) is obtained.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Benítez-Peña, S., Blanquero, R., Carrizosa, E. et al. On support vector machines under a multiple-cost scenario. Adv Data Anal Classif 13, 663–682 (2019). https://doi.org/10.1007/s11634-018-0330-5

Download citation

Received: 13 July 2017
Revised: 13 July 2018
Accepted: 19 July 2018
Published: 31 July 2018
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s11634-018-0330-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On support vector machines under a multiple-cost scenario

Abstract

Access this article

Similar content being viewed by others

Efficient Optimization of Multi-class Support Vector Machines with MSVMpack

High-dimensional penalized Bernstein support vector classifier

Nonlinear optimization and support vector machines

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix A: Derivation of the CSVM

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

On support vector machines under a multiple-cost scenario

Abstract

Access this article

Similar content being viewed by others

Efficient Optimization of Multi-class Support Vector Machines with MSVMpack

High-dimensional penalized Bernstein support vector classifier

Nonlinear optimization and support vector machines

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix A: Derivation of the CSVM

Appendix A: Derivation of the CSVM

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation