The sparse signomial classification and regression model

Lee, Kyungsik; Kim, Norman; Jeong, Myong K.

doi:10.1007/s10479-012-1198-y

The sparse signomial classification and regression model

Published: 15 August 2012

Volume 216, pages 257–286, (2014)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Kyungsik Lee¹,
Norman Kim² &
Myong K. Jeong²

477 Accesses
6 Citations
Explore all metrics

Abstract

Kernel-based methods (KBMs) such as support vector machines (SVMs) are popular data mining tools for solving classification and regression problems. Due to their high prediction accuracy, KBMs have been successfully used in various fields. However, KBMs have three major drawbacks. First, it is not easy to obtain an explicit description of the discrimination (or regression) function in the original input space and to make a variable selection decision in the input space. Second, depending on the magnitude and numeric range of the given data points, the resulting kernel matrices may be ill-conditioned, with the possibility that the learning algorithms will suffer from numerical instability. Although data scaling can generally be applied to deal with this problem and related issues, it may not always be effective. Third, the selection of an appropriate kernel type and its parameters can be a complex undertaking, with the choice greatly affecting the performance of the resulting functions. To overcome these drawbacks, we present here the sparse signomial classification and regression (SSCR) model. SSCR seeks a sparse signomial function by solving a linear program to minimize the weighted sum of the ℓ ₁-norm of the coefficient vector of the function and the ℓ ₁-norm of violation (or loss) caused by the function. SSCR employs the signomial function in the original variables and can therefore explore the nonlinearity in the data. SSCR is also less sensitive to numerical values or numeric ranges of the given data and gives a sparse explicit description of the resulting function in the original input space, which will be useful for the interpretation purpose in terms of which original input variables and/or interaction terms are more meaningful than others. We also present column generation techniques to select important signomial terms in the classification and regression processes and explore a number of theoretical properties of the proposed formulation. Computational studies demonstrate that SSCR is at the very least competitive and can even perform better compared to other widely used learning methods for classification and regression.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Note on Parameter Selection for Support Vector Machines

Kernel classification using a linear programming approach

Article 14 June 2019

A Study on Multi-Scale Kernel Optimisation via Centered Kernel-Target Alignment

Article 18 September 2015

References

Brieman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont: Wadsworth International.
Google Scholar
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2, 121–167.
Article Google Scholar
Chang, C. C., & Lin, C. J. (2001). LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Chapelle, O. (2007). Training a support vector machine in the primal. Neural Computation, 19(5), 1155–1178.
Article Google Scholar
Chen, S.-H., Sun, J., Dimitrov, L., Turner, A. R., Adams, T. S., Meyers, D. A., Chang, B.-L., Zheng, S. L., Gronberg, H., Xu, J., & Hsu, F.-C. (2008). A support vector machine approach for detecting gene-gene interaction. Genetic Epidemiology, 32, 152–167.
Article Google Scholar
Chou, P.-H., Wu, M.-J., & Chen, K.-K. (2010). Integrating support vector machine and genetic algorithm to implement dynamic wafer quality prediction system. Expert Systems with Applications, 37, 4413–4424.
Article Google Scholar
Chvátal, V. (1983). Linear programming. New York: Freeman.
Google Scholar
Fang, Y., Park, J. I., Jeong, Y. S., Jeong, M. K., Baek, S., & Cho, H. (2010). Enhanced predictions of wood properties using hybrid models of PCR and PLS with high-dimensional NIR spectra data. Annals of Operations Research, 190, 3–15.
Article Google Scholar
Frank, M., & Wolfe, P. (1956). An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3, 95–110.
Article Google Scholar
Friedman, J. (1991). Multivariate adaptive regression splines. Annals of Statistics, 19, 1–67.
Article Google Scholar
Glasmachers, T., & Igel, C. (2010). Maximum likelihood model selection for 1-norm soft margin SVMs with multiple parameters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8), 1522–1528.
Article Google Scholar
Gunn, S. R. (1998). Support vector machines for classification and regression. Technical report. School of Electronics and Computer Science, University of Southampton.
Hosmer, D., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York: Wiley.
Book Google Scholar
Huang, K., Zheng, D., King, I., & Lyu, M. R. (2009). Arbitrary norm support vector machines. Neural Computation, 21, 560–582.
Article Google Scholar
Kang, P., Lee, H., Cho, S., Kim, D., Park, J., Park, C.-K., & Doh, S. (2009). A virtual metrology system for semiconductor manufacturing. Expert Systems with Applications, 36, 12554–12561.
Article Google Scholar
Kim, H., & Loh, W. Y. (2001). Classification tree with unbiased multiway splits. Journal of American Statistical Association, 96, 598–604.
Google Scholar
Mangasarian, O. L. (1999). Arbitrary-norm separating plane. Operations Research Letters, 24, 15–23.
Article Google Scholar
Mangasarian, O. L. (2006). Exact 1-norm support vector machines via unconstrained convex differentiable minimization. Journal of Machine Learning Research, 7, 1517–1530.
Google Scholar
Mangasarian, O. L., & Thomson, M. E. (2008). Chunking for massive nonlinear kernel classification. Optimization Methods and Software, 23, 265–274.
Article Google Scholar
MATLAB Statistics Toolbox (2008). http://www.mathworks.com.
Mixture Flexible Discriminant Analysis Package (2009). http://cran.r-project.org/web/packages/mda.
Montgomery, D. C., Peck, E. A., & Vining, G. G. (2006). Introduction to linear regression analysis (4th ed.). New York: Wiley.
Google Scholar
Murphy, P. M., & Aha, D. W. (1992). UCI machine learning repository. www.ics.uci.edu/~mlearn/MLRepository.html.
Nemhauser, G. L., & Wolsey, L. A. (1988). Integer and combinatorial optimization. New York: Wiley.
Google Scholar
Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14, 199–222.
Article Google Scholar
SOCR body density data. http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_BMI_Regression.
Tax, D. M. J., & Duin, R. P. W. (2004). Support vector data description. Machine Learning, 54, 45–66.
Article Google Scholar
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
Book Google Scholar
Veenman, C. J., & Tax, D. M. J. (2005). LESS: a model-based classifier for sparse subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1496–1500.
Article Google Scholar
Wang, S., Jiang, W., & Tsui, K.-L. (2010). Adjusted support vector machines based on a new loss function. Annals of Operations Research, 174, 83–101.
Article Google Scholar
Weston, J., Elisseeff, A., Schölkopf, B., & Tipping, M. (2003). Use of zero-norm with linear models and kernel methods. Journal of Machine Learning Research, 3, 1439–1461.
Google Scholar
Xpress (2009). http://www.fico.com.

Download references

Acknowledgement

The authors thank the editor and anonymous referees for their constructive and helpful comments.

Author information

Authors and Affiliations

Department of Industrial & Management Engineering, Hankuk University of Foreign Studies, San 89 Yongin-si, Kyunggi-do, 449-791, Korea
Kyungsik Lee
RUTCOR, Rutgers University, 640 Bartholomew Rd., Piscataway, NJ, 08854, USA
Norman Kim & Myong K. Jeong

Authors

Kyungsik Lee
View author publications
You can also search for this author in PubMed Google Scholar
Norman Kim
View author publications
You can also search for this author in PubMed Google Scholar
Myong K. Jeong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Norman Kim or Myong K. Jeong.

Additional information

This research for the first author was supported by Hankuk University of Foreign Studies Research Fund.

Appendix

Proof of Theorem 1

The result can be established by showing that the well-known maximum independent set problem, which is known to be \(\mathcal{NP}\)-hard (Nemhauser and Wolsey 1988), is a special case of CGP. For a given undirected simple graph G=(V,E), and an independent set of G is a subset U of V such that (i,j)∉E for every pair of nodes i,j∈U. The problem is to find a maximum cardinality independent set of G.

For each node v∈V, we define an n-dimensional positive vector x _v with components x _vk for all k∈V, such that x _vk=1 for k≠v and x _vk=1/n ³, where n=|V|. We also define, for each edge e∈E, an n-dimensional vector y _e with components y _ek for all k∈V, such that y _ek=1/n ⁶ if k=i or k=j and y _ek=1 otherwise, where e=(i,j). Let S ₁={x _v|v∈V} and let S ₂={y _e|e∈E}. We specify the four parameters of D as d _min=0, d _max=1, T=1, and L=n. The set D is then the set of all n-dimensional binary vectors. We now set a ₁(x _v) for each v∈V to be equal to 1, and set each element of a ₂(y _e) for each e∈E to be n and define a special case of CGP with S ₁ and S ₂ along with a ₁(x _v) for each v∈V, a ₂(y _e) for each e∈E, and the four parameters of D. This special case of CGP is the problem of maximizing the function \(z(\mathbf{d}) = \sum_{i \in V} (1/n^{3})^{d_{i}} - n(\sum_{(i,j) \in E} (1/n^{6})^{d_{i}} (1/n^{6})^{d_{j}})\) over all \(\mathbf{d} \in\mathbb{B}^{n}\).

Now, we show that a maximum independent set U, such that |U|=K exists for some positive integer K<n if and only if \(K \leq\hat{z} < K+1\), where \(\hat{z} = \max\{z(\mathbf{d}) | \mathbf {d} \in\mathbb{B}^{n} \}\). Suppose that we have a maximum independent set U of G such that |U|=K. Then, for the vector d, such that d _i=0 if i∈U and d _j=1 otherwise z(d)≥K+1/n ³(n−K)−n|E|/n ⁶≥K and z(d)≤K+1/n ³(n−K)<K+1. Now, suppose that we have a binary vector d, such that \(K \leq\hat{z} < K+1\) for some positive integer such that K<n. This means that at least one of d _i and d _j should be equal to 1 if and only if (i,j)∈E, otherwise, \(\hat{z} \leq0\). Moreover, the number of zero elements of d is equal to K. If we define U={i∈V|d _i=0}, then U should be an independent set of cardinality K. Therefore, the result follows. □

Proof of Theorem 2

The result can also be established by showing that the maximum independent set problem is a special case of CGP with S ₁=S ₂. For any given instance of the maximum independent set problem, we define the vectors x _v, v∈V and y _e, e∈E as in the proof of Theorem 1. Let S ₁=S ₂={x _v,v∈V}∪{y _e,e∈E}. In addition to a ₁(x _v) for each v∈V and a ₂(y _e) for each e∈E defined in the proof of Theorem 1, we define a ₁(y _e)=0 for each e∈E and a ₂(x _v)=0 for each v∈V. We then can define a special case of CGP with S ₁, S ₂, a ₁, a ₂ along with the four parameters of D specified as those in the proof of Theorem 1. Observe that this special case of CGP with S ₁=S ₂ is the same problem as that discussed in the proof of Theorem 1. Therefore, the result follows. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, K., Kim, N. & Jeong, M.K. The sparse signomial classification and regression model. Ann Oper Res 216, 257–286 (2014). https://doi.org/10.1007/s10479-012-1198-y

Download citation

Published: 15 August 2012
Issue Date: May 2014
DOI: https://doi.org/10.1007/s10479-012-1198-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The sparse signomial classification and regression model

Abstract

Access this article

Similar content being viewed by others

A Note on Parameter Selection for Support Vector Machines

Kernel classification using a linear programming approach

A Study on Multi-Scale Kernel Optimisation via Centered Kernel-Target Alignment

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding authors

Additional information

Appendix

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The sparse signomial classification and regression model

Abstract

Access this article

Similar content being viewed by others

A Note on Parameter Selection for Support Vector Machines

Kernel classification using a linear programming approach

A Study on Multi-Scale Kernel Optimisation via Centered Kernel-Target Alignment

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding authors

Additional information

Appendix

Appendix

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation