Skip to main content
Log in

The sparse signomial classification and regression model

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Kernel-based methods (KBMs) such as support vector machines (SVMs) are popular data mining tools for solving classification and regression problems. Due to their high prediction accuracy, KBMs have been successfully used in various fields. However, KBMs have three major drawbacks. First, it is not easy to obtain an explicit description of the discrimination (or regression) function in the original input space and to make a variable selection decision in the input space. Second, depending on the magnitude and numeric range of the given data points, the resulting kernel matrices may be ill-conditioned, with the possibility that the learning algorithms will suffer from numerical instability. Although data scaling can generally be applied to deal with this problem and related issues, it may not always be effective. Third, the selection of an appropriate kernel type and its parameters can be a complex undertaking, with the choice greatly affecting the performance of the resulting functions. To overcome these drawbacks, we present here the sparse signomial classification and regression (SSCR) model. SSCR seeks a sparse signomial function by solving a linear program to minimize the weighted sum of the 1-norm of the coefficient vector of the function and the 1-norm of violation (or loss) caused by the function. SSCR employs the signomial function in the original variables and can therefore explore the nonlinearity in the data. SSCR is also less sensitive to numerical values or numeric ranges of the given data and gives a sparse explicit description of the resulting function in the original input space, which will be useful for the interpretation purpose in terms of which original input variables and/or interaction terms are more meaningful than others. We also present column generation techniques to select important signomial terms in the classification and regression processes and explore a number of theoretical properties of the proposed formulation. Computational studies demonstrate that SSCR is at the very least competitive and can even perform better compared to other widely used learning methods for classification and regression.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Algorithm 3

Similar content being viewed by others

References

  • Brieman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont: Wadsworth International.

    Google Scholar 

  • Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2, 121–167.

    Article  Google Scholar 

  • Chang, C. C., & Lin, C. J. (2001). LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.

  • Chapelle, O. (2007). Training a support vector machine in the primal. Neural Computation, 19(5), 1155–1178.

    Article  Google Scholar 

  • Chen, S.-H., Sun, J., Dimitrov, L., Turner, A. R., Adams, T. S., Meyers, D. A., Chang, B.-L., Zheng, S. L., Gronberg, H., Xu, J., & Hsu, F.-C. (2008). A support vector machine approach for detecting gene-gene interaction. Genetic Epidemiology, 32, 152–167.

    Article  Google Scholar 

  • Chou, P.-H., Wu, M.-J., & Chen, K.-K. (2010). Integrating support vector machine and genetic algorithm to implement dynamic wafer quality prediction system. Expert Systems with Applications, 37, 4413–4424.

    Article  Google Scholar 

  • Chvátal, V. (1983). Linear programming. New York: Freeman.

    Google Scholar 

  • Fang, Y., Park, J. I., Jeong, Y. S., Jeong, M. K., Baek, S., & Cho, H. (2010). Enhanced predictions of wood properties using hybrid models of PCR and PLS with high-dimensional NIR spectra data. Annals of Operations Research, 190, 3–15.

    Article  Google Scholar 

  • Frank, M., & Wolfe, P. (1956). An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3, 95–110.

    Article  Google Scholar 

  • Friedman, J. (1991). Multivariate adaptive regression splines. Annals of Statistics, 19, 1–67.

    Article  Google Scholar 

  • Glasmachers, T., & Igel, C. (2010). Maximum likelihood model selection for 1-norm soft margin SVMs with multiple parameters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8), 1522–1528.

    Article  Google Scholar 

  • Gunn, S. R. (1998). Support vector machines for classification and regression. Technical report. School of Electronics and Computer Science, University of Southampton.

  • Hosmer, D., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York: Wiley.

    Book  Google Scholar 

  • Huang, K., Zheng, D., King, I., & Lyu, M. R. (2009). Arbitrary norm support vector machines. Neural Computation, 21, 560–582.

    Article  Google Scholar 

  • Kang, P., Lee, H., Cho, S., Kim, D., Park, J., Park, C.-K., & Doh, S. (2009). A virtual metrology system for semiconductor manufacturing. Expert Systems with Applications, 36, 12554–12561.

    Article  Google Scholar 

  • Kim, H., & Loh, W. Y. (2001). Classification tree with unbiased multiway splits. Journal of American Statistical Association, 96, 598–604.

    Google Scholar 

  • Mangasarian, O. L. (1999). Arbitrary-norm separating plane. Operations Research Letters, 24, 15–23.

    Article  Google Scholar 

  • Mangasarian, O. L. (2006). Exact 1-norm support vector machines via unconstrained convex differentiable minimization. Journal of Machine Learning Research, 7, 1517–1530.

    Google Scholar 

  • Mangasarian, O. L., & Thomson, M. E. (2008). Chunking for massive nonlinear kernel classification. Optimization Methods and Software, 23, 265–274.

    Article  Google Scholar 

  • MATLAB Statistics Toolbox (2008). http://www.mathworks.com.

  • Mixture Flexible Discriminant Analysis Package (2009). http://cran.r-project.org/web/packages/mda.

  • Montgomery, D. C., Peck, E. A., & Vining, G. G. (2006). Introduction to linear regression analysis (4th ed.). New York: Wiley.

    Google Scholar 

  • Murphy, P. M., & Aha, D. W. (1992). UCI machine learning repository. www.ics.uci.edu/~mlearn/MLRepository.html.

  • Nemhauser, G. L., & Wolsey, L. A. (1988). Integer and combinatorial optimization. New York: Wiley.

    Google Scholar 

  • Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14, 199–222.

    Article  Google Scholar 

  • SOCR body density data. http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_BMI_Regression.

  • Tax, D. M. J., & Duin, R. P. W. (2004). Support vector data description. Machine Learning, 54, 45–66.

    Article  Google Scholar 

  • Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.

    Book  Google Scholar 

  • Veenman, C. J., & Tax, D. M. J. (2005). LESS: a model-based classifier for sparse subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1496–1500.

    Article  Google Scholar 

  • Wang, S., Jiang, W., & Tsui, K.-L. (2010). Adjusted support vector machines based on a new loss function. Annals of Operations Research, 174, 83–101.

    Article  Google Scholar 

  • Weston, J., Elisseeff, A., Schölkopf, B., & Tipping, M. (2003). Use of zero-norm with linear models and kernel methods. Journal of Machine Learning Research, 3, 1439–1461.

    Google Scholar 

  • Xpress (2009). http://www.fico.com.

Download references

Acknowledgement

The authors thank the editor and anonymous referees for their constructive and helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Norman Kim or Myong K. Jeong.

Additional information

This research for the first author was supported by Hankuk University of Foreign Studies Research Fund.

Appendix

Appendix

Proof of Theorem 1

The result can be established by showing that the well-known maximum independent set problem, which is known to be \(\mathcal{NP}\)-hard (Nemhauser and Wolsey 1988), is a special case of CGP. For a given undirected simple graph G=(V,E), and an independent set of G is a subset U of V such that (i,j)∉E for every pair of nodes i,jU. The problem is to find a maximum cardinality independent set of G.

For each node vV, we define an n-dimensional positive vector x v with components x vk for all kV, such that x vk =1 for kv and x vk =1/n 3, where n=|V|. We also define, for each edge eE, an n-dimensional vector y e with components y ek for all kV, such that y ek =1/n 6 if k=i or k=j and y ek =1 otherwise, where e=(i,j). Let S 1={x v |vV} and let S 2={y e |eE}. We specify the four parameters of D as d min =0, d max =1, T=1, and L=n. The set D is then the set of all n-dimensional binary vectors. We now set a 1(x v ) for each vV to be equal to 1, and set each element of a 2(y e ) for each eE to be n and define a special case of CGP with S 1 and S 2 along with a 1(x v ) for each vV, a 2(y e ) for each eE, and the four parameters of D. This special case of CGP is the problem of maximizing the function \(z(\mathbf{d}) = \sum_{i \in V} (1/n^{3})^{d_{i}} - n(\sum_{(i,j) \in E} (1/n^{6})^{d_{i}} (1/n^{6})^{d_{j}})\) over all \(\mathbf{d} \in\mathbb{B}^{n}\).

Now, we show that a maximum independent set U, such that |U|=K exists for some positive integer K<n if and only if \(K \leq\hat{z} < K+1\), where \(\hat{z} = \max\{z(\mathbf{d}) | \mathbf {d} \in\mathbb{B}^{n} \}\). Suppose that we have a maximum independent set U of G such that |U|=K. Then, for the vector d, such that d i =0 if iU and d j =1 otherwise z(d)≥K+1/n 3(nK)−n|E|/n 6K and z(d)≤K+1/n 3(nK)<K+1. Now, suppose that we have a binary vector d, such that \(K \leq\hat{z} < K+1\) for some positive integer such that K<n. This means that at least one of d i and d j should be equal to 1 if and only if (i,j)∈E, otherwise, \(\hat{z} \leq0\). Moreover, the number of zero elements of d is equal to K. If we define U={iV|d i =0}, then U should be an independent set of cardinality K. Therefore, the result follows. □

Proof of Theorem 2

The result can also be established by showing that the maximum independent set problem is a special case of CGP with S 1=S 2. For any given instance of the maximum independent set problem, we define the vectors x v , vV and y e , eE as in the proof of Theorem 1. Let S 1=S 2={x v ,vV}∪{y e ,eE}. In addition to a 1(x v ) for each vV and a 2(y e ) for each eE defined in the proof of Theorem 1, we define a 1(y e )=0 for each eE and a 2(x v )=0 for each vV. We then can define a special case of CGP with S 1, S 2, a 1, a 2 along with the four parameters of D specified as those in the proof of Theorem 1. Observe that this special case of CGP with S 1=S 2 is the same problem as that discussed in the proof of Theorem 1. Therefore, the result follows. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, K., Kim, N. & Jeong, M.K. The sparse signomial classification and regression model. Ann Oper Res 216, 257–286 (2014). https://doi.org/10.1007/s10479-012-1198-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-012-1198-y

Keywords

Navigation