Skip to main content

A Cost Based Reweighted Scheme of Principal Support Vector Machine

Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS,volume 74)

Abstract

Principal Support Vector Machine (PSVM) is a recently proposed method that uses Support Vector Machines to achieve linear and nonlinear sufficient dimension reduction under a unified framework. In this work, a reweighted scheme is used to improve the performance of the algorithm. We present basic theoretical results and demonstrate the effectiveness of the reweighted algorithm through simulations and real data application.

Keywords

  • Support vector machine
  • Sufficient dimension reduction
  • Inverse regression
  • Misclassification penalty
  • Imbalanced data

This is a preview of subscription content, access via your institution.

Buying options

Chapter
EUR   29.95
Price includes VAT (Finland)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR   42.79
Price includes VAT (Finland)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR   54.99
Price includes VAT (Finland)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
EUR   54.99
Price includes VAT (Finland)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions
Fig. 1.1

References

  1. Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013). http://archive.ics.uci.edu/ml

  2. Cook, R.D.: Principal Hessian directions revisited (with discussion). J. Am. Stat. Assoc. 93, 84–100 (1998a)

    CrossRef  MATH  Google Scholar 

  3. Cook, R.D.: Regression Graphics: Ideas for Studying Regressions through Graphics. Wiley, New York (1998b)

    CrossRef  MATH  Google Scholar 

  4. Cook, R.D., Weisberg, S.: Discussion of “Sliced inverse regression for dimension reduction”. J. Am. Stat. Assoc. 86, 316–342 (1991)

    CrossRef  Google Scholar 

  5. Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 1–25 (1995)

    Google Scholar 

  6. Ein-Dor, P., Feldmesser, J.: Attributes of the performance of central processing units: a relative performance prediction model. Commun. ACM 30(4), 308–317 (1987)

    CrossRef  Google Scholar 

  7. Fukumizu, K., Bach, F.R., Jordan, M.I.: Kernel dimension reduction in regression. Ann. Stat. 4, 1871–1905 (2009)

    CrossRef  MathSciNet  Google Scholar 

  8. Lee, K.K., Gunn, S.R., Harris, C.J., Reed, P.A.S.: Classification of imbalanced data with transparent kernels. In: Proceedings of International Joint Conference on Neural Networks (IJCNN ’01), vol. 4, pp. 2410–2415, Washington, D.C. (2001)

    Google Scholar 

  9. Li, K.-C.: Sliced inverse regression for dimension reduction (with discussion). J. Am. Stat. Assoc. 86, 316–342 (1991)

    CrossRef  MATH  Google Scholar 

  10. Li, K.-C.: On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. J. Am. Stat. Assoc. 86, 316–342 (1992)

    CrossRef  Google Scholar 

  11. Li, B., Wang, S.: On directional regression for dimension reduction. J. Am. Stat. Assoc. 102, 997–1008 (2007)

    CrossRef  MATH  Google Scholar 

  12. Li, B., Zha, H., Chiaromonte, F.: Contour regression: a general approach to dimension reduction. Ann. Stat. 33, 1580–1616 (2005)

    CrossRef  MATH  MathSciNet  Google Scholar 

  13. Li, B., Artemiou, A., Li, L.: Principal support vector machine for linear and nonlinear sufficient dimension reduction. Ann. Stat. 39, 3182–3210 (2011)

    CrossRef  MATH  MathSciNet  Google Scholar 

  14. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  15. Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI ’99), Workshop ML3, Stockholm, pp. 55–60

    Google Scholar 

  16. Weisberg, S.: Dimension reduction regression in R. J. Stat. Softw. 7(1) (2002) (Online)

    Google Scholar 

  17. Wu, H.M.: Kernel sliced inverse regression with applications on classification. J. Comput. Graph. Stat. 17, 590–610 (2008)

    CrossRef  Google Scholar 

  18. Yeh, Y.-R., Huang, S.-Y., Lee, Y.-Y.: Nonlinear dimension reduction with Kernel sliced inverse regression. IEEE Trans. Knowl. Data Eng. 21, 1590–1603 (2009)

    CrossRef  Google Scholar 

  19. Zhu, L.X., Miao, B., Peng, H.: On sliced inverse regression with large dimensional covariates. J. Am. Stat. Assoc. 101, 630–643 (2006)

    CrossRef  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

Andreas Artemiou is supported in part by NSF grant DMS-12-07651. The authors would like to help the editors and the referees for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Artemiou .

Editor information

Editors and Affiliations

Appendix

Appendix

Proof of Theorem 1.

Without loss of generality, assume that \(E\boldsymbol{X} =\boldsymbol{ 0}\). First we note that for \(i = 1,-1\)

$$\displaystyle\begin{array}{rcl} E\left (\lambda _{\tilde{Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]^{+}\right ) =& E\{E[\left (\lambda _{\tilde{ Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]^{+}\right )\vert Y,\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}]\}& {}\\ =& E\{E[\left (\lambda _{\tilde{Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]\right )^{+}\vert Y,\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}]\}& {}\\ \end{array}$$

where the last equality holds because \(\lambda _{\tilde{Y }}\) is positive. Since the function aa + is convex, by Jensen’s inequality we have

$$\displaystyle\begin{array}{rcl} E[\left (\lambda _{\tilde{Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]\right )^{+}\vert Y,\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}] \geq & \{E[\lambda _{\tilde{ Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]\vert Y,\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}]\}^{+}& {}\\ =& \{\lambda _{\tilde{Y }}[1 -\tilde{ Y }(E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}) - t)]\}^{+} & {}\\ \end{array}$$

where the equality follows from the model assumption (1.1). Thus

$$\displaystyle\begin{array}{rcl} E\left (\lambda _{\tilde{Y }}[1 -\tilde{ Y }(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X} - t)]^{+}\right ) \geq E& \{\lambda _{\tilde{ Y }}[1 -\tilde{ Y }(E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}) - t)]\}^{+}.&{}\end{array}$$
(1.12)

On the first term now we have:

$$\displaystyle\begin{array}{rcl} \boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{\varSigma }\boldsymbol{\psi } =\mathrm{ var}(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}) =& \mathrm{var}[E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X})] + E[\mathrm{var}(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X})]& \\ \geq & \mathrm{var}[E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X})]. &{}\end{array}$$
(1.13)

Combining (1.12) and (1.13),

$$\displaystyle\begin{array}{rcl} L_{R}(\boldsymbol{\psi },t) \geq \mathrm{ var}[E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X})] + E& \{\lambda _{\tilde{ Y }}[1 -\tilde{ Y }(E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}) - t)]\}^{+}.& {}\\ \end{array}$$

By the definition of the linearity condition in the theorem \(E(\boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{X}\vert \boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{X}) = \boldsymbol{\psi }^{\mathsf{T}}\boldsymbol{P}_{\boldsymbol{\beta }}^{\mathsf{T}}(\boldsymbol{\varSigma })\boldsymbol{X}\) and therefore the right-hand side of the inequality is equal to \(L_{R}(\boldsymbol{P}_{\boldsymbol{\beta }}(\boldsymbol{\varSigma })\boldsymbol{\psi },t)\). If \(\boldsymbol{\psi }\) is not in the CDRS then the inequality is strict which implies \(\boldsymbol{\psi }\) is not the minimizer. □ 

Proof of Theorem 2.

Following the same argument as in Vapnik [14] it can be shown that minimizing (1.9) is equivalent tov

$$\displaystyle\begin{array}{rcl} \begin{array}{ll} &\mbox{ minimizing}\ \ \boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{\zeta } + \frac{1} {n}(\boldsymbol{\lambda }^{{\ast}})^{\mathsf{T}}\boldsymbol{\xi }\ \ \mbox{ over}\ \ (\boldsymbol{\zeta },t,\boldsymbol{\xi }) \\ &\mbox{ subject to}\ \ \ \boldsymbol{\xi } \geq \boldsymbol{ 0},\ \ \tilde{y} \odot (\boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{Z} - t\boldsymbol{1}) \geq \boldsymbol{ 1} -\boldsymbol{\xi }.\end{array} & &{}\end{array}$$
(1.14)

The Lagrangian function of this problem is

$$\displaystyle\begin{array}{rcl} L(\boldsymbol{c},t,\boldsymbol{\xi },\boldsymbol{\alpha },\boldsymbol{\beta }) = \boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{\zeta } + \frac{1} {n}(\boldsymbol{\lambda }^{{\ast}})^{\mathsf{T}}\boldsymbol{\xi } -\boldsymbol{\alpha }^{\mathsf{T}}[\tilde{y} \odot (\boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{Z} - t\boldsymbol{1}) -\boldsymbol{ 1} + \boldsymbol{\xi }] -\boldsymbol{\beta }^{\mathsf{T}}\boldsymbol{\xi }.& &{}\end{array}$$
(1.15)

where \(\boldsymbol{\xi } = (\xi _{1},\ldots,\xi _{n})\). Let \((\boldsymbol{\zeta }^{{\ast}},\boldsymbol{\xi }^{{\ast}},t^{{\ast}})\) be a solution to problem (1.14). Using the Kuhn–Tucker Theorem, one can show that minimizing over \((\boldsymbol{\zeta },t,\boldsymbol{\xi })\) is similar as maximizing over \((\boldsymbol{\alpha },\boldsymbol{\beta })\). So, differentiating with respect to \(\boldsymbol{\zeta }\), t, and \(\boldsymbol{\xi }\) to obtain the system of equations:

$$\displaystyle\begin{array}{rcl} \left \{\begin{array}{@{}l@{\quad }l@{}} \partial L/\partial \boldsymbol{\zeta } = 2\boldsymbol{\zeta } -\boldsymbol{Z}^{\mathsf{T}}(\boldsymbol{\alpha } \odot \tilde{ y}) =\boldsymbol{ 0}\quad \\ \partial L/\partial t = \boldsymbol{\alpha }^{\mathsf{T}}\tilde{y} =\boldsymbol{ 0} \quad \\ \partial L/\partial \boldsymbol{\xi } = \frac{1} {n}\boldsymbol{\lambda }^{{\ast}}-\boldsymbol{\alpha }-\boldsymbol{\beta } =\boldsymbol{ 0}. \quad \end{array} \right.& &{}\end{array}$$
(1.16)

Substitute the last two equations above into (1.15) to obtain

$$\displaystyle\begin{array}{rcl} \boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{\zeta } -\boldsymbol{\alpha }^{\mathsf{T}}[\tilde{y} \odot (\boldsymbol{\zeta }^{\mathsf{T}}\boldsymbol{Z}) -\boldsymbol{ 1}]& &{}\end{array}$$
(1.17)

Now substitute the first equation in (1.16) (\(\boldsymbol{\zeta } = \frac{1} {2}\boldsymbol{Z}^{\mathsf{T}}(\boldsymbol{\alpha } \odot \tilde{ y})\)) in the above:

$$\displaystyle\begin{array}{rcl} \boldsymbol{1}^{\mathsf{T}}\boldsymbol{\alpha } -\frac{1} {4}(\boldsymbol{\alpha } \odot \tilde{ y})^{\mathsf{T}}\boldsymbol{Z}\boldsymbol{Z}^{\mathsf{T}}(\boldsymbol{\alpha } \odot \tilde{ y}).& &{}\end{array}$$
(1.18)

Thus to minimize (1.15) we need to maximize (1.18) over the constraints

$$\displaystyle\begin{array}{rcl} \left \{\begin{array}{@{}l@{\quad }l@{}} \boldsymbol{\alpha }^{\mathsf{T}}\tilde{y} =\boldsymbol{ 0} \quad \\ \frac{1} {n}\boldsymbol{\lambda }^{{\ast}}-\boldsymbol{\alpha }-\boldsymbol{\beta } =\boldsymbol{ 0}.\quad \end{array} \right.& &{}\end{array}$$
(1.19)

which are equivalent to the constraints in (1.10). □ 

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this paper

Cite this paper

Artemiou, A., Shu, M. (2014). A Cost Based Reweighted Scheme of Principal Support Vector Machine. In: Akritas, M., Lahiri, S., Politis, D. (eds) Topics in Nonparametric Statistics. Springer Proceedings in Mathematics & Statistics, vol 74. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0569-0_1

Download citation