Skip to main content

Part of the book series: Applied Optimization ((APOP,volume 50))

Abstract

We examine mathematical models for semi-supervised support vector machines (S3VM). Given a training set of labeled data and a working set of unlabeled data, S3VM constructs a support vector machine using both the training and working sets. We use S3VM to solve the transductive inference problem posed by Vapnik. In transduction, the task is to estimate the value of a classification function at the given points in the working set. This contrasts with inductive inference which estimates the classification function at all possible values. We propose a general S3VM model that minimizes both the misclassification error and the function capacity based on all the available data. Depending on how poorly-estimated unlabeled data are penalized, different mathematical models result. We examine several practical algorithms for solving these model. The first approach utilizes the S3VM model for 1-norm linear support vector machines converted to a mixed-integer program (MIP). A global solution of the MIP is found using a commercial integer programming solver. The second approach uses a nonconvex quadratic program. Variations of block-coordinate-descent algorithms are used to find local solutions of this problem. Using this MIP within a local learning algorithm produced the best results. Our experimental study on these statistical learning methods indicates that incorporating working data can improve generalization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. G. Atkeson, A. W. Moore, and S. Schaal. Locally weighted learning. Artificial Intelligence Review, 11:11–73, 1997.

    Article  Google Scholar 

  2. K. P. Bennett. Global tree optimization: a non-greedy decision tree algorithm. Computing Science and Statistics, 26:156–160, 1994.

    Google Scholar 

  3. K. P. Bennett. Combining support vector and mathematical programming methods for classification. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods — Support Vector Machines, pages 307–326, Cambridge, MA, 1999. MIT Press.

    Google Scholar 

  4. K. P. Bennett and E. J. Bredensteiner. Geometry i learning. Web manuscript, Rensselaer Polytechnic Institute, http://www.rpi.edu/~bennek/geometry2.ps, 1996. Accepted for publication in Geometry at Work, C. Gorini et al, editors, MAA Press.

    Google Scholar 

  5. K. P. Bennett and A. Demiriz. Semi-supervised support vector machines. In D. Cohn M. Kearns, S. Solla, editor, Advances in Neural Information Processing Systems, pages 368–374, Cambridge, MA, 1999. MIT Press.

    Google Scholar 

  6. K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1:23–34, 1992.

    Article  Google Scholar 

  7. K. P. Bennett and O. L. Mangasarian. Bilinear separation in n-space. Computational Optimization and Applications, 4(4):207–227, 1993.

    Article  MathSciNet  Google Scholar 

  8. D. P. Bertsekas. Nonlinear Programming. Aethena Scientific, Cambridge, MA, 1996.

    Google Scholar 

  9. J. Blue. A hybrid of tabu search and local descent algorithms with applications in artificial intelligence. PhD thesis, Rensselaer Polytechnic Institute, Troy, NY, 1998.

    Google Scholar 

  10. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the 1998 Conference on Computational Learning Theory, Madison WI, 1998. ACM Inc.

    Google Scholar 

  11. E. J. Bredensteiner and K. P. Bennett. Feature minimization within decision trees. Computational Optimization and Applications, 10:110–126, 1997.

    MathSciNet  Google Scholar 

  12. V. Castelli and T. M. Cover. On the exponential value of labeled samples. Pattern Recognition Letters, 16:105–111, 1995.

    Article  Google Scholar 

  13. Z. Cataltepe and M. Magdon-Ismail. Incorporating test inputs into learning. In Proceedings of the Advances in Neural Information Processing Systems, 10, Cambridge, MA, 1997. MIT Press.

    Google Scholar 

  14. C. Cortes and V. N. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995.

    MATH  Google Scholar 

  15. CPLEX Optimization Incorporated, Incline Village, Nevada. Using the CPLEX Callable Library, 1994.

    Google Scholar 

  16. R. Fourer, D. Gay, and B. Kernighan. AMPL A Modeling Language for Mathematical Programming. Boyd and Frazer, Danvers, MA, 1993.

    Google Scholar 

  17. T. Hastie and R. Tibshirani. Discriminant adaptive nearest neighbor classification. IEEE PAMI, 18:607–616, 1996.

    Article  Google Scholar 

  18. T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning(ECML), 1998.

    Google Scholar 

  19. T. Joachims. Transductive inference for text classification using support vector machines. In International Conference on Machine Learning, 1999.

    Google Scholar 

  20. S. Lawrence, A. C. Tsoi, and A. D. Back. Function approximation with neural networks and local methods: Bias, variance and smoothness. In Peter Bartlett, Anthony Burkitt, and Robert Williamson, editors, Australian Conference on Neural Networks, ACNN 96, pages 16–21. Australian National University, 1996.

    Google Scholar 

  21. O. L. Mangasarian. Arbitrary norm separating plane. Operations Research Letters, 24(1–2), 1999.

    Google Scholar 

  22. O. L. Mangasarian. Generalized support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 135–146, Cambridge, MA, 2000. MIT Press. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98–14.ps.

    Google Scholar 

  23. A. McCallum and K. Nigam. Employing em and pool-based active learning for text classification. In Proceedings of the 15th International Conference on Machine Learning (ICML-98), 1998.

    Google Scholar 

  24. P.M. Murphy and D.W. Aha. UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, California, 1992.

    Google Scholar 

  25. D. R. Musser and A. Saini. STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library. Addison-Wesley, 1996.

    Google Scholar 

  26. K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Learning to classify text from labeled and unlabeled documents. In Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98), 1998.

    Google Scholar 

  27. S. Odewahn, E. Stockwell, R. Pennington, R Humphreys, and W Zumach. Automated star/galaxy discrimination with neural networks. Astronomical Journal, 103(1):318–331, 1992.

    Article  Google Scholar 

  28. V. N. Vapnik. Estimation of dependencies based on empirical Data. Springer, New York, 1982. English translation, Russian version 1979.

    Google Scholar 

  29. V. N. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, New York, 1995.

    MATH  Google Scholar 

  30. V. N. Vapnik. Statistical Learning Theory. Wiley Inter-Science, 1998.

    MATH  Google Scholar 

  31. V. N. Vapnik and A. Ja. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. In Russian.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Demiriz, A., Bennett, K.P. (2001). Optimization Approaches to Semi-Supervised Learning. In: Ferris, M.C., Mangasarian, O.L., Pang, JS. (eds) Complementarity: Applications, Algorithms and Extensions. Applied Optimization, vol 50. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3279-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-3279-5_6

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-4847-2

  • Online ISBN: 978-1-4757-3279-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics