Abstract
We examine mathematical models for semi-supervised support vector machines (S3VM). Given a training set of labeled data and a working set of unlabeled data, S3VM constructs a support vector machine using both the training and working sets. We use S3VM to solve the transductive inference problem posed by Vapnik. In transduction, the task is to estimate the value of a classification function at the given points in the working set. This contrasts with inductive inference which estimates the classification function at all possible values. We propose a general S3VM model that minimizes both the misclassification error and the function capacity based on all the available data. Depending on how poorly-estimated unlabeled data are penalized, different mathematical models result. We examine several practical algorithms for solving these model. The first approach utilizes the S3VM model for 1-norm linear support vector machines converted to a mixed-integer program (MIP). A global solution of the MIP is found using a commercial integer programming solver. The second approach uses a nonconvex quadratic program. Variations of block-coordinate-descent algorithms are used to find local solutions of this problem. Using this MIP within a local learning algorithm produced the best results. Our experimental study on these statistical learning methods indicates that incorporating working data can improve generalization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
C. G. Atkeson, A. W. Moore, and S. Schaal. Locally weighted learning. Artificial Intelligence Review, 11:11–73, 1997.
K. P. Bennett. Global tree optimization: a non-greedy decision tree algorithm. Computing Science and Statistics, 26:156–160, 1994.
K. P. Bennett. Combining support vector and mathematical programming methods for classification. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods — Support Vector Machines, pages 307–326, Cambridge, MA, 1999. MIT Press.
K. P. Bennett and E. J. Bredensteiner. Geometry i learning. Web manuscript, Rensselaer Polytechnic Institute, http://www.rpi.edu/~bennek/geometry2.ps, 1996. Accepted for publication in Geometry at Work, C. Gorini et al, editors, MAA Press.
K. P. Bennett and A. Demiriz. Semi-supervised support vector machines. In D. Cohn M. Kearns, S. Solla, editor, Advances in Neural Information Processing Systems, pages 368–374, Cambridge, MA, 1999. MIT Press.
K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1:23–34, 1992.
K. P. Bennett and O. L. Mangasarian. Bilinear separation in n-space. Computational Optimization and Applications, 4(4):207–227, 1993.
D. P. Bertsekas. Nonlinear Programming. Aethena Scientific, Cambridge, MA, 1996.
J. Blue. A hybrid of tabu search and local descent algorithms with applications in artificial intelligence. PhD thesis, Rensselaer Polytechnic Institute, Troy, NY, 1998.
A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the 1998 Conference on Computational Learning Theory, Madison WI, 1998. ACM Inc.
E. J. Bredensteiner and K. P. Bennett. Feature minimization within decision trees. Computational Optimization and Applications, 10:110–126, 1997.
V. Castelli and T. M. Cover. On the exponential value of labeled samples. Pattern Recognition Letters, 16:105–111, 1995.
Z. Cataltepe and M. Magdon-Ismail. Incorporating test inputs into learning. In Proceedings of the Advances in Neural Information Processing Systems, 10, Cambridge, MA, 1997. MIT Press.
C. Cortes and V. N. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995.
CPLEX Optimization Incorporated, Incline Village, Nevada. Using the CPLEX Callable Library, 1994.
R. Fourer, D. Gay, and B. Kernighan. AMPL A Modeling Language for Mathematical Programming. Boyd and Frazer, Danvers, MA, 1993.
T. Hastie and R. Tibshirani. Discriminant adaptive nearest neighbor classification. IEEE PAMI, 18:607–616, 1996.
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning(ECML), 1998.
T. Joachims. Transductive inference for text classification using support vector machines. In International Conference on Machine Learning, 1999.
S. Lawrence, A. C. Tsoi, and A. D. Back. Function approximation with neural networks and local methods: Bias, variance and smoothness. In Peter Bartlett, Anthony Burkitt, and Robert Williamson, editors, Australian Conference on Neural Networks, ACNN 96, pages 16–21. Australian National University, 1996.
O. L. Mangasarian. Arbitrary norm separating plane. Operations Research Letters, 24(1–2), 1999.
O. L. Mangasarian. Generalized support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 135–146, Cambridge, MA, 2000. MIT Press. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98–14.ps.
A. McCallum and K. Nigam. Employing em and pool-based active learning for text classification. In Proceedings of the 15th International Conference on Machine Learning (ICML-98), 1998.
P.M. Murphy and D.W. Aha. UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, California, 1992.
D. R. Musser and A. Saini. STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library. Addison-Wesley, 1996.
K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Learning to classify text from labeled and unlabeled documents. In Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98), 1998.
S. Odewahn, E. Stockwell, R. Pennington, R Humphreys, and W Zumach. Automated star/galaxy discrimination with neural networks. Astronomical Journal, 103(1):318–331, 1992.
V. N. Vapnik. Estimation of dependencies based on empirical Data. Springer, New York, 1982. English translation, Russian version 1979.
V. N. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, New York, 1995.
V. N. Vapnik. Statistical Learning Theory. Wiley Inter-Science, 1998.
V. N. Vapnik and A. Ja. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. In Russian.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Demiriz, A., Bennett, K.P. (2001). Optimization Approaches to Semi-Supervised Learning. In: Ferris, M.C., Mangasarian, O.L., Pang, JS. (eds) Complementarity: Applications, Algorithms and Extensions. Applied Optimization, vol 50. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3279-5_6
Download citation
DOI: https://doi.org/10.1007/978-1-4757-3279-5_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4847-2
Online ISBN: 978-1-4757-3279-5
eBook Packages: Springer Book Archive