Abstract
The goal of this article is to develop a framework for large margin classification in metric spaces. We want to find a generalization of linear decision functions for metric spaces and define a corresponding notion of margin such that the decision function separates the training points with a large margin. It will turn out that using Lipschitz functions as decision functions, the inverse of the Lipschitz constant can be interpreted as the size of a margin. In order to construct a clean mathematical setup we isometrically embed the given metric space into a Banach space and the space of Lipschitz functions into its dual space. Our approach leads to a general large margin algorithm for classification in metric spaces. To analyze this algorithm, we first prove a representer theorem. It states that there exists a solution which can be expressed as linear combination of distances to sets of training points. Then we analyze the Rademacher complexity of some Lipschitz function classes. The generality of the Lipschitz approach can be seen from the fact that several well-known algorithms are special cases of the Lipschitz algorithm, among them the support vector machine, the linear programming machine, and the 1-nearest neighbor classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bartlett, P., Mendelson, S.: Rademacher and Gaussian complexities: Risk bounds and structural results. JMLR 3, 463–482 (2002)
Bousquet, O.: Concentration inequalities and empirical processes theory applied to the analysis of learning algorithms. PhD Thesis (2002)
Dudley, R.M.: Universal Donsker classes and metric entropy. Ann. Probab. 15(4), 1306–1326 (1987)
Graepel, T., Herbrich, R., Schölkopf, B., Smola, A., Bartlett, P., Müller, K., Ober-mayer, K., Williamson, R.: Classification of proximity data with LP machines. In: Proceedings of the 9th International Conference on Artificial Neural Networks, pp. 304–309 (1999)
Kolmogorov, A.N., Tihomirov, V.M.: ε-entropy and ε-capacity of sets in functional space. Amer. Math. Soc. Transl. 17(2), 277–364 (1961)
Schölkopf, B., Smola, A.: Learning with Kernels. Support Vector Machines, Regularization, Optimization and Beyond. MIT press, Cambridge (2002)
Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Computation 12(5), 1207–1245 (2000)
Michael Steele, J.: Probability theory and combinatorial optimization, Philadelphia, PA. CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), vol. 69 (1997)
Talagrand, M.: The Ajtai-Komlos-Tusnady matching theorem for general measures. Progress in Probability 30 (1991)
Weaver, N.: Lipschitz algebras. World Scientific, Singapore (1999)
Yukich, J.: Asymptotics for transportation costs in high dimensions. J. Theor. Probab. 8(1), 97–118 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
von Luxburg, U., Bousquet, O. (2003). Distance-Based Classification with Lipschitz Functions. In: Schölkopf, B., Warmuth, M.K. (eds) Learning Theory and Kernel Machines. Lecture Notes in Computer Science(), vol 2777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45167-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-45167-9_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40720-1
Online ISBN: 978-3-540-45167-9
eBook Packages: Springer Book Archive