Abstract
Kernel selection is critical to kernel methods. Approximate kernel selection is an emerging approach to alleviating the computational burdens of kernel selection by introducing kernel matrix approximation. Theoretical problems faced by approximate kernel selection are how kernel matrix approximation impacts kernel selection and whether this impact can be ignored for large enough examples. In this paper, we introduce the notion of approximate consistency for kernel matrix approximation algorithm to tackle the theoretical problems and establish the preliminary foundations of approximate kernel selection. By analyzing the approximate consistency of kernel matrix approximation algorithms, we can answer the question that, under what conditions, and how, the approximate kernel selection criterion converges to the accurate one. Taking two kernel selection criteria as examples, we analyze the approximate consistency of Nyström approximation and multilevel circulant matrix approximation. Finally, we empirically verify our theoretical findings.
Chapter PDF
Similar content being viewed by others
Keywords
- Little Square Support Vector Machine
- Kernel Matrix
- Reproduce Kernel Hilbert Space
- Circulant Matrix
- Machine Learn Research
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2004)
Bach, F.: Sharp analysis of low-rank kernel matrix approximations. In: Proceedings of the 26th Annual Conference on Learning Theory (COLT), pp. 185–209 (2013)
Bartlett, P.L., Boucheron, S., Lugosi, G.: Model selection and error estimation. Machine Learning 48(1–3), 85–113 (2002)
Bartlett, P., Mendelson, S.: Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3, 463–482 (2002)
Cawley, G., Talbot, N.: On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research 11, 2079–2107 (2010)
Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Machine Learning 46(1-3), 131–159 (2002)
Cortes, C., Mohri, M., Talwalkar, A.: On the impact of kernel approximation on learning accuracy. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 113–120 (2010)
De Vito, E., Caponnetto, A., Rosasco, L.: Model selection for regularized least-squares algorithm in learning theory. Foundations of Computational Mathematics 5(1), 59–85 (2005)
Ding, L.Z., Liao, S.Z.: Approximate model selection for large scale LSSVM. Journal of Machine Learning Research - Proceedings Track 20, 165–180 (2011)
Ding, L., Liao, S.: Nyström approximate model selection for LSSVM. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012, Part I. LNCS (LNAI), vol. 7301, pp. 282–293. Springer, Heidelberg (2012)
Drineas, P., Mahoney, M.W.: On the Nyström method for approximating a Gram matrix for improved kernel-based learning. Journal of Machine Learning Research 6, 2153–2175 (2005)
Fine, S., Scheinberg, K.: Efficient SVM training using low-rank kernel representations. Journal of Machine Learning Research 2, 243–264 (2002)
Gittens, A., Mahoney, M.W.: Revisiting the Nyström method for improved large-scale machine learning. In: Proceedings of the 30th International Conference on Machine Learning (ICML), pp. 567–575 (2013)
Golub, G.H., Heath, M., Wahba, G.: Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2), 215–223 (1979)
Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: The entire regularization path for the support vector machine. Journal of Machine Learning Research 5, 1391–1415 (2004)
Jin, R., Yang, T.B., Mahdavi, M., Li, Y.F., Zhou, Z.H.: Improved bounds for the Nyström method with application to kernel classification. IEEE Transactions on Information Theory 5(10), 6939–6949 (2013)
Kumar, S., Mohri, M., Talwalkar, A.: Sampling methods for the Nyström method. Journal of Machine Learning Research 13, 981–1006 (2012)
Liu, Y., Jiang, S., Liao, S.: Efficient approximation of cross-validation for kernel methods using Bouligand influence function. In: Proceedings of the 31st International Conference on Machine Learning (ICML), pp. 324–332 (2014)
Luxburg, U.V., Bousquet, O., Schölkopf, B.: A compression approach to support vector model selection. Journal of Machine Learning Research 5, 293–323 (2004)
Micchelli, C.A., Pontil, M.: Learning the kernel function via regularization. Journal of Machine Learning Research 6, 1099–1125 (2005)
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997)
Smola, A.J., Schölkopf, B.: Sparse greedy matrix approximation for machine learning. In: Proceedings of the 17th International Conference on Machine Learning (ICML), pp. 911–918 (2000)
Song, G.H.: Approximation of kernel matrices in machine learning. Ph.D. thesis, Syracuse University, Syracuse, NY, USA (2010)
Song, G.H., Xu, Y.S.: Approximation of high-dimensional kernel matrices by multilevel circulant matrices. Journal of Complexity 26(4), 375–405 (2010)
Tyrtyshnikov, E.E.: A unifying approach to some old and new theorems on distribution and clustering. Linear Algebra and its Applications 232, 1–43 (1996)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Wahba, G., Lin, Y., Zhang, H.: GACV for support vector machines. In: Advances in Large Margin Classifiers. MIT Press, Cambridge (1999)
Wang, S.S., Zhang, Z.H.: Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling. Journal of Machine Learning Research 14, 2729–2769 (2013)
Williams, C.K.I., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Advances in Neural Information Processing Systems 13, pp. 682–688 (2001)
Yang, T.B., Li, Y.F., Mahdavi, M., Jin, R., Zhou, Z.H.: Nyström method vs random Fourier features: A theoretical and empirical comparison. In: Advances in Neural Information Processing Systems 24, pp. 1060–1068 (2012)
Zhang, K., Kwok, J.T.: Clustered Nyström method for large scale manifold learning and dimension reduction. IEEE Transactions on Neural Networks 21(10), 1576–1587 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ding, L., Liao, S. (2014). Approximate Consistency: Towards Foundations of Approximate Kernel Selection. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44848-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-662-44848-9_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44847-2
Online ISBN: 978-3-662-44848-9
eBook Packages: Computer ScienceComputer Science (R0)