Continuous Kernel Learning

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9852)


Kernel learning is the problem of determining the best kernel (either from a dictionary of fixed kernels, or from a smooth space of kernel representations) for a given task. In this paper, we describe a new approach to kernel learning that establishes connections between the Fourier-analytic representation of kernels arising out of Bochner’s theorem and a specific kind of feed-forward network using cosine activations. We analyze the complexity of this space of hypotheses and demonstrate empirically that our approach provides scalable kernel learning superior in quality to prior approaches.


Convolutional Neural Network Multiple Kernel Learn Stochastic Gradient Descent Kernel Learning Convolutional Layer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aiolli, F., Donini, M.: EasyMKL: a scalable multiple kernel learning algorithm. Neurocomputing 169, 215–224 (2015)CrossRefGoogle Scholar
  2. 2.
    Aslan, O., Zhang, X., Schuurmans, D.: Convex deep learning via normalized kernels. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) NIPS, pp. 3275–3283. Curran Associates, Inc. (2014)Google Scholar
  3. 3.
    Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernel learning, conic duality, and the SMO algorithm. In: ICML, Banff, Canada (2004)Google Scholar
  4. 4.
    Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian complexities: risk bounds and structural results. JMLR 3, 463–482 (2003)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Bell, A.J., Sejnowski, T.J.: Edges are the ‘Independent Components’ of natural scenes. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) NIPS, pp. 831–837. MIT Press (1997)Google Scholar
  6. 6.
    Bertin-Mahieux, T., Ellis, D.P.W., Whitman, B., Lamere, P.: The million song dataset. In: ISMIR (2011)Google Scholar
  7. 7.
    Bo, L., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. In: CVPR, pp. 1729–1736, June 2011Google Scholar
  8. 8.
    Bo, L., Ren, X., Fox, D.: Kernel descriptors for visual recognition. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) NIPS, pp. 244–252. Curran Associates, Inc. (2010)Google Scholar
  9. 9.
    Bochner, S.: Lectures on Fourier Integrals. Annals of Mathematics Studies, vol. 42. Princeton University Press, Princeton (1959)CrossRefzbMATHGoogle Scholar
  10. 10.
    Băzăvan, E.G., Li, F., Sminchisescu, C.: Fourier kernel learning. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 459–473. Springer, Heidelberg (2012)Google Scholar
  11. 11.
    Cho, Y., Saul, L.K.: Kernel methods for deep learning. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) NIPS, pp. 342–350. Curran Associates, Inc. (2009)Google Scholar
  12. 12.
    Coates, A., Ng, A.Y., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: AIStats, pp. 215–223 (2011)Google Scholar
  13. 13.
    Cortes, C.: Invited talk: can learning kernels help performance? In: ICML, Montreal, Canada (2009)Google Scholar
  14. 14.
    Cortes, C., Mohri, M., Rostamizadeh, A.: Learning non-linear combinations of kernels. In: NIPS, Vancouver, Canada (2009)Google Scholar
  15. 15.
    Gallant, A., White, H.: There exists a neural network that does not make avoidable mistakes. In: ICNN, vol. 1, pp. 657–664, July 1988Google Scholar
  16. 16.
    Globerson, A., Livni, R.: Learning infinite-layer networks: beyond the kernel trick. arXiv:1606.05316 [cs], June 2016
  17. 17.
    Gönen, M., Alpaydın, E.: Localized multiple kernel learning. In: ICML, Helsinki, Finland (2008)Google Scholar
  18. 18.
    Gönen, M., Alpaydın, E.: Localized algorithms for multiple kernel learning. Pattern Recogn. 46(3), 795–807 (2013)CrossRefzbMATHGoogle Scholar
  19. 19.
    Goodfellow, I.J., Warde-Farley, D., Lamblin, P., Dumoulin, V., Mirza, M., Pascanu, R., Bergstra, J., Bastien, F., Bengio, Y.: Pylearn2: a machine learning research library. arXiv:1308.4214 [cs, stat], August 2013
  20. 20.
    Har-Peled, S.: Geometric Approximation Algorithms. American Mathematical Society, Boston (2011)CrossRefzbMATHGoogle Scholar
  21. 21.
    Hazan, T., Jaakkola, T.: Steps toward deep kernel methods from infinite neural networks. arXiv:1508.05133 [cs], August 2015
  22. 22.
    Jain, A., Vishwanathan, S.V.N., Varma, M.: SPG-GMKL: generalized multiple kernel learning with a million kernels. In: KDD, pp. 750–758 (2012)Google Scholar
  23. 23.
    Jiu, M., Sahbi, H.: Deep kernel map networks for image annotation. In: ICASSP, pp. 1571–1575, March 2016Google Scholar
  24. 24.
    Jiu, M., Sahbi, H.: Laplacian deep kernel learning for image annotation. In: ICASSP, pp. 1551–1555, March 2016Google Scholar
  25. 25.
    Kloft, M., Brefeld, U., Sonnenburg, S., Laskov, P., Müller, K.R., Zien, A.: Efficient and accurate Lp-norm multiple kernel learning. In: NIPS, Vancouver, Canada (2009)Google Scholar
  26. 26.
    Kloft, M., Brefeld, U., Sonnenburg, S., Zien, A.: Lp-norm multiple kernel learning. JMLR 12, 953–997 (2011)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Krizhevsky, A.: Learning Multiple Layers of Features from Tiny Images. Citeseer (2009)Google Scholar
  28. 28.
    Lanckriet, G.R.G., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. JMLR 5, 27–72 (2004)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Le, Q., Sarlos, T., Smola, A.: Fastfood - computing Hilbert space expansions in loglinear time. In: ICML, pp. 244–252 (2013)Google Scholar
  30. 30.
    LeCun, Y.: Generalization and network design strategies. In: Pfeifer, R., Schreter, Z., Fogelman, F., Steels, L. (eds.) Connectionism in Perspective. Elsevier, Zurich (1989). An extended version was published as a technical report of the University of TorontoGoogle Scholar
  31. 31.
    Lu, Z., May, A., Liu, K., Garakani, A.B., Guo, D., Bellet, A., Fan, L., Collins, M., Kingsbury, B., Picheny, M., Sha, F.: How to scale up kernel methods to be as good as deep neural nets. arXiv:1411.4000 [cs, stat], November 2014
  32. 32.
    Mairal, J., Koniusz, P., Harchaoui, Z., Schmid, C.: Convolutional kernel networks. In: NIPS, pp. 2627–2635 (2014)Google Scholar
  33. 33.
    Micchelli, C.A., Pontil, M.: Learning the kernel function via regularization. JMLR 6, 1099–1125 (2005)MathSciNetzbMATHGoogle Scholar
  34. 34.
    Moeller, J., Raman, P., Venkatasubramanian, S., Saha, A.: A geometric algorithm for scalable multiple kernel learning. In: AIStats, pp. 633–642 (2014)Google Scholar
  35. 35.
    Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)CrossRefzbMATHGoogle Scholar
  36. 36.
    Neal, R.M.: Priors for infinite networks. Bayesian Learning for Neural Networks. Lecture Notes in Statistics, vol. 118, pp. 29–53. Springer, New York (1996)CrossRefGoogle Scholar
  37. 37.
    Oliva, J., Dubey, A., Poczos, B., Schneider, J., Xing, E.P.: Bayesian Nonparametric Kernel-learning. arXiv:1506.08776 [stat], June 2015
  38. 38.
    Ong, C.S., Smola, A.J., Williamson, R.C.: Learning the kernel with hyperkernels. JMLR 6, 1043–1071 (2005)MathSciNetzbMATHGoogle Scholar
  39. 39.
    Orabona, F., Luo, J.: Ultra-fast optimization algorithm for sparse multi kernel learning. In: ICML, Bellevue, USA (2011)Google Scholar
  40. 40.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. JMLR 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  41. 41.
    Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: NIPS, pp. 1177–1184 (2007)Google Scholar
  42. 42.
    Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: More efficiency in multiple kernel learning. In: ICML, Corvalis, USA (2007)Google Scholar
  43. 43.
    Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. JMLR 7, 1531–1565 (2006)MathSciNetzbMATHGoogle Scholar
  44. 44.
    Varma, M., Babu, B.R.: More generality in efficient multiple kernel learning. In: ICML, Montreal, Canada (2009)Google Scholar
  45. 45.
    Vishwanathan, S.V.N., Sun, Z., Ampornpunt, N., Varma, M.: Multiple kernel learning and the SMO algorithm. In: NIPS, Vancouver, Canada (2010)Google Scholar
  46. 46.
    Wilson, A., Adams, R.: Gaussian process kernels for pattern discovery and extrapolation. In: ICML, pp. 1067–1075 (2013)Google Scholar
  47. 47.
    Wilson, A.G., Hu, Z., Salakhutdinov, R., Xing, E.P.: Deep Kernel Learning. arXiv:1511.02222 [cs, stat], November 2015
  48. 48.
    Xu, Z., Jin, R., King, I., Lyu, M.R.: An extended level method for efficient multiple kernel learning. In: NIPS, Vancouver, Canada (2008)Google Scholar
  49. 49.
    Xu, Z., Jin, R., Yang, H., King, I., Lyu, M.R.: Simple and efficient multiple kernel learning by group lasso. In: ICML, Haifa, Israel (2010)Google Scholar
  50. 50.
    Yang, Z., Moczulski, M., Denil, M., de Freitas, N., Smola, A., Song, L., Wang, Z.: Deep Fried Convnets. arXiv: 1412.7149, December 2014
  51. 51.
    Yang, Z., Wilson, A., Smola, A., Song, L.: À la Carte - learning fast kernels. In: AIStats, pp. 1098–1106 (2015)Google Scholar
  52. 52.
    Yu, F.X., Kumar, S., Rowley, H., Chang, S.F.: Compact Nonlinear Maps and Circulant Extensions. arXiv:1503.03893 [cs, stat], March 2015
  53. 53.
    Zhuang, J., Tsang, I.W., Hoi, S.: Two-layer multiple kernel learning. In: AIStats, pp. 909–917 (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.School of ComputingUniversity of UtahSalt Lake CityUSA

Personalised recommendations