Kernel Feature Maps from Arbitrary Distance Metrics

  • Markus SchneiderEmail author
  • Wolfgang Ertel
  • Günther Palm
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9324)


The approximation of kernel functions using explicit feature maps gained a lot of attention in recent years due to the tremendous speed up in training and learning time of kernel-based algorithms, making them applicable to very large-scale problems. For example, approximations based on random Fourier features are an efficient way to create feature maps for a certain class of scale invariant kernel functions. However, there are still many kernels for which there exists no algorithm to derive such maps. In this work we propose an efficient method to create approximate feature maps from an arbitrary distance metric using pseudo line projections called Distance-Based Feature Map (DBFM). We show that our approximation does not depend on the input dataset size or the dimension of the input space. We experimentally evaluate our approach on two real datasets using two metric and one non-metric distance function.


Support Vector Machine Reproduce Kernel Hilbert Space Random Projection Scene Category Hellinger Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Athitsos, V., Alon, J., Sclaroff, S., Kollios, G.: Learning Embeddings for Fast Approximate Nearest Neighbor Retrieval. Nearest-neighbor Methods in Learning and Vision: Theory and Practice, p. 143 (2005)Google Scholar
  2. 2.
    Athitsos, V., Potamias, M., Papapetrou, P., Kollios, G.: Nearest neighbor retrieval using distance-based hashing. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 327–336. IEEE (2008)Google Scholar
  3. 3.
    Bekka, B., de La Harpe, P., Valette, A.: Kazhdan’s property (T). Cambridge University Press (2008)Google Scholar
  4. 4.
    Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–556. ACM (2004)Google Scholar
  5. 5.
    Eiter, T., Mannila, H.: Computing discrete Fréchet distance. Rapport technique num. CD-TR 94, 64 (1994)Google Scholar
  6. 6.
    Faloutsos, C., Lin, K.I.: FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, vol. 24(2), pp. 163–174 (1995)Google Scholar
  7. 7.
    Fukumizu, K., Bach, F.R., Jordan, M.I.: Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. The Journal of Machine Learning Research 5, 73–99 (2004)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Kar, P., Karnick, H.: Random feature maps for dot product kernels. In: International Conference on Artificial Intelligence and Statistics, pp. 583–591 (2012)Google Scholar
  9. 9.
    Kimeldorf, G.S., Wahba, G.: A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. The Annals of Mathematical Statistics, 495–502 (1970)Google Scholar
  10. 10.
    Le, Q., Sarlós, T., Smola, A.: Fastfood: approximating kernel expansions in loglinear time. In: Proceedings of the International Conference on Machine Learning (2013)Google Scholar
  11. 11.
    Li, F., Ionescu, C., Sminchisescu, C.: Random fourier approximations for skewed multiplicative histogram kernels. In: Goesele, M., Roth, S., Kuijper, A., Schiele, B., Schindler, K. (eds.) Pattern Recognition. LNCS, vol. 6376, pp. 262–271. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  12. 12.
    Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7), 971–987 (2002)CrossRefzbMATHGoogle Scholar
  13. 13.
    Ott, L., Pang, L., Ramos, F., Howe, D., Chawla, S.: Integer Programming Relaxations for Integrated Clustering and Outlier Detection. arXiv preprint arXiv:1403.1329 (2014)
  14. 14.
    Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, pp. 1177–1184 (2007)Google Scholar
  15. 15.
    Rahimi, A., Recht, B.: Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning. In: Advances in Neural Information Processing Systems, pp. 1313–1320 (2008)Google Scholar
  16. 16.
    Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press (2005)Google Scholar
  17. 17.
    Schnitzer, D., Flexer, A., Widmer, G.: A fast audio similarity retrieval method for millions of music tracks. Multimedia Tools and Applications 58(1), 23–40 (2012)CrossRefGoogle Scholar
  18. 18.
    Schölkopf, B.: The kernel trick for distances. In: Proceedings of the 2000 Conference on Advances in Neural Information Processing Systems 13, vol. 13, p. 301. MIT Press (2001)Google Scholar
  19. 19.
    Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C., Sch, B.: Estimating the support of a high-dimensional distribution. Neural Computation 13(7), 1443–1471 (2001)CrossRefzbMATHGoogle Scholar
  20. 20.
    Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Hasler, M., Germond, A., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997) Google Scholar
  21. 21.
    Schölkopf, B., Smola, A.J.: Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press (2001)Google Scholar
  22. 22.
    Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal estimated sub-gradient solver for svm. In: Mathematical Programming (2007)Google Scholar
  23. 23.
    Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Statistics and Computing 14(3), 199–222 (2004)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Spearman, C.: The proof and measurement of association between two things. The American Journal of Psychology 15(1), 72–101 (1904)CrossRefGoogle Scholar
  25. 25.
    Steinwart, I.: Sparseness of support vector machines. The Journal of Machine Learning Research 4, 1071–1105 (2003)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Steinwart, I., Christmann, A.: Support vector machines. Springer (2008)Google Scholar
  27. 27.
    Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17(4), 401–419 (1952)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Vedaldi, A., Zisserman, A.: Efficient Additive Kernels via Explicit Feature Maps. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(3), 480–492 (2012)CrossRefGoogle Scholar
  29. 29.
    Williams, C., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Proceedings of the 14th Annual Conference on Neural Information Processing Systems, pp. 682–688. No. EPFL-CONF-161322 (2001)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Markus Schneider
    • 1
    • 2
    Email author
  • Wolfgang Ertel
    • 2
  • Günther Palm
    • 1
  1. 1.Institute of Neural Information ProcessingUniversity of UlmUlmGermany
  2. 2.Institute for Artificial IntelligenceRavensburg-Weingarten University of Applied SciencesWeingartenGermany

Personalised recommendations