Finding Small Sets of Random Fourier Features for Shift-Invariant Kernel Approximation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9896)

Abstract

Kernel based learning is very popular in machine learning, but many classical methods have at least quadratic runtime complexity. Random fourier features are very effective to approximate shift-invariant kernels by an explicit kernel expansion. This permits to use efficient linear models with much lower runtime complexity. As one key approach to kernelize algorithms with linear models they are successfully used in different methods. However, the number of features needed to approximate the kernel is in general still quite large with substantial memory and runtime costs. Here, we propose a simple test to identify a small set of random fourier features with linear costs, substantially reducing the number of generated features for low rank kernel matrices, while widely keeping the same representation accuracy. We also provide generalization bounds for the proposed approach.

Notes

Acknowledgment

Marie Curie Intra-European Fellowship (IEF): FP7-PEOPLE-2012-IEF (FP7-327791-ProMoS) is greatly acknowledged.

References

  1. 1.
    Chitta, R., Jin, R., Jain, A.K.: Efficient kernel clustering using random Fourier features. In: 12th IEEE International Conference on Data Mining, ICDM, pp. 161–170. IEEE (2012)Google Scholar
  2. 2.
    Villmann, T., Haase, S., Kaden, M.: Kernelized vector quantization in gradient-descent learning. Neurocomputing 147, 83–95 (2015)CrossRefGoogle Scholar
  3. 3.
    Schleif, F.-M., Villmann, T., Hammer, B., Schneider, P.: Efficient kernelized prototype-based classification. J. Neural Syst. 21(6), 443–457 (2011)CrossRefGoogle Scholar
  4. 4.
    Hofmann, D., Schleif, F.-M., Hammer, B.: Learning interpretable kernelized prototype-based models. Neurocomputing 131, 43–51 (2014)CrossRefGoogle Scholar
  5. 5.
    Schleif, F.-M., Zhu, X., Gisbrecht, A., Hammer, B.: Fast approximated relational and kernel clustering. In: Proceedings of ICPR 2012, pp. 1229–1232. IEEE (2012)Google Scholar
  6. 6.
    Si, S., Hsieh, C.-J., Dhillon, I.S.: Memory efficient kernel approximation. In: Proceedings of the 31th International Conference on Machine Learning, ICML, volume 32 of JMLR Proceedings, pp. 701–709. JMLR.org (2014)Google Scholar
  7. 7.
    Cortes, C., Mohri, M., Talwalkar, A.: On the impact of kernel approximation on learning accuracy. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, AISTATS, volume 9 of JMLR Proceedings, pp. 113–120. JMLR.org (2010)Google Scholar
  8. 8.
    Zhang, K., Kwok, J.T.: Clustered Nyström method for large scale manifold learning and dimension reduction. IEEE Trans. Neural Netw. 21(10), 1576–1587 (2010)CrossRefGoogle Scholar
  9. 9.
    Gisbrecht, A., Schleif, F.-M.: Metric and non-metric proximity transformations at linear costs. Neurocomputing 167, 643–657 (2015)CrossRefGoogle Scholar
  10. 10.
    Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Proceedings of the 21st Annual Conference on Neural Information Processing Systems, NIPS 2007. Curran Associates, Inc. (2007)Google Scholar
  11. 11.
    Agarwal, A., Kakade, S.M., Karampatziakis, N., Song, L., Valiant, G.: Least squares revisited: scalable approaches for multi-class prediction. In: Proceedings of the 31th International Conference on Machine Learning, ICML, volume 32 of JMLR Proceedings, pp. 541–549. JMLR.org (2014)Google Scholar
  12. 12.
    Bunte, K., Kaden, M., Schleif, F.-M.: Low-rank kernel space representations in prototype learning. WSOM 2016. AISC, vol. 428, pp. 341–353. Springer, Switzerland (2016)CrossRefGoogle Scholar
  13. 13.
    Schleif, F.-M., Hammer, B., Villmann, T.: Margin based active learning for LVQ networks. Neurocomputing 70(7–9), 1215–1224 (2007)CrossRefGoogle Scholar
  14. 14.
    Yang, T., Li, Y.-F., Mahdavi, M., Jin, R., Zhou, Z.-H., Nystroem method vs random Fourier features: a theoretical and empirical comparison. In: Proceedings of the 26st Annual Conference on Neural Information Processing Systems, NIPS 2012, pp. 485–493 (2012)Google Scholar
  15. 15.
    Durrant, R.J., Kabán, A.: Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions. Mach. Learn. 99(2), 257–286 (2015). doi:10.1007/s10994-014-5466-8 MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Freund, Y., Dasgupta, S., Kabra, M., Verma, N.: Learning the structure of manifolds using random projections. In: Proceedings of the 21st Annual Conference on Neural Information Processing Systems, NIPS 2007. Curran Associates, Inc. (2007)Google Scholar
  17. 17.
    Vergara, J.R., Estévez, P.A.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24(1), 175–186 (2014)CrossRefGoogle Scholar
  18. 18.
    Klement, S., Anders, S., Martinetz, T.: The support feature machine: classification with the least number of features and application to neuroimaging data. Neural Comput. 25(6), 1548–1584 (2013)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Schleif, F.-M., Villmann, T., Zhu, X.: High dimensional matrix relevance learning. In: Proceedings of IEEE Internation Conference on Data Mining Workshop (ICDMW), pp. 661–667 (2014)Google Scholar
  20. 20.
    Williams, C.K.I., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Proceedings of the 13th Annual Conference on Neural Information Processing Systems, NIPS 2000, pp. 682–688 (2000)Google Scholar
  21. 21.
    Zhang, K., Tsang, I.W., Kwok, J.T.: Improved Nystrom low-rank approximation and error analysis. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 1232–1239. ACM, New York (2008)Google Scholar
  22. 22.
    Gittens, A., Mahoney, M.W.: Revisiting the Nystrom method for improved large-scale machine learning. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, volume 28 of JMLR Proceedings, pp. 567–575. JMLR.org (2013)Google Scholar
  23. 23.
    De Brabanter, K., De Brabanter, J., Suykens, J.A.K., De Moor, B.: Optimized fixed-size kernel models for large data sets. Comput. Stat. Data Anal. 54(6), 1484–1504 (2010)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Kumar, S., Mohri, M., Talwalkar, A.: Sampling methods for the Nyström method. J. Mach. Learn. Res. 13, 981–1006 (2012)MathSciNetMATHGoogle Scholar
  25. 25.
    Pham, N., Pagh, R.: Fast and scalable polynomial kernels via explicit feature maps. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, pp. 239–247. ACM (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.School of Computer ScienceUniversity of Applied Sciences Würzburg-SchweinfurtWürzburgGermany
  2. 2.Computational Intelligence GroupUniversity of Applied Sciences MittweidaMittweidaGermany
  3. 3.School of Computer ScienceUniversity of BirminghamEdgbastonUK

Personalised recommendations