Learning with infinitely many features
- 881 Downloads
We propose a principled framework for learning with infinitely many features, situations that are usually induced by continuously parametrized feature extraction methods. Such cases occur for instance when considering Gabor-based features in computer vision problems or when dealing with Fourier features for kernel approximations. We cast the problem as the one of finding a finite subset of features that minimizes a regularized empirical risk. After having analyzed the optimality conditions of such a problem, we propose a simple algorithm which has the flavour of a column-generation technique. We also show that using Fourier-based features, it is possible to perform approximate infinite kernel learning. Our experimental results on several datasets show the benefits of the proposed approach in several situations including texture classification and large-scale kernelized problems (involving about 100 thousand examples).
KeywordsInfinite features Column generation Gabor features Kernels
This work was supported in part by the IST Program of the European Community, under the PASCAL2 Network of Excellence, IST-216886. This publication only reflects the authors views. This work was partly supported by grants from the ASAP ANR-09-EMER-001, ANR JCJC-12 Lemon and ANR Blanc-12 Greta.
- Argyriou, A., Micchelli, C., & Pontil, M. (2005). Learning convex combinations of continuously parameterized basic kernels. In Proceedings of COLT’2005 (pp. 338–352). Google Scholar
- Argyriou, A., Hauser, R., Micchelli, C., & Pontil, M. (2006). A dc-programming algorithm for kernel selection. In Proc. international conference in machine learning. Google Scholar
- Argyriou, A., Maurer, A., & Pontil, M. (2008b). An algorithm for transfer learning in a heterogeneous environment. In Proceedings of the European conference on machine learning. Google Scholar
- Bach, F., & Jordan, M. (2005). Predictive low-rank decomposition for kernel methods. In Proceedings of the 22nd international conference on machine learning. Google Scholar
- Chen, X., Pan, W., Kwok, J., & Carbonell, J. (2009). Accelerated gradient method for multi-task sparse learning problem. In Proceedings of the international conference on data mining. Google Scholar
- Cortes, C., Mohri, M., & Rostamizadeh, A. (2010). Generalization bounds for learning kernels. In Proceedings of the 27th annual international conference on machine learning. Google Scholar
- Evgeniou, T., & Pontil, M. (2004). Regularized multi-task learning. In Proceedings of the tenth conference on knowledge discovery and data mining. Google Scholar
- Gehler, P., & Nowozin, S. (2008). Infinite kernel learning. In NIPS workshop on automatic selection of kernel parameters. Google Scholar
- Gehler, P., & Nowozin, S. (2009). Let the kernel figure it out: principled learning of pre-processing for kernel classifiers. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition. Google Scholar
- Jalali, A., Ravikumar, P., Sanghavi, S., & Ruan, C. (2010). A dirty model for multitask learning. In Advances in neural information processing systems (NIPS). Google Scholar
- Lee, H., Largman, Y., Pham, P., & Ng, A. (2009). Unsupervised feature learning for audio classification using convolutional deep belief networks. In Advances in neural information and processing systems. Google Scholar
- Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Boosting algorithms as gradient descent in function space. In Neural information processing systems. Google Scholar
- Mota, J., Xavier, J., Aguiar, P., & Puschel, M. (2011). A proof of convergence for the alternating direction method of multipliers applied to polyhedral-constrained functions (Technical report). arXiv:1112.2295.
- Nocedal, J., & Wright, S. (2000). Numerical optimization. Berlin: Springer. Google Scholar
- Ozogur-Akyuz, S., & Weber, G. (2008). Learning with infinitely many kernels via semi-infinite programming. In Proceedings of Euro mini conference on “Continuous optimization and knowledge based technologies” (pp. 342–348). Google Scholar
- Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. In Advances in neural information processing systems (NIPS) (pp. 1177–1184). Google Scholar
- Rosset, S., Swirszcz, G., Srebro, N., & Zhu, J. (2007). ℓ 1 regularization in infinite dimensional feature spaces. In Proceedings of computational learning theory (pp. 544–558). Google Scholar
- Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In Proceedings of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
- Ying, Y., & Campbell, C. (2009). Generalization bounds for learning the kernel. In Proceedings of 22nd annual conference on learning theory (COLT). Google Scholar