Artificial Neural Networks pp 245-270 | Cite as
Learning as Constraint Reactions
Abstract
A theory of learning is proposed,which extends naturally the classic regularization framework of kernelmachines to the case in which the agent interacts with a richer environment, compactly described by the notion of constraint. Variational calculus is exploited to derive general representer theorems that give a description of the structure of the solution to the learning problem. It is shown that such solution can be represented in terms of constraint reactions, which remind the corresponding notion in analytic mechanics. In particular, the derived representer theorems clearly show the extension of the classic kernel expansion on support vectors to the expansion on support constraints. As an application of the proposed theory three examples are given, which illustrate the dimensional collapse to a finite-dimensional space of parameters. The constraint reactions are calculated for the classic collection of supervised examples, for the case of box constraints, and for the case of hard holonomic linear constraints mixed with supervised examples. Interestingly, this leads to representer theorems for which we can re-use the kernel machine mathematical and algorithmic apparatus.
Keywords
Soft Constraint Hard Constraint Unilateral Constraint Holonomic Constraint Kernel MachinePreview
Unable to display preview. Download preview PDF.
References
- 1.Adams, R.A., Fournier, J.F.: Sobolev Spaces, 2nd edn. Academic Press (2003)Google Scholar
- 2.Argyriou, A., Micchelli, C.A., Pontil, M.: When is there a representer theorem? Vector versus matrix regularizers. Journal of Machine Learning Research 10, 2507–2529 (2009)MATHMathSciNetGoogle Scholar
- 3.Attouch, H., Buttazzo, G., Michaille, G.: Variational Analysis in Sobolev and BV Spaces. Applications to PDEs and Optimization. SIAM, Philadelphia (2006)CrossRefMATHGoogle Scholar
- 4.Diligenti, M., Gori, M., Maggini, M., Rigutini, L.: Multitask kernel-based learning with logic constraints. In: Proc. 19th European Conf. on Artificial Intelligence, pp. 433–438 (2010)Google Scholar
- 5.Diligenti, M., Gori, M., Maggini, M., Rigutini, L.: Bridging logic and kernel machines. Machine Learning 86, 57–88 (2012)CrossRefMATHMathSciNetGoogle Scholar
- 6.Dinuzzo, F., Schoelkopf, B.: The representer theorem for Hilbert spaces: A necessary and sufficient condition. In: Proc. Neural Information Processing Systems (NIPS) Conference, pp. 189–196 (2012)Google Scholar
- 7.Giaquinta, M., Hildebrand, S.: Calculus of Variations I, vol. 1. Springer (1996)Google Scholar
- 8.Gnecco, G., Gori, M., Melacci, S., Sanguineti, M.: Learning with hard constraints. In: Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E.P., Appollini, B., Kasabov, N. (eds.) ICANN 2013. LNCS, vol. 8131, pp. 146–153. Springer, Heidelberg (2013)CrossRefGoogle Scholar
- 9.Gnecco, G., Gori, M., Melacci, S., Sanguineti, M.: A theoretical framework for supervised learning from regions. Neurocomputing 129, 25–32 (2014)CrossRefGoogle Scholar
- 10.Gnecco, G., Gori, M., Sanguineti, M.: Learning with boundary conditions. Neural Computation 25, 1029–1106 (2013)CrossRefMATHMathSciNetGoogle Scholar
- 11.Gnecco, G., Gori, M., Melacci, S., Sanguineti, M.: Foundations of support constraints machines. Neural Computation (to appear)Google Scholar
- 12.Gori, M., Melacci, S.: Constraint verification with kernel machines. IEEE Transactions on Neural Networks and Learning Systems 24, 825–831 (2013)CrossRefGoogle Scholar
- 13.Klement, E.P., Mesiar, R., Pap, E.: Triangular Norms. Kluwer (2000)Google Scholar
- 14.Kunapuli, G., Bennett, K.P., Shabbeer, A., Maclin, R., Shavlik, J.: Online knowledge-based support vector machines. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part II. LNCS (LNAI), vol. 6322, pp. 145–161. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 15.Mangasarian, O.L., Wild, E.W.: Nonlinear knowledge-based classification. IEEE Transactions on Neural Networks 19, 1826–1832 (2008)CrossRefGoogle Scholar
- 16.Melacci, S., Gori, M.: Learning with box kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(11), 2680–2692 (2013)CrossRefGoogle Scholar
- 17.Melacci, S., Maggini, M., Gori, M.: Semi–supervised learning with constraints for multi–view object recognition. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009, Part II. LNCS, vol. 5769, pp. 653–662. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- 18.Poggio, T., Girosi, F.: A theory of networks for approximation and learning. Technical report. MIT (1989)Google Scholar
- 19.Schwartz, L.: Théorie des distributions. Hermann, Paris (1978)MATHGoogle Scholar
- 20.Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004)Google Scholar
- 21.Sun, Z., Zhang, Z., Wang, H., Jiang, M.: Cutting plane method for continuously constrained kernel-based regression. IEEE Transactions on Neural Networks 21, 238–247 (2010)CrossRefGoogle Scholar
- 22.Suykens, J.A.K., Alzate, C., Pelckmans, K.: Primal and dual model representations in kernel-based learning. Statistics Surveys 4, 148–183 (2010)CrossRefMATHMathSciNetGoogle Scholar
- 23.Theodoridis, S., Slavakis, K., Yamada, I.: Adaptive learning in a world of projections. IEEE Signal Processing Magazine 28, 97–123 (2011)CrossRefGoogle Scholar
- 24.Tikhonov, A.N., Arsenin, V.Y.: Solution of ill-posed problems. W.H. Winston, Washington, DC (1977)Google Scholar
- 25.Yuille, A.L., Grzywacz, N.M.: A mathematical analysis of the motion coherence theory. International Journal of Computer Vision 3, 155–175 (1989)CrossRefGoogle Scholar