Abstract
We consider a two-layer network algorithm. The first layer consists of an uncountable number of linear units. Each linear unit is an LMS algorithm whose inputs are first “kernelized.” Each unit is indexed by the value of a parameter corresponding to a parameterized reproducing kernel. The first-layer outputs are then connected to an exponential weights algorithm which combines them to produce the final output. We give loss bounds for this algorithm; and for specific applications to prediction relative to the best convex combination of kernels, and the best width of a Gaussian kernel. The algorithm’s predictions require the computation of an expectation which is a quotient of integrals as seen in a variety of Bayesian inference problems. Typically this computational problem is tackled by mcmc, importance sampling, and other sampling techniques for which there are few polynomial time guarantees of the quality of the approximation in general and none for our problem specifically. We develop a novel deterministic polynomial time approximation scheme for the computations of expectations considered in this paper.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Andrieu, C., de Freitas, N., Doucet, A., Jordan, M.I.: An introduction to MCMC for machine learning. Machine Learning 50(1-2), 5–43 (2003)
Aronszajn, N.: Theory of reproducing kernels. Trans. Amer. Math. Soc. 68, 337–404 (1950)
Blum, L., Cucker, F., Shub, M., Smale, S.: Complexity and real computation. Springer, Heidelberg (1998)
Cesa-Bianchi, N., Long, P., Warmuth, M.: Worst-case quadratic loss bounds for on-line prediction of linear functions by gradient descent. IEEE Transactions on Neural Networks 7(2), 604–619 (1996)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Csató, L., Opper, M.: Sparse on-line gaussian processes. Neural Computation 14(3), 641–668 (2002)
Freund, Y.: Predicting a binary sequence almost as well as the optimal biased coin. In: Proc. of COLT, pp. 89–98. ACM Press, New York (1996)
Gibbs, M., MacKay, D.: Efficient implementation of gaussian processes (draft manuscript) (1996)
Herbster, M.: Learning additive models online with fast evaluating kernels. In: Helmbold, D.P., Williamson, B. (eds.) COLT 2001 and EuroCOLT 2001. LNCS (LNAI), vol. 2111, pp. 444–460. Springer, Heidelberg (2001)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301), 13–30 (1963)
Kivinen, J., Smola, A.J., Williamson, R.C.: Online learning with kernels. In: NIPS, vol. 14, MIT Press, Cambridge (2002)
Kivinen, J., Warmuth, M.K.: Averaging expert predictions. In: Fischer, P., Simon, H.U. (eds.) EuroCOLT 1999. LNCS (LNAI), vol. 1572, pp. 153–167. Springer, Heidelberg (1999)
Littlestone, N.: Learning when irrelevant attributes abound: A new linearthreshold algorithm. Machine Learning 2, 285–318 (1988)
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108(2), 212–261 (1994)
Micchelli, C., Pontil, M.: Learning the kernel function via regularization, Dept. of Computer Science, University College London, Research Note: RN/04/12 (2004)
Ong, C.S., Smola, A.J., Williamson, R.C.: Hyperkernels. In: Neural Information Processing Systems, vol. 15, MIT Press, Cambridge (2002)
Pan, V.Y.: Solving a polynomial equation: Some history and recent progress. SIAM Review 39(2), 187–220 (1997)
Traub, J.F., Werschulz, A.G.: Complexity and Information. Cambridge University Press, Cambridge (1998)
Vermaak, J., Godsill, S.J., Doucet, A.: Sequential bayesian kernel regression. In: NIPS, vol. 16, MIT Press, Cambridge (2004)
Vovk, V.: Aggregating strategies. In: Proc. 3rd Annu. Workshop on Comput. Learning Theory, pp. 371–383. Morgan Kaufmann, San Francisco (1990)
Vovk, V.: Competitive on-line statistics. Bull. of the International Stat. Inst (1999)
Williams, C.K.I., Rasmussen, C.E.: Gaussian processes for regression. In: NIPS 1995, Cambridge, Massachusetts, MIT Press, Cambridge (1996)
Wu, Q., Ying, Y., Zhou, D.-X.: Multi-kernel regularized classifiers. Submitted to Journ. of Complexity (2004)
Yap, C.K.: Fundamental problems of algorithmic algebra. Oxford Uni. Press, Oxford (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Herbster, M. (2004). Relative Loss Bounds and Polynomial-Time Predictions for the k-lms-net Algorithm. In: Ben-David, S., Case, J., Maruoka, A. (eds) Algorithmic Learning Theory. ALT 2004. Lecture Notes in Computer Science(), vol 3244. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30215-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-30215-5_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23356-5
Online ISBN: 978-3-540-30215-5
eBook Packages: Springer Book Archive