Online learning versus offline learning
We present an off-line variant of the mistake-bound model of learning. Just like in the well studied on-line model, a learner in the offline model has to learn an unknown concept from a sequence of elements of the instance space on which he makes “guess and test” trials. In both models, the aim of the learner is to make as few mistakes as possible. The difference between the models is that, while in the on-line model only the set of possible elements is known, in the off-line model the sequence of elements (i.e., the identity of the elements as well as the order in which they are to be presented) is known to the learner in advance.
We give a combinatorial characterization of the number of mistakes in the off-line model. We apply this characterization to solve several natural questions that arise for the new model. First, we compare the mistake bounds of an off-line learner to those of a learner learning the same concept classes in the on-line scenario. We show that the number of mistakes in the on-line learning is at most a log n factor more than the off-line learning, where n is the length of the sequence. In addition, we show that if there is an off-line algorithm that does not make more than a constant number of mistakes for each sequence then there is an online algorithm that also does not make more than a constant number of mistakes.
The second issue we address is the effect of the ordering of the elements on the number of mistakes of an off-line learner. It turns out that there are sequences on which an off-line learner can guarantee at most one mistake, yet a permutation of the same sequence forces him to err on many elements. We prove, however, that the gap, between the off-line mistake bounds on permutations of the same sequence of n-many elements, cannot be larger than a multiplicative factor of log n, and we present examples that obtain such a gap.
KeywordsOnline Algorithm Concept Class Complete Binary Tree Equivalence Query Instance Space
Unable to display preview. Download preview PDF.
- [A89]D. Angluin, “Equivalence Queries and Approximate Fingerprints” Proc. of 2nd COLT pp. 134–145, 1989.Google Scholar
- [B90a]A. Blum, “Separating distribution-free and mistake-bound learning models over the Boolean domain” Proceedings of FOCS90, pp. 211–218, 1990.Google Scholar
- [B90b]A. Blum, “Learning Boolean Functions in an Infinite Attribute Space” Machine Learning, Vol. 9, 1992. (Also, Proceedings of STOC90, pp. 64–72, 1990.)Google Scholar
- [B92]A. Blum, “Rank-r Decision Trees are a Subclass of r-Decision Lists” Information Processing Letters, Vol. 42, pp. 183–185, 1992.Google Scholar
- [CFHHSW93]N. Cesa-Bianchi, Y. Freund, D. P. Helmbold, D. Haussler, R. E. Schapire, and M. K. Warmuth “How to use expert advice”, Proceedings of STOC93, pp. 382–391, 1993.Google Scholar
- [CM92]Z. Chen, and W. Maass, “On-line Learning of Rectangles” Proceedings of COLT92, pp. 16–27, 1992.Google Scholar
- [CLR90]T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Algorithms. MIT Press, 1990.Google Scholar
- [EH89]A. Ehrenfeucht, and D. Haussler, “Learning Decision Trees from Random Examples” Information and Computation, Vol. 82, pp. 231–246, 1989.Google Scholar
- [FMG92]M. Feder, N. Merhav, and M. Gutman, “Universal Prediction of Individual Sequences”, IEEE Trans. on Information Theory, IT-38, No. 4, pp. 1258–1270, 1992.Google Scholar
- [GRS93]S. A. Goldman, R. L. Rivest, and R. E. Schapire, “Learning Binary Relations and Total Orders”, SIAM Journal on Computing, Vol. 22, No. 5, pp. 1006–1034, 1993.Google Scholar
- [GS94]S. A. Goldman, and R. H. Sloan, “The Power of Self-Directed Learning”, Machine Learning, Vol. 14, pp. 271–294, 1994.Google Scholar
- [HLL92]D.P. Helmbold, N. Littlestone, and P.M. Long, “Apple tasting and nearly one-sided learning”, Proceedings of FOCS92, pp. 493–502, 1992.Google Scholar
- [L88]N. Littlestone, “Learning when Irrelevant Attributes Abound: A New Linear-Threshold Algorithm” Machine Learning, Vol. 2, pp. 285–318, 1988.Google Scholar
- [L89]N. Littlestone, “Mistake bounds and logarithmic linear-threshold learning algorithms” PhD thesis, U.C. Santa Cruz, March 1989.Google Scholar
- [LW89]N. Littlestone, M. K. Warmuth, “The weighted majority algorithm”, Proceedings of FOCS89, pp. 256–261, 1989.Google Scholar
- [M91]W. Maass, “On-line Learning with an Oblivious Environment and the Power of Randomization” Proceedings of COLT91, pp. 167–175, 1991.Google Scholar
- [MF93]N. Merhav and M. Feder, “Universal Sequential Decision Schemes from Individual Sequences”, IEEE Trans. on Information Theory, IT-39, pp. 1280–1292, July 1993.Google Scholar
- [PF88]S. Porat, and J. Feldman, “Learning Automata from Ordered Examples”, Proc. of 1st COLT pp. 386–396, 1988.Google Scholar
- [S72]N. Sauer, “On the Density of Families of Sets”, Journal of Combinatorial Theory (A), Vol. 13, pp. 145–147, 1972.Google Scholar