Online learning versus offline learning

  • Shai Ben-David
  • Eyal Kushilevitz
  • Yishay Mansour
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 904)


We present an off-line variant of the mistake-bound model of learning. Just like in the well studied on-line model, a learner in the offline model has to learn an unknown concept from a sequence of elements of the instance space on which he makes “guess and test” trials. In both models, the aim of the learner is to make as few mistakes as possible. The difference between the models is that, while in the on-line model only the set of possible elements is known, in the off-line model the sequence of elements (i.e., the identity of the elements as well as the order in which they are to be presented) is known to the learner in advance.

We give a combinatorial characterization of the number of mistakes in the off-line model. We apply this characterization to solve several natural questions that arise for the new model. First, we compare the mistake bounds of an off-line learner to those of a learner learning the same concept classes in the on-line scenario. We show that the number of mistakes in the on-line learning is at most a log n factor more than the off-line learning, where n is the length of the sequence. In addition, we show that if there is an off-line algorithm that does not make more than a constant number of mistakes for each sequence then there is an online algorithm that also does not make more than a constant number of mistakes.

The second issue we address is the effect of the ordering of the elements on the number of mistakes of an off-line learner. It turns out that there are sequences on which an off-line learner can guarantee at most one mistake, yet a permutation of the same sequence forces him to err on many elements. We prove, however, that the gap, between the off-line mistake bounds on permutations of the same sequence of n-many elements, cannot be larger than a multiplicative factor of log n, and we present examples that obtain such a gap.


Online Algorithm Concept Class Complete Binary Tree Equivalence Query Instance Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [A89]
    D. Angluin, “Equivalence Queries and Approximate Fingerprints” Proc. of 2nd COLT pp. 134–145, 1989.Google Scholar
  2. [B90a]
    A. Blum, “Separating distribution-free and mistake-bound learning models over the Boolean domain” Proceedings of FOCS90, pp. 211–218, 1990.Google Scholar
  3. [B90b]
    A. Blum, “Learning Boolean Functions in an Infinite Attribute Space” Machine Learning, Vol. 9, 1992. (Also, Proceedings of STOC90, pp. 64–72, 1990.)Google Scholar
  4. [B92]
    A. Blum, “Rank-r Decision Trees are a Subclass of r-Decision Lists” Information Processing Letters, Vol. 42, pp. 183–185, 1992.Google Scholar
  5. [CFHHSW93]
    N. Cesa-Bianchi, Y. Freund, D. P. Helmbold, D. Haussler, R. E. Schapire, and M. K. Warmuth “How to use expert advice”, Proceedings of STOC93, pp. 382–391, 1993.Google Scholar
  6. [CM92]
    Z. Chen, and W. Maass, “On-line Learning of Rectangles” Proceedings of COLT92, pp. 16–27, 1992.Google Scholar
  7. [CLR90]
    T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Algorithms. MIT Press, 1990.Google Scholar
  8. [EH89]
    A. Ehrenfeucht, and D. Haussler, “Learning Decision Trees from Random Examples” Information and Computation, Vol. 82, pp. 231–246, 1989.Google Scholar
  9. [FMG92]
    M. Feder, N. Merhav, and M. Gutman, “Universal Prediction of Individual Sequences”, IEEE Trans. on Information Theory, IT-38, No. 4, pp. 1258–1270, 1992.Google Scholar
  10. [GRS93]
    S. A. Goldman, R. L. Rivest, and R. E. Schapire, “Learning Binary Relations and Total Orders”, SIAM Journal on Computing, Vol. 22, No. 5, pp. 1006–1034, 1993.Google Scholar
  11. [GS94]
    S. A. Goldman, and R. H. Sloan, “The Power of Self-Directed Learning”, Machine Learning, Vol. 14, pp. 271–294, 1994.Google Scholar
  12. [HLL92]
    D.P. Helmbold, N. Littlestone, and P.M. Long, “Apple tasting and nearly one-sided learning”, Proceedings of FOCS92, pp. 493–502, 1992.Google Scholar
  13. [L88]
    N. Littlestone, “Learning when Irrelevant Attributes Abound: A New Linear-Threshold Algorithm” Machine Learning, Vol. 2, pp. 285–318, 1988.Google Scholar
  14. [L89]
    N. Littlestone, “Mistake bounds and logarithmic linear-threshold learning algorithms” PhD thesis, U.C. Santa Cruz, March 1989.Google Scholar
  15. [LW89]
    N. Littlestone, M. K. Warmuth, “The weighted majority algorithm”, Proceedings of FOCS89, pp. 256–261, 1989.Google Scholar
  16. [M91]
    W. Maass, “On-line Learning with an Oblivious Environment and the Power of Randomization” Proceedings of COLT91, pp. 167–175, 1991.Google Scholar
  17. [MF93]
    N. Merhav and M. Feder, “Universal Sequential Decision Schemes from Individual Sequences”, IEEE Trans. on Information Theory, IT-39, pp. 1280–1292, July 1993.Google Scholar
  18. [PF88]
    S. Porat, and J. Feldman, “Learning Automata from Ordered Examples”, Proc. of 1st COLT pp. 386–396, 1988.Google Scholar
  19. [S72]
    N. Sauer, “On the Density of Families of Sets”, Journal of Combinatorial Theory (A), Vol. 13, pp. 145–147, 1972.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1995

Authors and Affiliations

  • Shai Ben-David
    • 1
  • Eyal Kushilevitz
    • 1
  • Yishay Mansour
    • 2
  1. 1.Computer Science Dept.TechnionIsrael
  2. 2.Computer Science Dept.Tel-Aviv UniversityIsrael

Personalised recommendations