The Shortest Path Problem Under Partial Monitoring

  • András György
  • Tamás Linder
  • György Ottucsák
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4005)


The on-line shortest path problem is considered under partial monitoring scenarios. At each round, a decision maker has to choose a path between two distinguished vertices of a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be small. In the multi-armed bandit setting, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this scenario, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to \(1/\sqrt{n}\) and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. This result improves earlier bandit-algorithms which have performance bounds that either depend exponentially on the number of edges or converge to zero at a slower rate than \(O(1/\sqrt{n})\). An extension to the so-called label efficient setting is also given, where the decision maker is informed about the weight of the chosen path only with probability ε<1. Applications to routing in packet switched networks along with simulation results are also presented.


Decision Maker Time Slot Edge Weight Short Path Problem Bandit Problem 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The non-stochastic multi-armed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2002)CrossRefMathSciNetMATHGoogle Scholar
  2. 2.
    Awerbuch, B., Holmer, D., Rubens, H., Kleinberg, R.: Provably competitive adaptive routing. In: Proceedings of IEEE INFOCOM 2005, vol. 1, pp. 631–641 (March 2005)Google Scholar
  3. 3.
    Awerbuch, B., Kleinberg, R.D.: Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the 36th Annual ACM Symposium on the Theory of Computing, STOC 2004, Chicago, IL, USA, pp. 45–53. ACM Press, New York (2004)CrossRefGoogle Scholar
  4. 4.
    Blackwell, D.: An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics 6, 1–8 (1956)MathSciNetMATHGoogle Scholar
  5. 5.
    Bousquet, O., Warmuth, M.K.: Tracking a small set of experts by mixing past posteriors. Journal of Machine Learning Research 3, 363–396 (2002)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., Haussler, D., Schapire, R., Warmuth, M.K.: How to use expert advice. Journal of the ACM 44(3), 427–485 (1997)CrossRefMathSciNetMATHGoogle Scholar
  7. 7.
    Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)CrossRefMATHGoogle Scholar
  8. 8.
    Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Minimizing regret with label efficient prediction. IEEE Trans. Inform. Theory IT-51, 2152–2162 (2005)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)MATHGoogle Scholar
  10. 10.
    Gelenbe, E., Gellman, M., Lent, R., Liu, P., Su, P.: Autonomous smart routing for network QoS. In: Proceedings of First International Conference on Autonomic Computing, pp. 232–239. IEEE Computer Society, Los Alamitos (2004)CrossRefGoogle Scholar
  11. 11.
    Gelenbe, E., Lent, R., Xhu, Z.: Measurement and performance of a cognitive packet network. Journal of Computer Networks 37, 691–701 (2001)CrossRefGoogle Scholar
  12. 12.
    György, A., Linder, T., Lugosi, G.: Efficient algorithms and minimax bounds for zero-delay lossy source coding. IEEE Transactions on Signal Processing 52, 2337–2347 (2004)CrossRefMathSciNetGoogle Scholar
  13. 13.
    György, A., Linder, T., Lugosi, G.: A “follow the perturbed leader”-type algorithm for zero-delay quantization of individual sequences. In: Proc. Data Compression Conference, Snowbird, UT, USA, pp. 342–351 (March 2004)Google Scholar
  14. 14.
    György, A., Linder, T., Lugosi, G.: Tracking the best of many experts. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS, vol. 3559, pp. 204–216. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  15. 15.
    György, A., Linder, T., Lugosi, G.: Tracking the best quantizer. In: Proceedings of the IEEE International Symposium on Information Theory, Adelaide, Australia,pp. 1163–1167 ( June-July 2005)Google Scholar
  16. 16.
    György, A., Ottucsák, G.: Adaptive routing using expert advice. The Computer Journal 49(2), 180–189 (2006)CrossRefGoogle Scholar
  17. 17.
    Hannan, J.: Approximation to bayes risk in repeated plays. In: Dresher, M., Tucker, A., Wolfe, P. (eds.) Contributions to the Theory of Games, vol. 3, pp. 97–139. Princeton University Press, Princeton (1957)Google Scholar
  18. 18.
    Helmbold, D.P., Schapire, R.E.: Predicting nearly as well as the best pruning of a decision tree. Machine Learning 27, 51–68 (1997)CrossRefGoogle Scholar
  19. 19.
    Herbster, M., Warmuth, M.K.: Tracking the best expert. Machine Learning 32(2), 151–178 (1998)CrossRefMATHGoogle Scholar
  20. 20.
    Kalai, A.T., Vempala, S.S.: Efficient algorithms for online decision problems. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS, vol. 2777, pp. 26–40. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  21. 21.
    Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108, 212–261 (1994)CrossRefMathSciNetMATHGoogle Scholar
  22. 22.
    McMahan, H.B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS, vol. 3120, pp. 109–123. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  23. 23.
    Mohri, M.: General algebraic frameworks and algorithms for shortest distance problems. Technical Report 981219-10TM, AT&T Labs Research (1998)Google Scholar
  24. 24.
    Takimoto, E., Warmuth, M.K.: Path kernels and multiplicative updates. Journal of Machine Learning Research 4, 773–818 (2003)CrossRefMathSciNetGoogle Scholar
  25. 25.
    Vovk, V.: Aggregating strategies. In: Proceedings of the Third Annual Workshop on Computational Learning Theory, Rochester, NY, pp. 372–383. Morgan Kaufmann, San Francisco (1990)Google Scholar
  26. 26.
    Vovk, V.: Derandomizing stochastic prediction strategies. Machine Learning 35(3), 247–282 (1999)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • András György
    • 1
  • Tamás Linder
    • 1
    • 2
  • György Ottucsák
    • 3
  1. 1.Informatics LaboratoryComputer and Automation Research Institute of the Hungarian Academy of SciencesBudapestHungary
  2. 2.Department of Mathematics and StatisticsQueen’s UniversityKingstonCanada
  3. 3.Department of Computer Science and Information TheoryBudapest University of Technology and EconomicsBudapestHungary

Personalised recommendations