Applied Intelligence

, Volume 44, Issue 2, pp 282–294 | Cite as

A formal proof of the 𝜖-optimality of discretized pursuit algorithms

  • Xuan Zhang
  • B. John Oommen
  • Ole-Christoffer Granmo
  • Lei Jiao


Learning Automata (LA) can be reckoned to be the founding algorithms on which the field of Reinforcement Learning has been built. Among the families of LA, Estimator Algorithms (EAs) are certainly the fastest, and of these, the family of discretized algorithms are proven to converge even faster than their continuous counterparts. However, it has recently been reported that the previous proofs for 𝜖-optimality for all the reported algorithms for the past three decades have been flawed. We applaud the researchers who discovered this flaw, and who further proceeded to rectify the proof for the Continuous Pursuit Algorithm (CPA). The latter proof examines the monotonicity property of the probability of selecting the optimal action, and requires the learning parameter to be continuously changing. In this paper, we provide a new method to prove the 𝜖-optimality of the Discretized Pursuit Algorithm (DPA) which does not require this constraint, by virtue of the fact that the DPA has, in and of itself, absorbing barriers to which the LA can jump in a discretized manner. Unlike the proof given (Zhang et al., Appl Intell 41:974–985, 3) for an absorbing version of the CPA, which utilizes the single-action Hoeffding’s inequality, the current proof invokes what we shall refer to as the “multi-action” version of the Hoeffding’s inequality. We believe that our proof is both unique and pioneering. It can also form the basis for formally showing the 𝜖-optimality of the other EAs that possess absorbing states.


Machine learning Learning automata Pursuit algorithms DPA Convergence 𝜖-optimality 


  1. 1.
    Zhang X, Oommen BJ, Granmo O-C, Jiao L (2014) Using the theory of regular functions to formally prove the 𝜖-optimality of discretized pursuit learning algorithms. In: Proceedings of IEA-AIE. Springer, Kaohsiung, Taiwan, pp 379–388Google Scholar
  2. 2.
    Rajaraman K, Sastry PS (1996) Finite time analysis of the pursuit algorithm for learning automata. IEEE Trans Syst Man Cybern B: Cybern 26:590–598CrossRefGoogle Scholar
  3. 3.
    Zhang X, Granmo O-C, Oommen BJ, Jiao L (2014) A formal proof of the 𝜖-optimality of absorbing continuous pursuit algorithms using the theory of regular functions. Appl Intell 41:974–985CrossRefGoogle Scholar
  4. 4.
    Narendra KS, Thathachar MAL (1989) Learning automata: an introduction. Prentice HallGoogle Scholar
  5. 5.
    Oommen BJ (1986) Absorbing and ergodic discretized two-action learning automata. IEEE Trans Syst Man Cybern 16:282–296CrossRefMathSciNetMATHGoogle Scholar
  6. 6.
    Thathachar MAL, Sastry PS (1986) Estimator algorithms for learning automata. In: Proceedings of the Platinum Jubilee Conference on Systems and Signal Processing, Bangalore, India, pp 29–32Google Scholar
  7. 7.
    Agache M, Oommen BJ (2002) Generalized pursuit learning schemes: new families of continuous and discretized learning automata. IEEE Trans Syst Man Cybern B: Cybern 32(6):738–749CrossRefGoogle Scholar
  8. 8.
    Zhang X, Granmo O-C, Oommen BJ (2011) The Bayesian pursuit algorithm: A new family of estimator learning automata. In: Proceedings of IEA-AIE 2011. Springer, New York, USA, pp 608–620Google Scholar
  9. 9.
    Zhang X, Granmo O-C, Oommen BJ (2013) On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata. Appl Intell 39:782–792CrossRefGoogle Scholar
  10. 10.
    Oommen BJ, Lanctôt JK (1990) Discretized pursuit learning automata. IEEE Trans Syst Man Cybern 20:931–938CrossRefMATHGoogle Scholar
  11. 11.
    Lanctôt JK, Oommen BJ (1991) On discretizing estimator-based learning algorithms. IEEE Trans Syst Man Cybern B: Cybern 2:1417–1422Google Scholar
  12. 12.
    Lanctôt JK, Oommen BJ (1992) Discretized estimator learning automata. IEEE Trans Syst Man Cybern B: Cybern 22(6):1473–1483CrossRefGoogle Scholar
  13. 13.
    Oommen BJ, Agache M (2001) Continuous and discretized pursuit learning schemes: various algorithms and their comparison. IEEE Trans Syst Man Cybern B: Cyber 31(3):277–287CrossRefGoogle Scholar
  14. 14.
    Zhang X, Granmo O-C, Oommen BJ (2012) “Discretized Bayesian pursuit - a new scheme for reinforcement learning. In: Proceedings of IEA-AIE 2012, Dalian, China, pp 784–793Google Scholar
  15. 15.
    Oommen BJ, Granmo O-C, Pedersen A (2007) Using stochastic AI techniques to achieve unbounded resolution in finite player Goore Games and its applications. In: Proceedings of IEEE Symposium on Computational Intelligence and Games, Honolulu, HI, pp 161–167Google Scholar
  16. 16.
    Beigy H, Meybodi MR (2000) Adaptation of parameters of BP algorithm using learning automata. In: Proceedings of Sixth Brazilian Symposium on Neural Networks, JR, Brazil, pp 24–31Google Scholar
  17. 17.
    Granmo O-C, Oommen BJ, Myrer S-A, Olsen MG (2007) Learning automata-based solutions to the nonlinear fractional knapsack problem with applications to optimal resource allocation. IEEE Trans Syst Man Cybern B 37(1):166–175CrossRefGoogle Scholar
  18. 18.
    Unsal C, Kachroo P, Bay JS (1999) Multiple stochastic learning automata for vehicle path control in an automated highway system. IEEE Trans Syst Man Cybern A 29:120–128CrossRefGoogle Scholar
  19. 19.
    Oommen BJ, Roberts TD (2000) Continuous learning automata solutions to the capacity assignment problem. IEEE Trans Comput 49:608–620CrossRefGoogle Scholar
  20. 20.
    Granmo O-C (2010) Solving stochastic nonlinear resource allocation problems using a hierarchy of twofold resource allocation automata. IEEE Trans Comput 59(4):545–560CrossRefMathSciNetGoogle Scholar
  21. 21.
    Oommen BJ, Croix TDS (1997) String taxonomy using learning automata. IEEE Trans Syst Man Cybern 27:354–365CrossRefGoogle Scholar
  22. 22.
    Oommen BJ, de St. Croix EV (1996) Graph partitioning using learning automata. IEEE Trans Comput 45:195–208CrossRefMathSciNetMATHGoogle Scholar
  23. 23.
    Dean T, Angluin D, Basye K, Engelson S, Aelbling L, Maron O (1995) Inferring finite automata with stochastic output functions and an application to map learning. Mach Learn 18:81–08Google Scholar
  24. 24.
    Song Y, Fang Y, Zhang Y (2007) Stochastic channel selection in cognitive radio networks. In: Proceedings of IEEE Global Telecommunications Conference, Washington DC, USA, pp 4878–4882Google Scholar
  25. 25.
    Ryan M, Omkar T (2012) On 𝜖-optimality of the pursuit learning algorithm. J Appl Probab 49(3):795–805CrossRefMathSciNetMATHGoogle Scholar
  26. 26.
    Zhang X, Granmo O-C, Oommen BJ, Jiao L (2013) On using the theory of regular functions to prove the 𝜖-optimality of the continuous pursuit learning automaton. In: Proceedings of IEA-AIE 2013. Springer, Amsterdan, Holland, pp 262– 271Google Scholar
  27. 27.
    Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58:13–30CrossRefMathSciNetMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Xuan Zhang
    • 1
  • B. John Oommen
    • 1
    • 2
  • Ole-Christoffer Granmo
    • 1
  • Lei Jiao
    • 1
  1. 1.Department of ICTUniversity of AgderGrimstadNorway
  2. 2.School of Computer ScienceCarleton UniversityOttawaCanada

Personalised recommendations