Applied Intelligence

, Volume 41, Issue 3, pp 974–985 | Cite as

A formal proof of the ε-optimality of absorbing continuous pursuit algorithms using the theory of regular functions

  • Xuan Zhang
  • Ole-Christoffer Granmo
  • B. John Oommen
  • Lei Jiao
Article

Abstract

The most difficult part in the design and analysis of Learning Automata (LA) consists of the formal proofs of their convergence accuracies. The mathematical techniques used for the different families (Fixed Structure, Variable Structure, Discretized etc.) are quite distinct. Among the families of LA, Estimator Algorithms (EAs) are certainly the fastest, and within this family, the set of Pursuit algorithms have been considered to be the pioneering schemes. Informally, if the environment is stationary, their ε-optimality is defined as their ability to converge to the optimal action with an arbitrarily large probability, if the learning parameter is sufficiently small/large. The existing proofs of all the reported EAs follow the same fundamental principles, and to clarify this, in the interest of simplicity, we shall concentrate on the family of Pursuit algorithms. Recently, it has been reported Ryan and Omkar (J Appl Probab 49(3):795–805, 2012) that the previous proofs for ε-optimality of all the reported EAs have a common flaw. The flaw lies in the condition which apparently supports the so-called “monotonicity” property of the probability of selecting the optimal action, which states that after some time instant t0, the reward probability estimates will be ordered correctly forever. The authors of the various proofs have rather offered a proof for the fact that the reward probability estimates are ordered correctly at a single point of time after t0, which, in turn, does not guarantee the ordering forever, rendering the previous proofs incorrect. While in Ryan and Omkar (J Appl Probab 49(3):795–805, 2012), a rectified proof was presented to prove the ε-optimality of the Continuous Pursuit Algorithm (CPA), which was the pioneering EA, in this paper, a new proof is provided for the Absorbing CPA (ACPA), i.e., an algorithm which follows the CPA paradigm but which artificially has absorbing states whenever any action probability is arbitrarily close to unity. Unlike the previous flawed proofs, instead of examining the monotonicity property of the action probabilities, it rather examines their submartingale property, and then, unlike the traditional approach, invokes the theory of Regular functions to prove that the probability of converging to the optimal action can be made arbitrarily close to unity. We believe that the proof is both unique and pioneering, and adds insights into the convergence of different EAs. It can also form the basis for formally demonstrating the ε-optimality of other Estimator algorithms which are artificially rendered absorbing.

Keywords

Pursuit algorithms CPA Absorbing CPA ε-optimality 

Notes

Acknowledgments

This work was partially supported by NSERC, the Natural Sciences and Engineering Research Council of Canada. A preliminary version of some of the results of this paper was presented at IEAAIE-2013, the 26th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Amsterdam, Holland, in June 2013 [1]. We also record our gratitude to the Associate Editor and anonymous Referees of the original version of this paper for their painstaking reviews. The changes that they requested certainly improved the quality of this paper.

References

  1. 1.
    Zhang X, Granmo O-C, Oommen B J, Jiao L (2013) On using the theory of regular functions to prove the ε-optimality of the continuous pursuit learning automaton. In: Proceedings of IEA-AIE 2013. Springer, Amsterdan, pp 262–271Google Scholar
  2. 2.
    Ryan M, Omkar T (2012) On ε-optimality of the pursuit learning algorithm. J Appl Probab 49(3):795–805MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Oommen B J, Granmo O-C, Pedersen A (2007) Using stochastic AI techniques to achieve unbounded resolution in finite player Goore Games and its applications. In: Proceedings of IEEE symposium on computational intelligence and games. Honolulu, pp 161–167Google Scholar
  4. 4.
    Granmo O-C, Glimsdal S (2013) Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the goore game. Appl Intell 38:479–488CrossRefGoogle Scholar
  5. 5.
    Beigy H, Meybodi M R (2000) Adaptation of parameters of BP algorithm using learning automata. In: Proceedings of 6th Brazilian symposium on neural networks. JR, Brazil, pp 24–31Google Scholar
  6. 6.
    Granmo O-C, Oommen B J, Myrer S-A, Olsen M G (2007) Learning automata-based solutions to the nonlinear fractional knapsack problem with applications to optimal resource allocation. IEEE Trans Syst Man Cybern B 37(1):166–175CrossRefGoogle Scholar
  7. 7.
    Unsal C, Kachroo P, Bay J S (1999) Multiple stochastic learning automata for vehicle path control in an automated highway system. IEEE Trans Syst Man Cybern 29:120–128CrossRefGoogle Scholar
  8. 8.
    Oommen B J, Roberts T D (2000) Continuous learning automata solutions to the capacity assignment problem. IEEE Trans Comput 49:608–620CrossRefGoogle Scholar
  9. 9.
    Granmo O-C, Oommen B J (2006) On allocating limited sampling resources using a learning automata-based solution to the fractional knapsack problem. In: Proceedings of the 2006 international intelligent information processing and web mining conference, Advances in Soft Computing, vol. 35. Ustron, Poland, pp 263–272Google Scholar
  10. 10.
    Granmo O-C, Oommen B J (2010) Optimal sampling for estimation with constrained resources using a learning automaton-based solution for the nonlinear fractional knapsack problem. Appl Intell 33(1):3–20CrossRefGoogle Scholar
  11. 11.
    Granmo O-C (2010) Solving stochastic nonlinear resource allocation problems using a hierarchy of twofold resource allocation automata. IEEE Trans Comput 59(4):545–560MathSciNetCrossRefGoogle Scholar
  12. 12.
    Oommen B J, Croix T D S (Apr. 1997) String taxonomy using learning automata. IEEE Trans Syst Man Cybern 27:354–365CrossRefGoogle Scholar
  13. 13.
    Oommen B J, Croix T D S (1996) Graph partitioning using learning automata. IEEE Trans Comput 45:195–208MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Dean T, Angluin D, Basye K, Engelson S, Aelbling L, Maron O (1995) Inferring finite automata with stochastic output functions and an application to map learning. Mach Learn 18:81–108Google Scholar
  15. 15.
    Yazidi A, Granmo O-C, Oommen B J (2012) Service selection in stochastic environments: A learning-automaton based solution. Appl Intell 36:617–637CrossRefGoogle Scholar
  16. 16.
    Vafashoar R, Meybodi M R, Momeni A A H (2012) Cla-de: a hybrid model based on cellular learning automata for numerical optimization. Appl Intell 36:735–748CrossRefGoogle Scholar
  17. 17.
    Torkestani J A (2012) An adaptive focused web crawling algorithm based on learning automata. Appl Intell 37:586–601CrossRefGoogle Scholar
  18. 18.
    Li J, Li Z, Chen J (2011) Microassembly path planning using reinforcement learning for improving positioning accuracy of a 1c m 3 omni-directional mobile microrobot. Appl Intell 34:211–225CrossRefGoogle Scholar
  19. 19.
    Erus G, Polat F (2007) A layered approach to learning coordination knowledge in multiagent environments. Appl Intell 27:249–267CrossRefGoogle Scholar
  20. 20.
    Hong J, Prabhu V V (2004) Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems. Appl Intell 20:71–87CrossRefGoogle Scholar
  21. 21.
    Narendra K S, Thathachar M A L (1989) Learning automata: an introduction. Prentice HallGoogle Scholar
  22. 22.
    Thathachar M A L, Sastry P S (1986) Estimator algorithms for learning automata. In: Proceedings of the platinum jubilee conference on systems and signal processing. Bangalore, India, pp 29–32Google Scholar
  23. 23.
    Oommen B J, Lanctot J K (1990) Discretized pursuit learning automata. IEEE Trans Syst Man Cybern 20:931–938MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Lanctot J K, Oommen B J (1991) On discretizing estimator-based learning algorithms. IEEE Trans Syst Man Cybern B Cybern 2:1417–1422Google Scholar
  25. 25.
    Lanctot J K, Oommen B J (1992) Discretized estimator learning automata. IEEE Trans Syst Man Cybern B Cybern 22(6):1473–1483MathSciNetCrossRefGoogle Scholar
  26. 26.
    Rajaraman K, Sastry P S (1996) Finite time analysis of the pursuit algorithm for learning automata. IEEE Trans Syst Man Cybern B Cybern 26:590–598CrossRefGoogle Scholar
  27. 27.
    Oommen B J, Agache M (2001) Continuous and discretized pursuit learning schemes: various algorithms and their comparison,”. IEEE Trans Syst Man Cybern B Cybern 31(3):277–287CrossRefGoogle Scholar
  28. 28.
    Oommen B J (1986) Absorbing and ergodic discretized two-action learning automata. IEEE Trans Syst Man Cybern 16:282–296MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58:13–30MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    Zhang X, Granmo O-C, Oommen B J (2013) On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata. Appl Intell 39:782–792CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Xuan Zhang
    • 1
  • Ole-Christoffer Granmo
    • 1
  • B. John Oommen
    • 1
    • 2
  • Lei Jiao
    • 1
  1. 1.Department of ICTUniversity of AgderGrimstadNorway
  2. 2.School of Computer ScienceCarleton UniversityOttawaCanada

Personalised recommendations