On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

Zhang, Xuan; Granmo, Ole-Christoffer; Oommen, B. John

doi:10.1007/s10489-013-0424-x

On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

Published: 02 February 2013

Volume 39, pages 782–792, (2013)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xuan Zhang¹,
Ole-Christoffer Granmo¹ &
B. John Oommen^2,3

277 Accesses
27 Citations
Explore all metrics

Abstract

There are currently two fundamental paradigms that have been used to enhance the convergence speed of Learning Automata (LA). The first involves the concept of utilizing the estimates of the reward probabilities, while the second involves discretizing the probability space in which the LA operates. This paper demonstrates how both of these can be simultaneously utilized, and in particular, by using the family of Bayesian estimates that have been proven to have distinct advantages over their maximum likelihood counterparts. The success of LA-based estimator algorithms over the classical, Linear Reward-Inaction (L _RI)-like schemes, can be explained by their ability to pursue the actions with the highest reward probability estimates. Without access to reward probability estimates, it makes sense for schemes like the L _RI to first make large exploring steps, and then to gradually turn exploration into exploitation by making progressively smaller learning steps. However, this behavior becomes counter-intuitive when pursuing actions based on their estimated reward probabilities. Learning should then ideally proceed in progressively larger steps, as the reward probability estimates turn more accurate. This paper introduces a new estimator algorithm, the Discretized Bayesian Pursuit Algorithm (DBPA), that achieves this by incorporating both the above paradigms. The DBPA is implemented by linearly discretizing the action probability space of the Bayesian Pursuit Algorithm (BPA) (Zhang et al. in IEA-AIE 2011, Springer, New York, pp. 608–620, 2011). The key innovation of this paper is that the linear discrete updating rules mitigate the counter-intuitive behavior of the corresponding linear continuous updating rules, by augmenting them with the reward probability estimates. Extensive experimental results show the superiority of DBPA over previous estimator algorithms. Indeed, the DBPA is probably the fastest reported LA to date. Apart from the rigorous experimental demonstration of the strength of the DBPA, the paper also briefly records the proofs of why the BPA and the DBPA are ϵ-optimal in stationary environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A formal proof of the 𝜖-optimality of discretized pursuit algorithms

Article 07 May 2015

A formal proof of the ε-optimality of absorbing continuous pursuit algorithms using the theory of regular functions

Article 31 May 2014

The design of absorbing Bayesian pursuit algorithms and the formal analyses of their ε-optimality

Article 25 January 2016

Notes

We refer the readers to [5] and the various papers about the families of estimator algorithms to understand the basic fundamentals of the convergence analysis for various LA.

References

Zhang X, Granmo O-C, Oommen BJ (2012) Discretized Bayesian pursuit—a new scheme for reinforcement learning. In: IEA-AIE 2012, Dalian, China, Jun 2012, pp 784–793
Google Scholar
Zhang X, Granmo O-C, Oommen BJ (2011) The Bayesian pursuit algorithm: a new family of estimator learning automata. In: IEA-AIE 2011. Springer, New York, pp 608–620
Google Scholar
Thathachar M, Sastry P (1986) Estimator algorithms for learning automata. In: The platinum jubilee conference on systems and signal processing, Bangalore, India, Dec 1986, pp 29–32
Google Scholar
Tsetlin M (1963) Finite automata and the modeling of the simplest forms of behavior. Usp Mat Nauk 8:1–26
MathSciNet Google Scholar
Narendra KS, Thathachar MAL (1989) Learning automat: an introduction. Prentice Hall, New York
Google Scholar
Thathachar M, Arvind M (1997) Solution of goore game using models of stochastic learning automata. J Indian Inst Sci 76:47–61
MathSciNet Google Scholar
Oommen BJ, Granmo O-C, Pedersen A (2006) Empirical verification of a strategy for unbounded resolution in finite player goore games. In: The 19th Australian joint conference on artificial intelligence, Hobart, Tasmania, Dec 2006, pp 1252–1258
Google Scholar
Oommen BJ, Granmo O-C, Pedersen A (2007) Using stochastic AI techniques to achieve unbounded resolution in finite player goore games and its applications. In: IEEE symposium on computational intelligence and games, Honolulu, HI Apr 2007
Google Scholar
Granmo O-C, Glimsdal S (2012, to appear) Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the goore game. Appl Intel
Granmo O-C, Oommen BJ, Pedersen A (2012) Achieving unbounded resolution in finite player goore games using stochastic automata, and its applications. Seq Anal 31:190–218
Article MathSciNet MATH Google Scholar
Narendra MAL, Thathacha KS (1987) Learning automata. Prentice-Hall, Englewood Cliffs
Google Scholar
Beigy H, Meybodi MR (2000) Adaptation of parameters of BP algorithm using learning automata. In: Sixth Brazilian symposium on neural networks. JR, Brazil, Nov 2000
Google Scholar
Song Y, Fang Y, Zhang Y (2007) Stochastic channel selection in cognitive radio networks. In: IEEE global telecommunications conference, Washington, DC, USA, Nov 2000, pp 4878–4882
Google Scholar
Oommen BJ, Roberts TD (2000) Continuous learning automata solutions to the capacity assignment problem. IEEE Trans Comput 49:608–620
Article Google Scholar
Granmo O-C, Oommen BJ, Myrer S-A, Olsen MG (2007) Learning automata-based solutions to the nonlinear fractional knapsack problem with applications to optimal resource allocation. IEEE Trans Syst Man Cybern, Part B, Cybern 37(1):166–175
Article Google Scholar
Granmo O-C, Oommen BJ, Myrer S-A, Olsen MG (2006) Determining optimal polling frequency using a learning automata-based solution to the fractional knapsack problem. In: The 2006 IEEE international conferences on cybernetics and intelligent systems (CIS) and robotics, automation and mechatronics (RAM), Bangkok, Thailand, Jun 2006, pp 1–7
Chapter Google Scholar
Granmo O-C, Oommen BJ (2011) Learning automata-based solutions to the optimal web polling problem modeled as a nonlinear fractional knapsack problem. Eng Appl Artif Intell 24(7):1238–1251
Article Google Scholar
Granmo O-C, Oommen BJ (2006) On allocating limited sampling resources using a learning automata-based solution to the fractional knapsack problem. In: The 2006 international intelligent information processing and web mining conference, advances in soft computing, vol 35. Ustron, Poland, Jun 2006, pp 263–272
Google Scholar
Granmo O-C, Oommen BJ (2010) Optimal sampling for estimation with constrained resources using a learning automaton-based solution for the nonlinear fractional knapsack problem. Appl Intell 33(1):3–20
Article Google Scholar
Yazidi A, Granmo O-C, Oommen BJ (2012) Service selection in stochastic environments: a learning-automaton based solution. Appl Intell 36:617–637
Article Google Scholar
Vafashoar R, Meybodi MR, Momeni AAH (2012) CLA-DE: a hybrid model based on cellular learning automata for numerical optimization. Appl Intell 36:735–748
Article Google Scholar
Torkestani JA (2012) An adaptive focused web crawling algorithm based on learning automata. Appl Intell 37:586–601
Article Google Scholar
Li J, Li Z, Chen J (2011) Microassembly path planning using reinforcement learning for improving positioning accuracy of a 1 cm³ omni-directional mobile microrobot. Appl Intell 34:211–225
Article Google Scholar
Erus G, Polat F (2007) A layered approach to learning coordination knowledge in multiagent environments. Appl Intell 27:249–267
Article Google Scholar
Hong J, Prabhu VV (2004) Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems. Appl Intell 20:71–87
Article Google Scholar
Kim CO, Kwon I-H, Baek J-G (2008) Asynchronous action-reward learning for nonstationary serial supply chain inventory control. Appl Intell 28:1–16
Article Google Scholar
Lakshmivarahan S (1981) Learning algorithms theory and applications. Springer, New York
Book MATH Google Scholar
Narendra KS, Thathachar MAL (1974) Learning automata–a survey. IEEE Trans Syst Man Cybern 4:323–334
Article MathSciNet MATH Google Scholar
Thathachar MAL, Sastry PS (1985) A class of rapidly converging algorithms for learning automata. IEEE Trans Syst Man Cybern SMC-15:168–175
Article MathSciNet Google Scholar
Sastry PS (1985) Systems of learning automata: Estimator algorithms applications. PhD thesis, Dept Elec Eng, Indian Institute of Science
Thathachar MAL, Sastry PS (1984) A new approach to designing reinforcement schemes for learning automata. In: IEEE int conf cybern syst, Bombay, India, Jan 1984, pp 1–7
Google Scholar
Granmo O-C (2010) Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton. Int J Intel Comput Cybern 3(2):207–234
Article MathSciNet MATH Google Scholar
Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25:285–294
MATH Google Scholar
Thathachar MAL, Oommen BJ (1979) Discretized reward-inaction learning automata. J Cybern Inf Sci, 24–29
Oommen BJ, Lanctot JK (1990) Discretized pursuit learning automata. IEEE Trans Syst Man Cybern 20:931–938
Article MathSciNet MATH Google Scholar
Oommen BJ, Agache M (2001) Continuous and discretized pursuit learning schemes: various algorithms and their comparison. IEEE Trans Syst Man Cybern, Part B, Cybern 31(3):277–287
Article Google Scholar
Oommen BJ (1990) Absorbing and ergodic discretized two-action learning automata. IEEE Trans Syst Man Cybern SMC-16:282–296
MathSciNet Google Scholar
Rajaraman K, Sastry PS (1996) Finite time analysis of the pursuit algorithm for learning automata. IEEE Trans Syst Man Cybern, Part B, Cybern 26:590–598
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of ICT, University of Agder, Grimstad, Norway
Xuan Zhang & Ole-Christoffer Granmo
School of Computer Science, Carleton University, Ottawa, Canada, K1S 5B6
B. John Oommen
University of Agder, Grimstad, Norway
B. John Oommen

Authors

Xuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ole-Christoffer Granmo
View author publications
You can also search for this author in PubMed Google Scholar
B. John Oommen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuan Zhang.

Appendix

Table 5 Notations used in DBPA

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Granmo, OC. & Oommen, B.J. On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata. Appl Intell 39, 782–792 (2013). https://doi.org/10.1007/s10489-013-0424-x

Download citation

Published: 02 February 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10489-013-0424-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

Abstract

Access this article

Similar content being viewed by others

A formal proof of the 𝜖-optimality of discretized pursuit algorithms

A formal proof of the ε-optimality of absorbing continuous pursuit algorithms using the theory of regular functions

The design of absorbing Bayesian pursuit algorithms and the formal analyses of their ε-optimality

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

Abstract

Access this article

Similar content being viewed by others

A formal proof of the 𝜖-optimality of discretized pursuit algorithms

A formal proof of the ε-optimality of absorbing continuous pursuit algorithms using the theory of regular functions

The design of absorbing Bayesian pursuit algorithms and the formal analyses of their ε-optimality

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation