Skip to main content

Improving learning ability of learning automata using chaos theory

Abstract

A learning automaton (LA) can be considered as an abstract system with a finite set of actions. LA operates by choosing an action from the set of its actions and applying it to the stochastic environment. The environment evaluates the chosen action, and automaton uses the response of the environment to update its decision-making method for selecting the next action. This process is repeated until the optimal action is found. The learning algorithm (learning scheme) determines how to use the environment response for updating the decision-making method to select the next action. In this paper, the chaos theory is incorporated with the LA and a new type of LA, namely chaotic LA (cLA), is introduced. In cLA, the chaotic numbers are used instead of the random numbers when choosing the action. The experiment results show that in most cases, the use of chaotic numbers leads to a significant improvement in the learning ability of the LA. Among the chaotic maps investigated in this paper, the Tent map has better performance than the other maps. The convergence rate/convergence time of the LA will increase/decrease by 91.4%/29.6% to 264.4%/69.1%, on average, by using the Tent map. Furthermore, the chaotic LA has more scalability than the standard LA, and its performance will not decrease significantly by increasing the problem size (number of actions).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  1. 1.

    Bolouki Speily OR, Kardan A (2018) Modeling the information spreading in online blog communities using learning automata. Int J Web Res 1(2):43–55

    Google Scholar 

  2. 2.

    Harmon ME, Harmon SS (1997) Reinforcement learning: a tutorial. WRIGHT LAB WRIGHT-PATTERSON AFB OH

  3. 3.

    Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, Cambridge

    MATH  Google Scholar 

  4. 4.

    Narendra KS, Thathachar MA (2012) Learning automata: an introduction. Courier Corporation, North Chelmsford

    Google Scholar 

  5. 5.

    Caponetto R, Fortuna L, Fazzino S, Xibilia MG (2003) Chaotic sequences to improve the performance of evolutionary algorithms. IEEE Trans Evol Comput 7(3):289–304

    Article  Google Scholar 

  6. 6.

    Zarei B, Meybodi MR, Masoumi B (2020) Chaotic memetic algorithm and its application for detecting community structure in complex networks. Chaos Interdiscip J Nonlinear Sci 30(1):013125

    MathSciNet  Article  Google Scholar 

  7. 7.

    Rezvanian A, Saghiri AM, Vahidipour SM, Esnaashari M, Meybodi MR (2018) Recent advances in learning automata. Springer, Berlin

    MATH  Book  Google Scholar 

  8. 8.

    Rezvanian A, Moradabadi B, Ghavipour M, Khomami MMD, Meybodi MR (2019) Learning automata approach for social networks. Springer, Berlin

    Book  Google Scholar 

  9. 9.

    Thathachar MA, Sastry PS (2011) Networks of learning automata: techniques for online stochastic optimization. Springer, Berlin

    Google Scholar 

  10. 10.

    Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. University of Cambridge, Cambridge

    Google Scholar 

  11. 11.

    Watkins CJCH (1989) Learning from delayed rewards. PhD thesis, Cambridge University

  12. 12.

    Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 5:834–846

    Article  Google Scholar 

  13. 13.

    Schwartz A (1993) A reinforcement learning method for maximizing undiscounted rewards. In: Proceedings of the Tenth International Conference on Machine Learning, vol 298, pp 298–305

  14. 14.

    Lorenzelli F (2014) The essence of chaos. CRC Press, Boca Raton

    Book  Google Scholar 

  15. 15.

    Smith P (1998) Explaining chaos. Cambridge University Press, Cambridge

    MATH  Book  Google Scholar 

  16. 16.

    Williams G (1997) Chaos theory tamed. CRC Press, Boca Raton

    MATH  Book  Google Scholar 

  17. 17.

    Ausloos M, Dirickx M (2006) The logistic map and the route to chaos: from the beginnings to modern applications. Springer, Berlin

    MATH  Book  Google Scholar 

  18. 18.

    Hilborn RC (2000) Chaos and nonlinear dynamics: an introduction for scientists and engineers. Oxford University Press on Demand, Oxford

    MATH  Book  Google Scholar 

  19. 19.

    Gandomi AH, Yang X-S (2014) Chaotic bat algorithm. J Comput Sci 5(2):224–232

    MathSciNet  Article  Google Scholar 

  20. 20.

    Jordehi AR (2015) A chaotic artificial immune system optimisation algorithm for solving global continuous optimisation problems. Neural Comput Appl 26(4):827–833

    Article  Google Scholar 

  21. 21.

    Lu H, Wang X, Fei Z, Qiu M (2014) The effects of using chaotic map on improving the performance of multiobjective evolutionary algorithms. Math Prob Eng 2014:1–16

    MathSciNet  MATH  Google Scholar 

  22. 22.

    Vorontsova I (1965) Algorithms for changing stochastic automata transition probabilities. Probl Peredachi Inform 1(3):122–126

    Google Scholar 

  23. 23.

    Chandrasekharan B, Shen D (1968) On expediency and convergence in variable structure stochastic automata. IEEE Trans Syst Sci Cybern 5:145–149

    Article  Google Scholar 

  24. 24.

    Shapiro IJ, Narendra KS (1969) Use of stochastic automata for parameter self-optimization with multimodal performance criteria. IEEE Trans Syst Sci Cybern 5(4):352–360

    MATH  Article  Google Scholar 

  25. 25.

    Viswanathan R, Narendra KS (1972) A Note on the linear reinforcement scheme for variable-structure stochastic automata. IEEE Trans Syst Man Cybern SMC-2(2):292–294

    MathSciNet  MATH  Article  Google Scholar 

  26. 26.

    Narendra KS, Thathachar MA (1974) Learning automata-a survey. IEEE Trans Syst Man Cybern 4:323–334

    MathSciNet  MATH  Article  Google Scholar 

  27. 27.

    Meybodi M, Lakshmivarahan S (1982) ɛ-Optimality of a general class of learning algorithm. Inform Sci 28:1–20

    MathSciNet  MATH  Article  Google Scholar 

  28. 28.

    Thathachar MA, Oommen BJ (1983) Learning automata processing ergodicity of the mean: the two-action case. IEEE Trans Syst Man Cybern 6:1143–1148

    MATH  Article  Google Scholar 

  29. 29.

    Poznyak S, Najim K (1997) On nonlinear reinforcement schemes. IEEE Trans Autom Control 42(7):1002–1004

    MathSciNet  MATH  Article  Google Scholar 

  30. 30.

    Friedman EJ, Shenker S (1992) Learning by distributed automata. University of California, California

    Google Scholar 

  31. 31.

    Thathachar MA, Sastry PS (1984) A class of rapidly converging algorithms for learning automata. In: IEEE International Conference on Cybernetics and Society, pp 602–606

  32. 32.

    Vasilakos AV, Papadimitriou GI (1995) A new approach to the design of reinforcement schemes for learning automata: stochastic estimator learning algorithm. Neurocomputing 7(3):275–297

    MATH  Article  Google Scholar 

  33. 33.

    Papadimitriou GI, Pomportsis AS, Kiritsi S, Talahoupi E (2001) Absorbing stochastic estimator learning algorithms with high accuracy and rapid convergence. In: Proceedings ACS/IEEE International Conference on Computer Systems and Applications. IEEE, pp 45–51

  34. 34.

    Lanctôt JK, Oommen BJ (1992) Discretized estimator learning automata. IEEE Trans Syst Man Cybern 22(6):1473–1483

    MathSciNet  Article  Google Scholar 

  35. 35.

    Simha R, Kurose JF (1989) Relative reward strength algorithms for learning automata. IEEE Trans Syst Man Cybern 19(2):388–398

    MathSciNet  Article  Google Scholar 

  36. 36.

    Vasilakos AV, Paximadis G (1994) Fault-tolerant routing algorithms using estimator discretized learning automata for high-speed packet-switched networks. IEEE Trans Reliab 43(4):582–593

    Article  Google Scholar 

  37. 37.

    Thathachar MA, Sastry PS (1986) Estimator algorithms for learning automata. In: Proceedings of the Platinum Jubilee Conference on Systems and Signal Processing, Department of Electrical Engineering, Indian Institute of Science, Bangalore, India

  38. 38.

    Oommen BJ, Agache M (2001) Continuous and discretized pursuit learning schemes: various algorithms and their comparison. IEEE Trans Syst Man Cybern Part B (Cybernet) 31(3):277–287

    Article  Google Scholar 

  39. 39.

    Oommen BJ, Lanctôt JK (1990) Discretized pursuit learning automata. IEEE Trans Syst Man Cybern 20(4):931–938

    MathSciNet  MATH  Article  Google Scholar 

  40. 40.

    Agache M, Oommen BJ (2002) Generalized pursuit learning schemes: new families of continuous and discretized learning automata. IEEE Trans Syst Man Cybern Part B (Cybern) 32(6):738–749

    Article  Google Scholar 

  41. 41.

    Ge H, Li S, Li J, Ren X (2017) A parameter-free learning automaton scheme. arXiv:1711.10111

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Mohammad Reza Meybodi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1

Appendix 1

In this appendix, the results of Tables 2 and 3 are presented as a chart for further analysis. In the charts of this appendix, the red horizontal dashed line indicates the convergence rate/convergence time of the standard LA, and the bars indicate the convergence rate/convergence time of the chaotic LAs. In the blue charts (Figs. 13, 14, 15, 16, 17), which illustrate the convergence rate, chaotic LAs whose bar are over the red horizontal dashed line have better performance than the standard LA. In the green charts (Figs. 18, 19, 20, 21, 22), which illustrate the convergence time, chaotic LAs whose bar are under the red horizontal dashed line have better performance than the standard LA. By investigating these charts, we can simply conclude that the Tent chaotic LA has better performance than the other chaotic LAs and standard LA in terms of both mentioned criteria, i.e., convergence rate and convergence time.

Fig. 13
figure13

Convergence rate in the problem \(P_{\text{I}}\) for different modes of action selection

Fig. 14
figure14

Convergence rate in the problem \(P_{\text{II}}\) for different modes of action selection

Fig. 15
figure15

Convergence rate in the problem \(P_{\text{III}}\) for different modes of action selection

Fig. 16
figure16

Convergence rate in the problem \(P_{\text{IV}}\) for different modes of action selection

Fig. 17
figure17

Convergence rate in the problem \(P_{\text{V}}\) for different modes of action selection

Fig. 18
figure18

Convergence time in the problem \(P_{\text{I}}\) for different modes of action selection

Fig. 19
figure19

Convergence time in the problem \(P_{\text{II}}\) for different modes of action selection

Fig. 20
figure20

Convergence time in the problem \(P_{\text{III}}\) for different modes of action selection

Fig. 21
figure21

Convergence time in the problem \(P_{\text{IV}}\) for different modes of action selection

Fig. 22
figure22

Convergence time in the problem \(P_{\text{V}}\) for different modes of action selection

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zarei, B., Meybodi, M.R. Improving learning ability of learning automata using chaos theory. J Supercomput 77, 652–678 (2021). https://doi.org/10.1007/s11227-020-03293-z

Download citation

Keywords

  • Reinforcement learning
  • Learning automata
  • Chaos theory
  • Chaotic map
  • Chaotic learning automata