Advertisement

AdQL – Anomaly Detection Q-Learning in Control Multi-queue Systems with QoS Constraints

  • Michal Stanek
  • Halina Kwasnicka
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6071)

Abstract

Reinforcement Learning is an optimal adaptive optimization method for stationary environments. For non-stationary environments where the transition function and reward structure change over time, the traditional algorithms seems to be ineffective in order to follow the environmental changes. In this paper we propose the Anomaly Detection Q-learning algorithm which increase learning abilities of standard Q-learning algorithm by applying Chauvenet’s criterion to detects anomalies.

Keywords

Reinforcement Learn Discount Factor Anomaly Detection Markov Decision Process Polling System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  2. 2.
    Watkins, C.: Learning from Delayed Rewards. PhD thesis, University of Cambridge, England (1989)Google Scholar
  3. 3.
    Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  4. 4.
    Sutton, R., Barto, A., Williams, R.: Reinforcement learning is direct adaptive control. IEEE Control Systems Magazine, 19–22 (1992)Google Scholar
  5. 5.
    Doya, K.: Metalearning and neuromodulation. Neural Netw. 15(4), 495–506 (2002)CrossRefGoogle Scholar
  6. 6.
    Murakoshi, K., Mizuno, J.: A parameter control method in reinforcement learning to rapidly follow unexpected environmental changes. Biosystems 77(1-3), 109–117 (2004)CrossRefGoogle Scholar
  7. 7.
    Reinforcement Learning-based Control of Traffic Lights in Non-stationary Environments: A Case Study in a Microscopic Simulator. In: Dunin-Keplicz, B., Omicini, A., Padget, J.A. (eds.) Proceedings of the 4th European Workshop on Multi-Agent Systems EUMAS 2006, December 14-15. CEUR Workshop Proceedings, CEUR-WS.org, vol. 223 (2006)Google Scholar
  8. 8.
    Poprawski, R., Salejda, W.: Zasady opracowania wyników pomiarów. Oficyna Wydawnicza Politechniki Wrocławskiej (2009)Google Scholar
  9. 9.
    Littman, M.L., Dean, T.L., Kaelbling, L.P.: On the complexity of solving markov decision problems. In: Proc. of the Eleventh International Conference on Uncertainty in Artificial Intelligence, pp. 394–402 (1995)Google Scholar
  10. 10.
    Levy, H., Sidi, M.: Polling systems: Applications, modelling, and optimization. IEEE Trans. Commun. 38(10), 1750–1760 (1990)CrossRefGoogle Scholar
  11. 11.
    Vishnevskii, V.M., Semenova, O.V.: Mathematical methods to study the polling systems. Automation and Remote Control 67(2), 173–220 (2006)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Documentation, C.I.: Quality of service solution guide, implementing diffserv for end-to-end quality of service, release 12.2, pp. 371–392 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Michal Stanek
    • 1
  • Halina Kwasnicka
    • 1
  1. 1.Instytut of InformaticsWroclaw University of Technology 

Personalised recommendations