Two-Objective Optimization Reinforcement Learning Used in Single-Phase Rectifier Control

  • Ande Zhou
  • Bin Liu
  • Yunxin Fan
  • Libing Fan
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 482)


Regarding the single-phase rectifier control as a Markov Decision Process (MDP) with continuous state space and discrete action space and in the meantime, we introduced a new two-objective optimization reinforcement learning framework and proposed a genetic algorithm to train the learning agent in order to optimize power factor and output DC voltage. This article analyzed the convergence of our new algorithm and presented favorable performance of numerical simulation.


Single-phase rectifier Markov decision process Reinforcement learning Genetic algorithm 


  1. 1.
    Stihi O, Ooi BT (1988) A single-phase controlled-current PWM rectifier. Power Electron IEEE Trans 3(4):453–459Google Scholar
  2. 2.
    Giri F, Abouloifa A, Lachkar I, Chaoui FZ (2010) Formal framework for nonlinear control of PWM AC/DC boost rectifiers—controller design and average performance analysis. IEEE Trans Control Syst Technol 18(2):323–335CrossRefGoogle Scholar
  3. 3.
    Kim GT, Lipo TA (1995) VSI-PWM rectifier/inverter system with a reduced switch count. IEEE Trans Ind Appl 32(6):1331–1337Google Scholar
  4. 4.
    Song HS, Keil R, Mutschler P, Van der Weem J (2003) Advanced control scheme for a single-phase PWM rectifier in traction applications. In: Conference record of the industry applications conference, 2003. Ias Meeting, vol 3 pp 1558–1565Google Scholar
  5. 5.
    Bellman RE (1957) A markov decision process. J Math Fluid Mech 6(1):65–73MathSciNetGoogle Scholar
  6. 6.
    Sutton RS (1998) Reinforcement learning. 11(5):126–134Google Scholar
  7. 7.
    Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4(1):237–285Google Scholar
  8. 8.
    Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12(1):219–245CrossRefGoogle Scholar
  9. 9.
    Gaskett C, Wettergreen D, Zelinsky A (1999) Q-learning in continuous state and action spaces. Lect Notes Comput Sci 1747:417–428CrossRefGoogle Scholar
  10. 10.
    Zitzler E, Laumanns M, Thiele L (2001) Spea2: improving the strength Pareto evolutionary algorithmGoogle Scholar
  11. 11.
    Sutton RS, Mcallester D, Singh S, Mansour Y (1999) Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Syst 12:1057–1063Google Scholar
  12. 12.
    Fonseca CM (1993) Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. In: Proceedings of international conference on genetic algorithms, pp 416–423Google Scholar
  13. 13.
    Horn J, Nafpliotis N, Goldberg DE (1994) A niched Pareto genetic algorithm for multiobjective optimization. In: IEEE world congress on computational intelligence, proceedings of the first ieee conference on evolutionary computation, 1994, vol 1, pp 82–87Google Scholar
  14. 14.
    Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. Springer, Berlin HeidelbergCrossRefGoogle Scholar
  15. 15.
    Cichosz P (1995) Truncating temporal differences: on the efficient implementation of td(lambda) for reinforcement learning 2:287–318Google Scholar
  16. 16.
    Bouzy B, Chaslot G (2006) Monte-carlo go reinforcement learning experiments. 187–194Google Scholar
  17. 17.
    Baird III LC (1995) Residual algorithms: reinforcement learning with function approximation. In: Machine learning, proceedings of the twelfth international conference on machine learning, Tahoe City, California, USA, July, pp 30–37Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.EMU Development DepartmentZhuzhou Electric Locomotive Company Ltd. CRRCZhuzhouChina

Personalised recommendations