Skip to main content

Effect of an Ensemble Algorithm in Reinforcement Learning for Garbage-Collection Sailing

  • Conference paper
  • First Online:
Robotic Sailing 2016 (WRSC/IRSC 2016)

Abstract

A robot sailor can obtain its behaviour autonomously with reinforcement learning. However, reinforcement learning suffers from the curse of dimensionality, with an increase in state variables and an exponential increase in the number of states to realize fine control. This paper introduces an ensemble algorithm in Q-learning to allow robot sailors to collect garbage while sailing, and discusses the effect of the ensemble algorithm. This paper especially investigated the enhancement of decision-making to sail faster to the target position, while keeping a small number of state variables and a small number of states. Numerical experiments show a statistically significant enhancement by the proposed ensemble decision-making algorithm with a diverse number of agents, state variables, and learning parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Watkins, C.: Learning from Delayed Rewards Ph.D. thesis. Cambridge University, Cambridge (1989)

    Google Scholar 

  2. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  3. Barto, A.G., Sutton, R.S., Anderson, W.C.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst., Man Cybern. SMC–13(5), 834–846 (1983)

    Article  Google Scholar 

  4. Asada, M., Noda, S., Tawaratsumida, S., Hosoda, K.: Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Mach. Learn. 23, 279–303 (1996)

    Google Scholar 

  5. Notsu, A., Ichihashi, H., Honda, K.: State and action space segmentation algorithm in Q-learning. Int. Joint Conf. Neural Netw. pp. 2384–2389 (2008)

    Google Scholar 

  6. Kashimura, Y., Ueno, A., Tatsumi, S.: A continuous action space representation by particle filter for reinforcement learning. In: 22nd Annual Conference of the Japanese Society for Artificial Intelligence (2008) (in Japanese)

    Google Scholar 

  7. Morimoto, J., Doya, K.: Reinforcement learning of dynamic motor sequence: learning to stand up. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol. 3, 1721–1726 (1998)

    Google Scholar 

  8. Ishihara, S., Igarashi, H.: Applying the policy gradient method to behavior learning in multiagent systems: the pursuit problem. Syst. Comput. Jpn. 37(10), 101–109 (2006)

    Article  Google Scholar 

  9. Glorennec, P., Dept. d’Inf, I.d.R.F., Jouffe, L.: Fuzzy Q-learning. In: Proceeding of Sixth IEEE Internationall Conference on Fuzzy Systems, Vol. 2 (1997)

    Google Scholar 

  10. Hosoya, Y., Umano, M.: Dynamic fuzzy Q-learning with facility of tuning and removing fuzzy rules. IEEE World Congress on Computational Intelligence (2012)

    Google Scholar 

  11. Yairi, T., Hori, K., Nakasuka, S.: Autonomous reconstruction of state space for learning of robot behavior. In: Proceedings of of International Conference on Intelligent Robots and Systems, pp. 891–896 (2000)

    Google Scholar 

  12. Nagayoshi, M., Murao, H., Tamaki, H.: A reinforcement learning with switching controllers for a continuous action space. Artif. Life Robot. 15, 97–100 (2010)

    Article  Google Scholar 

  13. Sterne, P.J.: Reinforcement Sailing. Artificial Intelligence, School of Informatics, University of Edinburgh, Master of Science (2004)

    Google Scholar 

  14. Manabe, H., Tachibana, K.: Consideration of state representation for semi-autonomous reinforcement learning of sailing within a navigable area. Robot. Sail. 1, 89–102 (2015)

    Google Scholar 

  15. Ueda, N., Nakano, R.: Generalization error of ensemble estimators. IEEE Conf. Neural Netw. pp. 90–95 (1996)

    Google Scholar 

  16. Sugiyama, T., Obata, T., Hoki, K., Ito, T.: Optimistic selection rule better than majority voting system. Comput. Games LNCS 6515, 166–175 (2011)

    Article  MATH  Google Scholar 

  17. Freund, Y., Schapire, E.R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 15(1), 119–139 (1995)

    MathSciNet  MATH  Google Scholar 

  18. Wiering, M.A., Hasselt, H.V.: Ensemble algorithm in reinforcement learning. IEEE Trans. Syst., Man Cybern. B 38(4), 930–936 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kanta Tachibana .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Tachibana, K., Fukazawa, R. (2017). Effect of an Ensemble Algorithm in Reinforcement Learning for Garbage-Collection Sailing. In: Alves, J., Cruz, N. (eds) Robotic Sailing 2016. WRSC/IRSC 2016. Springer, Cham. https://doi.org/10.1007/978-3-319-45453-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45453-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45452-8

  • Online ISBN: 978-3-319-45453-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics