Skip to main content

Advertisement

Log in

KIcker: An Industrial Drive and Control Foosball System automated with Deep Reinforcement Learning

  • Short Paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

The majority of efforts in the field of sim-to-real Deep Reinforcement Learning focus on robot manipulators, which is justified by their importance for modern production plants. However, there are only a few studies for a more extensive use in manufacturing processes. In this paper, we contribute to this by automating a complex manufacturing-like process using simulation-based Deep Reinforcement Learning. The setup and workflow presented here are designed to mimic the characteristics of real manufacturing processes and proves that Deep Reinforcement Learning can be applied to physical systems built from industrial drive and control components by transferring policies learned in simulation to the real machine. Aided by domain randomization, training in a virtual environment is crucial due to the benefit of accelerated training speed and the desire for safe Reinforcement Learning. Our key contribution is to demonstrate the applicability of simulation-based Deep Reinforcement Learning in industrial automation technology. We introduce an industrial drive and control system, based on the classic pub game Foosball, from both an engineering and a simulation perspective, describing the strategies applied to increase transfer robustness. Our approach allowed us to train a self-learning agent to independently learn successful control policies for demanding Foosball tasks based on sparse reward signals. The promising results prove that state-of-the-art Deep Reinforcement Learning algorithms are able to produce models trained in simulation, which can successfully control industrial use cases without using the actual system for training beforehand.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Availability of data and materials

Not applicable.

References

  1. Andrychowicz, O.M., Baker, B., Chociej, M., Józefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., Zaremba, W.: Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39(1), 3–20 (2020). https://doi.org/10.1177/0278364919887447

    Article  Google Scholar 

  2. van Baar, J., Sullivan, A., Cordorel, R., Jha, D., Romeres, D., Nikovski, D.: Sim-To-Real transfer learning using robustified controllers in robotic tasks involving complex dynamics. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6001–6007, IEEE (2019)

  3. Bambach, S., Lee, S.: Real-Time Foosball Game State Tracking. Tech. rep., School of Informatics and ComputingIndiana University (2012)

  4. Beul, M., Behnke, S.: Fast full state trajectory generation for multirotors. In: Proceedings of International Conference on Unmanned Aircraft Systems (ICUAS), Miami, FL, USA. https://doi.org/10.1109/ICUAS.2017.7991304, https://github.com/AIS-Bonn/opt_control (2017)

  5. Bousmalis, K., Irpan, A., Wohlhart, P., Bai, Y., Kelcey, M., Kalakrishnan, M., Downs, L., Ibarz, J., Pastor, P., Konolige, K., et al.: Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4243–4250, IEEE (2018)

  6. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. arXiv preprint arXiv:160601540 (2016)

  7. Campbell, B., Deo, P., Hanganu, C., Muhlstein, L.: Reinforcement Learning in Graphical Games. Tech Rep. Rutgers University, New Jersey Governors School of Engineering and Technology (2009)

  8. Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S., Levine, S.: Combining model-based and model-free updates for trajectory-centric reinforcement learning. Proceedings of the 34th International Conference on Machine Learning, JMLR. org 70, 703–711 (2017)

    Google Scholar 

  9. Collins, J., Howard, D., Leitner, J.: Quantifying the reality gap in robotic manipulation tasks. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6706–6712. IEEE (2019)

  10. De Blasi, S., Engels, E.: Next generation control units simplifying industrial machine learning. In: 2020 IEEE 29Th International Symposium on Industrial Electronics (ISIE), pp. 468–473. IEEE (2020)

  11. Grondman, I., Busoniu, L., Lopes, G.A.D., Babuska, R.: A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(6), 1291–1307 (2012). https://doi.org/10.1109/TSMCC.2012.2218595

    Article  Google Scholar 

  12. Hornung, A., Zhang, D.: On-line detection of rule violations in table soccer. In: Annual Conference on Artificial Intelligence, Springer, pp. 217–224 (2008)

  13. Humayoo, M., Cheng, X.: Relative importance sampling for off-policy actor-critic in deep reinforcement learning. CoRR arXiv:abs/1810.12558 (2018)

  14. Janssen, R., de Best, J., van de Molengraft, R.: Real-time ball tracking in a semi-automated foosball table. In: Robot Soccer World Cup, pp. 128–139. Springer (2009)

  15. Janssen, R., Verrijt, M., de Best, J., van de Molengraft, R.: Ball localization and tracking in a highly dynamic table soccer environment. Mechatronics 22(4), 503–514 (2012)

    Article  Google Scholar 

  16. Juliani, A., Berges, V., Vckay, E, Gao, Y., Henry, H., Mattar, M., Lange, D.: Unity: A general platform for intelligent agents. CoRR arXiv:1809.02627 (2018)

  17. Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V., Levine, S.: QT-Opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv e-prints arXiv:1806.10293 (2018)

  18. Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)

    Article  Google Scholar 

  19. Kröger, T.: On-Line Trajectory Generation in Robotic Systems - Basic Concepts for Instantaneous Reactions to Unforeseen (Sensor) Events, vol 58. Springer. https://doi.org/10.1007/978-3-642-05175-3 (2010)

  20. Lee, Y., Hu, E.S., Yang, Z., Yin, A., Lim, J.J.: IKEA furniture assembly environment for long-horizon complex manipulation tasks. arXiv e-prints arXiv:1911.07246 (2019)

  21. Luo, J., Solowjow, E., Wen, C., Ojea, J.A., Agogino, A.M., Tamar, A., Abbeel, P.: Reinforcement learning on variable impedance controller for high-precision robotic assembly. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3080–3087, IEEE (2019)

  22. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Article  Google Scholar 

  23. Muratore, F., Gienger, M., Peters, J.: Assessing transferability from simulation to reality for reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2019.2952353 (2019)

  24. Padon, O., Zuckerman, O., Kindergarten, L.: Development of Robotic Foosball as a Versatile Platform for Robotics Research and Contests. Tech Rep., Lifelong Kindergarten Media Lab, MIT (2003)

  25. Peng, X.B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Sim-To-Real transfer of robotic control with dynamics randomization. In: 2018 IEEE international conference on robotics and automation (ICRA), pp. 1–8, IEEE (2018)

  26. Rusu, A.A., Večerík, M., Rothörl, T., Heess, N., Pascanu, R., Hadsell, R.: Sim-to-real robot learning from pixels with progressive nets. In: Conference on Robot Learning, pp. 262–270 (2017)

  27. Schmidt, A., Schellroth, F., Riedel, O.: Control architecture for embedding reinforcement learning frameworks on industrial control hardware. In: Proceedings of the 3rd International Conference on Applications of Intelligent Systems, pp. 1–6 (2020)

  28. Schoettler, G., Nair, A., Luo, J., Bahl, S., Aparicio Ojea, J., Solowjow, E., Levine, S.: Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards. arXiv e-prints arXiv:1906.05841 (2019)

  29. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR arXiv:1707.06347 (2017)

  30. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)

    Article  Google Scholar 

  31. Sutton, R., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Syst 12 (2000)

  32. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book, Cambridge (2018)

    MATH  Google Scholar 

  33. Taitler, A., Shimkin, N.: Learning control for air hockey striking using deep reinforcement learning. In: 2017 International Conference on Control, Artificial Intelligence, Robotics & Optimization ICCAIRO, pp. 22–27. IEEE (2017)

  34. Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S., Vanhoucke, V.: Sim-to-real: Learning agile locomotion for quadruped robots. arXiv preprint arXiv:1804.10332(2018)

  35. Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30, IEEE (2017)

  36. Weichert, D., Link, P., Stoll, A., Rüping, S., Ihlenfeldt, S., Wrobel, S.: A review of machine learning for the optimization of production processes. Int. J. Adv. Manuf. Technol. 104(5-8), 1889–1902 (2019)

    Article  Google Scholar 

  37. Weigel, T.: Kiro - a table soccer robot ready for the market. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp. 4266–4271. https://doi.org/10.1109/ROBOT.2005.1570776 (2005)

  38. Weigel, T., Nebel, B.: Tischfußball: Mensch versus computer. Informatik-Spektrum 31(4), 323 (2008). https://doi.org/10.1007/s00287-008-0255-z

    Article  Google Scholar 

  39. Weigel, T., Rechert, K., Nebel, B.: Behavior recognition and opponent modeling for adaptive table soccer playing. In: Furbach, U. (ed.) KI 2005: Advances in Artificial Intelligence, pp 335–350. Springer, Berlin (2005)

  40. Yin, S., Li, X., Gao, H., Kaynak, O.: Data-based techniques focused on modern industry: an overview. IEEE Trans. Ind. Electron. 62(1), 657–667 (2014)

    Article  Google Scholar 

  41. Zhang, D., Nebel, B.: Learning a table soccer robot a new action sequence by observing and imitating. In: AIIDE, pp. 61–67 (2007)

  42. Zhang, D., Nebel, B.: Recording and segmenting table soccer games–initial results. In: Proceedings of the 1st International Symposium on Skill Science, pp. 61–67 (2007)

  43. Zhao, Y., Xiong, R., Zhang, Y.: Model based motion state estimation and trajectory prediction of spinning ball for ping-pong robots using expectation-maximization algorithm. J. Intell. Robot. Syst. 87(3), 407–423 (2017). https://doi.org/10.1007/s10846-017-0515-8

    Article  Google Scholar 

  44. Zhong, R.Y., Xu, X., Klotz, E., Newman, S.T.: Intelligent manufacturing in the context of industry 4.0: a review. Engineering 3(5), 616–630 (2017)

    Article  Google Scholar 

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Contributions

SDB, SK, AM, RR, FS initialized and conceived the project. SK, AM, RR, TZ designed the kinematic control simulation with environment. RR ran the training of the agents. SDB, SK, RR conceived and planned the experiments. SK prepared the models for deployment. SDB deployed the model on the system, carried out the experiments and analyzed the data. SDB, SK, RR discussed the results and wrote the manuscript.

Corresponding author

Correspondence to Stefano De Blasi.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for Publication

Not applicable.

Competing interests

There are no conflicts of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Derivation of a Simple Policy Gradient Loss

Appendix: Derivation of a Simple Policy Gradient Loss

Starting from Eq. 2, we introduce a short notation for the cumulative discounted reward obtained for a given trajectory: \(R(\tau ) = {\sum }_{k=0}^{\infty } \gamma ^{k} r(s_{k},a_{k})\). This allows us to easily derive a basic policy gradient algorithm using the following loss function which we want to optimize:

$$ L(\theta) = \underset{\tau\sim\pi_{\theta}}{\mathbb{E}} R(\tau). $$
(13)

To expand the expectation value, one integrates the cumulative returns over all trajectories weighted by the probability for that trajectory:

$$ L(\theta) = {\int}_{\tau} p_{\theta}(\tau) R(\tau) d\tau. $$
(14)

Since only the trajectory-probability depends on 𝜃, we can take the derivative of J(𝜃) with respect to 𝜃. By using a simple log-derivative identity, this can be reformulated:

$$ \begin{array}{@{}rcl@{}} \nabla_{\theta} L(\theta) & =& {\int}_{\tau} \nabla_{\theta} p_{\theta}(\tau) R(\tau) d\tau \\ & =& {\int}_{\tau} p_{\theta}(\tau) \nabla_{\theta} \log \ p_{\theta}(\tau) R(\tau) d\tau \\ & =& \underset{\tau\sim\pi_{\theta}}{\mathbb{E}} \nabla_{\theta} \log \ p_{\theta}(\tau) R(\tau) \end{array} $$
(15)

The term \(\nabla _{\theta } \log \ p_{\theta }(\tau )\) is evaluated by decomposing p𝜃(τ):

$$ \begin{array}{@{}rcl@{}} \nabla_{\theta} \log \ p_{\theta}(\tau) &=& \nabla_{\theta} \log \left( s_{0} \sum\limits_{k=0}^{\infty} T(s_{k+1}|s_{k},a_{k}) \pi(a_{k}|s_{k})\right) \\ &=& \nabla_{\theta} s_{0} + \sum\limits_{k=0}^{\infty} \nabla_{\theta} T(s_{k+1}|s_{k},a_{k}) + \\ && {} \sum\limits_{k=0}^{\infty} \nabla_{\theta} \pi(a_{k}|s_{k}) \\ &=& \sum\limits_{k=0}^{\infty} \nabla_{\theta} \pi(a_{k}|s_{k}) \end{array} $$
(16)

So, overall the gradient of the loss function can be formulated as:

$$ \nabla_{\theta} L(\theta) = \underset{\tau\sim\pi_{\theta}}{\mathbb{E}} \left( \sum\limits_{k=0}^{\infty} \nabla_{\theta} \pi(a_{k}|s_{k})\right) \left( \sum\limits_{k=0}^{\infty} \gamma^{k} r(s_{k},a_{k})\right) $$
(17)

This expression is very similar to a standard maximum likelihood gradient used in supervised learning. The only difference is the last term coming from the cumulative rewards. It acts as a weight term for the gradient likelihood terms. With this, policy updates

$$ \theta^{\prime} \leftarrow \theta + \alpha \nabla_{\theta} L(\theta) $$
(18)

can be performed iteratively.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

De Blasi, S., Klöser, S., Müller, A. et al. KIcker: An Industrial Drive and Control Foosball System automated with Deep Reinforcement Learning. J Intell Robot Syst 102, 20 (2021). https://doi.org/10.1007/s10846-021-01389-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10846-021-01389-z

Keywords

Navigation