KIcker: An Industrial Drive and Control Foosball System automated with Deep Reinforcement Learning

De Blasi, Stefano; Klöser, Sebastian; Müller, Arne; Reuben, Robin; Sturm, Fabian; Zerrer, Timo

doi:10.1007/s10846-021-01389-z

KIcker: An Industrial Drive and Control Foosball System automated with Deep Reinforcement Learning

Short Paper
Published: 24 April 2021

Volume 102, article number 20, (2021)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Stefano De Blasi ORCID: orcid.org/0000-0003-0459-5956^1,2,
Sebastian Klöser³,
Arne Müller³,
Robin Reuben³,
Fabian Sturm¹ &
…
Timo Zerrer³

379 Accesses
6 Citations
4 Altmetric
Explore all metrics

Abstract

The majority of efforts in the field of sim-to-real Deep Reinforcement Learning focus on robot manipulators, which is justified by their importance for modern production plants. However, there are only a few studies for a more extensive use in manufacturing processes. In this paper, we contribute to this by automating a complex manufacturing-like process using simulation-based Deep Reinforcement Learning. The setup and workflow presented here are designed to mimic the characteristics of real manufacturing processes and proves that Deep Reinforcement Learning can be applied to physical systems built from industrial drive and control components by transferring policies learned in simulation to the real machine. Aided by domain randomization, training in a virtual environment is crucial due to the benefit of accelerated training speed and the desire for safe Reinforcement Learning. Our key contribution is to demonstrate the applicability of simulation-based Deep Reinforcement Learning in industrial automation technology. We introduce an industrial drive and control system, based on the classic pub game Foosball, from both an engineering and a simulation perspective, describing the strategies applied to increase transfer robustness. Our approach allowed us to train a self-learning agent to independently learn successful control policies for demanding Foosball tasks based on sparse reward signals. The promising results prove that state-of-the-art Deep Reinforcement Learning algorithms are able to produce models trained in simulation, which can successfully control industrial use cases without using the actual system for training beforehand.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Reinforcement Learning Control of an Electromechanical Pinball Machine

Training a RoboCup Striker Agent via Transferred Reinforcement Learning

Transferring policy of deep reinforcement learning from simulation to reality for robotics

Article 14 December 2022

Availability of data and materials

Not applicable.

References

Andrychowicz, O.M., Baker, B., Chociej, M., Józefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., Zaremba, W.: Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39(1), 3–20 (2020). https://doi.org/10.1177/0278364919887447
Article Google Scholar
van Baar, J., Sullivan, A., Cordorel, R., Jha, D., Romeres, D., Nikovski, D.: Sim-To-Real transfer learning using robustified controllers in robotic tasks involving complex dynamics. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6001–6007, IEEE (2019)
Bambach, S., Lee, S.: Real-Time Foosball Game State Tracking. Tech. rep., School of Informatics and ComputingIndiana University (2012)
Beul, M., Behnke, S.: Fast full state trajectory generation for multirotors. In: Proceedings of International Conference on Unmanned Aircraft Systems (ICUAS), Miami, FL, USA. https://doi.org/10.1109/ICUAS.2017.7991304, https://github.com/AIS-Bonn/opt_control (2017)
Bousmalis, K., Irpan, A., Wohlhart, P., Bai, Y., Kelcey, M., Kalakrishnan, M., Downs, L., Ibarz, J., Pastor, P., Konolige, K., et al.: Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4243–4250, IEEE (2018)
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. arXiv preprint arXiv:160601540 (2016)
Campbell, B., Deo, P., Hanganu, C., Muhlstein, L.: Reinforcement Learning in Graphical Games. Tech Rep. Rutgers University, New Jersey Governors School of Engineering and Technology (2009)
Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S., Levine, S.: Combining model-based and model-free updates for trajectory-centric reinforcement learning. Proceedings of the 34th International Conference on Machine Learning, JMLR. org 70, 703–711 (2017)
Google Scholar
Collins, J., Howard, D., Leitner, J.: Quantifying the reality gap in robotic manipulation tasks. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6706–6712. IEEE (2019)
De Blasi, S., Engels, E.: Next generation control units simplifying industrial machine learning. In: 2020 IEEE 29Th International Symposium on Industrial Electronics (ISIE), pp. 468–473. IEEE (2020)
Grondman, I., Busoniu, L., Lopes, G.A.D., Babuska, R.: A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(6), 1291–1307 (2012). https://doi.org/10.1109/TSMCC.2012.2218595
Article Google Scholar
Hornung, A., Zhang, D.: On-line detection of rule violations in table soccer. In: Annual Conference on Artificial Intelligence, Springer, pp. 217–224 (2008)
Humayoo, M., Cheng, X.: Relative importance sampling for off-policy actor-critic in deep reinforcement learning. CoRR arXiv:abs/1810.12558 (2018)
Janssen, R., de Best, J., van de Molengraft, R.: Real-time ball tracking in a semi-automated foosball table. In: Robot Soccer World Cup, pp. 128–139. Springer (2009)
Janssen, R., Verrijt, M., de Best, J., van de Molengraft, R.: Ball localization and tracking in a highly dynamic table soccer environment. Mechatronics 22(4), 503–514 (2012)
Article Google Scholar
Juliani, A., Berges, V., Vckay, E, Gao, Y., Henry, H., Mattar, M., Lange, D.: Unity: A general platform for intelligent agents. CoRR arXiv:1809.02627 (2018)
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V., Levine, S.: QT-Opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv e-prints arXiv:1806.10293 (2018)
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)
Article Google Scholar
Kröger, T.: On-Line Trajectory Generation in Robotic Systems - Basic Concepts for Instantaneous Reactions to Unforeseen (Sensor) Events, vol 58. Springer. https://doi.org/10.1007/978-3-642-05175-3 (2010)
Lee, Y., Hu, E.S., Yang, Z., Yin, A., Lim, J.J.: IKEA furniture assembly environment for long-horizon complex manipulation tasks. arXiv e-prints arXiv:1911.07246 (2019)
Luo, J., Solowjow, E., Wen, C., Ojea, J.A., Agogino, A.M., Tamar, A., Abbeel, P.: Reinforcement learning on variable impedance controller for high-precision robotic assembly. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3080–3087, IEEE (2019)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
Muratore, F., Gienger, M., Peters, J.: Assessing transferability from simulation to reality for reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2019.2952353 (2019)
Padon, O., Zuckerman, O., Kindergarten, L.: Development of Robotic Foosball as a Versatile Platform for Robotics Research and Contests. Tech Rep., Lifelong Kindergarten Media Lab, MIT (2003)
Peng, X.B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Sim-To-Real transfer of robotic control with dynamics randomization. In: 2018 IEEE international conference on robotics and automation (ICRA), pp. 1–8, IEEE (2018)
Rusu, A.A., Večerík, M., Rothörl, T., Heess, N., Pascanu, R., Hadsell, R.: Sim-to-real robot learning from pixels with progressive nets. In: Conference on Robot Learning, pp. 262–270 (2017)
Schmidt, A., Schellroth, F., Riedel, O.: Control architecture for embedding reinforcement learning frameworks on industrial control hardware. In: Proceedings of the 3rd International Conference on Applications of Intelligent Systems, pp. 1–6 (2020)
Schoettler, G., Nair, A., Luo, J., Bahl, S., Aparicio Ojea, J., Solowjow, E., Levine, S.: Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards. arXiv e-prints arXiv:1906.05841 (2019)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR arXiv:1707.06347 (2017)
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Article Google Scholar
Sutton, R., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Syst 12 (2000)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book, Cambridge (2018)
MATH Google Scholar
Taitler, A., Shimkin, N.: Learning control for air hockey striking using deep reinforcement learning. In: 2017 International Conference on Control, Artificial Intelligence, Robotics & Optimization ICCAIRO, pp. 22–27. IEEE (2017)
Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S., Vanhoucke, V.: Sim-to-real: Learning agile locomotion for quadruped robots. arXiv preprint arXiv:1804.10332(2018)
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30, IEEE (2017)
Weichert, D., Link, P., Stoll, A., Rüping, S., Ihlenfeldt, S., Wrobel, S.: A review of machine learning for the optimization of production processes. Int. J. Adv. Manuf. Technol. 104(5-8), 1889–1902 (2019)
Article Google Scholar
Weigel, T.: Kiro - a table soccer robot ready for the market. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp. 4266–4271. https://doi.org/10.1109/ROBOT.2005.1570776 (2005)
Weigel, T., Nebel, B.: Tischfußball: Mensch versus computer. Informatik-Spektrum 31(4), 323 (2008). https://doi.org/10.1007/s00287-008-0255-z
Article Google Scholar
Weigel, T., Rechert, K., Nebel, B.: Behavior recognition and opponent modeling for adaptive table soccer playing. In: Furbach, U. (ed.) KI 2005: Advances in Artificial Intelligence, pp 335–350. Springer, Berlin (2005)
Yin, S., Li, X., Gao, H., Kaynak, O.: Data-based techniques focused on modern industry: an overview. IEEE Trans. Ind. Electron. 62(1), 657–667 (2014)
Article Google Scholar
Zhang, D., Nebel, B.: Learning a table soccer robot a new action sequence by observing and imitating. In: AIIDE, pp. 61–67 (2007)
Zhang, D., Nebel, B.: Recording and segmenting table soccer games–initial results. In: Proceedings of the 1st International Symposium on Skill Science, pp. 61–67 (2007)
Zhao, Y., Xiong, R., Zhang, Y.: Model based motion state estimation and trajectory prediction of spinning ball for ping-pong robots using expectation-maximization algorithm. J. Intell. Robot. Syst. 87(3), 407–423 (2017). https://doi.org/10.1007/s10846-017-0515-8
Article Google Scholar
Zhong, R.Y., Xu, X., Klotz, E., Newman, S.T.: Intelligent manufacturing in the context of industry 4.0: a review. Engineering 3(5), 616–630 (2017)
Article Google Scholar

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Bosch Rexroth AG, 97816, Lohr am Main, Germany
Stefano De Blasi & Fabian Sturm
University of Applied Sciences Fulda, 36037, Fulda, Germany
Stefano De Blasi
DXC Technology, 71034, Böblingen, Germany
Sebastian Klöser, Arne Müller, Robin Reuben & Timo Zerrer

Authors

Stefano De Blasi
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Klöser
View author publications
You can also search for this author in PubMed Google Scholar
Arne Müller
View author publications
You can also search for this author in PubMed Google Scholar
Robin Reuben
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Sturm
View author publications
You can also search for this author in PubMed Google Scholar
Timo Zerrer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SDB, SK, AM, RR, FS initialized and conceived the project. SK, AM, RR, TZ designed the kinematic control simulation with environment. RR ran the training of the agents. SDB, SK, RR conceived and planned the experiments. SK prepared the models for deployment. SDB deployed the model on the system, carried out the experiments and analyzed the data. SDB, SK, RR discussed the results and wrote the manuscript.

Corresponding author

Correspondence to Stefano De Blasi.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for Publication

Not applicable.

Competing interests

There are no conflicts of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Derivation of a Simple Policy Gradient Loss

Starting from Eq. 2, we introduce a short notation for the cumulative discounted reward obtained for a given trajectory: $R(\tau ) = {\sum }_{k=0}^{\infty } \gamma ^{k} r(s_{k},a_{k})$. This allows us to easily derive a basic policy gradient algorithm using the following loss function which we want to optimize:

$$ L(\theta) = \underset{\tau\sim\pi_{\theta}}{\mathbb{E}} R(\tau). $$

(13)

To expand the expectation value, one integrates the cumulative returns over all trajectories weighted by the probability for that trajectory:

$$ L(\theta) = {\int}_{\tau} p_{\theta}(\tau) R(\tau) d\tau. $$

(14)

Since only the trajectory-probability depends on 𝜃, we can take the derivative of J(𝜃) with respect to 𝜃. By using a simple log-derivative identity, this can be reformulated:

$$ \begin{array}{@{}rcl@{}} \nabla_{\theta} L(\theta) & =& {\int}_{\tau} \nabla_{\theta} p_{\theta}(\tau) R(\tau) d\tau \\ & =& {\int}_{\tau} p_{\theta}(\tau) \nabla_{\theta} \log \ p_{\theta}(\tau) R(\tau) d\tau \\ & =& \underset{\tau\sim\pi_{\theta}}{\mathbb{E}} \nabla_{\theta} \log \ p_{\theta}(\tau) R(\tau) \end{array} $$

(15)

The term $\nabla _{\theta } \log \ p_{\theta }(\tau )$ is evaluated by decomposing p_𝜃(τ):

$$ \begin{array}{@{}rcl@{}} \nabla_{\theta} \log \ p_{\theta}(\tau) &=& \nabla_{\theta} \log \left( s_{0} \sum\limits_{k=0}^{\infty} T(s_{k+1}|s_{k},a_{k}) \pi(a_{k}|s_{k})\right) \\ &=& \nabla_{\theta} s_{0} + \sum\limits_{k=0}^{\infty} \nabla_{\theta} T(s_{k+1}|s_{k},a_{k}) + \\ && {} \sum\limits_{k=0}^{\infty} \nabla_{\theta} \pi(a_{k}|s_{k}) \\ &=& \sum\limits_{k=0}^{\infty} \nabla_{\theta} \pi(a_{k}|s_{k}) \end{array} $$

(16)

So, overall the gradient of the loss function can be formulated as:

$$ \nabla_{\theta} L(\theta) = \underset{\tau\sim\pi_{\theta}}{\mathbb{E}} \left( \sum\limits_{k=0}^{\infty} \nabla_{\theta} \pi(a_{k}|s_{k})\right) \left( \sum\limits_{k=0}^{\infty} \gamma^{k} r(s_{k},a_{k})\right) $$

(17)

This expression is very similar to a standard maximum likelihood gradient used in supervised learning. The only difference is the last term coming from the cumulative rewards. It acts as a weight term for the gradient likelihood terms. With this, policy updates

$$ \theta^{\prime} \leftarrow \theta + \alpha \nabla_{\theta} L(\theta) $$

(18)

can be performed iteratively.

Rights and permissions

Reprints and permissions

About this article

Cite this article

De Blasi, S., Klöser, S., Müller, A. et al. KIcker: An Industrial Drive and Control Foosball System automated with Deep Reinforcement Learning. J Intell Robot Syst 102, 20 (2021). https://doi.org/10.1007/s10846-021-01389-z

Download citation

Received: 08 October 2020
Accepted: 31 March 2021
Published: 24 April 2021
DOI: https://doi.org/10.1007/s10846-021-01389-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

KIcker: An Industrial Drive and Control Foosball System automated with Deep Reinforcement Learning

Abstract

Access this article

Similar content being viewed by others

Towards Reinforcement Learning Control of an Electromechanical Pinball Machine

Training a RoboCup Striker Agent via Transferred Reinforcement Learning

Transferring policy of deep reinforcement learning from simulation to reality for robotics

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for Publication

Competing interests

Additional information

Publisher’s Note

Appendix: Derivation of a Simple Policy Gradient Loss

Rights and permissions

About this article

Cite this article

Keywords

Navigation

KIcker: An Industrial Drive and Control Foosball System automated with Deep Reinforcement Learning

Abstract

Access this article

Similar content being viewed by others

Towards Reinforcement Learning Control of an Electromechanical Pinball Machine

Training a RoboCup Striker Agent via Transferred Reinforcement Learning

Transferring policy of deep reinforcement learning from simulation to reality for robotics

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for Publication

Competing interests

Additional information

Publisher’s Note

Appendix: Derivation of a Simple Policy Gradient Loss

Appendix: Derivation of a Simple Policy Gradient Loss

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation