Abstract
The majority of efforts in the field of sim-to-real Deep Reinforcement Learning focus on robot manipulators, which is justified by their importance for modern production plants. However, there are only a few studies for a more extensive use in manufacturing processes. In this paper, we contribute to this by automating a complex manufacturing-like process using simulation-based Deep Reinforcement Learning. The setup and workflow presented here are designed to mimic the characteristics of real manufacturing processes and proves that Deep Reinforcement Learning can be applied to physical systems built from industrial drive and control components by transferring policies learned in simulation to the real machine. Aided by domain randomization, training in a virtual environment is crucial due to the benefit of accelerated training speed and the desire for safe Reinforcement Learning. Our key contribution is to demonstrate the applicability of simulation-based Deep Reinforcement Learning in industrial automation technology. We introduce an industrial drive and control system, based on the classic pub game Foosball, from both an engineering and a simulation perspective, describing the strategies applied to increase transfer robustness. Our approach allowed us to train a self-learning agent to independently learn successful control policies for demanding Foosball tasks based on sparse reward signals. The promising results prove that state-of-the-art Deep Reinforcement Learning algorithms are able to produce models trained in simulation, which can successfully control industrial use cases without using the actual system for training beforehand.
Similar content being viewed by others
Availability of data and materials
Not applicable.
References
Andrychowicz, O.M., Baker, B., Chociej, M., Józefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., Zaremba, W.: Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39(1), 3–20 (2020). https://doi.org/10.1177/0278364919887447
van Baar, J., Sullivan, A., Cordorel, R., Jha, D., Romeres, D., Nikovski, D.: Sim-To-Real transfer learning using robustified controllers in robotic tasks involving complex dynamics. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6001–6007, IEEE (2019)
Bambach, S., Lee, S.: Real-Time Foosball Game State Tracking. Tech. rep., School of Informatics and ComputingIndiana University (2012)
Beul, M., Behnke, S.: Fast full state trajectory generation for multirotors. In: Proceedings of International Conference on Unmanned Aircraft Systems (ICUAS), Miami, FL, USA. https://doi.org/10.1109/ICUAS.2017.7991304, https://github.com/AIS-Bonn/opt_control (2017)
Bousmalis, K., Irpan, A., Wohlhart, P., Bai, Y., Kelcey, M., Kalakrishnan, M., Downs, L., Ibarz, J., Pastor, P., Konolige, K., et al.: Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4243–4250, IEEE (2018)
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. arXiv preprint arXiv:160601540 (2016)
Campbell, B., Deo, P., Hanganu, C., Muhlstein, L.: Reinforcement Learning in Graphical Games. Tech Rep. Rutgers University, New Jersey Governors School of Engineering and Technology (2009)
Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S., Levine, S.: Combining model-based and model-free updates for trajectory-centric reinforcement learning. Proceedings of the 34th International Conference on Machine Learning, JMLR. org 70, 703–711 (2017)
Collins, J., Howard, D., Leitner, J.: Quantifying the reality gap in robotic manipulation tasks. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6706–6712. IEEE (2019)
De Blasi, S., Engels, E.: Next generation control units simplifying industrial machine learning. In: 2020 IEEE 29Th International Symposium on Industrial Electronics (ISIE), pp. 468–473. IEEE (2020)
Grondman, I., Busoniu, L., Lopes, G.A.D., Babuska, R.: A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(6), 1291–1307 (2012). https://doi.org/10.1109/TSMCC.2012.2218595
Hornung, A., Zhang, D.: On-line detection of rule violations in table soccer. In: Annual Conference on Artificial Intelligence, Springer, pp. 217–224 (2008)
Humayoo, M., Cheng, X.: Relative importance sampling for off-policy actor-critic in deep reinforcement learning. CoRR arXiv:abs/1810.12558 (2018)
Janssen, R., de Best, J., van de Molengraft, R.: Real-time ball tracking in a semi-automated foosball table. In: Robot Soccer World Cup, pp. 128–139. Springer (2009)
Janssen, R., Verrijt, M., de Best, J., van de Molengraft, R.: Ball localization and tracking in a highly dynamic table soccer environment. Mechatronics 22(4), 503–514 (2012)
Juliani, A., Berges, V., Vckay, E, Gao, Y., Henry, H., Mattar, M., Lange, D.: Unity: A general platform for intelligent agents. CoRR arXiv:1809.02627 (2018)
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V., Levine, S.: QT-Opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv e-prints arXiv:1806.10293 (2018)
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)
Kröger, T.: On-Line Trajectory Generation in Robotic Systems - Basic Concepts for Instantaneous Reactions to Unforeseen (Sensor) Events, vol 58. Springer. https://doi.org/10.1007/978-3-642-05175-3 (2010)
Lee, Y., Hu, E.S., Yang, Z., Yin, A., Lim, J.J.: IKEA furniture assembly environment for long-horizon complex manipulation tasks. arXiv e-prints arXiv:1911.07246 (2019)
Luo, J., Solowjow, E., Wen, C., Ojea, J.A., Agogino, A.M., Tamar, A., Abbeel, P.: Reinforcement learning on variable impedance controller for high-precision robotic assembly. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3080–3087, IEEE (2019)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Muratore, F., Gienger, M., Peters, J.: Assessing transferability from simulation to reality for reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2019.2952353 (2019)
Padon, O., Zuckerman, O., Kindergarten, L.: Development of Robotic Foosball as a Versatile Platform for Robotics Research and Contests. Tech Rep., Lifelong Kindergarten Media Lab, MIT (2003)
Peng, X.B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Sim-To-Real transfer of robotic control with dynamics randomization. In: 2018 IEEE international conference on robotics and automation (ICRA), pp. 1–8, IEEE (2018)
Rusu, A.A., Večerík, M., Rothörl, T., Heess, N., Pascanu, R., Hadsell, R.: Sim-to-real robot learning from pixels with progressive nets. In: Conference on Robot Learning, pp. 262–270 (2017)
Schmidt, A., Schellroth, F., Riedel, O.: Control architecture for embedding reinforcement learning frameworks on industrial control hardware. In: Proceedings of the 3rd International Conference on Applications of Intelligent Systems, pp. 1–6 (2020)
Schoettler, G., Nair, A., Luo, J., Bahl, S., Aparicio Ojea, J., Solowjow, E., Levine, S.: Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards. arXiv e-prints arXiv:1906.05841 (2019)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR arXiv:1707.06347 (2017)
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Sutton, R., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Syst 12 (2000)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book, Cambridge (2018)
Taitler, A., Shimkin, N.: Learning control for air hockey striking using deep reinforcement learning. In: 2017 International Conference on Control, Artificial Intelligence, Robotics & Optimization ICCAIRO, pp. 22–27. IEEE (2017)
Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S., Vanhoucke, V.: Sim-to-real: Learning agile locomotion for quadruped robots. arXiv preprint arXiv:1804.10332(2018)
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30, IEEE (2017)
Weichert, D., Link, P., Stoll, A., Rüping, S., Ihlenfeldt, S., Wrobel, S.: A review of machine learning for the optimization of production processes. Int. J. Adv. Manuf. Technol. 104(5-8), 1889–1902 (2019)
Weigel, T.: Kiro - a table soccer robot ready for the market. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp. 4266–4271. https://doi.org/10.1109/ROBOT.2005.1570776 (2005)
Weigel, T., Nebel, B.: Tischfußball: Mensch versus computer. Informatik-Spektrum 31(4), 323 (2008). https://doi.org/10.1007/s00287-008-0255-z
Weigel, T., Rechert, K., Nebel, B.: Behavior recognition and opponent modeling for adaptive table soccer playing. In: Furbach, U. (ed.) KI 2005: Advances in Artificial Intelligence, pp 335–350. Springer, Berlin (2005)
Yin, S., Li, X., Gao, H., Kaynak, O.: Data-based techniques focused on modern industry: an overview. IEEE Trans. Ind. Electron. 62(1), 657–667 (2014)
Zhang, D., Nebel, B.: Learning a table soccer robot a new action sequence by observing and imitating. In: AIIDE, pp. 61–67 (2007)
Zhang, D., Nebel, B.: Recording and segmenting table soccer games–initial results. In: Proceedings of the 1st International Symposium on Skill Science, pp. 61–67 (2007)
Zhao, Y., Xiong, R., Zhang, Y.: Model based motion state estimation and trajectory prediction of spinning ball for ping-pong robots using expectation-maximization algorithm. J. Intell. Robot. Syst. 87(3), 407–423 (2017). https://doi.org/10.1007/s10846-017-0515-8
Zhong, R.Y., Xu, X., Klotz, E., Newman, S.T.: Intelligent manufacturing in the context of industry 4.0: a review. Engineering 3(5), 616–630 (2017)
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Contributions
SDB, SK, AM, RR, FS initialized and conceived the project. SK, AM, RR, TZ designed the kinematic control simulation with environment. RR ran the training of the agents. SDB, SK, RR conceived and planned the experiments. SK prepared the models for deployment. SDB deployed the model on the system, carried out the experiments and analyzed the data. SDB, SK, RR discussed the results and wrote the manuscript.
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for Publication
Not applicable.
Competing interests
There are no conflicts of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Derivation of a Simple Policy Gradient Loss
Appendix: Derivation of a Simple Policy Gradient Loss
Starting from Eq. 2, we introduce a short notation for the cumulative discounted reward obtained for a given trajectory: \(R(\tau ) = {\sum }_{k=0}^{\infty } \gamma ^{k} r(s_{k},a_{k})\). This allows us to easily derive a basic policy gradient algorithm using the following loss function which we want to optimize:
To expand the expectation value, one integrates the cumulative returns over all trajectories weighted by the probability for that trajectory:
Since only the trajectory-probability depends on 𝜃, we can take the derivative of J(𝜃) with respect to 𝜃. By using a simple log-derivative identity, this can be reformulated:
The term \(\nabla _{\theta } \log \ p_{\theta }(\tau )\) is evaluated by decomposing p𝜃(τ):
So, overall the gradient of the loss function can be formulated as:
This expression is very similar to a standard maximum likelihood gradient used in supervised learning. The only difference is the last term coming from the cumulative rewards. It acts as a weight term for the gradient likelihood terms. With this, policy updates
can be performed iteratively.
Rights and permissions
About this article
Cite this article
De Blasi, S., Klöser, S., Müller, A. et al. KIcker: An Industrial Drive and Control Foosball System automated with Deep Reinforcement Learning. J Intell Robot Syst 102, 20 (2021). https://doi.org/10.1007/s10846-021-01389-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-021-01389-z