Learning to Transform Service Instructions into Actions with Reinforcement Learning and Knowledge Base
In order to improve the learning ability of robots, we present a reinforcement learning approach with a knowledge base for mapping natural language instructions to executable action sequences. A simulated platform with physical engine is built as interactive environment. Based on the knowledge base, a reward function with immediate rewards and delayed rewards is designed to handle sparse reward problems. Also, a list of object states is produced by retrieving the knowledge base, as a standard to define the quality of action sequences. Experimental results demonstrate that our approach yields good performance on accuracy of action sequences production.
KeywordsNatural language robot knowledge base reinforcement learning object state
This work was supported by National Natural Science Foundation of China (No. 61773239) and Shenzhen Future Industry Special Fund (No. JCYJ20160331174814755).
- W. Wang, Q. F. Zhao, T. H. Zhu. Research of natural language understanding in human-service robot interaction. Microcomputer Applications, vol. 3, no. 1, pp. 45–49, 2015.Google Scholar
- L. F. Shang, Z. D. Lu, H. Li. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, IEEE, Beijing, China, pp. 1577–1586, 2015. Doi: 10.3115/v1/P15-1152.Google Scholar
- J. M. Ji, X. P. Chen. A weighted causal theory for acquiring and utilizing open knowledge. International Journal of Approximate Reasoning, vol. 55, no. 9, pp. 2071–2082, 2014. Doi: 10.1016/j.ijar.2014.03.002.Google Scholar
- M. Tenorth, M. Beetz. Know rob-knowledge processing for autonomous personal robots. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, St. Louis, USA, pp. 4261266, 2009. Doi: 10.1109/IRGS.2009.5354602.Google Scholar
- M. Waibel, M. Beetz, J. Civera, R. D’Andrea, J. Elfring, D. Galvez-Lopez, K. Haussermann, R. Janssen, J. M. M. Montiel, A. Perzylo, B. Schiessle, M. Tenorth, O. Zweigle, R. van de Molengraft. Roboearth. IEEE Robotics and Automation Magazine, vol. 18, no. 2, pp. 69–82, 2011. DOI: 10.1109/MRA.2011.941632.CrossRefGoogle Scholar
- D. McDermott. The formal semantics of processes in PDDL. In Proceedings of the 23th International Conference on Automated Planning Scheduling, Rome, Italy, 2003.Google Scholar
- L. P. Kaelbling, M. L. Littman, A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, vol. 101, no. 1–2, pp. 99–134, 1998. DOI: 10.1016/S0004-3702(98)00023-X.Google Scholar
- I. A. Hameed. Using natural language processing (NLP) for designing socially intelligent robots. In Proceedings of Joint IEEE International Conference on Development and Learning and Epigenetic Robotics, IEEE, Cergy-Pontoises, France, pp. 268–269, 2016. DOI: 10.1109/DEVLRN. 2016.7846830.Google Scholar
- M. Tenorth, D. Nyga, M. Beetz. Understanding and executing instructions for everyday manipulation tasks from the World Wide Web. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Anchorage, USA, pp. 1486–1491, 2010. DOI: 10.1109/ROBOT.2010.5509955.Google Scholar
- L. Deng, D. Yu. Deep learning: Methods and applications. Foundations and Trends in Signal Processing, vol. 7, no. 3–4, pp. 197–387, 2014. DOI: 10.1561/2000000039.Google Scholar
- G. Hinton, L. Deng, D. Yu, G. Dahl, A. R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012. DOI: 10.1109/MSP.2012.2205597.CrossRefGoogle Scholar
- A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems, Lake Tahoe, USA, pp. 1097–1105, 2012.Google Scholar
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis. Human-level control through deep reinforcement learning. Nature, vol. 518, no. 7540, pp. 529–533, 2015. DOI: 10.1038/nature14236.CrossRefGoogle Scholar
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, vol. 529, no. 7587, pp. 484–489, 2016. DOI: 10.1038/nature16961.CrossRefGoogle Scholar
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra. Continuous control with deep reinforcement learning. Computer Science, vol. 529, no. 7587, pp. 484–489, 2015.Google Scholar
- Y. Duan, X. Chen, R. Houthooft, J. Schulman, P. Abbeel. Benchmarking deep reinforcement learning for continuous control. In Proceedings of the 33rd International Conference on Machine Learning, ACM, New York, USA, pp. 1329–1338, 2016.Google Scholar
- R. S. Sutton, A. G. Barto. Reinforcement Learning: An Introduction, Cambridge, UK: MIT Press, 1998.Google Scholar
- J. He, M. Ostendorf, X. D. He, J. S. Chen, J. F. Gao, L. H. Li, L. Deng. Deep reinforcement learning with a combinatorial action space for predicting popular Reddit threads. https://doi.org/arxir.org/abs/1606.03667.
- D. Dowty. Compositionality as an empirical problem. Direct Compositionality, C. Barker, P. I. Jacobson, Eds., Oxford, UK: Oxford University Press, pp. 23–101, 2007.Google Scholar
- K. S. Tai, R. Socher, C. D. Manning. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, pp. 1556–1566, 2015.Google Scholar
- S. R. Bowman, J. Gauthier, A. Rastogi, R. Gupta, C. D. Manning, C. Potts. A fast unified model for parsing and sentence understanding. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 1466–1477, 2016.Google Scholar
- R. Kaplan, C. Sauer, A. Sosa. Beating Atari with natural language guided reinforcement learning. Computer Science. https://doi.org/adsabs.harvard.edu/abs/2017arXiv170405539K.
- F. Wu, Z. W. Xu, Y. Yang. An end-to-end approach to natural language object retrieval via context-aware deep reinforcement learning. https://doi.org/arxir.org/abs/1703.07579.
- S. R. K. Branavan, H. Chen, L. S. Zettlemoyer, R. Barzilay. Reinforcement learning for mapping instructions to actions. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, pp. 82–90, 2009. DOI: 10.3115/1687878.1687892.Google Scholar
- A. Pritzel, B. Uria, S. Srinivasan, A. Puigdomenech, O. Vinyals, D. Hassabis, D. Wierstra, C. Blundell. Neural episodic control. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, pp. 963–975, 2017.Google Scholar
- A. S. Vezhnevets, S. Osindero, T. Schaul, N. Heess, M. Jaderberg, D. Silver, K. Kavukcuoglu. Feudal networks for hierarchical reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 2017.Google Scholar
- M. Jaderberg, V. Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo, D. Silver, K. Kavukcuoglu. Reinforcement learning with unsupervised auxiliary tasks. Computer Science. https://doi.org/adsabs.harvard.edu/abs/2016arXiv161105397J.
- G. Lample, D. S. Chaplot. Playing FPS games with deep reinforcement learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, USA, pp. 2140–2146, 2017.Google Scholar
- S. Miyashita, X. Y. Lian, X. Zeng, T. Matsubara, K. Uehara. Developing game AI agent behaving like human by mixing reinforcement learning and supervised learning. In Proceedings of the 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, IEEE, Kanazawa, Japan, pp. 489–494, 2017. Doi: 10.1109/SNPD. 2017.8022767.Google Scholar
- Y. K. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, F. F. Li, A. Farhadi. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Singapore, pp. 3357–3364, 2017. Doi: 10.1109/ICRA.2017.7989381.Google Scholar
- Q. V. Le. Building high-level features using large scale unsupervised learning. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Vancouver, Canada, pp. 8595–8598, 2013. Doi: 10.1109/ICASSP.2013.6639343.Google Scholar
- R. S. Sutton, D. McAllester, S. Singh, Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of Advances in Neural Information Processing Systems, Denver, USA, pp. 1057–1063, 2000.Google Scholar