Skip to main content

Critic Guided Segmentation of Rewarding Objects in First-Person Views

  • Conference paper
  • First Online:
KI 2021: Advances in Artificial Intelligence (KI 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12873))

Included in the following conference series:

Abstract

This work discusses a learning approach to mask rewarding objects in images using sparse reward signals from an imitation learning dataset. For that we train an Hourglass network using only feedback from a critic model. The Hourglass network learns to produce a mask to decrease the critic’s score of a high score image and increase the critic’s score of a low score image by swapping the masked areas between these two images. We trained the model on an imitation learning dataset from the NeurIPS 2020 MineRL Competition Track, where our model learned to mask rewarding objects in a complex interactive 3D environment with a sparse reward signal. This approach was part of the 1st place winning solution in this competition. Video demonstration and code: https://rebrand.ly/critic-guided-segmentation.

A. Melnik and A. Harter—Shared first authorship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bach, N., Melnik, A., Schilling, M., Korthals, T., Ritter, H.: Learn to move through a combination of policy gradient algorithms: DDPG, D4PG, and TD3. In: Nicosia, G., et al. (eds.) LOD 2020. LNCS, vol. 12566, pp. 631–644. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64580-9_52

    Chapter  Google Scholar 

  2. Greydanus, S., Koul, A., Dodge, J., Fern, A.: Visualizing and understanding Atari agents. In: International Conference on Machine Learning, pp. 1792–1801. PMLR (2018)

    Google Scholar 

  3. Gunning, D., Aha, D.: Darpa’s explainable artificial intelligence (XAI) program. AI Mag. 40(2), 44–58 (2019)

    Google Scholar 

  4. Guss, W.H., et al.: Towards robust and domain agnostic reinforcement learning competitions: MineRL 2020. In: NeurIPS 2020 Competition and Demonstration Track, PMLR, pp. 233–252 (2021). https://proceedings.mlr.press/v133/guss21a

  5. Harter, A., Melnik, A., Kumar, G., Agarwal, D., Garg, A., Ritter, H.: Solving physics puzzles by reasoning about paths. In: 1st NeurIPS workshop on Interpretable Inductive Biases and Physically Structured Learning (2020). https://arxiv.org/abs/2011.07357

  6. Hilton, J., Cammarata, N., Carter, S., Goh, G., Olah, C.: Understanding RL vision. Distill (2020). https://doi.org/10.23915/distill.00029, https://distill.pub/2020/understanding-rl-vision

  7. Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397 (2016)

  8. Kaiser, L., et al.: Model-based reinforcement learning for Atari. arXiv preprint arXiv:1903.00374 (2019)

  9. Konen, K., Korthals, T., Melnik, A., Schilling, M.: Biologically-inspired deep reinforcement learning of modular control for a six-legged robot. In: 2019 IEEE International Conference on Robotics and Automation Workshop on Learning Legged Locomotion Workshop, (ICRA) 2019, Montreal, CA, 20–25 May 2019 (2019)

    Google Scholar 

  10. König, P., Melnik, A., Goeke, C., Gert, A.L., König, S.U., Kietzmann, T.C.: Embodied cognition. In: 2018 6th International Conference on Brain-Computer Interface (BCI), pp. 1–4. IEEE (2018)

    Google Scholar 

  11. Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. CoRR abs/1210.5644 (2012). http://arxiv.org/abs/1210.5644

  12. Li, S.: Simple introduction about hourglass-like model. https://medium.com/@sunnerli/simple-introduction-about-hourglass-like-model-11ee7c30138

  13. Melnik, A., Bramlage, L., Voss, H., Rossetto, F., Ritter, H.: Combining causal modelling and deep reinforcement learning for autonomous agents in minecraft. In: 4th Workshop on Semantic Policy and Action Representations for Autonomous Robots at IROS 2019 (2019)

    Google Scholar 

  14. Melnik, A., Fleer, S., Schilling, M., Ritter, H.: Modularization of end-to-end learning: case study in arcade games. In: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Workshop on Causal Learning (2018). https://arxiv.org/pdf/1901.09895.pdf

  15. Melnik, A., Lach, L., Plappert, M., Korthals, T., Haschke, R., Ritter, H.: Using tactile sensing to improve the sample efficiency and performance of deep deterministic policy gradients for simulated in-hand manipulation tasks. Front. Robot. AI 8, 57 (2021). https://doi.org/10.3389/frobt.2021.538773

    Article  Google Scholar 

  16. Melnik, A., Schüler, F., Rothkopf, C.A., König, P.: The world as an external memory: the price of saccades in a sensorimotor task. Front. Behav. Neurosci. 12, 253 (2018). https://doi.org/10.3389/fnbeh.2018.00253

    Article  Google Scholar 

  17. Olah, C., Mordvintsev, A., Schubert, L.: Feature visualization. Distill 2(11), e7 (2017)

    Article  Google Scholar 

  18. Olah, C., et al.: The building blocks of interpretability. Distill 3(3), e10 (2018)

    Article  Google Scholar 

  19. Schilling, M., Melnik, A.: An approach to hierarchical deep reinforcement learning for a decentralized walking control architecture. In: Samsonovich, A.V. (ed.) BICA 2018. AISC, vol. 848, pp. 272–282. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-99316-4_36

    Chapter  Google Scholar 

  20. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)

  21. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps (2014)

    Google Scholar 

  22. Srinivas, A., Laskin, M., Abbeel, P.: Curl: contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136 (2020)

  23. taigw: Simple CRF python package. https://github.com/HiLab-git/SimpleCRF

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Melnik, A., Harter, A., Limberg, C., Rana, K., Sünderhauf, N., Ritter, H. (2021). Critic Guided Segmentation of Rewarding Objects in First-Person Views. In: Edelkamp, S., Möller, R., Rueckert, E. (eds) KI 2021: Advances in Artificial Intelligence. KI 2021. Lecture Notes in Computer Science(), vol 12873. Springer, Cham. https://doi.org/10.1007/978-3-030-87626-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87626-5_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87625-8

  • Online ISBN: 978-3-030-87626-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics