Spatiotemporal Attacks for Embodied Agents

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12362)


Adversarial attacks are valuable for providing insights into the blind-spots of deep learning models and help improve their robustness. Existing work on adversarial attacks have mainly focused on static scenes; however, it remains unclear whether such attacks are effective against embodied agents, which could navigate and interact with a dynamic environment. In this work, we take the first step to study adversarial attacks for embodied agents. In particular, we generate spatiotemporal perturbations to form 3D adversarial examples, which exploit the interaction history in both the temporal and spatial dimensions. Regarding the temporal dimension, since agents make predictions based on historical observations, we develop a trajectory attention module to explore scene view contributions, which further help localize 3D objects appeared with highest stimuli. By conciliating with clues from the temporal dimension, along the spatial dimension, we adversarially perturb the physical properties (e.g., texture and 3D shape) of the contextual objects that appeared in the most important scene views. Extensive experiments on the EQA-v1 dataset for several embodied tasks in both the white-box and black-box settings have been conducted, which demonstrate that our perturbations have strong attack and generalization abilities (Our code can be found at


Embodied agents Spatiotemporal perturbations 3D adversarial examples 



This work was supported by National Natural Science Foundation of China (61872021, 61690202), Beijing Nova Program of Science and Technology(Z191100001119050), Fundamental Research Funds for Central Universities (YWF-20-BJ-J-646), and ARC FL-170100117.

Supplementary material (14.3 mb)
Supplementary material 1 (zip 14620 KB)


  1. 1.
    Alexey, K., Ian, G., Samy, B.: Adversarial machine learning at scale. In: International Conference on Learning Representations (2017)Google Scholar
  2. 2.
    Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018)
  3. 3.
    Athalye, A., Engstrom, L., Ilyas, A., Kwok, K.: Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397 (2017)
  4. 4.
    Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: An evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)CrossRefGoogle Scholar
  5. 5.
    Brown, T.B., Mané, D., Roy, A., Abadi, M., Gilmer, J.: Adversarial patch. arXiv preprint arXiv:1712.09665 (2017)
  6. 6.
    Carlone, L., Karaman, S.: Attention and anticipation in fast visual-inertial navigation. IEEE Trans. Robot. 35, 1–20 (2018) CrossRefGoogle Scholar
  7. 7.
    Chen, W., Zhang, Z., Hu, X., Wu, B.: Boosting decision-based black-box adversarial attacks with random sign flip. In: Proceedings of the European Conference on Computer Vision (2020)Google Scholar
  8. 8.
    Das, A., Datta, S., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Embodied question answering. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  9. 9.
    Das, A., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Neural modular control for embodied question answering. arXiv preprint arXiv:1810.11181 (2018)
  10. 10.
    Dong, Y., Liao, F., Pang, T., Su, H.: Boosting adversarial attacks with momentum. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  11. 11.
    Gao, L., Zhang, Q., Song, J., Liu, X., Shen, H.: Patch-wise attack for fooling deep neural network. In: European Conference on Computer Vision (2020)Google Scholar
  12. 12.
    Garland-Thomson, R.: Staring: how we look. Oxford University Press, Oxford Google Scholar
  13. 13.
    Gleave, A., Dennis, M., Kant, N., Wild, C., Levine, S., Russell, S.A.: Adversarial policies: Attacking deep reinforcement learning. In: International Conference on Learning Representations (2020)Google Scholar
  14. 14.
    Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2014). arXiv preprint arXiv:1412.6572 (2014)
  15. 15.
    Gordon, D., Kembhavi, A., Rastegari, M., Redmon, J., Fox, D., Farhadi, A.: IQA: visual question answering in interactive environments. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  16. 16.
    Guo, C., Rana, M., Cisse, M., Van Der Maaten, L.: Countering adversarial images using input transformations. arXiv preprint arXiv:1711.00117 (2017)
  17. 17.
    Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. In: International Conference on Learning Representations (2019)Google Scholar
  18. 18.
    Huang, S.H., Papernot, N., Goodfellow, I.J., Duan, Y., Abbeel, P.: Adversarial attacks on neural network policies. arXiv preprint arXiv: 1702.02284 (2017)
  19. 19.
    Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  20. 20.
    Kolve, E., et al.: AI2-THOR: An interactive 3D environment for visual AI. arXiv preprint arXiv:1712.05474 (2017)
  21. 21.
    Kos, J., Song, D.X.: Delving into adversarial attacks on deep policies. arXiv preprint arXiv: 1705.06452 (2017)
  22. 22.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)Google Scholar
  23. 23.
    Lin, Y.C., Hong, Z.W., Liao, Y.H., Shih, M.L., Liu, M.Y., Sun, M.: Tactics of adversarial attack on deep reinforcement learning agents. In: IJCAI (2017)Google Scholar
  24. 24.
    Liu, A., et al.: Perceptual-sensitive GAN for generating adversarial patches. In: 33rd AAAI Conference on Artificial Intelligence (2019)Google Scholar
  25. 25.
    Liu, A., Wang, J., Liu, X., Cao, b., Zhang, C., Yu, H.: Bias-based universal adversarial patch attack for automatic check-out. In: European Conference on Computer Vision (2020)Google Scholar
  26. 26.
    Liu, H.T.D., Tao, M., Li, C.L., Nowrouzezahrai, D., Jacobson, A.: Beyond pixel norm-balls: Parametric adversaries using an analytically differentiable renderer (2019)Google Scholar
  27. 27.
    Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
  28. 28.
    Mohamed, A.R., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20, 14–22 (2011) CrossRefGoogle Scholar
  29. 29.
    Mopuri, K.R., Ganeshan, A., Radhakrishnan, V.B.: Generalizable data-free objective for crafting universal adversarial perturbations. IEEE Trans. Pattern Anal. Mach. Intel. 41, 2452–2465 (2018)CrossRefGoogle Scholar
  30. 30.
    Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against deep learning systems using adversarial examples. arXiv preprint (2016)Google Scholar
  31. 31.
    Pattanaik, A., Tang, Z., Liu, S., Bommannan, G., Chowdhary, G.: Robust deep reinforcement learning with adversarial attacks. In: AAMAS (2018)Google Scholar
  32. 32.
    Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision (2017)Google Scholar
  33. 33.
    Smith, L., Gasser, M.: The development of embodied cognition: six lessons from babies. Artif. Life 11(1–2), 13–29 (2005)CrossRefGoogle Scholar
  34. 34.
    Sutskever, I., Vinyals, O., Le, Q.: Sequence to sequence learning with neural networks. In: NeurIPS (2014)Google Scholar
  35. 35.
    Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
  36. 36.
    Tu, Z., Zhang, J., Tao, D.: Theoretical analysis of adversarial learning: a minimax approach. In: Advances in Neural Information Processing Systems (2019)Google Scholar
  37. 37.
    Wijmans, E., et al.: Embodied question answering in photorealistic environments with point cloud perception. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  38. 38.
    Xiao, C., Yang, D., Li, B., Deng, J., Liu, M.: Meshadv: adversarial meshes for visual recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  39. 39.
    Xie, C., Wang, J., Zhang, Z., Ren, Z., Yuille, A.: Mitigating adversarial effects through randomization. arXiv preprint arXiv:1711.01991 (2017)
  40. 40.
    Yang, J., et al.: Embodied visual recognition. In: IEEE International Conference on Computer Vision (2019)Google Scholar
  41. 41.
    Yu, L., Chen, X., Gkioxari, G., Bansal, M., Berg, T.L., Batra, D.: Multi-target embodied question answering. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  42. 42.
    Zeng, X., et al.: Adversarial attacks beyond the image space. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  43. 43.
    Zhang, T., Zhu, Z.: Interpreting adversarially trained convolutional neural networks. arXiv preprint arXiv:1905.09797 (2019)
  44. 44.
    Zhang, Y., Foroosh, H., David, P., Gong, B.: Camou: earning physical vehicle camouflages to adversarially attack detectors in the wild. In: International Conference on Learning Representations (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.State Key Laboratory of Software Development EnvironmentBeihang UniversityBeijingChina
  2. 2.Beijing Advanced Innovation Center for Big Data-Based Precision MedicineBeihang UniversityBeijingChina
  3. 3.UC BerkeleyLondonUSA
  4. 4.Birkbeck, University of LondonLondonUK
  5. 5.UBTECH Sydney AI Centre, School of Computer Science, Faculty of EngineeringThe University of SydneySydneyAustralia

Personalised recommendations