Advertisement

Towards Automated Testing and Robustification by Semantic Adversarial Data Generation

Conference paper
  • 856 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12347)

Abstract

Widespread application of computer vision systems in real world tasks is currently hindered by their unexpected behavior on unseen examples. This occurs due to limitations of empirical testing on finite test sets and lack of systematic methods to identify the breaking points of a trained model. In this work we propose semantic adversarial editing, a method to synthesize plausible but difficult data points on which our target model breaks down. We achieve this with a differentiable object synthesizer which can change an object’s appearance while retaining its pose. Constrained adversarial optimization of object appearance through this synthesizer produces rare/difficult versions of an object which fool the target object detector. Experiments show that our approach effectively synthesizes difficult test data, dropping the performance of YoloV3 detector by more than 20 mAP points by changing the appearance of a single object and discovering failure modes of the model. The generated semantic adversarial data can also be used to robustify the detector through data augmentation, consistently improving its performance in both standard and out-of-dataset-distribution test sets, across three different datasets.

Supplementary material

504434_1_En_29_MOESM1_ESM.pdf (13.1 mb)
Supplementary material 1 (pdf 13429 KB)

References

  1. 1.
    Alaifari, R., Alberti, G.S., Gauksson, T.: ADef: an iterative algorithm to construct adversarial deformations. In: Proceedings of the International Conference on Learning Representations (ICLR) (2019)Google Scholar
  2. 2.
    Alcorn, M.A., et al.: Strike (with) a pose: neural networks are easily fooled by strange poses of familiar objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  3. 3.
    Azulay, A., Weiss, Y.: Why do deep convolutional networks generalize so poorly to small image transformations? J. Mach. Learn. Res. (JMLR) 20(184), 1–25 (2019)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Che, Z., et al.: D2-city: a large-scale dashcam video dataset of diverse traffic scenarios. arXiv preprint arXiv:1904.01975 (2019)
  5. 5.
    Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation strategies from data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 113–123 (2019)Google Scholar
  6. 6.
    Dumont, B., Maggio, S., Montalvo, P.: Robustness of rotation-equivariant networks to adversarial perturbations. arXiv preprint arXiv:1802.06627 (2018)
  7. 7.
    Dvornik, N., Mairal, J., Schmid, C.: Modeling visual context is key to augmenting object detection datasets. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 375–391. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01258-8_23CrossRefGoogle Scholar
  8. 8.
    Dwibedi, D., Misra, I., Hebert, M.: Cut, paste and learn: Surprisingly easy synthesis for instance detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1301–1310 (2017)Google Scholar
  9. 9.
    Engstrom, L., Tran, B., Tsipras, D., Schmidt, L., Madry, A.: Exploring the landscape of spatial robustness. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the International Conference on Machine Learning (ICML), vol. 97, pp. 1802–1811. PMLR, Long Beach, California, USA, 09–15 June 2019Google Scholar
  10. 10.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
  11. 11.
    Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: Proceedings of the International Conference on Learning Representations (ICLR) (2019)Google Scholar
  12. 12.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NeurIPS) (2014)Google Scholar
  13. 13.
    Hamdi, A., Ghanem, B.: Towards analyzing semantic robustness of deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV Workshops) (2019)Google Scholar
  14. 14.
    Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. In: Proceedings of the International Conference on Learning Representations (ICLR) (2019)Google Scholar
  15. 15.
    Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples. arXiv preprint arXiv:1907.07174 (2019)
  16. 16.
    Hosseini, H., Poovendran, R.: Semantic adversarial examples. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), pp. 1614–1619 (2018)Google Scholar
  17. 17.
    Jakab, T., Gupta, A., Bilen, H., Vedaldi, A.: Unsupervised learning of object landmarks through conditional image generation. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems (NeurIPS), pp. 4016–4027. Curran Associates, Inc. (2018)Google Scholar
  18. 18.
    Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)Google Scholar
  19. 19.
    Li, Y.J., Lin, C.S., Lin, Y.B., Wang, Y.C.F.: Cross-dataset person re-identification via unsupervised pose disentanglement and adaptation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 7919–7929 (2019)Google Scholar
  20. 20.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  21. 21.
    Liu, H.T.D., Tao, M., Li, C.L., Nowrouzezahrai, D., Jacobson, A.: Beyond pixel norm-balls: parametric adversaries using an analytically differentiable renderer. In: Proceedings of the International Conference on Learning Representations (ICLR) (2019)Google Scholar
  22. 22.
    Lorenz, D., Bereska, L., Milbich, T., Ommer, B.: Unsupervised part-based disentangling of object shape and appearance. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10947–10956 (2019)Google Scholar
  23. 23.
    Maddison, C.J., Mnih, A., Teh, Y.W.: The concrete distribution: a continuous relaxation of discrete random variables (2016)Google Scholar
  24. 24.
    Michaelis, C., et al.: Benchmarking robustness in object detection: autonomous driving when winter is coming. arXiv preprint arXiv:1907.07484 (2019)
  25. 25.
    Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  26. 26.
    Peyre, J., Laptev, I., Schmid, C., Sivic, J.: Weakly-supervised learning of visual relations. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  27. 27.
    Recht, B., Roelofs, R., Schmidt, L., Shankar, V.: Do cifar-10 classifiers generalize to cifar-10? arXiv preprint arXiv:1806.00451 (2018)
  28. 28.
    Recht, B., Roelofs, R., Schmidt, L., Shankar, V.: Do ImageNet classifiers generalize to ImageNet? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the International Conference on Machine Learning (ICML), vol. 97, pp. 5389–5400. PMLR, Long Beach, California, USA, 09–15 June 2019Google Scholar
  29. 29.
    Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
  30. 30.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  31. 31.
    Rosenfeld, A., Zemel, R., Tsotsos, J.K.: The elephant in the room. arXiv preprint arXiv:1808.03305 (2018)
  32. 32.
    Shankar, V., Dave, A., Roelofs, R., Ramanan, D., Recht, B., Schmidt, L.: A systematic framework for natural perturbations from videos. In: Proceedings of the International Conference on Machine Learning Workshops (ICML Workshop) (2019)Google Scholar
  33. 33.
    Shetty, R., Fritz, M., Schiele, B.: Adversarial scene editing: automatic object removal from weak supervision. In: Advances in Neural Information Processing Systems (NeurIPS) (2018)Google Scholar
  34. 34.
    Shetty, R., Schiele, B., Fritz, M.: Not using the car to see the sidewalk-quantifying and controlling the effects of context in classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8218–8226 (2019)Google Scholar
  35. 35.
    Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., Sebe, N.: Animating arbitrary objects via deep motion transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2372–2381 (2019)Google Scholar
  36. 36.
    Song, Y., Shu, R., Kushman, N., Ermon, S.: Constructing unrestricted adversarial examples with generative models. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., (eds.) Advances in Neural Information Processing Systems (NeurIPS), pp. 8312–8323. Curran Associates, Inc. (2018)Google Scholar
  37. 37.
    Stutz, D., Hein, M., Schiele, B.: Disentangling adversarial robustness and generalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6976–6987 (2019)Google Scholar
  38. 38.
    Tripathi, S., Chandra, S., Agrawal, A., Tyagi, A., Rehg, J.M., Chari, V.: Learning to generate synthetic data via compositing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 461–470 (2019)Google Scholar
  39. 39.
    Ultralytics: pytorch implementation of YoloV3 (2019). https://github.com/ultralytics/yolov3. Accessed 11 Nov 2019
  40. 40.
    Wang, H.: Implentation of data augmentation for object detection via progressive and selective instance-switching (2019). https://github.com/Hwang64/PSIS. Accessed 11 Nov 2019
  41. 41.
    Wang, H., Wang, Q., Yang, F., Zhang, W., Zuo, W.: Data augmentation for object detection via progressive and selective instance-switching. arXiv preprint arXiv:1906.00358 (2019)
  42. 42.
    Wang, X., Shrivastava, A., Gupta, A.: A-fast-RCNN: hard positive generation via adversary for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2606–2615 (2017)Google Scholar
  43. 43.
    Xiao, C., Zhu, J.Y., Li, B., He, W., Liu, M., Song, D.: Spatially transformed adversarial examples. In: Proceedings of the International Conference on Learning Representations (ICLR) (2018)Google Scholar
  44. 44.
    Yu, F., et al.: BDD100k: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2636–2645 (2020)Google Scholar
  45. 45.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Max Planck Institute for Informatics, Saarland Informatics CampusSaarbrückenGermany
  2. 2.CISPA Helmholtz Center for Information SecuritySaarbrückenGermany

Personalised recommendations