Multitask Learning Strengthens Adversarial Robustness

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12347)


Although deep networks achieve strong accuracy on a range of computer vision benchmarks, they remain vulnerable to adversarial attacks, where imperceptible input perturbations fool the network. We present both theoretical and empirical analyses that connect the adversarial robustness of a model to the number of tasks that it is trained on. Experiments on two datasets show that attack difficulty increases as the number of target tasks increase. Moreover, our results suggest that when models are trained on multiple tasks at once, they become more robust to adversarial attacks on individual tasks. While adversarial defense remains an open challenge, our results suggest that deep networks are vulnerable partly because they are trained on too few tasks.


Multitask learning Adversarial robustness 



This work was in part supported by a JP Morgan Faculty Research Award; a DiDi Faculty Research Award; a Google Cloud grant; an Amazon Web Services grant; an Amazon Research Award; NSF grant CNS-15-64055; NSF-CCF 1845893; NSF-IIS 1850069; ONR grants N00014-16-1- 2263 and N00014-17-1-2788. The authors thank Vaggelis Atlidakis, Augustine Cha, Dídac Surís, Lovish Chum, Justin Wong, and Shunhua Jiang for valuable comments.

Supplementary material

504434_1_En_10_MOESM1_ESM.pdf (199 kb)
Supplementary material 1 (pdf 199 KB)


  1. 1.
  2. 2.
    Arnab, A., Miksik, O., Torr, P.H.: On the robustness of semantic segmentation models to adversarial attacks. In: CVPR (2018)Google Scholar
  3. 3.
    Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy, pp. 39–57 (2017)Google Scholar
  4. 4.
    Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Chen, Z., Badrinarayanan, V., Lee, C.Y., Rabinovich, A.: Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks (2017)Google Scholar
  6. 6.
    Cisse, M., Adi, Y., Neverova, N., Keshet, J.: Houdini: fooling deep structured prediction models (2017)Google Scholar
  7. 7.
    Cissé, M., Bojanowski, P., Grave, E., Dauphin, Y., Usunier, N.: Parseval networks: improving robustness to adversarial examples. In: ICML, pp. 854–863 (2017)Google Scholar
  8. 8.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  9. 9.
    Costales, R., Mao, C., Norwitz, R., Kim, B., Yang, J.: Live trojan attacks on deep neural networks (2020)Google Scholar
  10. 10.
    Doersch, C., Zisserman, A.: Multi-task self-supervised visual learning. arXiv:1708.07860 (2017)
  11. 11.
    Dong, Y., et al.: Boosting adversarial attacks with momentum. In: CVPR, pp. 9185–9193 (2018)Google Scholar
  12. 12.
    Désidéri, J.A.: Multiple-gradient descent algorithm (MGDA) for multiobjective optimization. C.R. Math. 350, 313–318 (2012)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Engstrom, L., et al.: A discussion of adversarial examples are not bugs, they are features. Distill 4(8) (2019) Google Scholar
  14. 14.
    Engstrom, L., Ilyas, A., Athalye, A.: Evaluating and understanding the robustness of adversarial logit pairing (2018)Google Scholar
  15. 15.
    Evgeniou, T., Pontil, M.: Regularized multi-task learning, pp. 109–117 (2004)Google Scholar
  16. 16.
    Glaßer, C., Reitwießner, C., Schmitz, H., Witek, M.: Approximability and hardness in multi-objective optimization. In: Ferreira, F., Löwe, B., Mayordomo, E., Mendes Gomes, L. (eds.) CiE 2010. LNCS, vol. 6158, pp. 180–189. Springer, Heidelberg (2010). Scholar
  17. 17.
    Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv:1412.6572 (2014)
  18. 18.
    Gur, S., Wolf, L.: Single image depth estimation trained via depth from defocus cues. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  19. 19.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv:1512.03385 (2015)
  20. 20.
    Kaiser, L., et al.: One model to learn them all (2017)Google Scholar
  21. 21.
    Kannan, H., Kurakin, A., Goodfellow, I.J.: Adversarial logit pairing (2018)Google Scholar
  22. 22.
    Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from noisy entries. In: NIPS (2009)Google Scholar
  23. 23.
    Kocabas, M., Karagoz, S., Akbas, E.: Multiposenet: fast multi-person pose estimation using pose residual network. In: CoRR (2018)Google Scholar
  24. 24.
    Kokkinos, I.: Ubernet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. arXiv:1609.02132 (2016)
  25. 25.
    Kumar, A., Daume III, H.: Learning task grouping and overlap in multi-task learning (2012)Google Scholar
  26. 26.
    Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. arXiv:1607.02533 (2017)
  27. 27.
    Lee, T., Ndirango, A.: Generalization in multitask deep neural classifiers: a statistical physics approach (2019)Google Scholar
  28. 28.
    Liu, S., Johns, E., Davison, A.J.: End-to-end multi-task learning with attention. arXiv:1803.10704 (2018)
  29. 29.
    Liu, W., et al.: SSD: single shot multibox detector, pp. 21–37 (2016)Google Scholar
  30. 30.
    Liu, X., He, P., Chen, W., Gao, J.: Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019)Google Scholar
  31. 31.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. arXiv:1411.4038 (2014)
  32. 32.
    Luong, M.T., Le, Q.V., Sutskever, I., Vinyals, O., Kaiser, L.: Multi-task sequence to sequence learning (2015)Google Scholar
  33. 33.
    Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., Chi, E.: Modeling task relationships in multi-task learning with multi-gate mixture-of-experts, pp. 1930–1939 (2018)Google Scholar
  34. 34.
    Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)Google Scholar
  35. 35.
    Mao, C., Zhong, Z., Yang, J., Vondrick, C., Ray, B.: Metric learning for adversarial robustness (2019)Google Scholar
  36. 36.
    Metzen, J.H., Kumar, M.C., Brox, T., Fischer, V.: Universal adversarial perturbations against semantic image segmentation. In: ICCV (2017)Google Scholar
  37. 37.
    Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning (2016)Google Scholar
  38. 38.
    Pang, T., Xu, K., Du, C., Chen, N., Zhu, J.: Improving adversarial robustness via promoting ensemble diversity. arXiv:1901.08846 (2019)
  39. 39.
    Papernot, N., McDaniel, P.D., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. arXiv:1511.07528 (2015)
  40. 40.
    Ross, A.S., Doshi-Velez, F.: Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. arXiv:1711.09404 (2017)
  41. 41.
    Samangouei, P., Kabkab, M., Chellappa, R.: Defense-GAN: protecting classifiers against adversarial attacks using generative models. arXiv:1805.06605 (2018)
  42. 42.
    Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., Madry, A.: Adversarially robust generalization requires more data. In: NeurIPS, pp. 5019–5031 (2018)Google Scholar
  43. 43.
    Schutze, O., Lara, A., Coello, C.A.C.: On the influence of the number of objectives on the hardness of a multiobjective optimization problem. IEEE Trans. Evol. Comput. 15(4), 444–455 (2011)CrossRefGoogle Scholar
  44. 44.
    Sener, O., Koltun, V.: Multi-task learning as multi-objective optimization (2018)Google Scholar
  45. 45.
    Shen, G., Mao, C., Yang, J., Ray, B.: AdvSPADE: realistic unrestricted attacks for semantic segmentation (2019)Google Scholar
  46. 46.
    Simon-Gabriel, C.J., Ollivier, Y., Bottou, L., Schölkopf, B., Lopez-Paz, D.: First-order adversarial vulnerability of neural networks and input dimension. In: Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 5809–5817 (2019)Google Scholar
  47. 47.
    Standley, T., Zamir, A.R., Chen, D., Guibas, L.J., Malik, J., Savarese, S.: Which tasks should be learned together in multi-task learning? arXiv:1905.07553 (2019)
  48. 48.
    Szegedy, C., et al.: Intriguing properties of neural networks. arXiv:1312.6199 (2013)
  49. 49.
    Tramèr, F., Kurakin, A., Papernot, N., Boneh, D., McDaniel, P.D.: Ensemble adversarial training: attacks and defenses. arXiv:1705.07204 (2017)
  50. 50.
    Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. In: International Conference on Learning Representations (2019)Google Scholar
  51. 51.
    Uesato, J., Alayrac, J., Huang, P., Stanforth, R., Fawzi, A., Kohli, P.: Are labels required for improving adversarial robustness? In: CoRR (2019)Google Scholar
  52. 52.
    Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: CoRR (2018)Google Scholar
  53. 53.
    Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., Yuille, A.: Adversarial examples for semantic segmentation and object detection. In: ICCV (2017)Google Scholar
  54. 54.
    Xie, C., Wu, Y., van der Maaten, L., Yuille, A.L., He, K.: Feature denoising for improving adversarial robustness. In: CoRR (2018)Google Scholar
  55. 55.
    Yan, Z., Guo, Y., Zhang, C.: Deep defense: training DNNs with improved adversarial robustness. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 2018, pp. 417–426 (2018)Google Scholar
  56. 56.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: International Conference on Learning Representations (ICLR) (2016)Google Scholar
  57. 57.
    Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  58. 58.
    Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., Finn, C.: Gradient surgery for multi-task learning (2020)Google Scholar
  59. 59.
    Zamir, A.R., Sax, A., Shen, W.B., Guibas, L., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2018)Google Scholar
  60. 60.
    Zhang, H., Yu, Y., Jiao, J., Xing, E.P., Ghaoui, L.E., Jordan, M.I.: Theoretically principled trade-off between robustness and accuracy. arXiv:1901.08573 (2019)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Columbia UniversityNew YorkUSA

Personalised recommendations