Dynamic Task Prioritization for Multitask Learning

  • Michelle GuoEmail author
  • Albert Haque
  • De-An Huang
  • Serena Yeung
  • Li Fei-Fei
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11220)


We propose dynamic task prioritization for multitask learning. This allows a model to dynamically prioritize difficult tasks during training, where difficulty is inversely proportional to performance, and where difficulty changes over time. In contrast to curriculum learning, where easy tasks are prioritized above difficult tasks, we present several studies showing the importance of prioritizing difficult tasks first. We observe that imbalances in task difficulty can lead to unnecessary emphasis on easier tasks, thus neglecting and slowing progress on difficult tasks. Motivated by this finding, we introduce a notion of dynamic task prioritization to automatically prioritize more difficult tasks by adaptively adjusting the mixing weight of each task’s loss objective. Additional ablation studies show the impact of the task hierarchy, or the task ordering, when explicitly encoded in the network architecture. Our method outperforms existing multitask methods and demonstrates competitive results with modern single-task models on the COCO and MPII datasets.

Supplementary material

474218_1_En_17_MOESM1_ESM.pdf (647 kb)
Supplementary material 1 (pdf 646 KB)


  1. 1.
    Coviello, D., Ichino, A., Persico, N.: Time allocation and task juggling. Am. Econ. Rev. 104(2), 609–623 (2014)CrossRefGoogle Scholar
  2. 2.
    Kenny, J., Fluck, A., Jetson, T., et al.: Placing a value on academic work: the development and implementation of a time-based academic workload model. Aust. Univ. Rev. 54(2), 50–60 (2012)Google Scholar
  3. 3.
    Kenny, J.D., Fluck, A.E.: The effectiveness of academic workload models in an institution: a staff perspective. J. High. Educ. Policy Manag. 36(6), 585–602 (2014)CrossRefGoogle Scholar
  4. 4.
    Bellotti, V., Dalal, B., Good, N., Flynn, P., Bobrow, D.G., Ducheneaut, N.: What a to-do: studies of task management towards the design of a personal task list manager. In: Conference on Human Factors in Computing Systems (2004)Google Scholar
  5. 5.
    Kember, D.: Interpreting student workload and the factors which shape students’ perceptions of their workload. Stud. High. Educ. 29(2), 165–184 (2004)Google Scholar
  6. 6.
    Yang, Y., Hospedales, T.: Deep multi-task representation learning: a tensor factorisation approach. arXiv (2016)Google Scholar
  7. 7.
    Jou, B., Chang, S.F.: Deep cross residual learning for multitask visual recognition. In: Multimedia Conference (2016)Google Scholar
  8. 8.
    Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: CVPR. (2016)Google Scholar
  9. 9.
    Luong, M.T., Le, Q.V., Sutskever, I., Vinyals, O., Kaiser, L.: Multi-task sequence to sequence learning. arXiv (2015)Google Scholar
  10. 10.
    Hashimoto, K., Xiong, C., Tsuruoka, Y., Socher, R.: A joint many-task model: growing a neural network for multiple NLP tasks. arXiv (2016)Google Scholar
  11. 11.
    Dong, D., Wu, H., He, W., Yu, D., Wang, H.: Multi-task learning for multiple language translation. In: ACL (2015)Google Scholar
  12. 12.
    Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: ICML, pp. 160–167 (2008)Google Scholar
  13. 13.
    Augenstein, I., Ruder, S., Søgaard, A.: Multi-task learning of pairwise sequence classification tasks over disparate label spaces. arXiv (2018)Google Scholar
  14. 14.
    Wu, Z., Valentini-Botinhao, C., Watts, O., King, S.: Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis. In: ICASSP (2015)Google Scholar
  15. 15.
    Seltzer, M.L., Droppo, J.: Multi-task learning in deep neural networks for improved phoneme recognition. In: ICASSP (2013)Google Scholar
  16. 16.
    Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: ICASSP, pp. 7304–7308 (2013)Google Scholar
  17. 17.
    Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. arXiv (2016)Google Scholar
  18. 18.
    Rusu, A.A., et al.: Progressive neural networks. arXiv (2016)Google Scholar
  19. 19.
    Devin, C., Gupta, A., Darrell, T., Abbeel, P., Levine, S.: Learning modular neural network policies for multi-task and multi-robot transfer. In: ICRA (2017)Google Scholar
  20. 20.
    Fernando, C., et al.: Pathnet: evolution channels gradient descent in super neural networks. arXiv (2017)Google Scholar
  21. 21.
    Kaiser, L., et al.: One model to learn them all. arXiv (2017)Google Scholar
  22. 22.
    Caruna, R.: Multitask learning: a knowledge-based source of inductive bias. In: ICML (1993)Google Scholar
  23. 23.
    Pentina, A., Sharmanska, V., Lampert, C.H.: Curriculum learning of multiple tasks. In: CVPR (2015)Google Scholar
  24. 24.
    Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML (2009)Google Scholar
  25. 25.
    Sung, K.K., Poggio, T.: Example-based learning for view-based human face detection. In: T-PAMI (1998)Google Scholar
  26. 26.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. In: T-PAMI (2010)Google Scholar
  27. 27.
    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  28. 28.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)Google Scholar
  29. 29.
    Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition 48, 71–99 (1993)CrossRefGoogle Scholar
  30. 30.
    Pentina, A., Sharmanska, V., Lampert, C.H.: Curriculum learning of multiple tasks. In: CVPR (2015)Google Scholar
  31. 31.
    Graves, A., Bellemare, M.G., Menick, J., Munos, R., Kavukcuoglu, K.: Automated curriculum learning for neural networks. arXiv (2017)Google Scholar
  32. 32.
    Zaremba, W., Sutskever, I.: Learning to execute. arXiv (2014)Google Scholar
  33. 33.
    Luo, Z., Zou, Y., Hoffman, J., Fei-Fei, L.: Label efficient learning of transferable representations across domains and tasks. In: NIPS (2017)Google Scholar
  34. 34.
    Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: ICML (2011)Google Scholar
  35. 35.
    Tzeng, E., Hoffman, J., Darrell, T., Saenko, K.: Simultaneous deep transfer across domains and tasks. In: ICCV (2015)Google Scholar
  36. 36.
    Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR (2018)Google Scholar
  37. 37.
    Kokkinos, I.: Ubernet: training auniversal’ convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: CVPR (2017)Google Scholar
  38. 38.
    Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv (2013)Google Scholar
  39. 39.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. arXiv (2017)Google Scholar
  40. 40.
    Chen, Z., Badrinarayanan, V., Lee, C.Y., Rabinovich, A.: Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks. arXiv (2017)Google Scholar
  41. 41.
    Kumar, M.P., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: NIPS (2010)Google Scholar
  42. 42.
    Xu, D., Alameda-Pineda, X., Song, J., Ricci, E., Sebe, N.: Cross-paced representation learning with partial curricula for sketch-based image retrieval. arXiv (2018)Google Scholar
  43. 43.
    Li, C., Yan, J., Wei, F., Dong, W., Liu, Q., Zha, H.: Self-paced multi-task learning. In: AAAI (2017)Google Scholar
  44. 44.
    Xu, W., Liu, W., Chi, H., Huang, X., Yang, J.: Multi-task classification with sequential instances and tasks. Signal Process. Image Commun. 64, 59–67 (2018)Google Scholar
  45. 45.
    Oudeyer, P.Y., Kaplan, F., Hafner, V.V.: Intrinsic motivation systems for autonomous mental development. Trans. Evol. Comput. 11(2), 265–286 (2007)CrossRefGoogle Scholar
  46. 46.
    urgen Schmidhuber, J.: A possibility for implementing curiosity and boredom in model-building neural controllers. In: From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior (1991)Google Scholar
  47. 47.
    Storck, J., Hochreiter, S., Schmidhuber, J.: Reinforcement driven information acquisition in non-deterministic environments. In: International Conference on Artificial Neural Networks (1995)Google Scholar
  48. 48.
    Itti, L., Baldi, P.: Bayesian surprise attracts human attention. Vision Res. 49(10), 1295–1306 (2009)CrossRefGoogle Scholar
  49. 49.
    Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., Abbeel, P.: Vime: variational information maximizing exploration. In: NIPS (2016)Google Scholar
  50. 50.
    Rosenbaum, C., Klinger, T., Riemer, M.: Routing networks: adaptive selection of non-linear functions for multi-task learning. arXiv (2017)Google Scholar
  51. 51.
    Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. arXiv (2016)Google Scholar
  52. 52.
    Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: NIPS (2016)Google Scholar
  53. 53.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)Google Scholar
  54. 54.
    Bilen, H., Vedaldi, A.: Integrated perception with recurrent multi-task neural networks. In: NIPS. (2016)Google Scholar
  55. 55.
    Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: CVPR (2017)Google Scholar
  56. 56.
    Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. In: T-PAMI (2017)Google Scholar
  57. 57.
    Anastasopoulos, A., Chiang, D.: Tied multitask learning for neural speech translation. arXiv (2018)Google Scholar
  58. 58.
    Baxter, J.: A bayesian/information theoretic model of learning to learn via multiple task sampling. Mach. Learn. 28, 7 (1997)Google Scholar
  59. 59.
    Meyerson, E., Miikkulainen, R.: Pseudo-task augmentation: From deep multitask learning to intratask sharing-and back. arXiv (2018)Google Scholar
  60. 60.
    Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv (2017)Google Scholar
  61. 61.
    Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR, pp. 3150–3158 (2016)Google Scholar
  62. 62.
    Teichmann, M., Weber, M., Zoellner, M., Cipolla, R., Urtasun, R.: Multinet: real-time joint semantic reasoning for autonomous driving. arXiv (2016)Google Scholar
  63. 63.
    Ben-David, S., Borbely, R.S.: A notion of task relatedness yielding provable multiple-task learning guarantees. Mach. Learn. 73, 273 (2008)CrossRefGoogle Scholar
  64. 64.
    Meyerson, E., Miikkulainen, R.: Beyond shared hierarchies: deep multitask learning through soft layer ordering. In: ICLR (2018)Google Scholar
  65. 65.
    Kang, Z., Grauman, K., Sha, F.: Learning with whom to share in multi-task feature learning. In: ICML (2011)Google Scholar
  66. 66.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. In: JMLR (2011)Google Scholar
  67. 67.
    Søgaard, A., Goldberg, Y.: Deep multi-task learning with low level tasks supervised at lower layers. In: Association for Computational Linguistics (2016)Google Scholar
  68. 68.
    Chen, Q., Zhu, X., Ling, Z., Wei, S., Jiang, H.: Enhancing and combining sequential and tree LSTM for natural language inference. arXiv (2016)Google Scholar
  69. 69.
    Eriguchi, A., Hashimoto, K., Tsuruoka, Y.: Tree-to-sequence attentional neural machine translation. arXiv (2016)Google Scholar
  70. 70.
    Zamir, A.R., et al.: Feedback networks. In: CVPR (2017)Google Scholar
  71. 71.
    Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: Multimedia Conference (2016)Google Scholar
  72. 72.
    Rahman, M.A., Wang, Y.: Optimizing intersection-over-union in deep neural networks for image segmentation. In: International Symposium on Visual Computing (2016)Google Scholar
  73. 73.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv (2014)Google Scholar
  74. 74.
    Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: CVPR (2010)Google Scholar
  75. 75.
    Pishchulin, L., Jain, A., Andriluka, M., Thormählen, T., Schiele, B.: Articulated people detection and pose estimation: Reshaping the future. In: CVPR (2012)Google Scholar
  76. 76.
    Gkioxari, G., Hariharan, B., Girshick, R., Malik, J.: Using k-poselets for detecting people and localizing their keypoints. In: CVPR (2014)Google Scholar
  77. 77.
    Iqbal, U., Gall, J.: Multi-person pose estimation with local joint-to-person associations. In: ECCV, Springer (2016)Google Scholar
  78. 78.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)Google Scholar
  79. 79.
    Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: CVPR (2017)Google Scholar
  80. 80.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  81. 81.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)Google Scholar
  82. 82.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)CrossRefGoogle Scholar
  83. 83.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). Scholar
  84. 84.
    Paszke, A., et al.: Pytorch: tensors and dynamic neural networks in python with strong GPU acceleration (2017)Google Scholar
  85. 85.
    Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467 (2016)
  86. 86.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceStanford UniversityStanfordUSA

Personalised recommendations