Boosting binary masks for multi-domain learning through affine transformations


In this work, we present a new, algorithm for multi-domain learning. Given a pretrained architecture and a set of visual domains received sequentially, the goal of multi-domain learning is to produce a single model performing a task in all the domains together. Recent works showed how we can address this problem by masking the internal weights of a given original convnet through learned binary variables. In this work, we provide a general formulation of binary mask-based models for multi-domain learning by affine transformations of the original network parameters. Our formulation obtains significantly higher levels of adaptation to new domains, achieving performances comparable to domain-specific models while requiring slightly more than 1 bit per network parameter per additional domain. Experiments on two popular benchmarks showcase the power of our approach, achieving performances close to state-of-the-art methods on the Visual Decathlon Challenge.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    We focus on classification tasks, but the proposed method applies also to other tasks.

  2. 2.

    Fully connected layers are a special case.

  3. 3.

    If the base architecture contains \(N_p\) parameters and the additional bits introduced per domain are \(A_p\) then \(\#~{\text {Params}=1+\frac{A_p\cdot (T-1)}{32\cdot N_p}}\), where T denotes the number of domains (included the one used for pretraining the network) and the 32 factors come from the bits required for each real number. The classifiers are not included in the computation.


  1. 1.

    Bendale, A., Boult, T.: Towards open world recognition. In: CVPR (2015)

  2. 2.

    Bendale, A., Boult, T.E.: Towards open set deep networks. In: CVPR (2016)

  3. 3.

    Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)

  4. 4.

    Berriel, R., Lathuilière, S., Nabi, M., Klein, T., Oliveira-Santos, T., Sebe, N., Ricci, E.: Budget-aware adapters for multi-domain learning. In: ICCV (2019)

  5. 5.

    Bilen, H., Fernando, B., Gavves, E., Vedaldi, A., Gould, S.: Dynamic image networks for action recognition. In: CVPR (2016)

  6. 6.

    Bilen, H., Vedaldi, A.: Universal representations: the missing link between faces, text, planktons, and cat breeds. arXiv preprint arXiv:1701.07275 (2017)

  7. 7.

    Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010. Springer (2010)

  8. 8.

    Cermelli, F., Mancini, M., Buló, S.R., Ricci, E., Caputo, B.: Modeling the background for incremental learning in semantic segmentation. In: CVPR (2020)

  9. 9.

    Cermelli, F., Mancini, M., Ricci, E., Caputo, B.: The RGB-D triathlon: towards agile visual toolboxes for robots. In: IROS (2019)

  10. 10.

    Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: CVPR (2014)

  11. 11.

    Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to \(+1\) or \(-1\). arXiv preprint arXiv:1602.02830 (2016)

  12. 12.

    Eitz, M., Hays, J., Alexa, M.: How do humans sketch objects? ACM Trans. Gr. 31(4), 44:1–44:10 (2012)

    Google Scholar 

  13. 13.

    Fontanel, D., Cermelli, F., Mancini, M., Buló, S.R., Ricci, E., Caputo, B.: Boosting deep open world recognition by clustering. arXiv preprint arXiv:2004.13849 (2020)

  14. 14.

    French, R.M.: Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 3(4), 128–135 (1999)

    Article  Google Scholar 

  15. 15.

    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

  16. 16.

    Goodfellow, I.J., Mirza, M., Xiao, D., Courville, A., Bengio, Y.: An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013)

  17. 17.

    Goodman, R.M., Zeng, Z.: A learning algorithm for multi-layer perceptrons with hard-limiting threshold units. In: NIPS Workshops (1994)

  18. 18.

    Guerriero, S., Caputo, B., Mensink, T.: Deep nearest class mean classifiers. In: ICLR Worskhops (2018)

  19. 19.

    Guo, Y., Shi, H., Kumar, A., Grauman, K., Rosing, T., Feris, R.: Spottune: transfer learning through adaptive fine-tuning. In: CVPR (2019)

  20. 20.

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

  21. 21.

    Hinton, G.: Neural networks for machine learning (2012). Coursera, video lectures

  22. 22.

    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  23. 23.

    Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: CVPR (2017)

  24. 24.

    Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: NIPS (2016)

  25. 25.

    Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations. JMLR 18(187), 1–30 (2018)

    MATH  Google Scholar 

  26. 26.

    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)

  27. 27.

    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  28. 28.

    Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)

    MathSciNet  Article  Google Scholar 

  29. 29.

    Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: ICCV Workshops (2013)

  30. 30.

    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)

  31. 31.

    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

  32. 32.

    Kuzborskij, I., Orabona, F., Caputo, B.: From N to N + 1: multiclass transfer incremental learning. In: CVPR (2013)

  33. 33.

    Kuzborskij, I., Orabona, F., Caputo, B.: Scalable greedy algorithms for transfer learning. CVIU 156, 174–185 (2017)

    Google Scholar 

  34. 34.

    Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)

    MathSciNet  Article  Google Scholar 

  35. 35.

    Li, Y., Vasconcelos, N.: Efficient multi-domain learning by covariance normalization. In: CVPR (2019)

  36. 36.

    Li, Z., Hoiem, D.: Learning without forgetting. In: IEEE T-PAMI (2017)

  37. 37.

    Lin, D., Talathi, S., Annapureddy, S.: Fixed point quantization of deep convolutional networks. In: ICML (2016)

  38. 38.

    Lin, Z., Courbariaux, M., Memisevic, R., Bengio, Y.: Neural networks with few multiplications. arXiv preprint arXiv:1510.03009 (2015)

  39. 39.

    Liu, S., Johns, E., Davison, A.J.: End-to-end multi-task learning with attention. In: CVPR (2019)

  40. 40.

    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)

  41. 41.

    Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)

  42. 42.

    Mallya, A., Lazebnik, S.: Packnet: adding multiple tasks to a single network by iterative pruning. In: CVPR (2018)

  43. 43.

    Mallya, A., Lazebnik, S.: Piggyback: adding multiple tasks to a single, fixed network by learning to mask. arXiv preprint arXiv:1801.06519 (2018)

  44. 44.

    Mancini, M., Karaoguz, H., Ricci, E., Jensfelt, P., Caputo, B.: Knowledge is never enough: towards web aided deep open world recognition. In: ICRA (2019)

  45. 45.

    Mancini, M., Ricci, E., Caputo, B., Rota Buló, S.: Adding new tasks to a single network with weight transformations using binary masks. In: ECCV-WS (2018)

  46. 46.

    McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Bower, G.H. (ed.) Psychology of learning and motivation, vol. 24, pp. 109–165. Elsevier, Amsterdam (1989)

    Google Scholar 

  47. 47.

    Mensink, T., Verbeek, J.J., Perronnin, F., Csurka, G.: Distance-based image classification: generalizing to new classes at near-zero cost. IEEE T-PAMI 35(11), 2624–2637 (2013)

    Article  Google Scholar 

  48. 48.

    Morgado, P., Vasconcelos, N.: Nettailor: tuning the architecture, not just the weights. In: CVPR (2019)

  49. 49.

    Munder, S., Gavrila, D.M.: An experimental study on pedestrian classification. IEEE T-PAMI 28(11), 1863–1868 (2006)

    Article  Google Scholar 

  50. 50.

    Neelakantan, A., Vilnis, L., Le, Q.V., Sutskever, I., Kaiser, L., Kurach, K., Martens, J.: Adding gradient noise improves learning for very deep networks. arXiv preprint arXiv:1511.06807 (2015)

  51. 51.

    Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshops (2011)

  52. 52.

    Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 2008. ICVGIP’08. IEEE (2008)

  53. 53.

    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-net: Imagenet classification using binary convolutional neural networks. In: ECCV (2016)

  54. 54.

    Rebuffi, S., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: CVPR (2017)

  55. 55.

    Rebuffi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: NIPS (2017)

  56. 56.

    Rebuffi, S.A., Bilen, H., Vedaldi, A.: Efficient parametrization of multi-domain deep neural networks. In: CVPR (2018)

  57. 57.

    Ristin, M., Guillaumin, M., Gall, J., Gool, L.J.V.: Incremental learning of random forests for large-scale image classification. IEEE T-PAMI 38(3), 490–503 (2016)

    Article  Google Scholar 

  58. 58.

    Rosenfeld, A., Tsotsos, J.K.: Incremental learning through deep adaptation. arXiv preprint arXiv:1705.04228 (2017)

  59. 59.

    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)

    MathSciNet  Article  Google Scholar 

  60. 60.

    Rusu, A.A., Rabinowitz, N.C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R., Hadsell, R.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)

  61. 61.

    Saleh, B., Elgammal, A.: Large-scale classification of fine-art paintings: learning the right metric on the right feature. In: International Conference on Data Mining Workshops (2015)

  62. 62.

    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

  63. 63.

    Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)

  64. 64.

    Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 32, 323–332 (2012)

    Article  Google Scholar 

  65. 65.

    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset (2011)

  66. 66.

    Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)

  67. 67.

    Zamir, A.R., Sax, A., Shen, W., Guibas, L.J., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: CVPR (2018)

  68. 68.

    Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)

Download references


We acknowledge financial support from ERC Grant 637076—RoboExNovo and Project DIGIMAP, Grant 860375, funded by the Austrian Research Promotion Agency (FFG).

Author information



Corresponding author

Correspondence to Massimiliano Mancini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mancini, M., Ricci, E., Caputo, B. et al. Boosting binary masks for multi-domain learning through affine transformations. Machine Vision and Applications 31, 42 (2020).

Download citation


  • Multi-domain learning
  • Multi-task learning
  • Quantized neural networks