Skip to main content
Log in

Towards data-free gating of heterogeneous pre-trained neural networks

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The combination and aggregation of knowledge from multiple neural networks can be commonly seen in the form of mixtures of experts. However, such combinations are usually done using networks trained on the same tasks, with little mention of the combination of heterogeneous pre-trained networks, especially in the data-free regime. The problem of combining pre-trained models in the absence of relevant datasets is likely to become increasingly important, as machine learning continues to dominate the AI landscape, and the number of useful but specialized models explodes. This paper proposes multiple data-free methods for the combination of heterogeneous neural networks, ranging from the utilization of simple output logit statistics, to training specialized gating networks. The gating networks decide whether specific inputs belong to specific networks based on the nature of the expert activations generated. The experiments revealed that the gating networks, including the universal gating approach, constituted the most accurate approach, and therefore represent a pragmatic step towards applications with heterogeneous mixtures of experts in a data-free regime. The code for this project is hosted on github at https://github.com/cwkang1998/network-merging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Perkins D N, Salomon G (1989) Are cognitive skills context-bound?. Educ Res 18(1):16–25

    Article  Google Scholar 

  2. Torrey L, Shavlik J (2010) Transfer learning. In: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI Global, pp 242–264

  3. Pan S J, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22 (10):1345–1359

    Article  Google Scholar 

  4. Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75

    Article  MathSciNet  Google Scholar 

  5. Valentini G, Masulli F (2002) Ensembles of learning machines. In: Italian workshop on neural nets. Springer, pp 3–20

  6. Ju C, Bibaut A, van der Laan M (2018) The relative performance of ensemble methods with deep convolutional neural networks for image classification. J Appl Stat 45(15):2800–2818

    Article  MathSciNet  Google Scholar 

  7. Minetto R, Segundo M P, Sarkar S (2019) Hydra: An ensemble of convolutional neural networks for geospatial land classification. IEEE Trans Geosci Remote Sens 57(9):6530–6541

    Article  Google Scholar 

  8. Shakeel P M, Tolba A, Al-Makhadmeh Z, Jaber M M (2020) Automatic detection of lung cancer from biomedical data set using discrete adaboost optimized ensemble learning generalized neural networks. Neural Comput Appl 32(3):777–790

    Article  Google Scholar 

  9. Jordan M I, Jacobs R A (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6(2):181–214

    Article  Google Scholar 

  10. Chamroukhi F (2016) Robust mixture of experts modeling using the t distribution. Neural Netw 79:20–36

    Article  Google Scholar 

  11. Nguyen H D, Chamroukhi F (2018) Practical and theoretical aspects of mixture-of-experts modeling: An overview. Wiley Interdiscip Rev Data Min Knowl Discov 8(4):e1246

    Article  Google Scholar 

  12. Guo J, Shah D, Barzilay R (2018) Multi-source domain adaptation with mixture of experts. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, pp 4694–4703. https://www.aclweb.org/anthology/D18-1498

  13. Fu H, Gong M, Wang C, Tao D (2018) Moe-spnet: A mixture-of-experts scene parsing network. Pattern Recogn 84:226–236

    Article  Google Scholar 

  14. Nguyen T, Pernkopf F (2019) Acoustic scene classification with mismatched recording devices using mixture of experts layer. In: 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1666–1671

  15. Wang X, Yu F, Dunlap L, Ma Y-A, Wang R, Mirhoseini A, Darrell T, Gonzalez J E (2020) Deep mixture of experts via shallow embedding. In: Uncertainty in Artificial Intelligence. PMLR, pp 552–562

  16. Liu J, Desrosiers C, Zhou Y (2020) Att-moe: Attention-based mixture of experts for nuclear and cytoplasmic segmentation. Neurocomputing 411:139–148

    Article  Google Scholar 

  17. Ponti MP Jr (2011) Combining classifiers: from the creation of ensembles to the decision fusion. In: 2011 24th SIBGRAPI Conference on Graphics, Patterns, and Images Tutorials. IEEE, pp 1–10

  18. Sagi O, Rokach L (2018) Ensemble learning: A survey. Wiley Interdiscip Rev Data Min Knowl Discov 8(4):e1249

    Article  Google Scholar 

  19. Chun-Wei L, Yue H (2017) Multi-expert opinions combination based on evidence theory. In: Proceedings of the 7th WSEAS International Conference on Simulation, Modelling and Optimization, pp Beijing

  20. Kittler J, Hatef M, Duin Robert PW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239

    Article  Google Scholar 

  21. Sahin S, Tolun M R, Hassanpour R (2012) Hybrid expert systems: A survey of current approaches and applications. Expert Syst Appl 39(4):4609–4617

    Article  Google Scholar 

  22. Juuso E K (2004) Integration of intelligent systems in development of smart adaptive systems. Int J Approx Reason 35(3):307–337

    Article  Google Scholar 

  23. Neagu C-D, Avouris N, Kalapanidas E, Palade V (2002) Neural and neuro-fuzzy integration in a knowledge-based system for air quality prediction. Appl Intell 17(2):141–169

    Article  Google Scholar 

  24. Gavrilov A V (2008) Hybrid rule and neural network based framework for ubiquitous computing. In: 2008 Fourth International Conference on Networked Computing and Advanced Information Management, vol 2. IEEE, pp 488–492

  25. Nabeshima K, Suzudo T, Ohno T, Kudo K (2002) Nuclear reactor monitoring with the combination of neural network and expert system. Math Comput Simul 60(3-5):233–244

    Article  MathSciNet  Google Scholar 

  26. Masoudnia S, Ebrahimpour R (2014) Mixture of experts: a literature survey. Artif Intell Rev 42(2):275–293

    Article  Google Scholar 

  27. Hong X, Harris C J (2002) A mixture of experts network structure construction algorithm for modelling and control. Appl Intell 16(1):59–69

    Article  Google Scholar 

  28. Sharma V, Vepakomma P, Swedish T, Chang K, Kalpathy-Cramer J, Raskar R (2019) Expertmatcher: Automating ml model selection for users in resource constrained countries. arXiv:1910.02312

  29. Shazeer N, Mirhoseini A, Maziarz K, Davis A, Le Q, Hinton G, Dean J (2017) Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv:1701.06538

  30. Maeda S (2020) Fast and flexible image blind denoising via competition of experts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 528–529

  31. Jacobs R A, Jordan M I, Nowlan S J, Hinton G E (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87

    Article  Google Scholar 

  32. Schwab P, Miladinovic D, Karlen W (2019) Granger-causal attentive mixtures of experts: Learning important features with neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 4846–4853

  33. LeCun Y, Cortes C, Burges C JC The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/ Accessed: 2021-07-01

  34. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical Report

  35. LeCun Y, Jackel LD, Bottou L, Brunot A, Cortes C, Denker J, Drucker H, Guyon I, Muller UA, Sackinger E et al (1995) Comparison of learning algorithms for handwritten digit recognition. In: International conference on artificial neural networks, vol 60, Perth, Australia, pp 53–60

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Wen Kang.

Ethics declarations

Conflicts of interest/Competing interests

Not applicable.

Additional information

Availability of data and material

All datasets used are in the public domain.

Code availability

https://github.com/cwkang1998/network-merging

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kang, C.W., Hong, C.M. & Maul, T. Towards data-free gating of heterogeneous pre-trained neural networks. Appl Intell 51, 8045–8056 (2021). https://doi.org/10.1007/s10489-021-02301-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02301-w

Keywords

Navigation