Towards data-free gating of heterogeneous pre-trained neural networks

Kang, Chen Wen; Hong, Chua Meng; Maul, Tomas

doi:10.1007/s10489-021-02301-w

Towards data-free gating of heterogeneous pre-trained neural networks

Published: 24 March 2021

Volume 51, pages 8045–8056, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

334 Accesses
11 Altmetric
1 Mention
Explore all metrics

Abstract

The combination and aggregation of knowledge from multiple neural networks can be commonly seen in the form of mixtures of experts. However, such combinations are usually done using networks trained on the same tasks, with little mention of the combination of heterogeneous pre-trained networks, especially in the data-free regime. The problem of combining pre-trained models in the absence of relevant datasets is likely to become increasingly important, as machine learning continues to dominate the AI landscape, and the number of useful but specialized models explodes. This paper proposes multiple data-free methods for the combination of heterogeneous neural networks, ranging from the utilization of simple output logit statistics, to training specialized gating networks. The gating networks decide whether specific inputs belong to specific networks based on the nature of the expert activations generated. The experiments revealed that the gating networks, including the universal gating approach, constituted the most accurate approach, and therefore represent a pragmatic step towards applications with heterogeneous mixtures of experts in a data-free regime. The code for this project is hosted on github at https://github.com/cwkang1998/network-merging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Calibration of Mixup Training for Deep Neural Networks

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

Towards Automatically-Tuned Deep Neural Networks

References

Perkins D N, Salomon G (1989) Are cognitive skills context-bound?. Educ Res 18(1):16–25
Article Google Scholar
Torrey L, Shavlik J (2010) Transfer learning. In: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI Global, pp 242–264
Pan S J, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22 (10):1345–1359
Article Google Scholar
Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75
Article MathSciNet Google Scholar
Valentini G, Masulli F (2002) Ensembles of learning machines. In: Italian workshop on neural nets. Springer, pp 3–20
Ju C, Bibaut A, van der Laan M (2018) The relative performance of ensemble methods with deep convolutional neural networks for image classification. J Appl Stat 45(15):2800–2818
Article MathSciNet Google Scholar
Minetto R, Segundo M P, Sarkar S (2019) Hydra: An ensemble of convolutional neural networks for geospatial land classification. IEEE Trans Geosci Remote Sens 57(9):6530–6541
Article Google Scholar
Shakeel P M, Tolba A, Al-Makhadmeh Z, Jaber M M (2020) Automatic detection of lung cancer from biomedical data set using discrete adaboost optimized ensemble learning generalized neural networks. Neural Comput Appl 32(3):777–790
Article Google Scholar
Jordan M I, Jacobs R A (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6(2):181–214
Article Google Scholar
Chamroukhi F (2016) Robust mixture of experts modeling using the t distribution. Neural Netw 79:20–36
Article Google Scholar
Nguyen H D, Chamroukhi F (2018) Practical and theoretical aspects of mixture-of-experts modeling: An overview. Wiley Interdiscip Rev Data Min Knowl Discov 8(4):e1246
Article Google Scholar
Guo J, Shah D, Barzilay R (2018) Multi-source domain adaptation with mixture of experts. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, pp 4694–4703. https://www.aclweb.org/anthology/D18-1498
Fu H, Gong M, Wang C, Tao D (2018) Moe-spnet: A mixture-of-experts scene parsing network. Pattern Recogn 84:226–236
Article Google Scholar
Nguyen T, Pernkopf F (2019) Acoustic scene classification with mismatched recording devices using mixture of experts layer. In: 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1666–1671
Wang X, Yu F, Dunlap L, Ma Y-A, Wang R, Mirhoseini A, Darrell T, Gonzalez J E (2020) Deep mixture of experts via shallow embedding. In: Uncertainty in Artificial Intelligence. PMLR, pp 552–562
Liu J, Desrosiers C, Zhou Y (2020) Att-moe: Attention-based mixture of experts for nuclear and cytoplasmic segmentation. Neurocomputing 411:139–148
Article Google Scholar
Ponti MP Jr (2011) Combining classifiers: from the creation of ensembles to the decision fusion. In: 2011 24th SIBGRAPI Conference on Graphics, Patterns, and Images Tutorials. IEEE, pp 1–10
Sagi O, Rokach L (2018) Ensemble learning: A survey. Wiley Interdiscip Rev Data Min Knowl Discov 8(4):e1249
Article Google Scholar
Chun-Wei L, Yue H (2017) Multi-expert opinions combination based on evidence theory. In: Proceedings of the 7th WSEAS International Conference on Simulation, Modelling and Optimization, pp Beijing
Kittler J, Hatef M, Duin Robert PW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239
Article Google Scholar
Sahin S, Tolun M R, Hassanpour R (2012) Hybrid expert systems: A survey of current approaches and applications. Expert Syst Appl 39(4):4609–4617
Article Google Scholar
Juuso E K (2004) Integration of intelligent systems in development of smart adaptive systems. Int J Approx Reason 35(3):307–337
Article Google Scholar
Neagu C-D, Avouris N, Kalapanidas E, Palade V (2002) Neural and neuro-fuzzy integration in a knowledge-based system for air quality prediction. Appl Intell 17(2):141–169
Article Google Scholar
Gavrilov A V (2008) Hybrid rule and neural network based framework for ubiquitous computing. In: 2008 Fourth International Conference on Networked Computing and Advanced Information Management, vol 2. IEEE, pp 488–492
Nabeshima K, Suzudo T, Ohno T, Kudo K (2002) Nuclear reactor monitoring with the combination of neural network and expert system. Math Comput Simul 60(3-5):233–244
Article MathSciNet Google Scholar
Masoudnia S, Ebrahimpour R (2014) Mixture of experts: a literature survey. Artif Intell Rev 42(2):275–293
Article Google Scholar
Hong X, Harris C J (2002) A mixture of experts network structure construction algorithm for modelling and control. Appl Intell 16(1):59–69
Article Google Scholar
Sharma V, Vepakomma P, Swedish T, Chang K, Kalpathy-Cramer J, Raskar R (2019) Expertmatcher: Automating ml model selection for users in resource constrained countries. arXiv:1910.02312
Shazeer N, Mirhoseini A, Maziarz K, Davis A, Le Q, Hinton G, Dean J (2017) Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv:1701.06538
Maeda S (2020) Fast and flexible image blind denoising via competition of experts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 528–529
Jacobs R A, Jordan M I, Nowlan S J, Hinton G E (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87
Article Google Scholar
Schwab P, Miladinovic D, Karlen W (2019) Granger-causal attentive mixtures of experts: Learning important features with neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 4846–4853
LeCun Y, Cortes C, Burges C JC The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/ Accessed: 2021-07-01
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical Report
LeCun Y, Jackel LD, Bottou L, Brunot A, Cortes C, Denker J, Drucker H, Guyon I, Muller UA, Sackinger E et al (1995) Comparison of learning algorithms for handwritten digit recognition. In: International conference on artificial neural networks, vol 60, Perth, Australia, pp 53–60

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

University of Nottingham Malaysia, Semenyih, Malaysia
Chen Wen Kang, Chua Meng Hong & Tomas Maul

Authors

Chen Wen Kang
View author publications
You can also search for this author in PubMed Google Scholar
Chua Meng Hong
View author publications
You can also search for this author in PubMed Google Scholar
Tomas Maul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Wen Kang.

Ethics declarations

Conflicts of interest/Competing interests

Not applicable.

Additional information

Availability of data and material

All datasets used are in the public domain.

Code availability

https://github.com/cwkang1998/network-merging

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, C.W., Hong, C.M. & Maul, T. Towards data-free gating of heterogeneous pre-trained neural networks. Appl Intell 51, 8045–8056 (2021). https://doi.org/10.1007/s10489-021-02301-w

Download citation

Accepted: 01 March 2021
Published: 24 March 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s10489-021-02301-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards data-free gating of heterogeneous pre-trained neural networks

Abstract

Access this article

Similar content being viewed by others

On Calibration of Mixup Training for Deep Neural Networks

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

Towards Automatically-Tuned Deep Neural Networks

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest/Competing interests

Additional information

Availability of data and material

Code availability

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards data-free gating of heterogeneous pre-trained neural networks

Abstract

Access this article

Similar content being viewed by others

On Calibration of Mixup Training for Deep Neural Networks

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

Towards Automatically-Tuned Deep Neural Networks

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest/Competing interests

Additional information

Availability of data and material

Code availability

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation