Abstract
The combination and aggregation of knowledge from multiple neural networks can be commonly seen in the form of mixtures of experts. However, such combinations are usually done using networks trained on the same tasks, with little mention of the combination of heterogeneous pre-trained networks, especially in the data-free regime. The problem of combining pre-trained models in the absence of relevant datasets is likely to become increasingly important, as machine learning continues to dominate the AI landscape, and the number of useful but specialized models explodes. This paper proposes multiple data-free methods for the combination of heterogeneous neural networks, ranging from the utilization of simple output logit statistics, to training specialized gating networks. The gating networks decide whether specific inputs belong to specific networks based on the nature of the expert activations generated. The experiments revealed that the gating networks, including the universal gating approach, constituted the most accurate approach, and therefore represent a pragmatic step towards applications with heterogeneous mixtures of experts in a data-free regime. The code for this project is hosted on github at https://github.com/cwkang1998/network-merging.
Similar content being viewed by others
References
Perkins D N, Salomon G (1989) Are cognitive skills context-bound?. Educ Res 18(1):16–25
Torrey L, Shavlik J (2010) Transfer learning. In: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI Global, pp 242–264
Pan S J, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22 (10):1345–1359
Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75
Valentini G, Masulli F (2002) Ensembles of learning machines. In: Italian workshop on neural nets. Springer, pp 3–20
Ju C, Bibaut A, van der Laan M (2018) The relative performance of ensemble methods with deep convolutional neural networks for image classification. J Appl Stat 45(15):2800–2818
Minetto R, Segundo M P, Sarkar S (2019) Hydra: An ensemble of convolutional neural networks for geospatial land classification. IEEE Trans Geosci Remote Sens 57(9):6530–6541
Shakeel P M, Tolba A, Al-Makhadmeh Z, Jaber M M (2020) Automatic detection of lung cancer from biomedical data set using discrete adaboost optimized ensemble learning generalized neural networks. Neural Comput Appl 32(3):777–790
Jordan M I, Jacobs R A (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6(2):181–214
Chamroukhi F (2016) Robust mixture of experts modeling using the t distribution. Neural Netw 79:20–36
Nguyen H D, Chamroukhi F (2018) Practical and theoretical aspects of mixture-of-experts modeling: An overview. Wiley Interdiscip Rev Data Min Knowl Discov 8(4):e1246
Guo J, Shah D, Barzilay R (2018) Multi-source domain adaptation with mixture of experts. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, pp 4694–4703. https://www.aclweb.org/anthology/D18-1498
Fu H, Gong M, Wang C, Tao D (2018) Moe-spnet: A mixture-of-experts scene parsing network. Pattern Recogn 84:226–236
Nguyen T, Pernkopf F (2019) Acoustic scene classification with mismatched recording devices using mixture of experts layer. In: 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1666–1671
Wang X, Yu F, Dunlap L, Ma Y-A, Wang R, Mirhoseini A, Darrell T, Gonzalez J E (2020) Deep mixture of experts via shallow embedding. In: Uncertainty in Artificial Intelligence. PMLR, pp 552–562
Liu J, Desrosiers C, Zhou Y (2020) Att-moe: Attention-based mixture of experts for nuclear and cytoplasmic segmentation. Neurocomputing 411:139–148
Ponti MP Jr (2011) Combining classifiers: from the creation of ensembles to the decision fusion. In: 2011 24th SIBGRAPI Conference on Graphics, Patterns, and Images Tutorials. IEEE, pp 1–10
Sagi O, Rokach L (2018) Ensemble learning: A survey. Wiley Interdiscip Rev Data Min Knowl Discov 8(4):e1249
Chun-Wei L, Yue H (2017) Multi-expert opinions combination based on evidence theory. In: Proceedings of the 7th WSEAS International Conference on Simulation, Modelling and Optimization, pp Beijing
Kittler J, Hatef M, Duin Robert PW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239
Sahin S, Tolun M R, Hassanpour R (2012) Hybrid expert systems: A survey of current approaches and applications. Expert Syst Appl 39(4):4609–4617
Juuso E K (2004) Integration of intelligent systems in development of smart adaptive systems. Int J Approx Reason 35(3):307–337
Neagu C-D, Avouris N, Kalapanidas E, Palade V (2002) Neural and neuro-fuzzy integration in a knowledge-based system for air quality prediction. Appl Intell 17(2):141–169
Gavrilov A V (2008) Hybrid rule and neural network based framework for ubiquitous computing. In: 2008 Fourth International Conference on Networked Computing and Advanced Information Management, vol 2. IEEE, pp 488–492
Nabeshima K, Suzudo T, Ohno T, Kudo K (2002) Nuclear reactor monitoring with the combination of neural network and expert system. Math Comput Simul 60(3-5):233–244
Masoudnia S, Ebrahimpour R (2014) Mixture of experts: a literature survey. Artif Intell Rev 42(2):275–293
Hong X, Harris C J (2002) A mixture of experts network structure construction algorithm for modelling and control. Appl Intell 16(1):59–69
Sharma V, Vepakomma P, Swedish T, Chang K, Kalpathy-Cramer J, Raskar R (2019) Expertmatcher: Automating ml model selection for users in resource constrained countries. arXiv:1910.02312
Shazeer N, Mirhoseini A, Maziarz K, Davis A, Le Q, Hinton G, Dean J (2017) Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv:1701.06538
Maeda S (2020) Fast and flexible image blind denoising via competition of experts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 528–529
Jacobs R A, Jordan M I, Nowlan S J, Hinton G E (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87
Schwab P, Miladinovic D, Karlen W (2019) Granger-causal attentive mixtures of experts: Learning important features with neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 4846–4853
LeCun Y, Cortes C, Burges C JC The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/ Accessed: 2021-07-01
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical Report
LeCun Y, Jackel LD, Bottou L, Brunot A, Cortes C, Denker J, Drucker H, Guyon I, Muller UA, Sackinger E et al (1995) Comparison of learning algorithms for handwritten digit recognition. In: International conference on artificial neural networks, vol 60, Perth, Australia, pp 53–60
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest/Competing interests
Not applicable.
Additional information
Availability of data and material
All datasets used are in the public domain.
Code availability
https://github.com/cwkang1998/network-merging
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kang, C.W., Hong, C.M. & Maul, T. Towards data-free gating of heterogeneous pre-trained neural networks. Appl Intell 51, 8045–8056 (2021). https://doi.org/10.1007/s10489-021-02301-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02301-w