Abstract
Artificial neural networks are well known computational models that have been successful in demonstrating various human cognitive capabilities. Nevertheless, as opposed to the human brain, neural networks usually require starting from the scratch to learn a new task. Furthermore, in contrast to human abilities, re-training a network on a new task will not conserve already learned information necessarily and may lead to a catastrophic forgetting. Having a well-established method for knowledge transfer between neural networks can alleviate these issues. Here in this paper, we propose a method to fuse knowledge contained in separate trained networks. The method is non-iterative and does not require initial or additional training data or training sessions. The theoretical basis of the model based on a probabilistic approach is presented and its performance for feedforward neural networks is tested on classification tasks for several publicly available data sets.
Similar content being viewed by others
References
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Balakrishnan S, Madigan D (2008) Algorithms for sparse linear classifiers in the massive data setting. J Mach Learn Res 9:313–337
Muller K-R, Mika S, Ratsch G, Tsuda K, Scholkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–201
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Mokhtari F, Hossein-Zadeh G-A (2013) Decoding brain states using backward edge elimination and graph kernels in fMRI connectivity networks. J Neurosci Methods 212(2):259–268
Gao Z-K, Cai Q, Yang Y-X, Dong N, Zhang S-S (2017) Visibility graph from adaptive optimal kernel time-frequency representation for classification of epileptiform EEG. Int J Neural Syst 27(04):1750005
McCloskey M, Cohen NJ (1989) Catastrophic interference in connectionist networks: the sequential learning problem. Psychol Learn Motiv 24:109–165
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans knowl Data Eng 22(10):1345–1359
Pérez-Sánchez B, Fontenla-Romero O, Guijarro-Berdiñas B (2016) A review of adaptive online learning for artificial neural networks. Artif Intell Rev 1–19
Jain LC, Seera M, Lim CP, Balasubramaniam P (2014) A review of online learning in supervised neural networks. Neural Comput Appl 25(3–4):491–509
Thrun S, Pratt L (2012) Learning to learn. Springer, New York
Yang L, Jing L, Yu J, Ng MK (2016) Learning transferred weights from co-occurrence data for heterogeneous transfer learning. IEEE Trans Neural Netw Learn Syst 27(11):2187–2200
Li J, Zhao R, Huang J-T, Gong Y (2014) Learning small-size DNN with output-distribution-based criteria. In: Fifteenth annual conference of the international speech communication association,
Hinton G, Vinyals O, Dean J (2014) Distilling the knowledge in a neural network. Paper presented at the NIPS workshop
Ba J, Caruana R (2014) Do deep nets really need to be deep? Advances in neural information processing systems 2654–2662
Tang Z, Wang D, Zhang Z (2016) Recurrent neural network training with dark knowledge transfer. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5900–5904
Zhang Y, Xiang T, Hospedales TM, Lu H (2017) Deep mutual learning. arXiv preprint arXiv:1706.00384
Ratcliff R (1990) Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol Rev 97(2):285–308
French RM (1991) Using semi-distributed representations to overcome catastrophic interference in connectionist networks. In: Proceedings of the thirteenth annual conference of the cognitive science society, Erlbaum, Hillsdale, pp 173–178
French RM (1999) Catastrophic forgetting in connectionist networks. Trends Cogn Sci 3(4):128–135
Caruana R (1998) Multitask learning. In: Learning to learn. Springer, New York, pp 95–133
Li Z, Hoiem D (2016) Learning without forgetting. In: European conference on computer vision. Springer, New York, pp 614–629
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) DeCAF: A deep convolutional activation feature for generic visual recognition. In: International conference in machine learning (ICML), pp 647–655
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
French RM, Ans B, Rousset S (2001) Pseudopatterns and dual-network memory models: advantages and shortcomings. In: Connectionist models of learning, development and evolution. Springer, pp 13–22
Li H, Wang X, Ding S (2017) Research and development of neural network ensembles: a survey. Artif Intell Rev 1–25
Woźniak M, Graña M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fusion 16:3–17
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Domingos P (2000) Bayesian averaging of classifiers and the overfitting problem. In: 17th international conference on machine learning, San Francisco, pp 223–230
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Cho S-B, Kim JH (1995) Multiple network fusion using fuzzy logic. IEEE Trans Neural Netw 6(2):497–501
Buciluǎ C, Caruana R, Niculescu-Mizil (2006) A Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 535–541
Mantas CJ (2008) A generic fuzzy aggregation operator: rules extraction from and insertion into artificial neural networks. Soft Comput 12(5):493–514
Kolman E, Margaliot M (2005) Are artificial neural networks white boxes? IEEE Trans Neural Netw 16(4):844–852
Hruschka ER, Ebecken NF (2006) Extracting rules from multilayer perceptrons in classification problems: a clustering-based approach. Neurocomputing 70(1):384–397
McGarry KJ, MacIntyre J (1999) Knowledge extraction and insertion from radial basis function networks. In: IEE colloquium on applied statistical pattern recognition, p 15
Kasabov N, Woodford B (1999) Rule insertion and rule extraction from evolving fuzzy neural networks: algorithms and applications for building adaptive, intelligent expert systems. In: Fuzzy systems conference proceedings, 1999. FUZZ-IEEE’99. 1999 IEEE international. IEEE, pp 1406–1411
Tran SN, Garcez ASdA (2016) Deep logic networks: inserting and extracting knowledge from deep belief networks. IEEE transactions on neural networks and learning systems, vol 99, pp 1–13. doi:10.1109/TNNLS.2016.2603784
Tran SN, Garcez AdA (2013) Knowledge extraction from deep belief networks for images. In: IJCAI-2013 workshop on neural-symbolic learning and reasoning
Fourati H (2015) Multisensor data fusion: from algorithms and architectural design to applications. CRC Press, Boca Raton
Remagnino P, Monekosso DN, Jain LC (eds) Innovations in defence support systems -3, vol 336. Springer, Berlin Heidelberg, pp 1–21
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Simard PY, Steinkraus D, Platt JC (2003) Best practices for convolutional neural networks applied to visual document analysis. In: ICDAR. Citeseer, pp 958–962
Quinlan JR (1987) Simplifying decision trees. Int J Man-Mach Stud 27(3):221–234
Noordewier MO, Towell GG, Shavlik JW (1991) Training knowledge-based neural networks to recognize genes in DNA sequences. In: Advances in neural information processing systems (vol 3, pp 530–536). Denver, CO: Morgan Kaufmann
Schlimmer JC (1987) Concept acquisition through representational adjustment. Doctoral disseration, Department of Information and Computer Science, University of California, Irvine
Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
McRae K, Hetherington PA (1993) Catastrophic interference is eliminated in pretrained networks. In: Proceedings of the 15h annual conference of the cognitive science society, pp 723–728
Bengio Y (2009) Learning deep architectures for AI. Foundations and trends ®. Mach Learn 2(1):1–127
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y Fitnets (2015) Hints for thin deep nets. In: Proceedings of international conference on learning representations
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Akhlaghi, M.I., Sukhov, S.V. Knowledge Fusion in Feedforward Artificial Neural Networks. Neural Process Lett 48, 257–272 (2018). https://doi.org/10.1007/s11063-017-9712-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-017-9712-5