Skip to main content
Log in

Knowledge Fusion in Feedforward Artificial Neural Networks

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Artificial neural networks are well known computational models that have been successful in demonstrating various human cognitive capabilities. Nevertheless, as opposed to the human brain, neural networks usually require starting from the scratch to learn a new task. Furthermore, in contrast to human abilities, re-training a network on a new task will not conserve already learned information necessarily and may lead to a catastrophic forgetting. Having a well-established method for knowledge transfer between neural networks can alleviate these issues. Here in this paper, we propose a method to fuse knowledge contained in separate trained networks. The method is non-iterative and does not require initial or additional training data or training sessions. The theoretical basis of the model based on a probabilistic approach is presented and its performance for feedforward neural networks is tested on classification tasks for several publicly available data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  2. Balakrishnan S, Madigan D (2008) Algorithms for sparse linear classifiers in the massive data setting. J Mach Learn Res 9:313–337

    MATH  Google Scholar 

  3. Muller K-R, Mika S, Ratsch G, Tsuda K, Scholkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–201

    Article  Google Scholar 

  4. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  5. Mokhtari F, Hossein-Zadeh G-A (2013) Decoding brain states using backward edge elimination and graph kernels in fMRI connectivity networks. J Neurosci Methods 212(2):259–268

    Article  Google Scholar 

  6. Gao Z-K, Cai Q, Yang Y-X, Dong N, Zhang S-S (2017) Visibility graph from adaptive optimal kernel time-frequency representation for classification of epileptiform EEG. Int J Neural Syst 27(04):1750005

    Article  Google Scholar 

  7. McCloskey M, Cohen NJ (1989) Catastrophic interference in connectionist networks: the sequential learning problem. Psychol Learn Motiv 24:109–165

    Article  Google Scholar 

  8. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  9. Pérez-Sánchez B, Fontenla-Romero O, Guijarro-Berdiñas B (2016) A review of adaptive online learning for artificial neural networks. Artif Intell Rev 1–19

  10. Jain LC, Seera M, Lim CP, Balasubramaniam P (2014) A review of online learning in supervised neural networks. Neural Comput Appl 25(3–4):491–509

    Article  Google Scholar 

  11. Thrun S, Pratt L (2012) Learning to learn. Springer, New York

    MATH  Google Scholar 

  12. Yang L, Jing L, Yu J, Ng MK (2016) Learning transferred weights from co-occurrence data for heterogeneous transfer learning. IEEE Trans Neural Netw Learn Syst 27(11):2187–2200

    Article  MathSciNet  Google Scholar 

  13. Li J, Zhao R, Huang J-T, Gong Y (2014) Learning small-size DNN with output-distribution-based criteria. In: Fifteenth annual conference of the international speech communication association,

  14. Hinton G, Vinyals O, Dean J (2014) Distilling the knowledge in a neural network. Paper presented at the NIPS workshop

  15. Ba J, Caruana R (2014) Do deep nets really need to be deep? Advances in neural information processing systems 2654–2662

  16. Tang Z, Wang D, Zhang Z (2016) Recurrent neural network training with dark knowledge transfer. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5900–5904

  17. Zhang Y, Xiang T, Hospedales TM, Lu H (2017) Deep mutual learning. arXiv preprint arXiv:1706.00384

  18. Ratcliff R (1990) Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol Rev 97(2):285–308

    Article  Google Scholar 

  19. French RM (1991) Using semi-distributed representations to overcome catastrophic interference in connectionist networks. In: Proceedings of the thirteenth annual conference of the cognitive science society, Erlbaum, Hillsdale, pp 173–178

  20. French RM (1999) Catastrophic forgetting in connectionist networks. Trends Cogn Sci 3(4):128–135

    Article  Google Scholar 

  21. Caruana R (1998) Multitask learning. In: Learning to learn. Springer, New York, pp 95–133

    Book  Google Scholar 

  22. Li Z, Hoiem D (2016) Learning without forgetting. In: European conference on computer vision. Springer, New York, pp 614–629

  23. Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) DeCAF: A deep convolutional activation feature for generic visual recognition. In: International conference in machine learning (ICML), pp 647–655

  24. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  25. French RM, Ans B, Rousset S (2001) Pseudopatterns and dual-network memory models: advantages and shortcomings. In: Connectionist models of learning, development and evolution. Springer, pp 13–22

  26. Li H, Wang X, Ding S (2017) Research and development of neural network ensembles: a survey. Artif Intell Rev 1–25

  27. Woźniak M, Graña M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fusion 16:3–17

    Article  Google Scholar 

  28. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  29. Domingos P (2000) Bayesian averaging of classifiers and the overfitting problem. In: 17th international conference on machine learning, San Francisco, pp 223–230

  30. Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    Article  Google Scholar 

  31. Cho S-B, Kim JH (1995) Multiple network fusion using fuzzy logic. IEEE Trans Neural Netw 6(2):497–501

    Article  Google Scholar 

  32. Buciluǎ C, Caruana R, Niculescu-Mizil (2006) A Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 535–541

  33. Mantas CJ (2008) A generic fuzzy aggregation operator: rules extraction from and insertion into artificial neural networks. Soft Comput 12(5):493–514

    Article  MATH  Google Scholar 

  34. Kolman E, Margaliot M (2005) Are artificial neural networks white boxes? IEEE Trans Neural Netw 16(4):844–852

    Article  Google Scholar 

  35. Hruschka ER, Ebecken NF (2006) Extracting rules from multilayer perceptrons in classification problems: a clustering-based approach. Neurocomputing 70(1):384–397

    Article  Google Scholar 

  36. McGarry KJ, MacIntyre J (1999) Knowledge extraction and insertion from radial basis function networks. In: IEE colloquium on applied statistical pattern recognition, p 15

  37. Kasabov N, Woodford B (1999) Rule insertion and rule extraction from evolving fuzzy neural networks: algorithms and applications for building adaptive, intelligent expert systems. In: Fuzzy systems conference proceedings, 1999. FUZZ-IEEE’99. 1999 IEEE international. IEEE, pp 1406–1411

  38. Tran SN, Garcez ASdA (2016) Deep logic networks: inserting and extracting knowledge from deep belief networks. IEEE transactions on neural networks and learning systems, vol 99, pp 1–13. doi:10.1109/TNNLS.2016.2603784

  39. Tran SN, Garcez AdA (2013) Knowledge extraction from deep belief networks for images. In: IJCAI-2013 workshop on neural-symbolic learning and reasoning

  40. Fourati H (2015) Multisensor data fusion: from algorithms and architectural design to applications. CRC Press, Boca Raton

    Google Scholar 

  41. Remagnino P, Monekosso DN, Jain LC (eds) Innovations in defence support systems -3, vol 336. Springer, Berlin Heidelberg, pp 1–21

  42. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  43. Simard PY, Steinkraus D, Platt JC (2003) Best practices for convolutional neural networks applied to visual document analysis. In: ICDAR. Citeseer, pp 958–962

  44. Quinlan JR (1987) Simplifying decision trees. Int J Man-Mach Stud 27(3):221–234

    Article  Google Scholar 

  45. Noordewier MO, Towell GG, Shavlik JW (1991) Training knowledge-based neural networks to recognize genes in DNA sequences. In: Advances in neural information processing systems (vol 3, pp 530–536). Denver, CO: Morgan Kaufmann

  46. Schlimmer JC (1987) Concept acquisition through representational adjustment. Doctoral disseration, Department of Information and Computer Science, University of California, Irvine

  47. Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554

    Article  Google Scholar 

  48. McRae K, Hetherington PA (1993) Catastrophic interference is eliminated in pretrained networks. In: Proceedings of the 15h annual conference of the cognitive science society, pp 723–728

  49. Bengio Y (2009) Learning deep architectures for AI. Foundations and trends ®. Mach Learn 2(1):1–127

    Article  MathSciNet  MATH  Google Scholar 

  50. Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y Fitnets (2015) Hints for thin deep nets. In: Proceedings of international conference on learning representations

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergey V. Sukhov.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 102 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Akhlaghi, M.I., Sukhov, S.V. Knowledge Fusion in Feedforward Artificial Neural Networks. Neural Process Lett 48, 257–272 (2018). https://doi.org/10.1007/s11063-017-9712-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-017-9712-5

Keywords

Navigation