Skip to main content

Comparing the performance of Hebbian against backpropagation learning using convolutional neural networks

Abstract

In this paper, we investigate Hebbian learning strategies applied to Convolutional Neural Network (CNN) training. We consider two unsupervised learning approaches, Hebbian Winner-Takes-All (HWTA), and Hebbian Principal Component Analysis (HPCA). The Hebbian learning rules are used to train the layers of a CNN in order to extract features that are then used for classification, without requiring backpropagation (backprop). Experimental comparisons are made with state-of-the-art unsupervised (but backprop-based) Variational Auto-Encoder (VAE) training. For completeness,we consider two supervised Hebbian learning variants (Supervised Hebbian Classifiers—SHC, and Contrastive Hebbian Learning—CHL), for training the final classification layer, which are compared to Stochastic Gradient Descent training. We also investigate hybrid learning methodologies, where some network layers are trained following the Hebbian approach, and others are trained by backprop. We tested our approaches on MNIST, CIFAR10, and CIFAR100 datasets. Our results suggest that Hebbian learning is generally suitable for training early feature extraction layers, or to retrain higher network layers in fewer training epochs than backprop. Moreover, our experiments show that Hebbian learning outperforms VAE training, with HPCA performing generally better than HWTA.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. The code to reproduce the experiments is available at: github.com/GabrieleLagani/HebbianPCA/tree/hebbpca.

References

  1. Amato G, Carrara F, Falchi F, Gennaro C, Lagani G (2019) Hebbian learning meets deep convolutional neural networks. In: International conference on image analysis and processing. Springer, pp 324–334

  2. Bahroun Y, Soltoggio A (2017) Online representation learning with single and multi-layer hebbian networks for image classification. In: International conference on artificial neural networks. Springer, pp 354–363

  3. Becker S, Plumbley M (1996) Unsupervised neural network learning procedures for feature extraction and classification. Appl Intell 6(3):185–203

    Article  Google Scholar 

  4. Diehl PU, Cook M (2015) Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Front Comput Neurosci 9:99

    Article  Google Scholar 

  5. Ferré P, Mamalet F, Thorpe SJ (2018) Unsupervised feature learning with winner-takes-all based stdp. Front Comput Neurosci 12:24

    Article  Google Scholar 

  6. Földiak P (1989) Adaptive network for optimal linear feature extraction. In: Proceedings of IEEE/INNS international joint conference on neural networks, vol 1, pp 401–405

  7. Grossberg S (1976) Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors. Biol Cybern 23(3):121–134

    MathSciNet  Article  Google Scholar 

  8. Haykin S (2009) Neural networks and learning machines, 3rd edn. Pearson

  9. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  10. He T, Zhang Z, Zhang H, Zhang Z, Xie J, Li M (2019) Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 558–567

  11. Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2016) beta-vae: Learning basic visual concepts with a constrained variational framework

  12. Hyvarinen A, Karhunen J, Oja E (2002) Independent component analysis. Stud Inf Control 11(2):205–207

    Google Scholar 

  13. Karhunen J, Joutsensalo J (1995) Generalizations of principal component analysis, optimization problems, and neural networks. Neural Netw 8(4):549–562

    Article  Google Scholar 

  14. Kingma DP, Welling M (2013) Auto-encoding variational bayes

  15. Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69

    MathSciNet  Article  Google Scholar 

  16. Kolda TG, Lewis RM, Torczon V (2003) Optimization by direct search: new perspectives on some classical and modern methods. SIAM Rev 45(3):385–482

    MathSciNet  Article  Google Scholar 

  17. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images

  18. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems

  19. Lagani G (2019) Hebbian learning algorithms for training convolutional neural networks. Master’s thesis, School of Engineering, University of Pisa, Italy. https://etd.adm.unipi.it/theses/available/etd-03292019-220853/

  20. LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  21. Miconi T, Clune J, Stanley KO (2018) Differentiable plasticity: training plastic neural networks with backpropagation

  22. Movellan JR (1991) Contrastive hebbian learning in the continuous hopfield model. In: Connectionist models. Elsevier, pp 10–17

  23. Olshausen BA (1996) Learning linear, sparse, factorial codes

  24. Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607

    Article  Google Scholar 

  25. O’Reilly RC (1996) Biologically plausible error-driven learning using local activation differences: the generalized recirculation algorithm. Neural Comput 8(5):895–938

    Article  Google Scholar 

  26. O’reilly RC (2001) Generalization in interactive networks: the benefits of inhibitory competition and hebbian learning. Neural Comput 13(6):1199–1241

    Article  Google Scholar 

  27. O’Reilly RC, Munakata Y (2000) Computational explorations in cognitive neuroscience: Understanding the mind by simulating the brain. MIT Press

  28. Pehlevan C, Chklovskii DB (2015) Optimization theory of hebbian/anti-hebbian networks for pca and whitening. In: 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, pp 1458–1465

  29. Pehlevan C, Hu T, Chklovskii DB (2015) A hebbian/anti-hebbian neural network for linear subspace learning: A derivation from multidimensional scaling of streaming data. Neural Comput 27(7):1461–1495

    MathSciNet  Article  Google Scholar 

  30. Ponulak F (2005) Resume-new supervised learning method for spiking neural networks. technical report. In: Institute of control and information engineering, Poznan University of Technology

  31. Rozell CJ, Johnson DH, Baraniuk RG, Olshausen BA (2008) Sparse coding via thresholding and local competition in neural circuits. Neural Comput 20(10):2526–2563

    MathSciNet  Article  Google Scholar 

  32. Rumelhart DE, Zipser D (1985) Feature discovery by competitive learning. Cogn Sci 9(1):75–112

    Article  Google Scholar 

  33. Sanger TD (1989) Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Netw 2(6):459–473

    Article  Google Scholar 

  34. Shrestha A, Ahmed K, Wang Y, Qiu Q (2017) Stable spike-timing dependent plasticity rule for multilayer unsupervised and supervised learning. In: International joint conference on neural networks (IJCNN). IEEE, pp 1999–2006

  35. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484

    Article  Google Scholar 

  36. Wadhwa A, Madhow U (2016) Bottom-up deep learning using the hebbian principle

  37. Wadhwa A, Madhow U (2016) Learning sparse, distributed representations using the hebbian principle

  38. Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487

  39. Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks?

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriele Lagani.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was partially supported by the H2020 project AI4EU under GA 825619 and by the H2020 project AI4Media under GA 951911.

Appendices

Appendix

Appendix 1: Supplementary results

In this Appendix, we present the additional results on MNIST, CIFAR10, and CIFAR100 datasets. Tables 7, 9, 11, show the results of hybrid training, in which part of the network layers are trained by supervised backprop training, and part with the Hebbian approach. Tables 8, 10, 12, show the results of SHC and CHL classifiers, compared with SGD classifiers, trained on the features extracted from the various layers of a pre-trained network.

MNIST

Hybrid network models

In Table 7, we report the results obtained on the MNIST test set with hybrid networks. In each row, we reported the results for a network with a different combination of Hebbian and backprop layers (the first row below the header represent the network fully trained with backprop). We used the letter “H” to denote layers trained using the Hebbian approach, and the letter “B” for layers trained using backprop. The letter “G” is used for the final classifier (corresponding to the sixth layer) trained with gradient descent. The final classifier (corresponding to the sixth layer) was trained with SGD in all the cases, in order to make comparisons on equal footings. The last two columns show the resulting accuracy obtained with the corresponding combination of layers.

Table 7 MNIST accuracy (top-1) and 95% confidence intervals of hybrid network models

Table 7 allows us to understand what is the effect of switching a specific layer (or group of layers) in a network from backprop to Hebbian training. The first row represents the network fully trained with backprop. In the next rows we can observe the results of a network in which a single layer was switched. Both HPCA and HWTA exhibit comparable results with respect to full backprop training. A result slightly higher than full backprop is observed when layer 5 is replaced, suggesting that some combinations of layers might actually be helpful to increase performance. In the successive rows, more layers are switched from backprop to Hebbian training, and a slight performance drop is observed, but the HPCA approach seems to perform generally better than HWTA when more Hebbian layers are involved. The most prominent difference appears when we finally replace all the network layers with Hebbian equivalent, in which case the HPCA approach shows an increase of more than 2% points over HWTA.

Comparison of SHC and SGD

Table 8 shows a comparison between SHC and SGD classifiers placed on the various layers of a network pre-trained with backprop. The results suggest that SHC is effective in classifying high-level features, achieving comparable accuracy as SGD, but requiring fewer training epochs. On the other hand, SHC is not so effective on lower layer features, although the convergence time is still fast, suggesting that the supervised Hebbian approach benefits from the use of more abstract latent representations. CHL appears to perform comparably to SGD training.

Table 8 MNIST accuracy (top-1), 95% confidence intervals, and convergence epochs of SHC, CHL, and SGD classifiers on top of various network layer features

CIFAR10

Hybrid network models

In Table 9, we report the results obtained on the CIFAR10 test set with hybrid networks. The table, which has the same structure as that of the previous sub-section, allows us to understand what is the effect of switching a specific layer (or group of layers) in a network from backprop to Hebbian training. The first row represents the network fully trained with backprop. In the next rows we can observe the results of a network in which a single layer was switched. Both HPCA, and HWTA exhibit competitive results with respect to full backprop training, when they are used to train the first or the fifth network layer. A small, but more significant drop is observed when inner layers are switched from backprop to Hebbian. In the successive rows, more layers are switched from backprop to Hebbian training, and a higher performance drop is observed, but the HPCA approach seems to perform better than HWTA when more Hebbian layers are involved. The most prominent difference appears when we finally replace all the deep network layers with Hebbian equivalent, in which case the HPCA approach shows an increase of 15% points over HWTA.

Table 9 CIFAR10 accuracy (top-1) and 95% confidence intervals of hybrid network models

Comparison of SHC and SGD

Table 10 shows a comparison between SHC, CHL, and SGD classifiers placed on the various layers of a network pre-trained with backprop. The results suggest that SHC is effective in classifying high-level features, achieving comparable accuracy as SGD, but requiring fewer training epochs. On the other hand, SHC is not so effective on lower layer features, although the convergence time is still fast, suggesting that the supervised Hebbian approach benefits from the use of more abstract latent representations. CHL appears to perform comparably to SGD training.

Table 10 CIFAR10 accuracy (top-1), 95% confidence intervals, and convergence epochs of SHC, CHL, and SGD classifiers on top of various network layer features

CIFAR100

Hybrid network models

In Table 11, we report the results obtained on the CIFAR100 test set with hybrid networks. The table, which has the same structure as those of the previous sub-sections, allows us to understand what is the effect of switching a specific layer (or group of layers) in a network from backprop to Hebbian training. The first row represents our network fully trained with backprop. In the next rows we can observe the results of a network in which a single layer was switched. HWTA exhibits competitive results with respect to full backprop when it is used to train the first or the fifth network layer. A small, but more significant drop is observed when inner layers are switched from backprop to HWTA. On the other hand, the HPCA approach seems to perform generally better than HWTA. In particular, it slightly outperforms full backprop (by 2% points), when used to train the fifth network layer, suggesting that this kind of hybrid combinations might be useful when more complex tasks are involved. In the successive rows, more layers are switched from backprop to Hebbian training, and a higher performance drop is observed, but still, the HPCA approach exhibits a better behavior than HWTA. The most prominent difference appears when we finally replace all the network layers with Hebbian equivalent, in which case the HPCA approach shows an increase of 22% points over HWTA.

Table 11 CIFAR100 accuracy (top-5) and 95% confidence intervals of hybrid network models

Comparison of SHC and SGD

Table 12 shows a comparison between SHC, CHL, and SGD classifiers placed on the various layers of a network pre-trained with backprop. In this case, SHC achieves comparable accuracy as SGD (even with a slight improvement of 6% points on layer 3), but requiring fewer training epochs, suggesting that the approach might be especially useful when more complex tasks are involved. On the other hand, in this case, lower performance is observed when CHL is used, suggesting that this approach has more difficulties in scaling to more complex datasets.

Table 12 CIFAR100 accuracy (top-5), 95% confidence intervals, and convergence epochs of SHC, CHL, and SGD classifiers on top of various network layer features

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lagani, G., Falchi, F., Gennaro, C. et al. Comparing the performance of Hebbian against backpropagation learning using convolutional neural networks. Neural Comput & Applic 34, 6503–6519 (2022). https://doi.org/10.1007/s00521-021-06701-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06701-4

Keywords

  • Hebbian learning
  • Deep learning
  • Neural networks
  • Biologically inspired