Appendix
Appendix 1: Supplementary results
In this Appendix, we present the additional results on MNIST, CIFAR10, and CIFAR100 datasets. Tables 7, 9, 11, show the results of hybrid training, in which part of the network layers are trained by supervised backprop training, and part with the Hebbian approach. Tables 8, 10, 12, show the results of SHC and CHL classifiers, compared with SGD classifiers, trained on the features extracted from the various layers of a pre-trained network.
MNIST
Hybrid network models
In Table 7, we report the results obtained on the MNIST test set with hybrid networks. In each row, we reported the results for a network with a different combination of Hebbian and backprop layers (the first row below the header represent the network fully trained with backprop). We used the letter “H” to denote layers trained using the Hebbian approach, and the letter “B” for layers trained using backprop. The letter “G” is used for the final classifier (corresponding to the sixth layer) trained with gradient descent. The final classifier (corresponding to the sixth layer) was trained with SGD in all the cases, in order to make comparisons on equal footings. The last two columns show the resulting accuracy obtained with the corresponding combination of layers.
Table 7 MNIST accuracy (top-1) and 95% confidence intervals of hybrid network models Table 7 allows us to understand what is the effect of switching a specific layer (or group of layers) in a network from backprop to Hebbian training. The first row represents the network fully trained with backprop. In the next rows we can observe the results of a network in which a single layer was switched. Both HPCA and HWTA exhibit comparable results with respect to full backprop training. A result slightly higher than full backprop is observed when layer 5 is replaced, suggesting that some combinations of layers might actually be helpful to increase performance. In the successive rows, more layers are switched from backprop to Hebbian training, and a slight performance drop is observed, but the HPCA approach seems to perform generally better than HWTA when more Hebbian layers are involved. The most prominent difference appears when we finally replace all the network layers with Hebbian equivalent, in which case the HPCA approach shows an increase of more than 2% points over HWTA.
Comparison of SHC and SGD
Table 8 shows a comparison between SHC and SGD classifiers placed on the various layers of a network pre-trained with backprop. The results suggest that SHC is effective in classifying high-level features, achieving comparable accuracy as SGD, but requiring fewer training epochs. On the other hand, SHC is not so effective on lower layer features, although the convergence time is still fast, suggesting that the supervised Hebbian approach benefits from the use of more abstract latent representations. CHL appears to perform comparably to SGD training.
Table 8 MNIST accuracy (top-1), 95% confidence intervals, and convergence epochs of SHC, CHL, and SGD classifiers on top of various network layer features CIFAR10
Hybrid network models
In Table 9, we report the results obtained on the CIFAR10 test set with hybrid networks. The table, which has the same structure as that of the previous sub-section, allows us to understand what is the effect of switching a specific layer (or group of layers) in a network from backprop to Hebbian training. The first row represents the network fully trained with backprop. In the next rows we can observe the results of a network in which a single layer was switched. Both HPCA, and HWTA exhibit competitive results with respect to full backprop training, when they are used to train the first or the fifth network layer. A small, but more significant drop is observed when inner layers are switched from backprop to Hebbian. In the successive rows, more layers are switched from backprop to Hebbian training, and a higher performance drop is observed, but the HPCA approach seems to perform better than HWTA when more Hebbian layers are involved. The most prominent difference appears when we finally replace all the deep network layers with Hebbian equivalent, in which case the HPCA approach shows an increase of 15% points over HWTA.
Table 9 CIFAR10 accuracy (top-1) and 95% confidence intervals of hybrid network models Comparison of SHC and SGD
Table 10 shows a comparison between SHC, CHL, and SGD classifiers placed on the various layers of a network pre-trained with backprop. The results suggest that SHC is effective in classifying high-level features, achieving comparable accuracy as SGD, but requiring fewer training epochs. On the other hand, SHC is not so effective on lower layer features, although the convergence time is still fast, suggesting that the supervised Hebbian approach benefits from the use of more abstract latent representations. CHL appears to perform comparably to SGD training.
Table 10 CIFAR10 accuracy (top-1), 95% confidence intervals, and convergence epochs of SHC, CHL, and SGD classifiers on top of various network layer features CIFAR100
Hybrid network models
In Table 11, we report the results obtained on the CIFAR100 test set with hybrid networks. The table, which has the same structure as those of the previous sub-sections, allows us to understand what is the effect of switching a specific layer (or group of layers) in a network from backprop to Hebbian training. The first row represents our network fully trained with backprop. In the next rows we can observe the results of a network in which a single layer was switched. HWTA exhibits competitive results with respect to full backprop when it is used to train the first or the fifth network layer. A small, but more significant drop is observed when inner layers are switched from backprop to HWTA. On the other hand, the HPCA approach seems to perform generally better than HWTA. In particular, it slightly outperforms full backprop (by 2% points), when used to train the fifth network layer, suggesting that this kind of hybrid combinations might be useful when more complex tasks are involved. In the successive rows, more layers are switched from backprop to Hebbian training, and a higher performance drop is observed, but still, the HPCA approach exhibits a better behavior than HWTA. The most prominent difference appears when we finally replace all the network layers with Hebbian equivalent, in which case the HPCA approach shows an increase of 22% points over HWTA.
Table 11 CIFAR100 accuracy (top-5) and 95% confidence intervals of hybrid network models Comparison of SHC and SGD
Table 12 shows a comparison between SHC, CHL, and SGD classifiers placed on the various layers of a network pre-trained with backprop. In this case, SHC achieves comparable accuracy as SGD (even with a slight improvement of 6% points on layer 3), but requiring fewer training epochs, suggesting that the approach might be especially useful when more complex tasks are involved. On the other hand, in this case, lower performance is observed when CHL is used, suggesting that this approach has more difficulties in scaling to more complex datasets.
Table 12 CIFAR100 accuracy (top-5), 95% confidence intervals, and convergence epochs of SHC, CHL, and SGD classifiers on top of various network layer features