1 Introduction

Image classification is a primary computer vision task deployed in many industrial systems, such as healthcare or manufacturing systems. It is usually solved with conventional artificial intelligence (AI) algorithms [1], such as convolutional neural networks (CNNs) trained with gradient back-propagation learning algorithms. However, with the emergence of edge cameras that require real-time secured image processing, like surveillance systems, autonomous cars, or agricultural monitoring systems [2,3,4], there is a need to process information and bring AI at the edge. CNN models are not compatible with edge constraints in terms of memory, bandwidth, and energy consumption. In comparison, neuromorphic computing systems, with their low power computing and in-memory processing, propose solutions suitable for AI at the edge [5].

Neuromorphic computing, such as spiking neural networks (SNNs) and oscillatory neural networks (ONNS), are brain-inspired paradigms that emulate biological neural network functions for fast learning capability via plasticity and low-power computation for edge devices. SNNs [6,7,8] are neuromorphic algorithms taking inspiration from spike signals transmitting time-based information in the brain. Using spikes reduces the mean voltage amplitude and so the energy consumption of the system. More recently, ONNs appear as an alternative computing paradigm for energy-efficient computation [9]. In hardware, ONNs are implemented as analog-based neural networks using oscillators to emulate oscillatory neurons coupled with analog components, e.g., resistors or capacitors to emulate synaptic coupling [10,11,12,13,14,15,16,17]. Unlike SNNs, ONNs encode information in the phase relationship among oscillators and compute based on the phase dynamics of coupled oscillators. Phase-based computing allows for low voltage amplitude computing, which ultimately reduces the energy consumption [9].

Using a single-layer recurrent architecture, ONNs are shown to solve auto-associative memory (AAM) tasks [15, 17, 18], like Hopfield neural networks (HNNs) [19, 20]. AAM networks can learn patterns and retrieve them from a corrupted input. AAM tasks are mainly solved using unsupervised learning algorithms [21], such as Hebbian [22], Storkey [23], and Pseudo-Inverse [21]. In comparison, image classifications are mainly treated with supervised learning algorithms. State-of-the-art processing for image classifications is typically based on multi-layer neural networks trained with back-propagation algorithms [1, 24, 25].

Multiple benchmarks and datasets exist to evaluate models on image classification. The main ones include MNIST [26], ImageNet [27], and CIFAR-10. MNIST contains grayscale 28 × 28 labeled images of handwritten digits, while ImageNet and CIFAR-10 classify objects using more complex colored images. They are all used to a large extent for assessing AI-model performances. Even though image processing and AAM are two different tasks, authors in [28] adapted HNN to solve a simplified MNIST classification task using the unsupervised Storkey rule. They obtain 61.5% precision using the Storkey learning algorithm, while typically CNNs achieve around 99% on the standard MNIST classification task [1]. More recently, authors in [29] adapted the contrastive Hebbian rule (CHR) [30] to perform supervised learning with energy-based recurrent neural network (RNN) models, using the so-called equilibrium propagation (EP) learning algorithm.

While previous works to solve image classification with AAM networks have been mainly focused on HNNs, in this work, we investigate for the first time how to perform MNIST classification tasks with ONNs. To do so, we first start extending the classification method developed by [28] on HNN to ONNs. Then, we study the classification of simplified black and white 10 × 10 MNIST images with HNN and ONN trained with both unsupervised and supervised learning algorithms. We start by comparing the precision results obtained with Hebbian, Storkey, and pseudo-inverse unsupervised learning algorithms. Then, similar to state-of-the-art methods that apply supervised learning algorithms for classification, we adapt the EP learning algorithm to single-layer AAM networks (AAM-EP). We test and study two approaches using AAM-EP to train HNN and ONN for the simplified MNIST classification task. Our first approach consists of utilizing AAM-EP with random weights at the start. Next, we explore the use of AAM-EP on AAM networks pre-trained with unsupervised learning to fine-tune weight values and assess whether AAM-EP can increase the accuracy of pre-trained AAM networks. We evaluate the precision on simplified MNIST test sets with HNN simulated with MATLAB, and digital ONN design implemented on FPGA [31].

Note, we consider only a simplified 10 × 10 MNIST dataset for image classification with HNN and ONN due to networks and implementation constraints. First, HNN is limited to bipolar values {− 1,1}, and thus, it can consider only black and white images. Also, up to our knowledge, there is no efficient training algorithm for ONN to settle in continuous phase. Usual ONN training algorithms are HNN unsupervised training algorithms constraining training patterns to binary values, and so constraining ONN output to binary phases {0°, 180°}. EP learning algorithms also considered in this work, should in principle be suitable for continuous ONN outputs. However, we believe it requires further investigations, and it is out of the scope of the paper. Also, our current ONN implementation limits ONN size to a hundred neurons. Thus, with our current ONN implementation, the MNIST dataset is the most suitable for image classification with HNN and ONN. Considering additional image datasets necessitates other ONN implementations and adaptations.

Our contributions can be summarized as: (1) we perform a first study to explore MNIST classification task with ONNs. We adapt the method introduced by [28] for HNNs to solve handwritten digits classification task with ONN. (2) We study precision performance of HNN and ONN using unsupervised learning algorithms. And (3) We apply and adjust the supervised learning Equilibrium Propagation (EP) algorithm for single-layer energy-based AAM, proposing the AAM-EP, on both HNN and ONN to solve the classification of a simplified black and white 10 × 10 MNIST set.

The rest of this article is organized as follows. In Sect. 2, we detail HNN and ONN computing paradigms and describe the AAM computing principle with unsupervised learning algorithms. In Sect. 3, we present methods used to solve the simplified MNIST classification task using AAM networks. In Sect. 4, we explain the various training methods we apply to solve the MNIST classification task using HNN and ONN, including unsupervised and supervised learning. Next, in Sect. 5, we highlight and compare precision obtained with the various training methods using both HNN and ONN. Finally, we discuss the obtained results and future work in Sect. 6.

2 Auto-associative memory (AAM) neural networks

AAM models can learn patterns and retrieve them from a corrupted input. The most common model used for the AAM task is the HNN, introduced by Hopfield in 1982 [19]. More recently, other analog-based paradigms that show AAM properties have emerged, such as ONN. This part describes the two HNN and ONN models before giving more details on the AAM task, the existing learning algorithms, and how ONN and HNN can solve AAMs.

2.1 Hopfield neural networks (HNNs)

HNNs are single-layer fully connected RNNs, where each neuron is connected to other neurons by bidirectional synaptic weights, see Fig. 1. Synaptic weight values are determined during the training step. During inference, neurons are initialized with input information, they evolve in time following a propagation and an activation function until each neuron reaches stabilization. The final stable state corresponds to the output information. Neurons evolution can be seen as the minimization of an energy function; also, HNNs are labeled as energy-based models. The propagation function of each neuron i follows:

$${{h}}_{{i}}=\sum_{{j}}{{W}}_{{ij}}*{\sigma }_{{j}} $$
(1)

where \({W}_{ij}\) is the synaptic weight between neuron \(i\) and neuron \(j\), and \({\sigma }_{j}\) is the activation value of neuron \(j\).

Fig. 1
figure 1

Auto-associative memory computation with HNN and ONN

The new activation of neuron i is then calculated with:

$${{\sigma }}_{i} = \left\{ {\begin{array}{*{20}l} { - 1} \hfill & {{\text{if}}~\;h_{i} < 0} \hfill \\ {\sigma _{i} } \hfill & {{\text{if}}~\;h_{i} ~ = ~0} \hfill \\ { + 1} \hfill & {{\text{~if}}\;~h_{i} ~ > ~0} \hfill \\ \end{array} } \right.$$
(2)

2.2 Oscillatory neural networks (ONNs)

ONNs are brain-inspired networks of coupled-oscillators, where each neuron is an oscillator coupled with synapses. ONN implementations are diverse, using not only CMOS-based analog devices [32], but also emerging low-power devices [33, 34], or digital programmable logic [31, 35] to emulate ONN neurons and synapses. In this work, we perform simulations with a fully digital ONN design [31] made of phase-controlled digital oscillators, and 5-bit signed register synapses.

There also exist multiple architectures, and computing methods using ONN. In this work, we only consider fully connected recurrent architecture with phase computation, see Fig. 1, as it can compute AAM tasks [18]. We encode data in phase relationships between oscillators. For example, for bipolar values, considering a reference oscillator with phase 0° ([+ 1]), another oscillator with 180° phase shift from the reference oscillator would represent the opposite bipolar value ([− 1]). Similarly, an oscillator with 0° phase shift represents the same bipolar reference value ([+ 1]). Phase computation allows to reduce signal amplitude and consequently the energy consumption of the system [9].

ONN computes based on the dynamics of the coupled oscillators. Couplings between oscillators are set during the learning step and depending on the task. Next, inference starts with oscillators' phase initialization depending on the input information. Then, ONN phases evolve in time due to the natural dynamic of the oscillators and stabilize to a final phase value. The final phase value is then measured to obtain the ONN output information. This computing paradigm can be associated with the minimization of an energy function (as in HNNs); thus, ONN is also an energy-based computing model. Such property makes ONN suitable to solve AAM tasks.

2.3 Auto-associative memory (AAM) task and learning

AAM tasks involve memorization of patterns and retrieval of the memorized patterns from the corrupted input information. AAM processes in two main steps. The first is the learning step to memorize patterns in the network, and the second is the inference step to retrieve one of the memorized patterns from a corrupted input, see Fig. 1. Considering HNN or ONN, we associate each image pixel to a neuron, and the color of the pixels to the neuron value, either the binary value for HNN, or the phase value for ONN. AAM tasks are usually solved using unsupervised learning algorithms. Unsupervised learning algorithms differentiate from supervised learning algorithms as they only use data to compute, without any external interaction to evaluate it. ONN-oriented rules were introduced recently, especially for ONNs [36]. However, other unsupervised learning rules designed for HNN are also compatible with ONN. In order to make a meaningful comparison between HNN and ONN precision, we only consider learning algorithms compatible with both models. Unsupervised learning algorithms are categorized into biologically plausible algorithms following three main criteria. First, the locality which induces that the weights update only depends on the activation from neurons in both sides of the synapse. Second, the incrementality that indicates if a network can learn patterns one by one or if it needs to learn all patterns together. Third the weight symmetry that states that as in human brain, weights are not symmetric. Yet, for both HNN and ONN, the synapse between neuron i and j and the synapse between neuron j QUOTE and i are the same, so weights are symmetric. Even though this makes HNN and ONN non-biologically plausible paradigms, weight symmetry is necessary for HNN and ONN learning. The first learning algorithm we use is the Hebbian learning rule [22]. Hebbian is local and incremental and has symmetric weights. The principle is based on the biological rule saying “neurons that fire together wire together.” Synaptic connection \({W}_{ij}\) between neuron\(i\) and \(j\) with the same value is calculated for \(k\) patterns \({\zeta }^{k}\) as:

$${{W}}_{{ij}}= \sum_{{k}}\frac{1}{{N}}{\xi }_{{i}}^{{k}}{\xi }_{{j}}^{{k}}$$
(3)

Hebbian has good AAM properties, and however, its capacity is limited [21], meaning it can not memorize and retrieve patterns correctly for a numerous number of patterns. Additional learning rules with higher storage capacity also exist, such as the Storkey learning rule [23] and the pseudo-inverse learning rule [21]. Storkey is also local, incremental, with symmetric weights and is similar to Hebbian as:

$${{W}}_{{ij}}= \sum_{{k}}\frac{1}{{N}}\left({\xi }_{{i}}^{{k}}{\xi }_{{j}}^{{k}}-\frac{1}{{N}}{\xi }_{{i}}^{ }{{h}}_{{ij}}^{ }-\frac{1}{{N}}{{h}}_{{ij}}^{ }{\xi }_{{j}}^{ }\right)$$
(4)

with \({h}_{ij}\):

$${{h}}_{{ij}}=\sum_{{j}}{{W}}_{{ij}}{\xi }_{{j}}^{ }$$
(5)

In the contrary, the pseudo-inverse learning rule is neither local, nor incremental, but has symmetric weights. Thus, it is not biologically plausible. The pseudo-inverse learning rule is as:

$${W}={N}*\xi *{pinv}(\xi )$$
(6)

The biological plausibility is useful for online learning applications. Though, in this paper, we process weights offline using MATLAB software, before using them in the HNN MATLAB model or in the digital ONN. In the digital ONN design, we normalize weights in 5-bits signed format. We compare the three above-mentioned learning algorithms with HNN and ONN on the simplified MNIST set.

3 MNIST classification with AAM networks

It is important to state that AAM and image classification are two distinctive tasks. Thus, one needs to adapt AAMs to perform classification. In [28], authors propose a solution to apply HNN to classify images of MNIST handwritten digits when trained with Storkey learning rule. In this work, inspired by [28], we replicate their method to evaluate and compare precision of HNN and ONN on a simplified MNIST set for different unsupervised learning configurations. In this section, we first present the MNIST set and the simplified version we use in this work before describing methods to classify the simplified MNIST set using HNN and ONN.

3.1 MNIST database

The MNIST set was created to assess neural networks performances on image classification task. It contains 28 × 28 Gy scale labeled images of handwritten digits, from 0 to 9. It is organized in two sets, a training set with 60,000 images, and a test set with 10,000 images. State-of-the-art solves the MNIST classification problem with CNN models. They can achieve more than 99% of accuracy when trained by supervised back-propagation algorithms [1].

3.2 MNIST dataset preparation

In this work, we employ a simplified MNIST set containing the same number of training and test images, where each image is pre-processed to be transformed in a 10 × 10 black and white image in order to be compatible with our digital ONN design, see Fig. 2. We focus on a 10 × 10 format because the digital ONN design is limited in size and resources. The transformation of each image follows three steps. First, each 28 × 28 image is cropped to 20 × 20 removing the black background to reduce similarities among images. Then, the 20 × 20 image is resized to a 10 × 10 by taking average values of 2 × 2 neighbor pixels. Finally, we binarize each image into black and white using a threshold. MNIST grayscale pixels take values in the range of [0,255], thus, for example, if we use a threshold of 128, a pixel value under 128 will become black, otherwise white. We study different binarization thresholds and their influence on the simplified MNIST classification task in Sect. 3.4.

Fig. 2
figure 2

Simplified MNIST classification image pre-processing method

3.3 Image classification with AAM

There are some clear distinctions between AAM and classification tasks. On one hand, classification problems associate input information to output classes; hence, input and output can have different dimensions. On the other hand, AAM tasks associate a corrupted input to a clean memorized output, where both have same dimensions. To solve MNIST classification problem using AAM, authors in [28] propose to train an HNN network with one pattern per label, so one image per digit. Thus, HNN and ONN inference will stabilize to one of the training patterns corresponding to a digit label class, equivalent to the MNIST classification task, see Fig. 3.

Fig. 3
figure 3

Difference between AAM and image classification tasks, and the adaptation of AAM network for image classification task

3.4 Training patterns for MNIST classification

In this work, we use HNN and ONN configured for pattern recognition task, while MNIST is a classification task. Thus, we need to adapt the MNIST classification dataset to be solved as a pattern recognition task. MNIST classification with AAM starts by the training step to configure the weights, and so the choice of the training patterns. The training pattern choice is a key for high precision. Each training pattern must be the best representation of its digit, such that each image of that digit from MNIST set will stabilize to that training pattern. Authors in [28] propose to perform a mean on each grayscale image with same label from the MNIST training set of 60,000 images, and we re-use the same method. We define 10 training patterns corresponding to the 10 digits which will be learnt as stable points for both HNN and ONN networks. The 10 training patterns are created from the 60,000 training images. We group the training images by digits and compute a mean image for each digit such that we obtain ten 28 × 28 Gy scale images representing digits between 0 and 9. After, we apply the pre-processing on each training pattern to obtain ten 10 × 10 black and white images, with one image per digit being the training pattern associated with the corresponding digit, see Fig. 4.

Fig. 4
figure 4

Process to define training patterns from MNIST training set

Pre-processing binarizes each training pattern to be converted into black and white. Binarization threshold determines the number of black or white pixels in each image. In AAM tasks, having uncorrelated training patterns is also a key for high precision. The more patterns are correlated, the hardest it is to dissociate them. The correlation of patterns is evaluated by the Hamming Distance (HD) metric d, which calculates the number of different neuron activation between two patterns. Grayscale in MNIST images is encoded between 0 and 1. Thus, as in [28], we study the ideal threshold to have the largest HD between training patterns. Figure 5a shows the average HD \({{d}}_{{\mathrm{avg}}}\), and the minimum HD \({{d}}_{{\min}}\) in-between the 10 training patterns depending on the binarization threshold \(\theta\). It highlights that HD is maximal for \(\theta =0.3\), meaning training patterns are less correlated. Figure 5b depicts the HD between the 10 patterns generated after pre-processing with \(\theta =0.3\)\(\theta =0.3\), and Fig. 5c prints the resulted 10 training patterns.

Fig. 5
figure 5

a Average and mean hamming distance (HD) between training patterns for various binarization threshold β, b HD between training patterns binarized with β = 0.3, and c training patterns with β = 0.3

4 AAM training for MNIST classification

Once we have training patterns, we need to define synaptic weights using some learning algorithms. Usual image classification tasks are trained based on supervised learning algorithms. However, AAM tasks use unsupervised learning algorithms. In Sect. 4.1, we describe how we use various unsupervised learning algorithms to train HNN and ONN for classification. Then, Sect. 4.2 proposes to adapt the supervised EP algorithm to train HNN and ONN, with the AAM-EP. With supervised AAM-EP, we study first training networks from random weights initialization, and then starting from a pre-trained network using weights generated by unsupervised learning algorithms, see Fig. 6.

Fig. 6
figure 6

Study of various options to perform MNIST classification with HNN and ONN

4.1 Unsupervised learning

Authors in [28] use the Storkey unsupervised algorithm after stating Hebbian does not have enough memory capacity. Here, we study three training algorithm Hebbian, Storkey, and Pseudo-inverse on HNN. Note, incremental learning algorithms can be sensitive to iterative learning. We call iterative learning the possibility to learn same patterns multiple times. Thus, we study the impact of iterative learning on the Hebbian and Storkey learning algorithms, as Pseudo-inverse is not incremental.

From HNN MATLAB simulations, we extract the weights giving best HNN precision and integrate them inside the digital ONN design to compare ONN precision with HNN precision on the simplified MNIST classification task. Chosen weights are normalized into a 5-bits signed representation to be compatible with digital ONN design. We simulate the digital ONN design using the Vivado design tool with xc7z020-1clg400c FPGA target.

We evaluate precision from inference results on the MNIST test set of 10,000 images. Inference starts by initializing neurons with one of the test set images. Then, the network evolves until stabilization. The stable state is then compared to the 10 training patterns to define the output class. Exact match between ONN output and corresponding training pattern is considered a correct classification. MNIST results are evaluated through four metrics for HNN and ONN. First, we evaluate the precision representing the percentage of tested images which stabilize to the correct training pattern. Then, we compute the true negative metric, that counts the number of images which stabilize to one of the training patterns but not the expected one. We also add the percentage of spurious outputs incorporating images that stabilize to none of the training patterns, but to another non-memorized image. Finally, for the ONN, an additional metric is necessary to highlight the percentage of images that never stabilize to an output. We call them inconsistent images. In Sect. 5.1, we report results obtained on the 10,000 images from the simplified MNIST test set with HNN and ONN trained with unsupervised learning.

4.2 Supervised equilibrium propagation

EP is a supervised learning algorithm proposed in [29] for energy-based RNN models. In [29], authors demonstrate EP efficiency to solve MNIST classification task using multi-layer energy-based continuous RNNs. HNNs and ONNs are also energy-based RNNs, yet made with a single-layer architecture of non-continuous neurons. Thus, in this paper, we propose to adapt the EP algorithm to energy-based single-layer AAM networks. First, Sect. 4.2.1 describes the EP algorithm from [29]. Then, Sect. 4.2.2 explains how we adapted EP to AAM single-layer networks creating an AAM-EP learning.

4.2.1 Equilibrium propagation (EP)

EP algorithm was introduced by [29] to perform supervised learning on energy-based RNN models. It is inspired by the Contrastive Hebbian Rule (CHR) algorithm [30] proposed to solve supervised problems with multi-layer continuous Hopfield networks, a specific type of energy-based RNN. The common algorithm to solve supervised problems with multi-layer RNN models is the Back-Propagation Through Time (BPTT) [37]. Even though it is efficient and gives really high precision, it requires a large amount of computational resources. Thus, authors in [29] proposed a new solution which requires less computation to solve supervised problems with energy-based RNN models. EP computes the gradient of an objective function, similar to the Hopfield energy function, that propagates in the layers. This gradient back-propagation is transparent in the weight update algorithm, and the final weight update equation is intuitive and simple to apply. Note, authors in [38] showed that EP and BPTT have similar gradient updates in an RNN, so achieving similar precision. The EP algorithm defines two learning phases to update the weights. The first phase, called the free phase, clamps the input information to the input layer and waits until neurons of the following layers stabilize. Then, EP performs a second phase, called the weakly clamped phase, where input neurons and output neurons are clamped with the input information associated with the expected output information. In [29], they show that the signal back-propagated during the weakly clamped phase corresponds to the derivative error of their objective function, and they define the following weight update algorithm:

  1. 1.

    Clamp input information at the input neurons and let the network evolve until all neurons from hidden and output layers stabilize.

  2. 2.

    Save the stable state \({\sigma }_{{i}}^{0}\) of each neuron \({i}\).

  3. 3.

    Weakly clamp with \(\beta \) the expected output information at the output neurons and let the network evolve until all neurons from hidden layers stabilize.

  4. 4.

    Save the new stable state \({\sigma }_{i}^{\beta }\) of each neuron \(i\).

  5. 5.

    Update weight \({w}_{ij}\) between neuron \(i\) and neuron \(j\) as:

    $${{W}}_{{ij}}= {{W}}_{{ij}}+\Delta {{W}}_{{ij}}$$
    (7)

    with

    $$\Delta {{W}}_{{ij}}=\frac{1}{\beta }({\sigma }_{{i}}^{\beta }{\sigma }_{{j}}^{\beta }-{\sigma }_{{i}}^{0}{\sigma }_{{j}}^{0})$$
    (8)

Using \(\beta =1\), in [29], authors show that RNNs with 1, 2, or 3 hidden layers can solve the MNIST classification problem and reach more than 95% of precision. Eventually, EP is a low-computation, energy-based learning algorithm expected to fit with analog computing paradigms. Thus, it makes it attractive for ONN application. However, EP cannot be applied directly to HNN or ONN as they are single-layer energy-based models.

4.2.2 EP for single-layer energy-based AAM

In this paper, we consider single-layer energy-based AAM models with HNN and ONN and we adapt EP to HNN and ONN. Note, recently, authors in [39] also proposed to use EP for pattern recognition with ONN. However, we apply it with different methods. In [39], authors apply supplementary neurons to clamp training patterns during clamping phase, while in this work, we do not require additional neurons. Also, in [39], authors perform phase dynamics simulations of memristor-based ONNs to validate their method, while here we perform simulations of a digital ONN design. This Section explains how we adapt EP to fit AAM, introducing AAM-EP and we present methods and parameters used in this work when training HNN and ONN with AAM-EP.

EP is a supervised algorithm, meaning it needs labeled data and interaction to define synaptic weights. As we only have a single-layer network, the MNIST set needs to be modified to associate each input image to the correct output image with same dimension. We consider the previous training patterns defined for unsupervised learning as the corresponding digit labels. Note, each input/output pair of images are converted by pre-processing to 10 × 10 black and white images. So, first we define the expected outputs and replace each label from MNIST training and test sets with the corresponding training pattern. Then, AAM-EP uses these new input/output pairs from MNIST training set to clamp input and output during the two training phases. After, during inference and precision evaluation, we compare the output given by the network with the expected training pattern using the MNIST test set.

EP learning is a two-phase algorithm, with a first free phase with input neurons clamped, and a second weakly clamped phase with additional weakly clamped output with factor. The clamping principle works well for a multi-layer network with at least one hidden layer [29]. However, in HNN or ONN cases, input and output neurons are the same, and there is no hidden layers. Thus, we adapt the EP algorithm for AAM as the AAM-EP following:

  1. 1.

    Clamp the input image in the network; Consider \({\sigma }_{{i}}^{0}\) as activation of neuron \({i}\) from the input image.

  2. 2.

    Clamp the expected output image with \(\beta =1\) in the network; Consider \({\sigma }_{{i}}^{\beta }\) as activation of neuron \(i\) from the expected output image.

  3. 3.

    Use the two activation states to update weight between neuron \({i}\) and \({j}\) as \({{W}}_{{ij}} = {{W}}_{{ij}} +{ \alpha \Delta }{{W}}_{{ij}}\) with:

    $$\Delta {{W}}_{{ij}}={\sigma }_{{i}}^{\beta }{\sigma }_{{j}}^{\beta }-{\sigma }_{{i}}^{0}{\sigma }_{{j}}^{0}$$
    (9)

Note, we remove the factor \(1/\beta \) because we use \(\beta =1\). Also note, we add a learning rate factor α in order to regulate the weight update for each training iteration (each image). During tests, we consider various range of learning rates between 0.0001 and 1. Initialization of the weights is also important to achieve high precision. In this work, we first initialize weights randomly, with small values between [− 1; 1]. Especially, we study if AAM-EP can train a single-layer energy-based RNN like HNN or ONN from scratch. After, we initialize AAM networks using weights computed previously with unsupervised learning. With this option, we study if AAM-EP can improve precision of an already trained network. At first, we apply AAM-EP for numerous epochs and observe the HNN precision at each epoch for various learning rates. Each epoch applies a random minibatch of 1000 pre-processed images from the MNIST training set to update the weights. Precision can slightly change from one trial to another as minibatch images are randomly chosen from the full simplified 10 × 10 MNIST training set. The precision for each epoch is computed on the full simplified 10 × 10 MNIST test set. In a second step, we select the best configuration and collect corresponding weights to evaluate the AAM-EP learning algorithm with the digital ONN design. Collected weights are normalized into a 5-bits signed representation compatible with the digital ONN design. We simulate the digital ONN design using the Vivado design tool with xc7z020-1clg400c FPGA target. We compare precision, true negative, spurious, and inconsistent results obtained with HNN and ONN for the same weight values.

5 Results

This section presents results obtained with both HNN and ONN trained with unsupervised and supervised learning algorithms to solve the simplified 10 × 10 MNIST classification task.

5.1 Unsupervised learning

Figure 7a highlights the precision obtained using HNN for multiple training configurations. Configurations include the learning rules, Hebbian, Storkey, or Pseudo-inverse for up to 10 iterations. As expected, Pseudo-inverse is not sensitive to iterative learning as precision does not change depending on the iterations, and Storkey is sensitive. However, we expected Hebbian to be sensitive to iterative learning, but for each iteration precision stays 0%. Figure 7a also shows that the best precision is obtained when training the network with the 10 training patterns using the iterative Storkey learning rule during 5 iterations. However, training HNN with Pseudo-inverse can reach a precision close to the best with 64.4% after a single iteration. We configure the digital ONN design using the synaptic weight values obtained with best configuration, 5 Storkey iterations, to compare ONN precision with HNN precision on the simplified MNIST classification task.

Fig. 7
figure 7

a HNN precision for multiple unsupervised training configurations. Training configuration includes the choice of the learning rule, and the number of iterations performed on the training patterns (each iteration learns the 10 training patterns). b Results of ONN and HNN trained with 5 iterations of the unsupervised Storkey learning rule

Figure 7b shows results of HNN and ONN trained with 5 Storkey iterations. HNN and ONN have similar trends. First, HNN precision and true negative percentages are higher than ONN. The difference is certainly in part due to the normalization of weights into 5-bits signed integers in the digital ONN design. Also, the number of spurious patterns detected with ONN is slightly higher than with HNN, and inconsistent patterns often happen with ONN while never with HNN. Thus, we strongly believe that HNN decides more easily of a stable output, even if it is a true negative, while ONN can hesitate between different outputs and keep bouncing between patterns. Figure 7b also reports the best precision, to the best of our knowledge, obtained with HNN and ONN trained with unsupervised learning algorithms to solve a simplified MNIST classification task. We report on 65.2% precision with HNN, and 59.1% precision with ONN. However, reported precision is lower than state-of-the-art precision of neural network models solving MNIST classification problems with supervised learning, reaching around 99%. Hence, next section presents results obtained using AAM-EP supervised algorithm to train HNN and ONN on our simplified MNIST set to investigate if supervised learning can increase HNN and ONN precision.

5.2 Supervised EP

In this Section, we present results obtained with both HNN and ONN trained with the supervised AAM-EP algorithm to solve the MNIST classification problem.

To begin with, Fig. 8a displays HNN precision evolution for multiple learning rates during 50 epochs, with weights initialized with small random positive values. It shows that for all the tested learning rates, during several epochs, the precision stays 0%. Moreover, Fig. 8b compares results between HNN and ONN, obtained using weights achieved with learning rate \({\alpha}\) = 0.0005 after 10 epochs. Note, we tried various learning rates and epochs and obtained same results. It highlights that for HNN, each image stabilizes to a spurious pattern, while for ONN, neuron states continuously evolve without reaching stabilization. To sum up, using AAM-EP learning from scratch with HNN or ONN, does not result in high precision on the simplified MNIST classification task.

Fig. 8
figure 8

a Results of simplified MNIST classification precisions obtained with HNN, for various learning rates, during numerous epochs, starting with weights initialized randomly. b Results of ONN and HNN trained with AAM-EP algorithm with random weight initialization after 10 epochs

Subsequently, we reproduce the precision tests with weights initialized using unsupervised learning. More precisely, we initialize weights with best-unsupervised learning precision obtained for each unsupervised learning algorithm. That are weights generated with Hebbian after one iteration, with Storkey after 5 iterations, and with Pseudo-inverse after one iteration. Figure 9a shows precision of the HNN trained with AAM-EP during 50 epochs for various learning rates when weights are initialized with Hebbian. It highlights that for weights initialized with Hebbian, the AAM-EP does not modify the network to allow classification of the simplified MNIST task.

Fig. 9
figure 9

Simplified MNIST classification precisions obtained with HNN, for various learning rates, during numerous epochs, starting with weights initialized with a 1 iteration of Hebbian, b 5 iterations of Storkey, and c 1 iteration of Pseudo-inverse

Figure 9b, c shows precision of the HNN trained with AAM-EP during 50 epochs for various learning rates when weights are initialized with 5 iterations of Storkey and Pseudo-inverse, respectively. It illustrates a particular behavior in which precision increases during the first couple of epochs and decreases afterward. The larger the learning rate is, the faster the increase and decrease phenomenon is observed. Considering Pseudo-inverse initialization, the maximum precision is obtained with learning rate \({\alpha}\) = 0.0005, after 9 epochs, for which HNN reaches 66.3% precision. Considering Storkey initialization, the maximum precision is obtained with learning rate \({\alpha}\) = 0.0005, after 5 epochs, and HNN reaches 67.04% precision. For both Pseudo-inverse and Storkey initialization, AAM-EP increases the HNN precision of around to 2%. We assume this phenomenon is due to the weight’s initial values. If the unsupervised learning already set weights to an acceptable network configuration, then the AAM-EP algorithm can slightly help to increase precision up to a certain point after which it modifies the previous configuration and reduces drastically the HNN precision. However, if the weights initialization does not bring the network to an acceptable configuration, such as with Hebbian or random weights initialization, the AAM-EP cannot modify enough the network to reach a good configuration.

We configure the digital ONN with best HNN configuration, that is weights obtained after 5 epochs of AAM-EP with learning rate \(\alpha \) = 0.0005 after initializing weights with Storkey learning rule. Figure 10 plots the precision obtained with both HNN and ONN. Note that obtained precision from Figs. 9b and 10 are different. As we use mini-batch of 1000 randomly chosen images at each epoch, from one run to another, computed weights and obtained precision can be slightly different. Figure 10 shows that for both HNN and ONN, precision increases with the use of the AAM-EP supervised algorithm. Additionally, the number of true negative stays approximately stable for HNN and decreases for ONN, and the number of spurious patterns decreases. Thus, we deduce the AAM-EP algorithm helps to differentiate and reinforce training patterns such that input patterns are better associated with training patterns. For example, spurious patterns often appear to be close to training patterns with some wrong pixels. We believe AAM-EP helps to modify local weights associated with wrong pixels such that HNN and ONN stabilize to correct training patterns.

Fig. 10
figure 10

Results of ONN and HNN trained with AAM-EP algorithm with α = 0.0005 after 5 epochs with weights initialized after 5 iterations of Storkey

HNN and ONN precisions increase of about 2%, and 3.5%, respectively. Table 1 assembles best HNN and ONN precisions of the three learning configurations, unsupervised learning only, supervised learning only, both unsupervised and supervised learning.

Table 1 Best precision results obtained with both HNN MATLAB emulator and ONN digital design for the three training configurations (unsupervised, supervised, unsupervised and supervised)

5.3 Comparison of both unsupervised and supervised methods

In this work, we highlighted that the supervised AAM-EP learning algorithm can help increasing precision of HNN and ONN networks pre-trained with unsupervised learning algorithms for the MNIST classification task. However, using supervised learning can be more demanding in terms of computational efforts.

In this work, training was performed in MATLAB. However, we can derive from results obtained in MATLAB the computational efforts for the different learning algorithms, as well as an estimation of the learning latency if learning was implemented on the digital ONN. We evaluate the computational efforts using the metric of the number of multiply and accumulate operations (\({\text{NMAC}}_{{{\text{OP}}}}\)) required for each learning method. Table 2 shows a comparison of the computational efforts for the various learning methods for a general case, as well as for the MNIST classification application. It shows that the supervised learning algorithm increases drastically the \({\text{NMAC}}_{{{\text{OP}}}}\) per training compared to the unsupervised learning algorithms. Also, considering a system frequency of \({{F}}_{{\mathrm{sys}}}=31.25\;{\mathrm{MHz}}\), and parallelism in the \({\text{NMAC}}_{{{\text{OP}}}}\), Hebbian learning can compute in 1 clock cycle, Storkey, in 15 clock cycles, and Pseudo-inverse in 3 clock cycles, while AAM-EP requires approximately 50 ms to train HNN or ONN for MNIST classification task. Thus, supervised AAM-EP takes longer to compute that other unsupervised learning algorithms. However, depending on the application, using AAM-EP to increase precision while also increasing the computational efforts and latency can be interesting.

Table 2 Computational efforts required for training depending on the learning algorithm for a network of \(N\) neurons, for \(k\) training patterns, after \(it\) iterations or \(\varepsilon \) epochs

6 Discussion

This work analyzes classification of handwritten digits from MNIST set using two AAM networks, HNN and ONN, trained with either unsupervised learning, supervised learning, or both learning strategies. In this section, we first discuss the results obtained with our AAM-EP learning algorithm in comparison with reported results in literature. Then, we highlight advantages and limitations of using AAM networks to solve MNIST classification task. Finally, we expose future explorations.

6.1 Comparison with other models

Results presented in Sect. 5 showed we can train HNN and ONN networks with supervised AAM-EP algorithm. Additionally, the AAM-EP algorithm increases HNN and ONN precision on the simplified MNIST classification task from precision obtained if learning with unsupervised algorithms. We report an HNN maximum precision of 67%, and an ONN maximum precision of 62.5%. In comparison, [28] reported 61.5% HNN precision, while performing classification of a simplified MNIST set with 14 × 14 black and white images, with additional pattern optimization. So, on one hand our AAM-EP solution has, to the best of our knowledge, the highest reported precision of single-layer AAM networks performing classification of handwritten digits from MNIST set. And on the other hand, state-of-the-art multi-layer RNNs trained with EP can solve the complete MNIST classification problem with more than 90% precision, see Table 3. Thus, our proposed AAM-EP learning algorithm improves already known precision of AAM on handwritten digits classification task, but does not overtake multi-layer RNN model precision on the same task.

Table 3 Comparison of various network models trained with EP on the MNIST classification task

However, multi-layer models are often heavy, requiring a lot of resources latency to be trained. Thus, we compare our solution with state-of-the-art models in terms of resources and latency by comparing the number of parameters to tune the network, as well as the number of epochs necessary to obtain the best precision. Table 3 highlights the precision, number of parameters, number of epochs, and architectures of various models trained with EP for MNIST classification. HNN and ONN are single-layer models requiring low number of parameters compared with multi-layer models. In terms of training latency, ONN and HNN reach their maximum precision after less than 10 epochs, while most of the multi-layer networks necessitate more epochs to converge to their maximum precision, between 10 and 140 depending on the model. Thus, ONN can be of interest for applications with restricted amount of resources, but flexibility in the precision.

6.2 Future work

Handwritten digits classification is a benchmark application for neural network models. Here, we explored solutions and learning algorithms to solve handwritten digits classification with single-layer energy-based AAM models. We explore HNN and ONN, as they aim to be suitable for image processing at the edge. However, in this work, we have seen that using supervised and/or unsupervised learning does not perform better than other multi-layer models. A possible improvement is to explore multi-layer associative memory architecture [43] like Heterogeneous Associative Memory (HAM), in order to have a more coherent architecture with classification task [44]. Initializing input layer with input images classified into different categories highlighted by the output layer. HAM networks using perceptron neurons such as the Linear Associative Memory [45], or the Bidirectional Associative Memory [46, 47] could replace the single-layer HNN. And in [44] and [48], authors demonstrated that ONN can work as HAM for edge detection application, so it could also be applied for handwritten digits classification task. The EP algorithm was first introduced as a supervised learning algorithm for energy-based multi-layer RNNs, and thus, we expect it to be compatible with multi-layer HNN and ONN. Also, there are numerous learning algorithms that are compatible with multi-layer networks, which could help increasing HNN and ONN precision [49].

Finally, an extension of this work is to explore the integration of AAM-EP on ONN hardware. In [29], authors precise that the algorithm is compatible with hardware implementation. One could also integrate learning on chip and perform the two learning phases of AAM-EP directly inside the digital ONN design, or even inside an analog ONN design.

7 Conclusion

This work proposes a study of solving classification of handwritten digits images from MNIST dataset using single-layer energy-based AAM networks. In particular, we analyze the behavior of an HNN tested on MATLAB software and a digital ONN design simulated on Vivado software. To classify with single-layer AAM network, we define training patterns for each label. So, after initialization, the network evolves to one of the training patterns corresponding to one label. We evaluate network precision to classify 10 × 10 black and white images of handwritten digits from a simplified MNIST set. We compare and assess precision performances for different learning configurations. First, we compute precision for different unsupervised learning algorithms. We obtain a maximum HNN precision of 65.2%, and a maximum ONN precision of 59.1% when training with 5 iterations of the Storkey algorithm. Then, we evaluate precision using supervised EP algorithm adapted to AAM, the AAM-EP. We highlight that tuning weights of a pre-trained network with AAM-EP, the precision is augmented. For example, using a pre-training configuration with 5 Storkey iterations, precision augments by 2% and 3.5% for HNN and ONN, reaching 67%, and 62.6%, respectively. Thus, our AAM-EP algorithm increases single-layer AAM model precision on the simplified MNIST classification task. Despite the lower obtained precision in comparison with state-of-the-art MNIST classification with multi-layer networks trained with EP, this study explains how to solve supervised problems using AAM networks. More than that, it introduces the use of AAM-EP supervised learning algorithm to help single-layer AAM network to classify images.