1 Introduction

Spiking neural networks are representatives of the third type of neural network, where, unlike its predecessor, it models the operation of information flow in the human body in much more detail. The last years brought much research not only about concepts and differences from different mathematical models but also the practical application of these structures [32]. Spiking network types differ from their predecessors mainly in two elements. The first one is to add a concept of time relative to the flow of information between neurons. The second element is to remove the automatic forwarding of information from the neuron. This is solved by analyzing the voltage level in the neuron, where sending information is forced by exceeding a certain threshold value for the neuron. Remodeling and increasing the realism of neural models meant changing the whole operation. And the consequence of this is that the known training algorithms are no longer useful. There are some unsupervised approaches like spike-timing-dependent plasticity (STDP), but it is difficult to propose a supervised one. Lee et al. [17] described a proposition to develop spike-based backpropagation idea when [26] shown a training solution using equilibrium propagation. Again, Saunders et al. [30] described another unsupervised technique for feature training. Spiking networks can be shallow and as deep as their predecessors. Diehl and Cook [9] used a two-layer architecture and obtained accuracy over 90% on image datasets. In contrast, Sengupta et al. [31] analyzed a deep architectures. In both cases, it was tested that shallow and deep structures can obtain a high accuracy level in different classification tasks. The main issue is what kind of data is processed and how large is the database.

The last research brought many different architectures and modifications of these networks like a quantum version described by Kristensen et al. [14]. Developing a new structure is associated with many problems such as routing, sorting, etc. [24] presented the obtained results of research on analytical performance assessment of routing algorithm. The use of real equipment and electrodes can implement very large and deep networks, where the number of spikes can be huge, so sorting them is a very important problem [4]. Balaji et al. [2] presented an alternative way of mapping neural networks to neuromorphic hardware. In their paper, the result was compared with other, existing solutions proposed approach reduce average energy consumption and delay time. The perception of depth in vision is also a complicated problem that was presented by Haessig et al. [11]. A similar idea can be the analysis of knowledge representation that can be very useful for brain-computer interfaces and it was proved by Kumarasinghe et al. [15].

Moreover, Capecci et al. [5] described evolving spiking neural networks for modeling gene interaction networks based on time-series gene expression. Even though the development of these techniques is in the early stages, numerous applications are already found and tested on simulators like hand gesture analysis [6] or discrimination of EMG signals [10]. An interesting idea was proposed by Rongala et al. [28], where the classification problem naturalistic texture stimuli were analyzed using spiking architecture. A similar, application can be seen in the proposal of Wang et al. [35], which analyze one of the most difficult images/video classification problem—action recognition. The number of problems that can be solved by the application of different neural networks and mathematical models is very large [8, 13, 19] and the possibility of application may improve the quality of human lives [20]. Moreover, the various aspects of neural models are used in other approaches like medicine [23], or remote sensing [36].

The very large development of spiking neural networks indicates the great practical potential and much more accurate modeling of information flow by nerve cells in living organisms. It is worth noting that such systems and architectures have a huge number of variables, the values of which are quite often chosen empirically based on conducted experiments. The automatic method can help in decreasing the time to develop the best neural structure. Based on these observations, we noticed that the selection of parameters can be reduced to an optimization problem. Particularly, it is important because of the huge development of heuristic techniques inspired by nature. One of the most popular heuristics is the cuckoo search algorithm described by Yang and Deb [39], where the main idea is based on the behavior of cuckoos during tossing their eggs to other nests. In the last two years, there was modeled the behavior of whales [21], polar bears [27], grasshopper [29], or salp swarms [33]. Moreover, optimization methods founds in many areas of practical implementation, for instance in plant waste analysis [34] or job scheduling [7]. Theoretical aspects on hyperparameters optimization tools are shown in [38], where the mentioned meta-heuristic was found to be a good solution for finding parameters in classic machine learning tools like an artificial neural network, etc. Again, in [18] a new method for the hyperparameter problem was introduced. The author proposed a bandit-based method and analyzed it with Bayesian methods.

In this paper, we propose using heuristic algorithms for searching for the best coefficients for spiking neural networks. The proposed approach is based on the modification of the heuristic operation idea and by proposing a fitness function that can be used for hyperparameters problems with the use of the federated learning method. To summarize, the main contribution is as follows:

  • New mechanism for finding hyperparameters in spiking neural networks,

  • Combining collaborative learning with heuristic operations,

  • The use of multithreading in hybrid (neuro-heuristic) analysis of the coefficients used in the training process,

  • Evaluation of the proposal by five different heuristic algorithms inspired by nature for selecting the best one.

2 Spiking neural network

In this section, we described a mathematical model of spiking neural network and spike-timing-dependent plasticity algorithm which is used for training these structures. The analyzed model was proposed by Diehl and Cook [9].

2.1 Spiking neuron

One of the common known neuron models is called the leaky integrate-and-fire (LIF) model which takes into account the integrative properties of the membrane in the neuron as well as the leakage of current due to passive ion channels. Let us denote the membrane voltage at V. The neuron releases a voltage spike further when the membrane threshold value \(v_{thres}\) is exceeded, beyond which, the membrane value is reset to \(v_{reset}\). After resetting this value, a neuron cannot fire for a limited time. So, the model neuron will have a potential value of membrane in resting state as \(E_{rest}\) and as \(E_{exc}\) and \(E_{inh}\) will be marked the equilibrium potential of excitatory/inhibitory synapses and the conductances as \(g_{exc}\) and \(g_{inh}\). Taking all these values, the voltage for a constant time \(\tau \) can be defined as

$$\begin{aligned} \tau \frac{dV}{dt}=(E_{rest}-V)+g_{exc}(E_{exc}-V)+g_{inh}(E_{inh}-V). \end{aligned}$$
(1)

2.2 Synapse

Synapses are a model of biological connections between neurons and operate on the principle of changing conductivity. Each synapse weights w (called synaptic weight), the value of which is increased when the presynaptic spike arrives. In the case when the spike does not come, the conductivity value decreases exponentially. In this model, if the neuron is stimulated then the conductance \(g_{exc}\) can be calculated as

$$\begin{aligned} \tau _{g_{exc}}\frac{dg_{exc}}{dt}=-g_{exc}. \end{aligned}$$
(2)

In the same way, the value of \(\tau _{g_{inh}}\) can be calculated in the following way

$$\begin{aligned} \tau _{g_{inh}}\frac{dg_{inh}}{dt}=-g_{inh}. \end{aligned}$$
(3)

2.3 Spiking neural network and training algorithm

A spiking neural network can be built of two layers [9]. The first layer process the input which is built of \(m\times n\) neurons due to the size of incoming data. Each neuron in the input layer has a Poisson firing (spike trains) [1]. The second layer has excitatory and inhibitory neurons. Each excitatory neuron is connected to one inhibitory neuron. In this model, if the excitatory neuron will send a spike, then only one inhibitory neuron receives it. But each inhibitory neuron is connected to almost all excitatory neurons (the exception is the neuron that sends a spike).

One of the known algorithms for training spiking neural networks is called spike-timing-dependent plasticity [22]. The idea is based on adding additional value \(x_{pre}\) to each synapse where the presynaptic trace is kept. This variable is used to count the number of arrived spikes through this synapse. This value is increased each time when there is a spike, but in many cases, there might not be. In this case, the value of \(x_{pre}\) is exponentially decreasing. The weights are calculated using a learning rate of \(\eta \) as

$$\begin{aligned} \Delta w=\eta (x_{pre}-x_{tar})(w_{max}-w)^\mu , \end{aligned}$$
(4)

where \(x_{tar}\) is the value of output target, \(w_{max}\) is the maximum value of the weight, and a coefficient \(\mu \) indicates a relationship to the previous weight.

3 Heuristic algorithms for hyper parameters

This section is focused on describing the heuristic idea of collaborative many populations for hyperparameter problem. The parameters that we take into account in the coefficients selection process are the number of neurons \(n_{neurons}\), the value of \(E_{exc}\), \(E_{inh}\), \(\mu \), \(v_{thres}\), \(v_{decay}\) (threshold potential decay). These values create a six-element vector that is identified with a single individual in the heuristic algorithm population.

3.1 General idea of heuristic approach

A heuristic is an algorithm that does not guarantee an ideal solution infinite time, so the obtained solution is an approximate one. In general, the heuristic algorithm is based on the selection of the initial population of individuals who are displaced in the solution space through the specified number of iterations. The movement takes place primarily as two operations, which can be interpreted as global and local. Global movement changes the actual position by a large number, and the local one changes only by a very small number. Both processes are important because global allows to avoid getting stuck in the local extreme, and local modifies the current value to increase accuracy relative to the ideal solution.

Each iteration in the heuristic algorithm is based on performing a global and local movement. Then, the individual is assessed for their fitness function and the best results can be treated as a solution. These operations are repeated until the stop condition is met which is the error value (when the solution is known), or the iteration number.

3.2 Collaborative work

Federate learning assumes that there will a few workers with their private dataset. Each of them will calculate some parameters and send them to one instance which will aggregate all of them. This process will be repeated for several iterations and in the end, the last aggregated results will be a configuration for some classifier. This idea can be transferred with some modification to heuristic operation with some modification. Let us assume that a set of workers will be marked as \(\{\xi _1,\xi _2,\ldots ,\xi _{M-1}\}\), where \(M>0\) is the number of all workers and in our case corresponds to the number of processor threads. \(\xi _0\) is treated as the main worker, which gets data from others and chose the best results. Each element, for instance \(\xi _j\) has a private dataset \(d_j\).

figure a

Each worker with its dataset has one spiking neural network and one heuristic population composed of k individuals (as a six-elements vector). In the beginning, all individuals are randomly generated and for a specific number of iterations \(T_{h}\), a heuristic algorithm is performed to find the best coefficients. Of course, the population must be evaluated in each iteration to avoid a random changing position. For that, we propose a fitness function that is based on calculating the accuracy of the trained network. To make it possible, the spiking neural network will be trained in each heuristic population for all individuals. To avoid huge amounts of calculations, the number of individuals k, as well as the number of training iteration \(T_{train}\) or heuristic iteration \(T_{h}\) on a dataset, should be adequate to the hardware capabilities. Each dataset \(d_j\) connected to a specific j worker (heuristic population) splits the dataset into two subsets – training \(\Theta _{train(d_j)}\) and testing \(\Theta _{val(d_j)}\). The second subset is used for validation in the fitness function. The main idea is to minimize loss function based on the mean squared error defined as

$$\begin{aligned} f(\Theta _{val(d_j)})=\frac{1}{|\Theta _{val(d_j)}|}\sum _{i=1}^{|\Theta _{val(d_j)}|}\left( y_i-{\hat{y}}_i\right) ^2, \end{aligned}$$
(5)

where \(y_i\) is the true label of a tested sample \(x_i\), and \({\hat{y}}_i\) is the output from the network. However, there are other loss function like Contrastive Loss [25].

This means, that in each iteration on each worker, there will be \(T_{h}\) iteration. During one iteration of the heuristic approach, all individuals in the population are performing the local and global movement. Then, all individuals are used to train spiking neural networks for \(T_{train}\) iterations on a private dataset of the worker. In the end, each individual is evaluated by Eq. (5). The individual with the lowest value of this function can be treated as a solution. Half the worst individuals are replaced by new random values and the best remains in the population.

The proposed idea is based on the actions of M such workers. It is worth noting that the database was divided into relation to all workers, so the \(\xi _0\) has its database \(d_0\) which is a subset of the whole database. However, this worker has no heuristics, only a spiking neural network. The main task of it is to wait for the best individual from the other workers. As a result, the \(\xi _0\) will receive vectors with found coefficients. Based on these vectors, the worker chooses the best coefficients by finding the best using the following formulas as

$$\begin{aligned} \begin{aligned} \min \{&f(\Theta _{val(d_0)})_{\xi _1},f(\Theta _{val(d_0)})_{\xi _2},\ldots ,f(\Theta _{val(d_0)})_{\xi _{M-1}},\\&f(\Theta _{val(d_0)})_{avg}\}, \end{aligned}\end{aligned}$$
(6)

where the last element uses the average coefficients from all workers what can be presented as

$$\begin{aligned} \left[ \frac{1}{M-1}\sum _{i=1}^{M-1}c_0^{(i)},\frac{1}{M-1}\sum _{i=1}^{M-1}c_1^{(i)},\ldots ,\frac{1}{M-1}\sum _{i=1}^{M-1}c_6^{(i)}\right] , \end{aligned}$$
(7)

where each vector from worker i has the following form

$$\begin{aligned} ,\end{aligned}$$
(8)

where \(c_i\) means a specific coefficient.

figure b

When the best coefficient configuration is chosen by the worker \(\xi _0\), it is sent to all other workers. Each worker generates individuals in the population by giving them the obtained values with random changes at the level of \(\langle -5 \%, 5\%\rangle \) (only if the values are in a variable range). This operation is repeated until the specified amount of iterations \(T_{collaborative}\) is reached. The whole process of the proposal is presented in Alg. 1–2 and an example visualization in Fig. 1.

Fig. 1
figure 1

Visualization of the idea of collaboration in the training process

4 Experiments

All experiments were made with the library BindsNET [12] which allows for simulation spiking neural networks and performed on a 12-thread Intel Core i7-8750H processor, 32GB RAM, and NVIDIA GeForce GTX 1050Ti Max-Q. A classification accuracy was analyzed on two datasets—MNIST [16] and Fashion MNIST [37]. Both datasets have 10 classes, 60000 training, and 10000 testing images. Datasets were split with an equal division of samples according to classifying classes.

The analysis of the proposed approach was analyzed with 3, 6, and 12 workers. In the heuristic approach, we used 5 algorithms – CSA (Cuckoo Search Algorithm), WOA (Whale Optimization Algorithm), PBA (Polar Bears Algorithm), GOA (Grasshopper Optimization Algorithm), and SWA (Salp Swarm Algorithm). The parameters in these algorithms were set as – the size of population\(k\in \{10,25,50,100\}\), the number of iterations\(T_{h} \in \{10,25,50,100\}\), the rest of parameters depending on the model of the given algorithm were selected randomly. In the case of spiking neural networks, training iteration was set as \(T_{train}\in \{400,600,800\}\). Moreover, the incoming sample to the network is a 2D image with size \(28\times 28\), so the input layer will be composed of 784 neurons, where the incoming value of each pixel is presented in the form of a Poisson distribution with a firing ratio proportional to the pixel intensity in the database of analyzed images. As the maximum intensity value (through the RGB model) the value of 255/4 is chosen, i.e. 66.75 Hz, so it gives the value of fire from the range \(\langle 0,66.75\rangle \) Hz.

In all conducted experiments, we measured the final accuracy and time. In the case of the accuracy, created a vector with coefficients’ values were used to train spiking neural network (with the same architecture) by 800 iterations on the whole training dataset. The validation set was used for calculating the final accuracy for both classification problems. The results for the MNIST and Fashion MNIST databases are presented in Tables 1, 2 and 3 (respectively, for 3, 6 and 12 workers). In both cases, using only 3 workers despite the chosen heuristic gives very low accuracy except the situation with 12 workers. These results can be justified through much greater coefficient analysis. For using only three workers, the number of coefficient configurations analyzed is four times smaller than in the higher number of workers. In most heuristics, the classifier accuracy was close to \(50\%\) which is very high classification results taking into account a small number of iterations of heuristics. We notice that in some algorithms like CSA, PBA, and GOA, the population stack in the local extremum and had problems with escaping. This situation often resulted in a worse accuracy level than other algorithms. However, this is not a rule because, for 12 workers and PBA, the accuracy of the classifier was the highest. The accuracy level increased with the number not only of the workers but also of training iterations as well as the iterations of individual movements in the heuristic algorithm. An increase in any of the parameters allows for increasing the quality of the proposed technique.

Table 1 Classification accuracy over the entire verification set after 3 iterations of collaboration using 3 workers
Table 2 Classification accuracy over the entire verification set after 3 iterations of collaboration using 6 workers
Table 3 Classification accuracy over the entire verification set after 3 iterations of collaboration using 12 workers

We counted the best results in each test and presented the results in Figs. 2 and 3 to visualize the heuristics algorithm selection. In the case of the MNIST database, PBA, CSA, and SWA algorithms achieved the best results in terms of quantity. Two algorithms, i.e. CSA and PBA, showed the best results using a specific number of workers—CSA for 3, and PBA for 6. For the second database, the best results were achieved using PBA, SWA, and WOA. It is worth noting that PBA and SWA achieved good results with the other algorithms considering stability. By stability, we mean good results regardless of the workers or iteration number. In general, the best accuracy level was achieved by different coefficients value, where only the number of neurons was in the range of \(\langle 100,150\rangle \).

Fig. 2
figure 2

Ranking of heuristics that obtained the highest accuracy in the classification problem of MNIST database

Fig. 3
figure 3

Ranking of heuristics that obtained the highest accuracy in the classification problem of Fashion MNIST database

During the experiments, we analyzed also the time needed to find the best coefficients used in heuristic algorithms. For each set of such coefficients, the average time needed to perform calculations for one worker during one heuristic search. It should be noted that the more workers, the private database are smaller (due to equal distribution). Also, for a small number of individuals, the average time is indirect taking into account the measurements with the minimum and the maximum number of training iterations. Obtained results are presented in relation to different numbers of \(T_{heuristics}\) and the number of workers in Figs. 4, 5 and 6. On these charts, the average time needed to perform each of the selected heuristic algorithms takes into consideration three parameters – the specific number of iterations, number of individuals in the population, and the number of a worker in federated learning. Based on all of these charts, the time increases with a higher value of any parameter. It is caused by the larger amount of calculations. The most optimal solution is the use of 50 iterations, which obtained an average time of about 500–600 seconds for one worker which in one iteration reaches an average accuracy of \(50\%\) (see Figs. 4c, 5c and 6c).

Fig. 4
figure 4

Averaged time for both datasets using 3 workers

Fig. 5
figure 5

Averaged time for both datasets using 6 workers

Fig. 6
figure 6

Averaged time for both datasets using 12 workers

The highest accuracy was obtained using the maximum values of the analyzed parameters (the number of iterations and population size), although the time needed to accomplish this task for one worker increases by nearly 50–60% relative to the smaller number of iterations (Fig. 6d). All performed experiments show that the best results are gained using a large number of basic coefficients in heuristic algorithms (like the number of iteration and population size). It is caused by analyzing the bigger number of possibilities by the heuristic algorithm. However, it causes that time needed for it is very high. To avoid such a long calculation time, it is possible to use a less time-consuming variant (smaller number of iterations/individuals) and longer retraining of the final classifier with the obtained parameters after using specific values of coefficients in the heuristic, the final retraining can be performed with a larger number of training iteration. Based on time measurements, most algorithms showed a linear increase relative to the number of individuals in the population (what can be noticed in the example of CSA). Only in a few cases such as the GOA algorithm, the situation was different, what can be seen in Figs. 4 and 5. In this case, the average time curve increases to 50 individuals, and at 100 it is already much lower, for instance with using 50 or 100 iterations on 3 workers. We also compared the GOA with a classic approach which is the Bayesian approach (BA) [3]. This set of parameters examined by BA reached 83% for the MNIST database and 77% for MNIST Fashion. These results show a high accuracy metric, but the proposal reached, respectively, 89%, and 82%. This 5–6% of accuracy was obtained by the use of the highest values in the conducted tests.

5 Conclusion

The selection of parameters in the training process not only neural networks but other artificial intelligence methods is an important and open problem. In this paper, we presented the idea based on parallel collaborative learning to find network parameters in such a way that the obtained accuracy is as high as possible. The main idea was to use federated learning to find the best parameter values. In conducted experiments, we analyzed three scenarios of federated learning with the use of 3, 6, and 12 workers. Moreover, each worker performs one of five different heuristic algorithms inspired by nature. In each of them, we examined the impact of the population size and iterations. Based on this setup, the best results were reached by CSA, PBA, and SWA for a small number of iterations and workers. In this case, when the number of workers was 12, the best results were reached by GOA, PBA, and WOA. In all cases, when the number of these parameters was increased, the time needed to calculate it was also higher. The more workers in federated learning are used, the higher level of accuracy can be achieved faster. The tests showed that the developed technique can be used in practice for this purpose. However, the best results were obtained using many threads. During the research, a large number of calculations were also noted, which is reduced by using multi-threading and dividing the training database into equal parts for each thread. Hence, not only do the heuristic coefficients have a large impact, but also the number of workers.

In future research, we plan to focus on reducing the number of calculations, and thus also the training time. This could be done by the use of parallel operations as well as threads, where the worker could finish calculations faster. An additional element that should be analyzed is the number of training iterations.