Introduction

Kohonen's Self-Organizing Map (SOM) [1, 2] is an artificial neural network that maps high-dimensional inputs to a lower-dimensional lattice of artificial neurons [3]. The learning is usually done in an unsupervised fashion as it does not require input labels to train the model. SOMs have proven to be effective for solving various problems, including but not limited to clustering [4, 5], dimensionality reduction [6, 7], anomaly detection [8, 9], feature selection [10], speaker recognition [11], non-stationary real-world agents [12], remote sensing [13], speaker recognition [11], etc. In addition, SOM has the potential to preserve the topological relationship of the input space, which is essential for producing consistent results.

The SOM architecture consists of artificial neurons arranged in layers to map the input to the desired output. The neurons are connected via weight vectors. These weight vectors keep updating during the training process to learn the patterns in the input data. Obtaining the optimal values for the weight vectors that result in high accuracy is the main objective of the learner.

Generally, the SOM algorithm has two main stages: competition and adaption. The algorithm proceeds by selecting a winner neuron from a set of neurons in the competition stage. The weights are updated for the winner neuron and other nodes in its vicinity in the adaption stage. The conventional SOM algorithm has two main challenges that hinder its performance: weight initialization and topology preservation [14]. The conventional SOM uses fixed weights for initialization, and a predefined topology is selected. Using a fixed weight may affect both the algorithm's accuracy and robustness. In addition, its performance is adversely affected in the case of non-stationary datasets [7]. Topology preservation is crucial for producing consistent results. The literature review showed that preserving topology is tricky with conventional SOM as high topology error is reported on various datasets [7, 15,16,17].

Over the last few decades, researchers proposed various versions of the conventional SOM algorithm to exploit its potential fully. For instance, Growing SOM (GSOM) [18] is a hierarchical clustering approach that improves the conventional SOM's accuracy. In GSOM, the number of neurons is gradually increased to produce the final maps. It introduces a spreading factor as a tool to control the size of the maps being generated. Similarly, the authors in [19] propose an asymmetric neighborhood function adopted into the GSOM algorithm to reduce the topology error further and improve the accuracy. Grow-when-Required (GWR) is another variant of SOM [20] that learns the prototypical representation of the interaction between human and object in an unsupervised manner. The GWR-SOM showed superior performance for human motion patterns clustering. A common limiting factor for achieving faster convergence in conventional SOM is its sequential execution of tasks. To achieve the high-speed processing capability of the SOM algorithm, a fully parallel architecture of SOM is proposed in [21]. The experimental results showed that it achieved 8.91 times faster processing speed than the sequential SOM algorithm. The authors in [14] proposed a semi-supervised learning technique called a semi-supervised Growing Self-Organizing Map (SSGSOM). Like other variants, SSGSOM also resulted in higher accuracy. In addition, it is faster than conventional SOM, which is exploited for the visualization of higher-dimensional data on a 2D feature map quickly. An improved version of SOM for clustering of time series data is proposed by Jayanth et al. [22]. In particular, the prototype vectors are initialized using farthest neighbors in contrast to the random initialization in SOM. Moreover, dynamic time warping is employed as a metric for measuring the similarity between signals. These two combinations produced high-quality clusters for time series data. The results indicated that the proposed SOM not only performed better than Agglomerative Clustering but also it is more scalable in processing time to compute clusters. In some research studies, a combination of various machine learning technique is proposed which lead to better clustering and interpretability. For instance, Mateusz et al. [23] employed feature selection using PCA first then applied Self-organizing maps (SOM) neural network and K-means clustering combination model for clustering analysis that resulted in higher classification accuracy. A novel unsupervised cross-modal retrieval framework based on associative learning has been proposed in [24] where two traditional SOMs are trained separately for images and collateral text and then they are associated together using the Hebbian learning network to facilitate the cross-modal retrieval process. In [25], the authors proposed a novel SOM approach termed unsupervised borderline SOM to solve the two main challenges, class imbalance and high dimensionality. Generally, industrial processes produce high-dimensional data set and that is an imbalance as well. UB-SOM used a small number of nodes to represent the normal samples and also help highlight the borderline areas. Highly accurate results were obtained on datasets for fault detection in an industrial process using the proposed method. In another study [26], an ensembles method based on SOM and a support vector machine is proposed for survival risk prediction of cancer patients.

In [27], the authors proposed the Growing Hierarchical Self-Organizing Map (GHSOM) algorithm, a dynamic variant of the original SOM. GHSOM has two main issues: slow learning rate and unable to process categorical data. To solve these two issues, Spark-GHSOM is proposed in [28]. Spark-GHSOM integrates the Spark platform to enable the processing of large amounts of data. It also introduced a new cost function that can handle both numerical and categorical data. In [29], least squares support vector machines (LSSMSs) with multiple kernels are used to solve data redundancy and improve the conventional SOM's performance. In [30], the half quadratic (HQ) approach is adopted for parameter selection in a semi-supervised growing organizing map (SSGSOM). Another work, [31], uses an adaptable variable learning rate to obtain optimal weight vectors for the winner neurons. The authors in [20] proposed self-organizing map-based oversampling (SOMO) to deal with class imbalance issues. SOM is first used to transform the input data to a lower-dimensional space, and then it derives the within- and between synthetic cluster data. The proposed method was tested in synthetic data which showed promising results. Zhang et al. [32] proposed a Biomimetic SLAM Algorithm Based on a Growing Self-Organizing Map (GSOM-BSLAM) to overcome the uncertainty issues in location identification which uses self-motion-aware information to obtain activation response. In [33], to add automatic interpretations for better decision making, a method based on a combination of case-based reasoning, semiotic concepts and self-organizing maps is used. Moreover, a novel data-driven sign deconstruction mechanism is introduced to the problem domain.

Preserving the topological order is vital for obtaining consistent results for both the feature maps and the clusters. Since the optimization algorithms required performing the experiment several times over a number of iterations, the algorithm must assign the data to the most closet neuron [6]. Moreover, the presence of outliers in the data may result in the suboptimal performance of any machine-learning algorithm. Some SOM variants have also focused on this issue. For instance, a smoothed SOM (S-SOM) is proposed in [16] that can deal with outliers without affecting the model's performance. Specifically, a new learning rule is introduced, which helps smooth out the representation of outlying vectors. Similarly, in [34], the authors exploit the Neighbor Entropy Local Outlier Factor (NELOF) to identify and remove clusters. The initial clustering is done with SOM followed by a refinement step using the entropy of the K-relative neighborhood to redefine the local outlier factor (LOF). The LOF is then used to identify the outliers and remove them for the subsequent iterations.

Recently, deep learning-based methods have been widely used for pattern recognition problems. Although deeper networks have achieved promising performance on large datasets, they need a large amount of labeled data. Training such networks is difficult due to their highly complex network structures. Literature review shows that the SOM techniques have also been used with deep learning-based frameworks for various problems. For instance, a new unsupervised technique for visual feature learning is proposed to learn invariant image representation from unlabeled data [35], known as deep convolutional self-organizing maps (DCSOM). It consists of a cascade of convolutional SOM layers that extract features at multiple levels. Similarly, an extended version of deep SOM (DSOM) is proposed in [36]. It modified the learning algorithm and introduced unsupervised learning in the model, and the architecture is modified to learn features of different resolutions. The results indicated that the performance was significantly improved for classification. In [37] authors proposed a denoising autoencoder self-organizing map (DASOM) that integrates denoising autoencoders into a hierarchically organized hybrid model. This arrangement will help learn the model parameters in an unsupervised fashion and also maintain the clustering properties.

The deep learning techniques have demonstrated their ability to solve various supervised learning problems. In recent times, deep neural networks are also combined with representation learning for data clustering tasks. Specific regularization techniques are introduced to learn data representation to improve clustering. Florent et al. [38] proposed a deep embedded self-organizing map (DESOM) that combines both representation learning and clustering as a joint task. The model consists of an autoencoder and a SOM layer which are trained jointly to learn SOM-friendly representations. Experimental results on various benchmark datasets showed that DESOM improved the quality of quantization and topology in latent space.

This paper presents a novel variation of the conventional SOM with a variable learning rate parameter called VLRSOM. This method can obtain optimal weights to produce higher accuracy and reduce the topological error. Experiments are performed to evaluate the effect of the variable learning rate on the accuracy and topological error of the network. The results indicate that the proposed VLRSOM can produce high accuracy, low QE, and topology error (TE) compared to conventional SOM and some of its popular variants.

The rest of the paper is organized as follows. The related work and an overview of the relevant SOM variants are presented in “Related work”. “Proposed method” summarizes the motivation and contributions of this paper. “Experimental results” describes the proposed method. The experimental results are presented in “Conclusion”. Finally, the paper is completed with concluding remarks.

Motivation and contributions

As mentioned earlier, SOM is an unsupervised learning approach that can maintain the topological relationship among the input data. Since SOM can extract the latent representation of the input space, it is highly useful for applications such as data clustering and visualization. However, SOM suffers from some basic limitations and may produce undesirable results. In this section, we highlight some issues and motivate the need to improve the accuracy of the SOM algorithm further. Some main issues and desirable properties of the SOM are presented as follows:

The choice of learning rate is crucial for training the SOM model. One issue with conventional SOM is that it uses a constant learning rate. However, this may lead to convergence issues. When a small learning rate is selected, it will produce a low error, but the convergence rate will be slow. In contrast, a large learning rate value provides a faster convergence, but it may result in a high error. Several versions of SOM were introduced to handle this issue [16, 25, 30, 31, 39,40,41,42]. However, its dependence on weight initialization and the presence of outliers in the input space can adversely affect its performance.

The SOM algorithm also expects a balanced class distribution. However, in some situations obtaining balanced data can be difficult. Applying conventional SOM on such class-imbalanced data may produce undesirable results. Although various methods have been proposed to tackle this issue by generating artificial data to achieve a balanced distribution, these techniques may contain noise and make the precise prediction a challenging problem [40]. The constant learning rate may not adapt well to the imbalanced data and may fail to converge due to the high error. Generally, a lower learning rate is expected against a higher error value and vis-versa.

SOM suffers from many issues like over-sensitivity to outliers and high dependence on the weight initialization. In addition, the computation cost of proposed SOM variations is also high. Therefore, new methods are motivated to overcome the issues in SOM with less computational overhead.

We proposed an algorithm that can deal with the above-mentioned problems. The main features of the proposed work are summaries as below:

  • Introduced a novel variable learning rate to improve both accuracy and the convergency behavior of the algorithm.

  • The proposed algorithm is more robust in terms of TE as it produced the optimal topology and maintained it until the end of the iterations. It can reach the steady error state faster than other variants of SOM in the training process.

  • The presence of outliers or class imbalance does not affect its performance.

  • It can deal with multiclass clustering problems with high accuracy.

  • We have performed detailed experiments on various benchmark and synthetic datasets.

Related work

This section summarizes some well-known and most relevant algorithms, followed by a detailed description of the proposed VLRSOM algorithm.

Improved parameterless self-organizing map (PLSOM2) algorithm

The original PLSOM algorithm [42] proposed an effective solution to the main issues encountered in conventional SOM when dealing with some specific types of mapping tasks. However, the PLSOM is over-sensitivity to the outliers and highly dependent on weight initialization. An improvement of PLSOM was introduced as PLSOM2 [43] to address these problems. PLSOM2 is robust against the outliers, thus resulting in improved accuracy compared to the PLSOM algorithm. Moreover, PLSOM2 is not computationally expensive and does not require prior knowledge of the input data.

The PLSOM2 has overcome the problem in PLSOM [42] by using the range of the inputs to scale the weight update method during training. While in PLSOM, the weight update was performed using the size of the error relative to the maximum error [43]. In PLSOM2, the scaling variable of PLSOM is calculated as follows [42]:

$$ \epsilon \left( t \right) = \left( {\frac{{{\text{err}}\left( t \right)}}{S\left( t \right)},1} \right) $$
(1)

where \(\epsilon \left( t \right)\) is the scaling variable which is considered as a normalized Euclidean distance between input vector \(x\left( t \right)\) at time \(t\) and the closest weight vector \(w_{{\text{c}}} \left( t \right)\) given as

$$ {\text{err}}\left( t \right) \, = \, d\left( {x\left( t \right),w_{{\text{c}}} \left( t \right)} \right), $$
(2)

and \(S\left( t \right)\) is calculated as

$$ S\left( t \right) = {\text{max}}_{i,j} \left( || {x_{i} \left( t \right) - x_{j} \left( t \right) ||_{2} } \right),\quad i,j \le t. $$
(3)

A large value of \(\epsilon\) indicates that the output map has fit the input space poorly, while a small value of \(\epsilon\) indicates that the map fitting is acceptable. A large readjustment is required for large \(\epsilon\), which may require more iterations, while no adjustment is needed for a small value of \(\epsilon\) at time t. The PLSOM algorithm updates the weight vectors of the winner neurons as follows:

$$ w_{i} \left( {t + 1} \right) = w_{i} \left( t \right) + \epsilon \left( t \right) \cdot h_{{{\text{c}},i}} \left( t \right) \cdot \left[ {x\left( t \right) - w_{i} \left( t \right)} \right], $$
(4)

where \(h_{{{\text{c}},i}}\) is the Gaussian neighborhood function given as

$$ h_{{{\text{c}},i}} \left( t \right) = \exp \left( { - \frac{{ \left\| {r_{{\text{c}}} - r_{i} } \right\|}}{{\Theta (\epsilon \left( t \right))^{2} }}} \right). $$
(5)

\(\Theta \left( {\epsilon \left( t \right)} \right)\) is used as a scaling factor.

$$ \Theta \left( {\epsilon \left( t \right)} \right) = \beta \cdot \epsilon \left( t \right)\quad \Theta \left( {\epsilon \left( t \right)} \right) \ge \theta_{{{\text{min}}}} . $$
(6)

Moreover, another way to calculate \(\Theta \left( {\epsilon \left( t \right)} \right)\) is provided in (10) and (11), where \(\beta = {\text{constant}}\,\forall t\), and \(\theta_{\min }\) is constant.

$$ \Theta \left( {\epsilon \left( t \right)} \right) = \left( {\beta - \theta_{\min } } \right) \cdot \epsilon \left( t \right) + \theta_{\min } , $$
(7)
$$ \Theta \left( {\epsilon \left( t \right)} \right) = \left( {\beta - \theta_{\min } } \right) \cdot \ln (1 + \epsilon \left( {t)} \right)\left( {e - 1} \right) + \theta_{\min } , $$
(8)

where \({\text{ln}}\) (.) is the natural logarithm and \(e\) is the Euler number.

RA-SOM algorithm

It has been proven that selecting a higher learning rate may result in faster convergence, leading to higher topology error. In contrast, a small learning rate value may produce more accurate results, yet it requires a higher number of iterations to obtain lower QE. It may be impractical in situations with a higher amount of data. The problem is alleviated by introducing an adaptive technique termed robust adaptive SOM [44]. The algorithm initially starts with a relatively larger value of learning rate and then gradually reduces it over several iterations. This technique resulted in less QE and faster convergence. The adaptive learning rate can be updated according to the following equation [44]:

$${w}_{i}\left(t+1\right)={w}_{i}\left(t\right)+\alpha \left(t\right)\cdot \left[x\left(t\right)-{w}_{i}\left(t\right)\right],t=\mathrm{0,1},\dots ,$$
(9)

where \({w}_{i}(t+1)\) is the new weight updated in the next iteration while \(\alpha (t)\) represents the adaptive learning rate which is defined as

$$\alpha \left(t\right)=\frac{\lambda }{1-{\beta }^{t}}.$$
(10)

Equation (9) can be updated using Eq. (10) as

$${w}_{i}\left(t+1\right)={w}_{i}\left(t\right)+\left(\frac{\lambda }{1-{\beta }^{t}}\right)\cdot \left[x\left(t\right)-{w}_{i}\left(t\right)\right].$$
(11)

It adopts a similar approach for weight initialization as in the conventional SOM, where weights are first randomly initialized. However, unlike conventional SOM, RA-SOM is able to control the weight updation via \({\beta }^{t}\). Initially, a larger value of \(\beta \) is selected while t will be small that will make the term \(1-{\beta }^{t}\) small while making \(\alpha \left(t\right)\) large. In the subsequent iterations, this term will gradually decrease as the value of t will increase.

Proposed method

This section provides the mathematical detail of the SOM algorithm followed by the proposed version VLRSOM. The original SOM consists of a group of neurons that gradually adjust to input data points. It then generates a set of ordered neurons that maintains the topology of the mapped data. A similarity measure such as Euclidean distance is defined to help adaption of these neurons. The weights of winner neurons are then updated in each iteration during training.

In Kohonen's SOM algorithm [1], features from a high n-dimensional input space \(x=\{ {x}_{1},{x}_{2},\dots ,{x}_{n}\}\) are mapped to a lower dimensional output space using the connection weights \({w}_{i}=\{{w}_{n1}, {w}_{n2}, \dots , {w}_{nm}\}\). It simply uses the Euclidean distance with the rule known as winner-takes-all, which is given in the following equation [1, 2]:

$$\underset{i}{c \, = \, \mathrm{arg min }}\left(\Vert {w}_{i}\left(t\right)-x\left(t\right)\Vert \right),$$
(12)

where \(c\) is called the best-match-unit (BMU) neuron on the output map, \(i=1, 2,\dots , k\le n\times m\), it indicates that a high dimensional input corresponds to the most suitable unit \(i\) at position \(c\). For all inputs and randomly initialized weights, a competitive learning rule is applied in such a way that the input data having similar features will retain a similar topological output map [1, 2]:

$${w}_{i}\left(t+1\right)={w}_{i}\left(t\right)+{h}_{c,i}\left(t\right).\left[x\left(t\right)-{w}_{i}\left(t\right)\right],$$
(13)

where

$$ h_{c,i} \left( t \right) = \mu . \exp \left( { - \frac{{ \left\| {r_{c} - r_{i} } \right\|}}{{2\sigma^{2} \left( t \right)}}} \right), $$
(14)

where \({w}_{i}\left(t+1\right)\) represents the weight of \(i\)th neuron at iteration \((t+1)\), \(x\left(t\right)\) is the training input taken at time \(t\), \(\mu \) is learning rate and \({h}_{c,i}\) is the Gaussian neighborhood function, \(\left\| {r_{c} - r_{i} } \right\|\) is the Euclidean distance between the winning neuron c and ith neuron in the grid, and \(\sigma (t)\) is the neighborhood size which is set to a constant value. During training, the value of \({h}_{c,i}(t)\) decreases according to the annealing scheme used in the algorithm.

The weights are then updated during the training as given by the following equation

$${w}_{i}\left(t+1\right)={w}_{i}\left(t\right)+\alpha \left(t\right)\cdot \left[x\left(t\right)-{w}_{i}\left(t\right)\right],$$
(15)

where \(\alpha \left(t\right)\) is the learning rate. Instead of the learning rate, a Gaussian function is adopted in GF-SOM. Equation (15) can then be updated using Gaussian function as

$${w}_{i}\left(t+1\right)={w}_{i}\left(t\right)+{h}_{c,i}\left(t\right)\cdot \left[x\left(t\right)-{w}_{i}\left(t\right)\right],$$
(16)

where \({h}_{c,i}(t)\) represents Gaussian function defined as follows

$$ h_{c,i} \left( t \right) = \alpha \left( t \right) \cdot \exp \left( { - \frac{{\left\| {r_{c} - r_{i} } \right\|}}{{2\sigma^{2} \left( t \right)}}} \right). $$
(17)

In the above equation, \(\left\| {r_{c} - r_{i} } \right\|\) represents the Euclidean distance between ith neuron and the selected winner neuron. Moreover, in the conventional SOM, the error function \(J\left(t\right)\) is defined as

$$J(t)=\frac{1}{n}\sum_{i=1}^{n}{\Vert {w}_{c}\left(t\right)-{x}_{i}\left(t\right)\Vert }^{2},$$
(18)

where \({w}_{c}\) is the BMU of \({x}_{i}, i = 1, 2, \dots , n\).

The weight update rule of the SOM corresponds to a gradient descent step in minimizing the above error function.

$$ \begin{aligned} w_{i} \left( {t + 1} \right) & = w_{i} \left( t \right) + \Delta w_{i} \left( t \right) \\ & = w_{i} \left( t \right) + \mu \cdot \exp \left( { - \frac{{ \left\| {r_{c} - r_{i} } \right\|}}{{2\sigma^{2} \left( t \right)}}} \right) \cdot \left[ {x\left( t \right) - w_{i} \left( t \right)} \right] \\ \end{aligned} $$
(19)

where \(\mu \) is the fixed learning rate, which controls the convergence (where \(\mu >0\) is a preset small learning rate parameter) and steady-state behavior of the SOM algorithm. We introduced a new variable learning rate in the SOM Similar to [45], to further enhance the accuracy and decrease both TE and QE. The learning rate is adopted according to the following equation in VLRSOM:

$${\mu }^{{\prime}}\left(t+1\right)=\alpha {\mu }^{{\prime}}\left(t\right)+\gamma {J}^{2}\left(t\right).$$
(20)

In this equation, the learning rate is updated with two conditions, \(0<\alpha <1\) and \(\gamma >0\). \({\mu }^{\mathrm{{\prime}}}\left(t+1\right)\) is bounded by \([\mu_{\max } -\)\(\mu_{\min } ]\). Then, we introduce the following condition for \(\mu \left( k \right)\).

$$ \mu \left( k \right) = \left\{ {\begin{array}{*{20}ll} {\mu_{\max } \quad\qquad {\text{if}}\,\,\, \mu^{\prime}\left( {t + 1} \right) > \mu_{\max } } \\ {\mu_{\min } \qquad\quad {\text{if}}\,\,\, \mu^{\prime}\left( {t + 1} \right) < \mu_{\min } } \\ {\mu^{\prime}\left( {t + 1} \right) \quad {\text{otherwise}}{.}} \\ \end{array} } \right. $$
(21)

As suggested in [45] that a good choice could be \({\mu }_{\mathrm{max}}\). To provide maximum convergence speed, \({\mu }_{\mathrm{max}}\) is normally selected near the point of instability in the conventional SOM algorithm. The value of \({\mu }_{\mathrm{min}}\) is selected as a tradeoff between the desired steady-state misadjustment and the algorithm's required tracking capabilities (convergence behaviors). From Eq. (20), it is obvious that the learning rate is always positive and controlled by \(\alpha \), \(\gamma \) and the prediction error \(J\left(t\right)\). \(\gamma \) controls both the convergence rate and the level of steady-state misadjustment of the algorithm. This technique has shown improved performance compared to the fixed learning-rate SOM. When the model starts training, the weights are randomly initialized and produce a high prediction error, and a large learning rate is selected. As the training continues, the prediction error gradually decreases. The learning rate is also gradually reduced, which then yields smaller misadjustment near the optimum. Hence, the steady-state misadjustment is reduced. The value of \({\mu }_{\mathrm{max}}\) should be chosen to guarantee bounded error [45] which is given below

$${\mu }_{\mathrm{max}}\le \frac{2}{3\mathrm{tr}\left(R\right)},$$
(22)

where, \(R\) is the expected value of the autocorrelation matrix of the input vector, and \(\mathrm{tr}\) is the trace of \(R\) matrix.

Experimental results

This section describes in detail the results obtained for the proposed method. Moreover, a comparative analysis is also performed with the conventional SOM and its two well-known variants: PLSOM2 and RA-SOM. We evaluated the performance of the proposed method in terms of QE, TE, accuracy, and convergence time.

Two different datasets were used to test the efficiency and robustness of the proposed method: synthetically generated data and the MNIST handwritten characters dataset [46]. For each dataset, two separate experiments were performed with a different number of iterations (200 and 500) to test the ability of the algorithms to reach the steady error state. Technical details are presented in the following subsections.

Results for synthetic data

The synthetic data was generated randomly in the two-dimensional (2D) feature space in the range \([\mathrm{0,1}]\). Such data is widely used in various experiments to validate the proposed approach, [19, 42, 47,48,49,50]. 2D data is easier to visualize and analyze; therefore, selecting such data for the current study was more imperative.

For this study, 1000 2-D samples were generated to validate the proposed VLRSOM architecture. Let the 2-D data be represented \({x}_{j,2}\forall j,1\le j\le 1000\). The map established by \(j=100\) neurons in a \(10\times 10\) lattice, and \(t=200,\mathrm{ and }500\) iterations. The 2-D data arranged in a grid as coordinates can also confirm the effectiveness of the asymmetric neighborhood function. The similar units on the map are connected with each other. The weights associated with the input vector are randomly initialized which makes sure that the positions of units in neighborhood on the map did not match the input data. The weights of the map are gradually learned as the training proceeds. Finally, the algorithm is able to obtain the optimal maps.

Experiment A

This experiment was carried out for a maximum of 200 iterations. We performed a detailed analysis to evaluate the performance of each model in terms of QE and TE.

Like any other machine learning algorithm, SOM requires optimal values for its parameters. Naturally, these parameters should be obtained first before training or testing the model. Therefore, a comprehensive search for finding the optimal parameters was performed to get the optimal set of parameters for generating the optimal maps. Table 1 summarizes the optimal parameters obtained on the 2D synthetic data. These optimal parameters were then used for further training and evaluation of the models.

Table 1 Optimal parameters of random initialized map of a 2-dimension \(( x,y)\in [\mathrm{0,1}]\)

Table 2 summarizes the quantitative results obtained for the proposed method and its comparison with the other three SOMs (conventional SOM, PLSOM2, and RA-SOM). VLRSOM proved to be superior compared to the other three models in terms of both QE (4.6 × 10–4) and TE (1.0 × 10–4). The other models also produced relatively similar performance in terms of QE; conventional SOM, PLSOM2, and RA-SOM resulted in 8.1 × 10–4, 5.6 × 10–4, and 4.9 × 10–4 QE, respectively. In terms of TE, VLRSOM produced the lowest error (1.0 × 10–4). The performance of RA-SOM was relatively better (TE = 1.1 × 10–3) while PLSOM2 and conventional SOM resulted in higher TE with scores of 3.57 × 10–2 and 2.760 × 10–1, respectively. Lower TE is an indication of a consistent result in maintaining the topology of the network. The lower TE for both VLRSOM and RA-SOM compared to conventional SOM and PLSOM2 indicates that these models were able to exploit the relationship between data which plays an important role in producing consistent results and maintaining the topology.

Table 2 Results for synthetic 2D data for 200 iterations

To better understand the results, we visualized the input data and the corresponding output of the models to see the consistency in the maps generated (Fig. 1). All maps were randomly initialized before training each model. Figure 1a shows the generated synthetic 2-D data plot. (b) shows the topology adaption results for the conventional-SOM. Maps are highly random and suffer from a higher number of variations, indicating that the model could not reach a steady-error state within the given number of iterations. It was also confirmed from the quantitative results, which showed a higher TE that indicates the algorithm suffers from low stability even after running over all 200 iterations. The topological maps generated for PLSOM 2 (Fig. 1c) were more consistent and better than conventional SOM, indicating that it is better in terms of TE as it reached a steady error state within the given maximum number of iterations.

Fig. 1
figure 1

Experimental results of conventional SOM, PLSOM2, RA-SOM and proposed VLR-SOM algorithms after training using random initialization with 200 iterations: a Random initialized map of a 2-D SOM, b Topology adaptation results for Conventional SOM, c Topology adaptation results for PLSOM2, d Topology adaptation results for RA-SOM, e Topology adaptation results for proposed VLR-SOM

Figure 2 shows each model's visual results to further understand their behavior in terms of QE and TE over the maximum number of iterations. In (a), we can see that the proposed method is always faster than the other model for the whole number of iterations. In addition, its convergence is smoother and stable every time it iterates. The variations are higher for SOM and PLSOM2, and their convergence rate is way too slower than the RA-SOM and VLRSOM. RA-SOM is more stable, and results are more comparable with the proposed VLRSOM. However, still proposed method is able to reach the steady-state faster than the RA-SOM. The better stability of the proposed method is shown in Fig. 2a, b in the zoomed region for iterations 48–56. It clearly shows that the VLRSOM has a slight edge over the RA-SOM in speed and reaching a low error state. Conventional SOM produced the highest QE for the whole number of iterations. However, QE was initially high for other models but gradually reached a lower value as the model continued training over the number of iterations and ultimately reached a stable state, showing their effectiveness in achieving the steady error state.

Fig. 2
figure 2

Comparative analysis of the conventional SOM, PLSOM2, RA-SOM and proposed VLR-SOM algorithms with 200 iterations in terms of a QE, and b TE

The behavior of the models in terms of TE was similar to that of QE (Fig. 2b). Conventional SOM was highly unstable even though it showed a slight improvement in TE in the end, but still, it was way behind the other models. Interestingly, the TE of PLSOM2 was also low, and it was also faster in reaching a lower TE in early iterations, but it was still unstable with a higher number of variations. The performance of both RA-SOM and VLRSOM was very similar and produced overall optimal results. When we closely look at the fine details (zoomed region for iterations 10–20 in Fig. 2b), VLRSOM was slightly better than RA-SOM. These results indicate that PLSOM2, RA-SOM, and VLRSOM are highly suitable for processing 2D data with high accuracy.

Experiment B

In this experiment, the models were evaluated for 500 iterations to test their behavior over a large number of iterations. Like experiment A, we randomly generated the 2D data and then calculated the optimal parameter values for each model.

Table 3 shows the results obtained for the proposed method and its comparison with the other three models. The overall behavior of the models was similar to experiment A. Yet, there were some interesting points to note. Increasing the number of iterations also resulted in lower QE for all models. Interestingly, the TE was very low for all the models after training over 500 iterations. It shows that, given many iterations, the models can reach a steady error state. However, a model with a faster convergence rate is better since fewer resources will be utilized to reach the steady error state.

Table 3 Results for synthetic 2D data for 500 iterations for all models

We can see that among all models, VLRSOM resulted in the lowest QE (1.5 × 10–3), followed by RA-SOM (1.6 × 10–3), PLSOM2 (2.0 × 10–3), and conventional SOM (2.2 × 10–3). These results indicate that all models were able to converge when trained over 500 iterations. The overall behavior of each model in terms of TE was also similar as only marginal differences in TE were noted; VLRSOM produced TE as 1.37 × 10–6, RA-SOM 1.75 × 10–6, PLSOM2 7.4 × 10–5, and conventional SOM as 4.3 × 10–5.

Figure 3a–e shows the visual results obtained for 2D synthetic data for each model run over 500 iterations. (a) shows the original randomly generated data, and the connections between the data points indicate the weights of the topology. The resulting topology for conventional SOM (b) is distorted (shapeless) even after 200 iterations. It indicates higher instability in the topology resulting in a higher topology error as validated by the quantitative results. However, the stability of the topology for both PLSOM2 (c) and RA-SOM (d) was much better than conventional SOM, which can be evidenced from figures that there is less twisting and deformation in the topology produced by reaching the 500 iterations. The grid shape obtained for VLRSOM (e) is more consistent, and it can adapt the asymmetric neighborhood function better than other algorithms.

Fig. 3
figure 3

Experimental results of conventional SOM, PLSOM2, RA-SOM and proposed VLR-SOM algorithms after training using random initialization with 500 iterations: a Random initialized map of a 2-D SOM, b Topology adaptation results for Conventional SOM, c Topology adaptation results for PLSOM2, d Topology adaptation results for RA-SOM, e Topology adaptation results for proposed VLR-SOM

Figure 4a, b compare the performance of each model in terms of QE and TE, respectively. The results complement the quantitative results. The proposed VLRSOM algorithm consistently showed lower QE over all iterations except in very few cases. The RA-SOM's performance was very similar to the VLRSOM at every iteration, indicating that it can reach lower QE early in training. However, as shown in the zoomed area from 460 to 500th iteration, the proposed VLRSOM has a slight edge over RA-SOM in obtaining a low error state. On the other side, both PLSOM2 and conventional SOM required a significant number of iterations to reach a steady error state.

Fig. 4
figure 4

Behavior of the conventional SOM, PLSOM2, RA-SOM and proposed VLR-SOM over 500 iterations a QE, and b TE

In the case of TE, the proposed method and RA-SOM are comparatively close as both achieved less TE and more stability early in the training process (Fig. 4b). However, when we look closely at the zoomed region for iterations 1–100, it is clear that VLRSOM reached the lower TE early in the training process compared to all other models. SOM was highly unstable till it reached the 460th iteration, which is clear that it requires more time to reach a steady error state. PLSOM2 was better than conventional SOM as it consistently produced lesser TE. It also needed many iterations to reach a steady error state (350). These results show that both RA-SOM and VLRSOM can reach a stable error state quite early in training, making them suitable for processing 2D data with high accuracy.

In terms of CPU time, the comparisons between the proposed method and other variants of SOM for two different iterations of the synthetic data are summarized in Table 4. Time is measured in seconds in all algorithms. The CPU time for PLSOM2 was relatively longer than the other algorithms. In contrast, conventional SOM took less execution time than other models.

Table 4 Comparison of CPU time for conventional SOM, PLSOM2, and proposed VLRSOM algorithms on the synthetic data

Results for handwritten characters

The second experiment was conducted for handwritten character recognition using the MNIST dataset. It consists of a total of 70,000 handwritten characters. First, we divided the data into training (60,000) and testing (10,000) sets. Like the previous experiments with the synthetic dataset, we carried out two separate experiments, with 200 and 500 iterations. The following sections provide details of each experiment performed.

Experiment A

This experiment was carried out for 200 iterations. For the handwritten dataset, first, we proceed by obtaining the optimal set of parameters for each model. Table 5 summarizes the parameters obtained for each algorithm. The quantitative results obtained for this experiment are shown in Table 6. The performance of the proposed VLRSOM was superior to the other three models, which produced an accuracy of 83.00%, QE 5.8450, and TE 0.0024. RA-SOM also produced highly satisfactory results; accuracy = 81.1%, QE = 6.6566 and TE = 0.0711. The performance of the Conventional SOM algorithm was also in line with state-of-the-art models, which produced an accuracy of 80.00%, QE as 6.8660, and TE as 0.3377. Surprisingly, PLSOM2 showed the least accuracy (73.33%), highest QE (7.3240), and TE (0.644).

Table 5 Optimal parameters obtained for the conventional SOM, PLSOM2, and RA-SOM and proposed VLR-SOM
Table 6 Comparison of conventional SOM, PLSOM2, RA-SOM, and proposed VLR-SOM algorithms for handwritten character recognition for 200 iterations

The visual results obtained for some sample character recognition using 200 iterations for each algorithm are shown in Fig. 5. In (a), the visual results obtained for conventional SOM are shown. The results are visually consistent as the constructed characters are recognizable except in fewer cases. In the case of PLSOM2 (b), the visual results indicate that its performance is suboptimal for character recognition. Its output is even difficult to comprehend in some situations. It indicates its low applicability for character recognition tasks. The visual results obtained for RA-SOM (c) and VLRSOM (d) were highly accurate as they were able to get a correct estimation of the shape of the original characters.

Fig. 5
figure 5

Learning performance comparison of conventional SOM, PLSOM2, RA-SOM and proposed VLR-SOM algorithms using handwritten digits random initialization with 200 iterations: a Neighborhood construction behavior for Conventional SOM, b Neighborhood construction behavior for PLSOM2, c Neighborhood construction behavior for RA-SOM, d Neighborhood construction behavior for proposed VLR-SOM, e QE, and f TE

Figure 5e, f shows QE and TE for each model over the whole number of iterations. The overall behavior of all models for QE was the same; initially, the QE was high due to random initialization, and then it started to reduce as the models began to learn with the increasing number of iterations. However, QE's convergence rate and value were consistently lower for the proposed VLRSOM algorithm for each iteration until it reached the maximum value (200). This was also validated in quantitative results. In the case of TE (f), however, the behavior of each model was different as high variations were noted during the initial stage of map generation for each model. Conventional SOM, PLSOM2, and RA-SOM have high variations till they reach the highest number of iterations. On The other hand, although initially, the behavior of VLRSOM was similar to other models, the variations gradually reduced as it progressed over the number of iterations. This indicates that the VLRSOM reaches a more stable state in constructing the maps, which is crucial for producing consistent results.

Experiment B

We again run the experiment for hand character recognition using 500 iterations to test the behavior of the models over a higher number of iterations. The quantitative results obtained using 500 iterations are summarized in Table 7. The proposed VLRSOM produced the highest accuracy (88.89%). Interestingly, conventional SOM and RA-SOM produced higher accuracy of 87.66% and 87.77%, respectively, compared to PLSOM2, resulting in 64.44% accuracy. QE, VLRSOM, RA-SOM, and conventional SOM were highly effective as they produced 6.421, 6.461, and 6.887, respectively. However, the performance of PLSOM2 was suboptimal as it resulted in QE = 7.279. The TE for VLRSOM was lowest (8.10 × 10−3) as expected as it produced consistent results. Similarly, RA SOM and conventional SOM were also efficient in preserving topology for handwritten character recognition as they produced 8.9 × 10−3 and 3.02 × 10−2 TE, respectively. The performance of PLSOM2 was again lowest in terms of TE (6.40 × 10−1) for this dataset.

Table 7 Comparative analysis of conventional SOM, PLSOM2, RA-SOM, and proposed VLR-SOM algorithms for handwritten character recognition

The visual results obtained for each model for the handwritten character recognition dataset are shown in Fig. 6. The models showed similar performance as we saw in experiment A. However, as we can see from Fig. 6e, f, which are QE and TE, respectively, the models perform better over a more significant number of iterations. Naturally, the models can gain more insights into the data as they go over many iterations. The output characters produced by each model are more consistent than the previous experiment's output. It is also worth mentioning that both QE and TE for VLRSOM were consistently lower and reached steady error compared to the other two methods.

Fig. 6
figure 6

Learning performance comparison of conventional SOM, PLSOM2, RA-SOM and proposed VLR-SOM algorithms using handwritten digits random initialization with 500 iterations: a Neighborhood construction behavior for Conventional SOM, b Neighborhood construction behavior for PLSOM2, c Neighborhood construction behavior for RA-SOM, d Neighborhood construction behavior for proposed VLR-SOM, e QE, and f TE

Table 8 shows the execution time taken by each algorithm for the handwritten character dataset. Compared to the synthetic data, the time taken by longer for this dataset. Similar to the synthetic data, the CPU time for PLSOM2 was relatively longer than the other algorithms. In contrast, conventional SOM took less execution time than other models.

Table 8 Comparison of CPU time for conventional SOM, PLSOM2, and proposed RZA-SOM algorithms on handwritten character dataset

Experiments with UCI benchmark datasets

Additional experiments were performed on four benchmark datasets to test the applicability of the proposed method. These datasets were obtained from the University of California, Irvine (https://archive.ics.uci.edu). Four datasets were considered: Balance, Wisconsin Breast, Dermatology, and Ionosphere. Table 9 summarizes the dataset used. As we can see, there are varying samples, features, and classes present in each data set. Generally, models face challenges when the number of features and classes are more than 2.

Table 9 Dataset used for experiments in this study

Before executing the experiments, we divided the data into training (80%) and testing (20%) subsets. The number of iterations was empirically set to 50 as the models converge before reaching these maximum iterations. In addition, weights were generated in the same fashion for all algorithms to achieve fairness in the evaluation.

The algorithms have many parameters which need fine-tuning before training the models as the performance of those algorithms depends on the optimal values of these parameters. Therefore, the optimal values for those parameters were first obtained by applying the grid search method. Since the numerical values for each dataset are significantly varying, optimal values for the parameters were obtained separately for each dataset. Table 10 summarizes the fine-tuned parameter values obtained for each algorithm. The values were then used in subsequent training of the models.

Table 10 Optimal parameters obtained for conventional SOM, PLSOM2, RA-SOM and proposed RZA-SOM algorithms on four UCI benchmark datasets

The quantitative results in terms of accuracy and QE obtained for all the algorithms applied on the four datasets are summarized in Table 11. We can see that the proposed VLRSOM produced the highest accuracy and lowest QE for all data sets. Generally, all models produce good classification accuracy on the Wisconsin Breast dataset. Both PLSOM2 and RA-SOM proved better than conventional SOM but were suboptimal compared to the proposed VLRSOM.

Table 11 Comparative analysis of conventional SOM, PLSOM2, and proposed RZA-SOM algorithms on four UCI benchmark datasets

In the case of the Balance dataset, the highest accuracy was obtained for the VLRSOM (76.47%) and lowest QE (0.206). The performance of RA-SOM was comparable with VLRSOM as it resulted in 75.94% and 0.208 accuracy and QE, respectively. In contrast, the performance of both conventional SOM and PLSOM2 was not satisfactory. Conventional SOM resulted in 63.10% accuracy and 0.242 QE. Similarly, PLSOM2 produced accuracy and QE of 63.10% and 0.222, respectively.

All models produced high accuracies for the Wisconsin Breast dataset. Both RA-SOM and VLRSOM produced 100% accuracy, while conventional SOM and PLSOM2 resulted in 99.02% accuracy. Similarly, the QE for conventional SOM, PLSOM2, RA-SOM and VLRSOM was 0.148, 0.152, 0.147 and 0.142, respectively. The main reason behind the high accuracy can be ascribed to a relatively lower number of distinct features and the number of classes.

VLRSOM outperformed the other three models for the Dermatology dataset as it produced the highest accuracy (80.91%). The accuracy obtained for the other three models was similar; conventional SOM produced 68.18%, while both PLSOM2 and RA-SOM produced 70.91% accuracy. However, in terms of QE, all classifier’s response was similar. The accuracies obtained for Ionosphere data sets for the proposed method was 82.41% with QE 0.099. For this dataset, conventional SOM resulted in 68.52% accuracy and 0.100 QE, PLSOM2 produced 68.52% accuracy and 0.110 as QE, and RA-SOM resulted in 80.56% accuracy and 0.100 QE.

These results indicated that the proposed VLRSOM is superior in terms of accuracy compared to the other three algorithms. In addition, in terms of QE, it also produced optimal results on all datasets. This indicates that the proposed VLRSOM is more robust against noise and outliers.

We performed further analysis to investigate the learning behavior of the proposed method and compared it with conventional SOM, PLSOM2 and RA-SOM algorithms. Figure 7 shows the visual results obtained for all algorithms obtained on the four datasets. Figure 7a shows the learning behavior of the four models on the Balance dataset. We observe that the proposed algorithm has a much faster adaptation as it reached an error steady-state quite early in the training process (8th iteration). The learning behavior of RA-SOM was similar to the proposed method. However, it started with a higher error, and also at the end of the maximum iterations, its QE was higher than the proposed method. On the other hand, the learning behavior of conventional SOM and PLSOM2 was different than RA-SOM and VLRSOM. SOM does not seem to reach a lower QE even though it reached a stable error state, but the QE increased after reaching a lower error early in training. PLSOM2 also resulted in higher QE and did not converge to a lower QE even after reaching the maximum iterations. The proposed method showed low variations throughout the iterations compared to the other methods in terms of TE.

Fig. 7
figure 7

The comparison of learning behavior between conventional SOM, PLSOM2, RA-SOM and proposed VLR-SOM for each dataset: a Balance b Wisconsin c Dermatology and d Ionosphere

Figure 7b shows the learning behavior for the Wisconsin dataset. The models behaved similarly to each other. All models had high variations in the beginning dues to random initialization of weights and then gradually reached a lower QE. After the 10th iteration, the models reached a lower QE, thus attaining a steady-error state. All models also showed similar behavior for both Dermatology (Fig. 7c) and Ionosphere datasets (Fig. 7d). It can also be observed that the models produced higher TE at the beginning of the iterations and then gradually reached lower TE as the training continued for a higher number of iterations.

Table 12 shows the execution time taken by each algorithm for the UCI benchmark dataset. For all datasets, the CPU time taken by all algorithms is very similar except for the conventional SOM, which took less execution time for the Ionosphere dataset. This indicates that the proposed variable learning rate does cost much processing time compared to the other methods and obtained higher accuracy.

Table 12 Comparison of CPU time for conventional SOM, PLSOM2, and VLR-SOM algorithms on four UCI benchmark datasets

Time complexity of the VLRSOM algorithm

This section provides an insight into the time complexity of the proposed VLRSOM algorithm. In general, the SOM algorithm suffers more from memory complexity than time complexity as that kind of situation is less likely to occur. Vesanto et al. [51] showed that the time complexity of the SOM is \(\mathcal{O}\left({N}^{2}\right)\), where \(N\) is the total number of neurons/prototypes in the SOM lattice which shows the time complexity is quadratic in nature with respect to the given input N. This indicates that the algorithm is less efficient in terms of memory complexity as the maps size increases. In contrast, the processing time for the SOM algorithm seems to be much less than the amount of memory it consumes during processing [39]. The processing time complexity for input to the output layer is in order of \(\mathcal{O}(NM)\) [51]. This shows that the time complexity is linear to the number of nodes in the output (\(M\)) and the input \((N)\) layers of the SOM model.

Each sample is passed to the SOM model during training, which calculates the distance between the input and the weight vector (Eq. 12). The time required for calculating this distance for each sample can be estimated to be \(O(NM)\) as the initial clustering is performed only once. Similarly, according to Eq. 12, the time complexity required for calculating the winning neuron in the output layer is also \(O(N)\). According to Eq. 13, the total time required for weight updating for each pattern is \(O\left(NM\right)\). Therefore, the total time complexity for SOM can be calculated as [52]:

$${\mathrm{TC}}_{\mathrm{SOM }}= O\left(\mathrm{ts}\left(\hspace{0.17em}NM+N+NM\hspace{0.17em}\right)\right),$$
(23)

where \({\mathrm{TC}}_{\mathrm{SOM}}\) is the time complexity of SOM, \(t\) represents the number of iterations, and \(s\) is the number of patterns. Therefore, the asymptotic time complexity is \(O(\mathrm{ts}NM)\).

The proposed approach does not affect the architecture or the layers of the conventional SOM algorithm. This indicates that the underlying principles of the SOM algorithm remain intact except for the introduction of an adaptive learning rate instead of a fixed one, as shown in Eqs. (20) and (21). Therefore, time complexity remains \(\mathcal{O}(NM)\) which is the same as the conventional SOM. The “big \(\mathcal{O}\) notation" does not consider the constant and minor terms, so when such terms are dropped, it leads to the same time complexity as \(\mathcal{O}(NM)\) for the proposed approach. However, for the memory consumption, the proposed method has a little more memory complexity than the conventional SOM due to the addition of more complex calculations. Yet, after simplification of the "big \(\mathcal{O}\) notation", the resulting expression for memory usage will remain \(\mathcal{O}\left({N}^{2}\right)\), which is same as the conventional SOM algorithm.

According to [53], the time complexity of the conventional SOM algorithm can be calculated as \(O(\mathrm{ts}(3N+3))\). In the case of VLRSOM, the proposed steps are also executed in each iteration. From Eqs. (20) and (21), we can deduce that each iteration of VLRSOM approximately takes \(O(\mathrm{ts}(5N+3))\) units of time. Therefore, the total time for \(n\) iterations can be calculated as:

$${\mathrm{TC}}_{\mathrm{VLRSOM }}= O\left(\mathrm{ts}\left(5N+3\right)\right).$$
(24)

The correctness of these results is confirmed by comparing the results of the CPU time shown in Tables 4, 8, and 12 for the conventional SOM, PLSOM2, RA-SOM, and VLRSOM algorithms. It is interesting to note that each iteration of the VLRSOM algorithm takes slightly longer than the conventional SOM. Yet, the convergence speed of the VLRSOM compensates for it in such a way that VLRSOM reaches an acceptable level of quantization error much faster than the conventional SOM.

Conclusion

The main objective of this paper was to improve the accuracy and topology preservation capability of the SOM algorithm. The improvement in the accuracy is achieved by introducing a new variable learning rate parameter. The adaptive learning rate help improve the accuracy of the SOM technique by reducing the steady-state misadjustment. The VLR adaptively adjusts itself to the error (increase or decrease) allowing the SOM model to track changes in the training data that resulted in a small steady-state error. The VLR adjustment is controlled by the estimated error. Moreover, VLR leads to faster convergence and robustness in the steady-state behavior. The goal is to make a large adjustment to VLR for a large estimation error for faster tracking while a small adjustment to VLR for a small estimation error. Hence, the VLR will control the amount of misadjustments needed to produce optimal maps.

Detailed experiments were performed to evaluate the accuracy and robustness of the proposed VLRSOM algorithm. Two different datasets were used, and for each dataset, two independent experiments were performed with different iterations to test the speed of convergence with high accuracy and its ability to preserve the topology. The results confirmed the capability of the proposed method as it produced highly satisfactory results. Moreover, VLRSOM was also compared with conventional SOM, parameter-less self-organizing map (PLSOM2), and RA-SOM in terms of accuracy, quantization error (QE), and topology preservation (TE). The proposed method proved superior to all other three techniques in all experiments.

We want to focus on Markov Blanket to make the SOM algorithms more efficient in future work. Moreover, we would like to integrate the proposed algorithm with a deep neural network to find the optimal set of parameters for classification tasks. The greedy search algorithm will help improve the efficiency of deep neural networks by selecting the optimal set of parameters needed to perform the classification task. The method can further be improved by adopting a parallel implementation of the proposed algorithm. In addition, the theoretical aspects of the algorithm will be explored to prove the working of the new algorithm.