1 Introduction

Nowadays, deep learning models significantly impact extracting information or hidden patterns from enormous amounts of data with greater accuracy. Compared to conventional machine learning approaches, deep learning can solve complex problems and correlate the interdependent variables. However, traditional neural networks SOM, MLP and DBN have limited high-level data abstraction, which can be alleviated, when combined with deep neural nets. This paper analyzes different hybrid neural networks MLP, SOM and DBN with deep learning models [1] and the techniques used to improve the performance of the model. MLP, SOM, and DBN models are highly applicable when human experts are unavailable, unable to explain the decisions made using their knowledge adequately, problem solutions evolve and size, and in instances where solutions must be modified depending on new information [2].

2 Multi-Layer Perceptron (MLP)

It is the commonly used neural network based on supervised learning in which information flows in one direction and has no loops. The main objective is to find the optimized function f() that maps input to the desired output and to learn the optimized bias value (θ) for it. Learning occurs in the MLP using a back propagation algorithm by adjusting the connection weights when there is a deviation between the expected and actual output. Their main applications are to solve optimization problems in finance, transportation, fitness, and energy (Figs. 1 and 2).

Fig. 1
figure 1

Architecture of Multi-Layer Perceptron

Fig. 2
figure 2

Steps in MLP algorithm

3 Architecture, Algorithm and Characteristics of MLP

It has three layers: input, output and one or more hidden layers. The input layer collects the input features to be processed. An arbitrary number of hidden layers lies between the input and output layers. They work as the computational unit of the MLP. The output unit performs tasks such as prediction and classification.

  • Property 1: Universality

    MLP is capable of learning both linear as well as non-linear functions. MLPs are designed to approximate any continuous function and can solve problems that are not linearly separable.

  • Property 2: Adaptive learning and Optimal

    MLP can learn how to do tasks from the data given for training and initial experience. MLP minimizes the loss function. Hence it is optimal. Learning the function that maps the inputs to the outputs reduces the loss to an acceptable level.

  • Property 3: Stochastic

    MLP is a stochastic program. In a stochastic program, some or all problem parameters are uncertain and use probability distributions to solve highly complex optimization problems.

  • Property 4: The power of depth

    Compared to shallow ones, deep nets can represent some functions more compactly, such as parity function and a deep network, whose size is linear in the number of inputs computes it.

3.1 Application of MLP

There are various convolution neural network-based models for remote sensing image classification and better performance. VHR remote sensing image scene classification plays a vital role in remote sensing research; hence they help manage land resources, urban planning, tracking of disasters, and traffic monitoring. Osama A. Shawky et al. [3] proposed a VHR (Very High Resolution) image scene classification model comprising three phases: Data augmentation to learn robust features, a pre-trained CNN model to extract features from the original image, and an adaptive gradient algorithm multi-layer perceptron to improve the accuracy of the classifier.

With the advent of modern remote sensing technologies, various very fine spatial resolutions (VFSR) dataset is now commercially available. These VFSR images have opened up many opportunities such as urban land use rescue, agriculture, and tree crown description. Zhang et al. [4] proposed a hybrid classification system that combines the contextually based classifier CNN and pixel-based classifier MLP with a rule-based decision fusion strategy. The decision fusion rules formed based on the confidence distribution of the contextual-based CNN classifier. If the input image patch is at the homogeneous region, the confidence is high.

On the other hand, if the image pixel contains other land cover classes as related information, the confidence is low. As a result, the MLP can rectify the classified pixels with low confidence at the pixel level. This paper also compares the proposed method's performance with benchmark standards such as pixel-based MLP, spectral texture-based MLP, and contextual-based CNN classifiers.

Md Manjurul Ahsan et al. [5] proposed a hybrid model with a combination of a Convolutional Neural Network (CNN) and Multilayer Perceptron (MLP) in which MLP handles the numerical/categorical data, and CNN extracts features from the X-ray images. Parameter tuning used the grid search method to decide the number of hidden layers, number of neurons, epochs, and batch size. Meha Desai et al. [6], in their study, compare and analyze the function and designing of MLP and CNN for the application of breast cancer detection and conclude CNN give slightly higher accuracy than MLP.

Vinod Kumar et al. [7] suggested a hybrid CNN-MLP model that analyzes novel and diversified attacks. The problem of intrusion detection is a classification task using machine learning and deep learning techniques. The model used feature selection and reduction techniques, random forest regressor, along the correlation parameter. The CICIDS2017 dataset used the performance of the proposed model outperforms that of the performance of the individual CNN and MLP models. Hanwen Feng et al. [8] suggested a CNN model for Classification of Points of Interest in Side-channel attacks and compared it with MLP and concludes that MLP is more suitable for PCA traces and CNN is for POI traces; shorter traces improves the classification results. Bikku [9] proposed a model using MLP to predict future health risk with a certain probability, comparing it with LSTM and RNN, and they suggested MLP outperforms the other two. Salah, L. B presented a model to control a bioreactor using deep learning feed-forward neural networks with different MLP structures. The trained model emulates the inverse dynamics of the bioreactor and then uses neural controllers for neural control strategies of the chosen bioreactor [10].

Bairavel et al. [11] suggested a model for multimodal sentiment analysis using feature-level fusion technique and novel oppositional grass bee optimization (OGBEE) algorithm for fusing the extracted features from different modalities and MLP for classification. [12] compared three different neural network approaches, MLP, RBF and PNN, for Thematic mapping from remotely sensed data. For the proposed model, PNN outperforms. Singh et al. [13] designed a model to find the optimal collision-free path and control the robot's speed in a dynamic environment for the mobile robots to reach the destination using MLP. The ultrasonic sensors in the robot sense the obstacle in its path and calculate the distance between them. Meng Wang et al. [14] devised a model to detect the Distributed Denial of Service (DDoS) attack using MLP with feature selection for optimal feature selection and the Back Propagation algorithm to reconstruct the detector when errors are perceived. The model comprises three modules knowledge base, detection model, and feedback mechanism and MLP act as binary classifier during attack detection [15] (Table 1).

The model proposed by Morteza Taki et al. [15] predicts the irrigated and rainfed wheat output energy using artificial network models MLP, RBF and Gaussian Process Regression (GPR). The RBF model performs better than the other two models in predicting wheat output energy under various irrigated and rainfed farms. [16]

Table 1 Applications of Deep Learning Algorithms: MLP

In order to decrease error and packet loss in the network, Jafari-Marandi et al. [18] suggested an MLP with many layers arranged in an input layer, an output layer, and several hidden layers. Back-propagation learning is a method used by MLPs to train the network. A collection of vectors derived from actual data or produced by a realistic simulator are used during the training phase. Perceptrons learn by adjusting the connection weights to reduce the output error relative to the predicted outcome for each training vector. Authors [19] propose secure routing on the Internet of Vehicles based on V2V (Vehicle to Vehicle) communication [20] using multilayer perceptron (MLP) to detect intruders or attackers on the network [21]. MLP algorithm-generated classification report based on Denial of service attacks. The MLP's precision, recall, F1 score, and support [22] scores are displayed in the classification report. They utilized it to compare classification models and choose the ones with more robust or balanced classification scores. The measures employed are true positives and false negatives as well as true and false positives [23].

4 Self-organizing Map (SOM)

A Self-organizing map (SOM) is an unsupervised-based neural network algorithm. It is also referred to as the dimensionality reduction algorithm or Kohonen Network with input and output layers without a hidden layer. Since this algorithm reduces the input dimension of the data, the final output is represented as a feature map. Similar samples of maps are merged as a map. Generally, this algorithm is used to convert the high-dimensional dataset into a 2D discretized pattern [4].

4.1 Architecture, Algorithm and Characteristics of SOM

The SOM architecture has two layers: the input and output layers with the feature map. It does not contain any hidden layer like a neural network. So, it just passes weight values to the output layer without performing any activation function in neurons. At the same time, each neuron is assigned with some weight value based on the input space. The SOM architecture has a feed-forward structure with a 2D computational layer of nodes arranged in rows and columns and connected fully with all other sources of the input layer [24]. Figure 3 depicts an architectural overview of Kohonen’s SOM.

Fig. 3
figure 3

Kohonen Self Organising Map:An overview

The SOM uses competitive learning to update its weights. It consists of three methods as Competition, Cooperation and Adaptation. In the competition process, compute the distance between each neuron of the Kohonen layer and the input layer and identify the minimum or maximum distance of neuron based upon the applications. That will be considered as the winner of the process. Following the cooperation process, select the neighbourhood neurons depending on the time and distance of the winner neurons.

At last, the adaptation process updates the weight values of the winner and cooperative neurons. Finally, it produces a feature map from input variables. The main properties of the SOM are described as, Property 1: Approximation of the Input Space: The feature map in the output space, which is expressed by a collection of weight vectors, is a fair estimation of the input space. Property 2: Topological Ordering: The SOM algorithm produces a topologically ordered feature map, meaning that the spatial position of a neuron in the output lattice or grid correlates to a specific domain or feature of the input data. Property 3: Density Matching: The feature map represents differences in the input distribution's statistics regions in the input space. For example, the high probability sample training data are mapped into more significant domains of the output space, and thus with higher resolution, than regions of input space where training vectors are produced with low probability values. Property 4: Feature Selection: The self-organizing map will pick a set of optimal features for quantifying the underlying distribution of given data from input data with a non-linear distribution. Figure 4 represents the algorithmic steps of SOM.

Fig. 4
figure 4

Essential steps of SOM algorithm

4.2 Application of SOM

Kohonen et al., [62] initially used them for speech recognition. But nowadays it is used in various applications such as, Pattern recognition, speech processing, industrial and medical diagnostics applications [25, 26], and data mining etc. using some hybrid architecture merged with RNN, CNN and back tracking approaches. The phonetic typewriter of Kohonen is one of the earliest and most well-known applications of the SOM. The challenge is to identify phonemes in real time so that they can be used to drive a typewriter from dictation in the field of speech recognition. The speech signals [27] are pre-processed before applied to the SOM. The Fourier transforming and filtering process are used to sample the data using 24 dimensional spectral vectors. The proposed network was effectively trained using speech waveforms, and the output nodes are naturally clustered with the ideal phonemes. Finally, the model output generated logical phoneme strings from real-world speech applications (Table 2)

Table 2 Applications of Deep Learning Algorithms: SOM

Behnisch and Ultsch [64] used an extension of SOM as an emergent self-organizing map (ESOM) for clustering and classification of the data. It keeps the high-dimensional data's neighborhood relationships. On the other hand, the finite grid has a drawback because neurons on the map's edges have somewhat different mapping qualities than neurons in the middle versus those on the boundary. Growing hierarchical self-organizing maps (GHSOM) used by Chifu and Letia [65] consist of a set of SOMs (particularly bidimensional grids) organized as nodes in a hierarchy. GHSOM is initialized with a hierarchy mirroring the one in the taxonomy, and concepts are mapped to some nodes in the corresponding SOM by initializing the node’s weights with the vector description of the concept; all other unmapped nodes are initialized randomly [28].

Potok [29] suggested the architecture for Breast cancer diagnosis using neural network based self-organizing maps (SOM). It classified tumour breast or benign and non tumour or malignant lesions with the samples of 243 breast tumours. The SOM based autocorrelation texture features used to classify the tumours. Osman [30] suggested a corona virus detection technique based on the Locality weighted learning and self-organization map (LWL-SOM) technique for capturing the images and identify the diseases of COVID-19 cases. They grouped the chest X-ray data patterns based on SOM strategy to categorize between the positive and negative cases of COVID-19. Then, they built locality weighted learning model for diagnosing the cases. Materials Informatics application.

Jimin et al. [28] used the machine learning technique of SOM to visualize and validate relationships between high-dimensional materials dataset. Compared to conventional methods, the SOM categorized and validated the materials using various mapping techniques like U-matrix map, heat maps, cluster-based map, and Gruneisen parameter. Nicolas et al., [35], proposed the classification of vertebral problems using K-means and SOM algorithm. In his approach, SOM outperformed than the K-Means clustering analysis. Felix et al. [36] and Srivatsa et al., [37] used the hybrid version of self organising map algorithm with Susi framework and Cellular Self-Organizing Map for classifying the hyper spectral datasets. A self-organization-based clustering network in MANET employing zone-based group mobility was proposed by Farooq et al. in [40] to increase scalability and decrease additional energy consumption in the network. They employed the flocking behavior of birds as a microbe model to build and maintain clusters in MANETs. A cluster size management plan that lowers network traffic, enhances MANETs' performance in group mobility, and uses less energy. A distance-based intrusion detection system based on an unsupervised SOM network was proposed by Barletta et al. in [41]. To identify intrusions, the SOM network connects to CAN buses inside the car. Many hybrid methods, combining the SOM network with other clustering methods, such as the k-means algorithm, have been proposed to improve the accuracy of the model. Several data sets, including those for automobile hacking, spoofing, fuzzy data, and denial-of-service attacks, are used for simulation.

5 Deep Belief Network

Nowadays, Machine Learning dominates research interests due to its vast application in various fields. “Deep Learning,” a type of machine learning algorithm also known as Representation Learning [42] has its application in multimedia concept retrieval, text mining, social network analysis and video recommendation. Deep Learning represents ANN with layered network topologies of neuron models (Fig. 5).

Fig. 5
figure 5

Layer architecture of Deep Belief Network

5.1 Architecture, Algorithm and Characteristics of DBN

Hinton et al. [43], Deep Belief Network (DBN) [22] is a popular deep learning algorithm representing advanced learning methodology, more profound architecture, and high-level abstraction of biological modelling, giving rise to simplified mathematical models. Its network architecture is inspired by artificial intelligence (AI) research study replicate human-level intelligence.

DBN, an alternate class of Deep Neural Network, is a graphical model with multiple layers of ‘hidden units’ with a connection within layers and not within each layer [22]. Trained Unsupervision DBN reconstructs its inputs probabilistically acting as feature detectors, whereas trained Supervision DBN is utilized for classification. Restricted Boltzmann Machine (RBM)/autoencoders is an unsupervised DBN representing an undirected, generative energy-based model where hidden layer acts as a visible layer for the successor. When trained greedily layer-wise, DBN leads to an effective deep learning algorithm [44]. Overall deep belief network real-time application scenarios include computer vision, electroencephalography [45], drug discovery [46], natural language processing, speech recognition, material inspection, board game programs, and so forth, with fantastic outcomes surpassing human expert achievement. Figure 6 represents the algorithmic steps of the Deep Belief Network.

Fig. 6
figure 6

Fundamental steps of DBM algorithm

5.2 Application of DBN

Liu et al. [47] suggested a model for image classification using Deep belief network. In this, the stacked restricted Boltzmann machine make use of contrastive divergence algorithm for feature extraction and softmax layer make use of Evolutionary Gradient Descent(EGD) strategy to classify the extracted features. The acceleration rate of EGD is remarkable compared to Gradient Descent algorithm. Jianjian Yang et al. [48] proposed a model for deep fault recognition using Deep Belief Network. To refine the result of the DBN, the proposed method makes use of stochastic adaptive particle swarm optimization (RSAPSO) algorithm. To address the limitations such as local optimization and low search accuracy of conventional PSO algorithms, the proposed system used RSAPSO algorithm that allows particles to reset the position from the original with an assured probability, and continue its searching again. Hence, the proposed method minimizes the probability of the trapping in a local minimum of particle swarm. Dan Wang et al., [49] presented a system that applied raw physiological data to Deep Belief Networks (DBNs) with three classifiers to envisage the levels of emotions such as arousal, valence, and liking based on the known features [50].

The classification accuracies obtained are better than the results acquired by Gaussian Naïve Bayes classifier. Xiaoai Dai et al., [51] devised a DBF model to extract artificial target features in cities, as a hyper-spectral image. DBF performs dimensionality reduction and extracts the depth features of pixels. DBF provides better robustness and separability compared with Principal component analysis. O’Connor et al., [52] This paper proposes a method based on the Siegert approximation for Integrate-and-Fire neurons to map an offline-trained DBN onto an efficient event-driven spiking neural network suitable for hardware implementation. The method is demonstrated in simulation and by a real-time implementation of a 3-layer network with 2694 neurons used for visual classification of MNIST handwritten digits with input from a 128 × 128 Dynamic Vision Sensor (DVS) silicon retina, and sensoryfusion using additional input from a 64-channel AER-EAR silicon cochlea. The system is implemented through the open-source software in the jAER project and runs in realtime on a laptop computer. It is demonstrated that the system can recognize digits in the presence of distractions, noise, scaling, translation and rotation, and that the degradation of recognition performance by using an event-based approach is less than 1%. Movahedi, faezeh et al., [46] discussed the state of- the-art algorithms for deep belief networks and their performances in electroencephalographic applications in medical fields such as emotion recognition, sleep stage classification, and seizure detection. It also includes the challenges and future research direction of DBF in electroencephalographic applications. Abdellaoui and Douik [53] suggested an optimal HAR system with a two-phase DBN model that offers a better quality of classification prediction [54] (Table 3).

Table 3 Applications of Deep Learning Algorithms: DBN

Kuldeep et al. [55], proposed to build a cluster out of the vehicles based on their relay nodes, distrust values, and recommendation and experience-based faith vehicles. Deep Belief Network (DBN) used the threshold value discovered to classify the vehicles into three lists: regular, irregular, and vicious. By removing the malicious nodes from the VANET network [56], this belief result from the FBTRP-DBN model enables the selection of an ideal cluster head using the highest trustworthy node [57]. According to that it improved the overall connectivity and throughput of the network [26, 58]. Moreover, the classification algorithm provided the varied results based on the involving vehicles types and counts i.e.real time data sets [59]. The challenges faced by these algorithms are depicted in the Fig. 7.

Fig. 7
figure 7

Challenges of MLP,SOM and DBN algorithms

5.3 Limitations of MLP, SOM and DBN Algorithms

MLP with back propagation has feeble generalization ability for statistically neutral problems. Hence the model does not know the expected output, and the relation between different input variables determines the output.MLP has too many parameters as it is fully connected, resulting in redundancy and inefficiency [6].MLP disregards Spatial information in solving problems. SOM minimizes the volume of the dataset, making it easier to visualize and form clusters. However, it has several flaws, including poor handling of categorical variables. Solutions for enormous data sets are computationally demanding and potentially inaccurate. As per DBN, it needs High energy and large space requirement for its execution. The huge requirement of Random Number Generators (RNGs) drops deep belief networks energy efficiency [60].

Moreover, in DBN, Learning time is prolonged in a back-propagation neural network with multiple hidden layers. Greedy learning is inefficient in the directed module as the posterior is not factorizable in each training case. Integrating the overall organization of higher variables before the first hidden layer makes learning layer-wise difficult in a sigmoid belief network.MLP has recently seen a sharp decline in research and has been used in many tasks, including content retrieval, intrusion detection, video recommendation, and image classification. SOM and DBN are two of the more often used compared MLP algorithms in these applications [61]. The ongoing emergence of new research in deep and distributed learning results from the ever-increasing ease with which data can be obtained and the incredible advancements in hardware technologies like high-speed computing.

The main challenges of these algorithms are lack of training data in intrusion detection, imbalanced data issues in ad-hoc network applications, Interpretability of data, Uncertainty scaling in health care, Catastrophic forgetting in biological applications, Model compression in health care, Over fitting in medical applications during misbehaviour detection in mobile networks, Exploding Gradient and Vanishing gradient problem in energy efficient network formation and under specification of data in secure routing. Cloud-based platforms are anticipated to be crucial in the future creation of computational Deep learning applications. Handling the massive volume of data can be done with cloud computing. Additionally, it aids with cost- and efficiency-saving measures. Additionally, it gives the option to train Deep learning architectures [61].

6 Conclusion

The field of machine learning simulating deep learning methods has slowly become the dominant domain in this era. This chapter discusses the issues and challenges and aggregates numerous existing solutions in medical, material and NLP applications. Though, there are still several issues and challenges that need to be tackled in the future. MLP is one of the deep learning approaches with universality and stochastic properties. Hence its application is versatile, such as intrusion detection in ad-hoc networks, medical image processing for disease prediction, cluster-based data transformation in VANET, remote sensing for image scene classification and networks to identify the attacks, energy engineering to predict the output energy in various applications, and robot navigation to determine the optimal collision-free path the destination and so on. The self-organizing maps and modified versions of SOM are discussed based on various application platforms. The unsupervised-based learning algorithm has some standard steps and properties to create a map from the large or complicated input data set. Due to its high adaptiveness, many applications like medical diagnosis, data compression techniques, Bibliographic classification, and Image browsing have used this algorithm for classification. Deep learning is the fast-growing approach that provides solutions for hectic challenges in various applications. Moreover, machine learning is transforming into its new phase of intelligent Artificial Intelligence applications .