Introduction

News is a medium that keeps everyone updated about the events that have taken place. It has two parts, one is the headline of the news and the other is the content body. News has the highest reach among all other forms of media [88]. In this era of digitization, news also spread through various social media platforms such as Facebook, Twitter, Whatsapp, etc. Nowadays, doubt regarding the credibility of news has become widespread, since some miscreates are spreading fake news purposefully through different social media platforms. Fake news can have a severe threat to the community due to its extensive reach [86]. In this digital era, everyday terabytes of fake news are created and shared on various online platforms. However, such a high-dimensional data cannot be managed by human beings in real time and human fact-checkers cannot handle such tremendous information in real time. Thus, to automate the process of fake news detection, several artificial intelligence (AI)-based techniques have been introduced in the literature. One of the popular approach to discover fake news is by identifying the stance of the news [50]. Stance detection involves estimating the relative views (or stance) of two chunks of text relative to a claim, issue, or topic. In general, a stance is defined as a relationship between two textual bodies. In stance detection, a headline and a body text are given and the objective is to classify the headline–body pair into one of the categories, namely, agree, disagree, discuss, and unrelated as discussed below.

  1. 1.

    Agree stance A stance is agree stance when the claim of one body is validated by the other body. For example: Body 1-Mango is known as king of fruits. Body 2-Mango has a great taste, that is why it is known as king of fruits.

  2. 2.

    Disagree stance A stance is disagree stance when the claim of one body is denied by the other body. The same is elaborated in following example: Body 1-Mango is known as king of fruits. Body 2-Mango has a great taste but we cannot say it as king of fruits.

  3. 3.

    Discuss stance A stance is discuss stance when the claim of one body is neither denied nor validated by the other body rather the other body discuss the claim made by the other body. For example: Body 1-Mango is known as king of fruits. Body 2-Mango has a great taste and has great texture which makes him stand apart from the other fruits.

  4. 4.

    Unrelated stance A stance is an unrelated stance when the claim of one body is neither denied nor validated by the other body rather the other body discuss anything apart form the claim made by the other body. For example: Body 1-Mango is known as king of fruits. Body 2-Arvind Kejriwal is the chief minister of the Delhi.

To discover whether a piece of particular news is fake or not, its stance is identified [32]. If the stance of news belongs to an “agreed” category, then it confirms its genuinity, while, if the stance is “‘disagreed” or “unrelated”, it implies that the news is fake. It is very difficult to discover whether the news is fake or not if the stance belongs to the discussed category. News belonging to the “discussed” category have to be physically analyzed. To discover the stance of fake news, many models such as traditional stance, machine learning, deep learning, and natural language processing (NLP)-based models have been proposed in the literature. The traditional stance-based models compare the body and headlines to check the credibility of news [27]. Shu et al. [71] presented an NLP-based approach for identifying the stance of fake news. The NLP-based approaches investigate fake news from textual and network perspective. The NLP-based approaches cannot capture semantics from textual data; hence, these approaches sometimes do not perform well. Therefore, various word embeddings such as word2vec, Glove, etc., along with machine learning are used for identifying the stance [76]. Ghanem et al. [28] introduced a hybrid model based on the strength of n-grams, lexical representation of indicative words, and word embeddings for stance detection of fake news. In the literature, it has been stated that the performance of stance classification can be improved if appropriate representative features are used. However, the existing approaches use some predefined lexicons and word embeddings to extract features from textual data. Thus, it is possible that the extracted features may be irrelevant. To extract relevant features, deep learning models are generally used.

The deep neural network-based models extract useful features automatically from datasets using backpropagation [63]. However, the performance of backpropagation falls rapidly if the complexity of the problem increases [58]. Moreover, the efficiency of neural network models also depends upon hyper-parameters such as filter window size, learning rate, word embedding techniques, and batch size. Therefore, to improve the performance and optimize the values of hyper-parameters, metaheuristic methods are generally used. The whale optimization algorithm (WOA) is successfully used in literature for optimizing FFNNs [5]. Moreover, WOA is also used to solve text classification and sentiment analysis problems [49, 78]. In this paper, an improved whale optimization algorithm (IWOA) has been proposed for optimizing the hyper-parameters of neural networks. The proposed model uses the word embedding technique for normalizing the textual data followed by an optimized neural network to get the stance. The proposed IWOA optimized FFNN can be used for solving single-objective minimization and maximization optimization problems. Furthermore, the proposed algorithm can also be extended to solve multi-objective optimization problems such as allocating parking lots in a distribution network, optimization of RO desalination plants, etc. The main contributions of this article can be outlined as follows.

  • A new variant of whale optimization has been proposed and validated on standard benchmark functions.

  • The proposed improved whale optimization algorithm (IWOA) enhances the efficiency by balancing the exploration and exploitation capabilities of WOA.

  • The proposed IWOA method are used for optimizing the hyper-parameters of neural network and to identify the stance of textual data.

The remaining paper is ordered as follows. The section “Background study” briefs the related work in the field of stance detection. Preliminaries are discussed in the section “Preliminaries” and the proposed work is presented in the section “Proposed algorithms”. The section “Evaluating IWOA for bias(es)” evaluates for the bias(es) and the section “Experimental results” reports the Experimental outcomes followed by the conclusion in the section “Conclusion and future work”.

Background study

Recently, many deep neural network-based techniques have been proposed for stance detection. Riedel et al. [68] proposed a two-step model for stance detection. The authors first extracted and created a vocabulary set of 5000 most frequent words using TF-IDF which are then passed to a multilayer perceptron having a single hidden layer to get the stance. The proposed technique shows acceptable results for agreeing label stance. However, it does not give satisfactory results for other stances. Therefore, Davis and Proctor [21] presented three different approaches for correctly identifying the stances of the news; the first approach is based on bag-of-words with a three-layer multilayer perceptron (BoW-MLP), the second is a bag of words with bidirectional LSTM (BoW-BiLSTM), and the third is a bag of words with a concatenated multilayer perceptron (BoW-CMLP). The proposed BoW-MLP model outperforms the other two proposed BoW models with a classification accuracy of 93%. However, the proposed models do not perform well with information captured by the BoW model. Thus, Chen et al. [14] used a vectorization technique based on the strength of the LSTM and attention model to predict stances. For the same, they used three different types of neural network architecture with a bag of vectors. The results show that the bag of vector technique performs well with neural network for related and agree news articles. However, it does not perform well in classifying the stance of discuss and unrelated category.

Furthermore, Chaudhry et al. [12] used GloVe word embedding along with LSTMs and three different encoding schemes, namely conditional, independent, and bidirectional conditional for stance detection. In LSTM with conditional encoding, RNN is arranged sequentially, while in LSTM with bidirectional conditional encoding, the softmax layer is used for prediction. The results show that LSTM with bidirectional conditional encoding outperforms the other two algorithms with an accuracy of 97%. However, the accuracy can be further improved by combining an attention mechanism with high-dimensional pre-trained embedding. Mrowca et al. [52] used 100-dimensional GloVe vector representations with bidirectional LSTM for stance detection. Pfohl et al. [65] used the notions of attention model and conditional encoding-based LSTM for identifying the stance of news articles. Furthermore, basic LSTM, LSTM with attention, and conditional encoding LSTM with attention (CEA-LSTM) have been used for stance detection [9, 58]. Zeng et al. [86] compared the performance of deep neural network models with a hand-crafted feature-based system and discovered that the deep neural network model outperformed the hand-crafted feature-based system. Sun et al. [74] used a hierarchical attention network for identifying the stance of news. The performance of the attention model is further improved by combining the hand-crafted features and hidden features. Additionally, Yu et al. [85] implemented an RNN encoder–decoder using LSTM and GRU cells with and without having attention for stance detection. In the proposed model, LSTM and GRU cells allow the flexibility to remember or forget the context and also deal with the vanishing gradient problem. Yoon et al. [84] detected incongruity between the body and headline of news articles using a deep hierarchical encoder.

Moreover, Le and Mikolov [42] combined doc2vec with Word2vec word embedding for stance detection. Besides, Lau and Baldwin [41] presented an improved bag-of-words model for stance detection. However, the existing machine learning approaches use some predefined lexicons and word embeddings to extract features from textual data. Thus, it is possible that the extracted features may be irrelevant wherever deep learning models suffer from vanishing gradient descent problem. Therefore, to overcome the problem of vanishing gradient descent, metaheuristic algorithms are generally used [7, 24, 57]. Metaheuristic methods normally show better results than the traditional and state-of-the-art methods over NP problems [11, 57, 59,60,61]. Mosavi et al. [51] introduced a hybrid model based on gray wolf optimization and neural network for the classification of sonar data. Mukherjee et al. [53] introduced a PSO optimized multilayer perceptron for malignant melanoma detection. Kohli and Arora [38] presented a chaotic GWO algorithm to enhance the global convergence speed of GWO. Krill herd algorithm has been also used to solve a number of optimization problems. Abualigah [2] employed an improved krill herd algorithm for feature selection and document clustering. Cuevas et al. [20] proposed a social spider algorithm based on the behavior of spider to solve many real-world optimization problems like economic dispatch problem [23], transmission expansion planning problem [22], and optimal power flow solution with single-objective optimization [54]. Furthermore, a new metaheuristic algorithm namely symbiotic organisms search (SOS) has been proposed to solve engineering design and various numerical optimization problems [17]. SOS algorithm has been used to solve complex real-world optimization problems such as predicting sea wave height [3], truss optimization with natural frequency [75], optimal operation of reservoir systems [10], and many more. In continuation, Mirjalili et al. [46] proposed a new bio-inspired optimization algorithm, namely salp swarm optimization algorithm (SSA) for engineering design problems. A number of variants of SSA have presented in the literature for solving various NP problems. Yılmaz et al. [82] presented a biobjective optimization model that minimizes the makespan and reduces the workload imbalance among workers for seru production system. Yilmaz and Durmusoglu [83] compared the performance of different metaheuristics for batch scheduling problem in a multihybrid cell manufacturing system. The distribution optimization approaches have also been used for finding optimal parameters for numerous applications. Sun et al. [73] presented a wind forecasting approach based on two-step short-term probabilistic distribution optimization. WOA is also employed in a number of NP problems. WOA has been proposed by Mirjalili and Lewis [44], which is motivated by the hunting mechanism of humpback whales. Mafarja and Mirjalili [43] proposed a feature selection algorithm based on WOA and simulated annealing. Alzaqebah et al. [6] employed WOA to prioritize the software requirements by assuming requirements as a search space and priority as hunting behavior of the whales. Petrović et al. [64] presented a new variant of WOA to find the optimal solution for the NP-hard scheduling problem. Jiang et al. [37] introduced a discrete WOA to solve the green job shop scheduling problem. A new binary WOA based on S-shaped and V-shaped transfer functions has been proposed by Hussien et al. [35] to solve discrete optimization problems. Chen et al. [15] employed levy flight and chaotic local search synchronously to balance the exploration and exploitation capabilities of standard WOA. The proposed balance WOA has been used to solve complex constrained engineering design problems. Aljarah et al. [5] used a whale optimization algorithm (WOA) along with an MLP for identifying the stance of news. For the same, the authors employed the WOA algorithm for updating the weights and biases for backpropagation instead of gradient descent. The balance WOA proposed by Aljarah et al. [5] for optimizing the MLP mitigates the issue of vanishing gradient problem and also enhances the convergence speed. From the literature, it has been observed that WOA has been successfully used to solve diverse real-word optimization problems including single-objective and multi-objective problems [1, 16], especially in text-mining tasks [4, 36, 49, 78]. Therefore, in this study, WOA has been used for identifying the stance of fake news.

From the above discussion, it is evident that the traditional machine learning and deep learning models generally suffer from vanishing gradient descent problems and computationally very expensive. Moreover, these methods also perform badly over NP problems. On the contrary, metaheuristic methods do not suffer from gradient descent problem and also show better results on NP problems. In literature, a number of metaheuristic methods have been introduced for optimizing the different types of neural networks and for solving the various real-world problems. Therefore, in this paper, a new variant of WOA named IWOA has been proposed. The proposed IWOA can be used for solving various engineering design and NP problems like truss optimization, a traveling salesman, cell manufacturing system, text classification, and many more. In this work, the proposed IWOA optimized has been used for optimizing a neural network that has been utilized for stance detection for fake news.

Preliminaries

The proposed approach uses an improved whale optimization algorithm (IWOA) for optimizing the weight and biases of multilayer perceptron which are discussed in the following subsections.

Whale optimization algorithm

Whale optimization algorithm (WOA) is a nature-inspired metaheuristic algorithm which is generally used for solving different optimization problems [44]. WOA is motivated by the hunting mechanism of humpback whales. WOA method uses three phases, i.e., encircling the prey, bubble-net attacking, and searching of prey to find the optimal solution. All the phases of WOA are discussed below.

Encircling the prey

In this phase, each of the search agents update their positions vector concerning the position vector of current best agent using Eqs. (1) and (2):

$$\begin{aligned}&\mathbf {Q} = |\mathbf {B}\cdot \mathbf {W^*(t)}- \mathbf {W(t)}| \end{aligned}$$
(1)
$$\begin{aligned}&\mathbf {W(t+1)}= \mathbf {W^*(t)} - \mathbf {A}\cdot \mathbf {Q}, \end{aligned}$$
(2)

where t denotes the iteration, \(\mathbf {A}\) and \(\mathbf {B}\) are coefficient vectors, \(\mathbf {W^*}\) is the position vector of the best solution obtained so for, and \(\mathbf {W}\) denotes the position vector of a whale. The values of \(\mathbf {A}\) and \(\mathbf {B}\) are computed according to Eqs. (3) and (4):

$$\begin{aligned} \mathbf {A}= & {} 2\mathbf {a}\cdot \mathbf {d} - \mathbf {a} \end{aligned}$$
(3)
$$\begin{aligned} \mathbf {B}= & {} 2\cdot \mathbf {d}, \end{aligned}$$
(4)

where \(\mathbf {a}\) is a control parameter and \(\mathbf {d}\) is a random vector in the range of [0, 1].

Bubble-net attacking

The attacking mechanism performed by a humpback whale is also known as a bubble-net attack. It is basically an exploitation phase in which the obtained solution is further refined for finding the optimal solution. The bubble-net attacking has two phases, i.e., shrinking encircling medium and spiral updating position mechanism. For the course of the next iteration, one of these mechanisms is chosen based on the probability. For the same, the first probability is randomly generated, and if the probability is less than 0.5, then the shrinking encircling mechanism is used; otherwise, the spiral updating position mechanism is used. Both shrinking encircling and spiral updating position mechanisms are discussed in the following paragraph.

Shrinking encircling mechanism To achieve the shrinking encircling behavior, the value of \(\mathbf {a}\), as defined in Eq. (3), is decreased linearly from 2 to 0. In other words, A is set to \([-a, a],\) where \(\mathbf {a}\) is decreased from 2 to 0 over the course of iterations. Therefore, the new position of a search individual using this approach can be defined anywhere in between the current best individual and the original position of the search individual.

Fig. 1
figure 1

Block diagram of MLP

Spiral updating position mechanism In this phase, the search agents move in helix shape toward the prey using Eqs. (5) and (6):

$$\begin{aligned}&\mathbf {Q'} = |\mathbf {W^*(t)}-\mathbf {W(t)}| \end{aligned}$$
(5)
$$\begin{aligned}&\mathbf {W(t+1)} = \mathbf {Q'}\cdot e ^{bk}\cdot \cos {2\pi k} + \mathbf {W^*(t)}; \end{aligned}$$
(6)

here, \(W^*\) is position vector of the best search agent, \(\mathbf {W(t)}\) is position vector of search agent in iteration t, \(\mathbf {Q'}\) is coefficient vector which is computed using Eq. (5), b is a constant value, and k is a random number in range \([-1,1].\)

Search for prey (exploration phase)

In this phase, a search agent searches for prey randomly. This random movement serves two purposes; first, it helps in proper investigation of the search region, and the other, it helps to avoid algorithm form being stuck at local optima. Mathematically, it can be formulated using Eqs. (7) and (8):

$$\begin{aligned}&\mathbf {Q} = |\mathbf {B}\cdot {\mathbf {W}_{\mathrm{rand}}} - \mathbf {W(t)}| \end{aligned}$$
(7)
$$\begin{aligned}&\mathbf {W(t+1)} = \mathbf {W_{\mathrm{rand}}} - \mathbf {A}\cdot \mathbf {Q}, \end{aligned}$$
(8)

where \(\mathbf {W}_{\mathrm{rand}}\) is random vector, \(\mathbf {W(t)}\) is position vector of search agent at iteration t, \(\mathbf {A}\) and \(\mathbf {B}\) are position vectors, and \(\mathbf {Q}\) is coefficient vector.

Multilayer perceptron neural network

A multilayer perceptron (MLP) is a neural network that consists of multiple perceptrons as depicted in Fig. 1. MLP consists of an input layer to accept the inputs, an output layer that predicts the input, and hidden layers [55]. The hidden layer is the true computing engine of the MLP which learns more complicated features of data. MLP is generally used for supervised learning problems in which they are trained on a variety of input–output pairs to learn the dependencies (or correlation) between those inputs and outputs. Feedforward neural networks (FFNNs) are a specific form of MLP [25]. In FFNNs, neurons are interlinked in a one-way and one-directional manner. Connections are depicted by weights that are real numbers and fall in the range \([-1, 1].\) Figure 1 illustrates an FFNN with only one hidden layer.

Proposed algorithms

In this paper, weights and hyper-parameters of MLP have been optimized using an improved whale optimization algorithm for enhancing the efficiency of stance detection. The next subsections describe the proposed improved whale optimization algorithm followed by the proposed stance detection method.

Improved whale optimization algorithm

The success of metaheuristic methods depends upon diversification and intensification steps [48, 56, 61, 62]. The metaheuristic methods which maintain the trade-off between diversification and intensification are considered superior. In whale optimization algorithm (WOA), search individual (humpback whales) that guides the search process is randomly selected. Therefore, WOA usually suffers from slow convergence. Moreover, it also sticks at some local solutions due to hastening exploitation. Thus, in this paper, a new variant of WOA named improved whale optimization algorithm (IWOA) has been introduced to boost the convergence speed and to attain better results. The proposed variant is invigorated with the capabilities of WOA, roulette wheel, and tournament selection. The proposed improved whale optimization algorithm is discussed below.

Improved search for prey using hybrid TS–RS selection

In the intensification (exploitation) phase of WOA, humpback whales that control the entire search process are randomly selected. However, due to this random selection of humpback whales, WOA sometimes takes a longer time to find the optimal solution. Therefore, to equalize the exploration and exploitation and boost the convergence speed, the proposed IWOA uses tournament selection and roulette wheel selection [43] in alternate iterations. In tournament selection, search agents with the best fitness are selected from a group and these search agents guide the whole search process. However, the tournament selection-based WOA sometimes traps in a local solution if local search agents are selected as humpback whales. Thus, to maintain the diversity, roulette wheel selection [49] and tournament selection are used alternatingly to select the humpback whales. The proposed IWOA uses the following steps to find the searched individual that guides the whole search process. First, tournament selection is used in odd iteration to find the search agents (humpback whales). Tournament selection employs the following steps to select the searched individual:

  1. 1.

    Select few search individuals randomly from the population (a tournament).

  2. 2.

    Compute fitness of each search individual and sort them according to their fitness.

  3. 3.

    The search individual with the best fitness value (the winner) is selected.

Secondly, roulette wheel selection is used to control the trade-off between exploration and exploitation. To understand mathematically, consider there are n search agents in a population \(P_s= \{s_1, s_2, \ldots , s_n\}\) and fitness value of each search agents \(s_i\) is \(f(s_i)\). First, tournament selection is used to elect the humpback whales. Second, roulette wheel selection is employed in which selection probability (\(P_m(s_i)\)) as well as cumulative probability \((P_l{(s_k)})\) as given in Eqs. (9) and (10) are used to select the search agents:

$$\begin{aligned} P_m{(s_i)}= & {} \frac{f(s_i)}{\sum _{i=1}^{n}f(s_{i})},\quad i= 1, 2, \ldots , n \end{aligned}$$
(9)
$$\begin{aligned} P_l{(s_k)}= & {} \sum _{j=1}^{k}p_{l}(s_{j}), \quad \text {for} \ k= 1, 2, \ldots , n. \end{aligned}$$
(10)

In diversification phase, position of search individual updated according to Eqs. (11) and (12).

$$\begin{aligned}&\mathbf {\mathrm{W}}(t+1) = \mathbf {\mathrm{W}}_{{\mathrm{TS}}}- \mathbf {\mathrm{A}}\cdot \mathbf {\mathrm{Q}}\quad \text {if iteration is odd} \end{aligned}$$
(11)
$$\begin{aligned}&\mathbf {\mathrm{W}}(t+1) = \mathbf {\mathrm{W}}_{{\mathrm{RS}}}- \mathbf {\mathrm{A}}\cdot \mathbf {\mathrm{Q}}\quad \text {if iteration is even} \end{aligned}$$
(12)
$$\begin{aligned}&\mathbf {\mathrm{Q}} = | \mathbf {\mathrm{B}}\cdot \mathbf {\mathrm{W}}_{({\mathrm{RS}}\ {\mathrm{or}}\ {\mathrm{TS}})} - \mathbf {\mathrm{W(t)}}| \end{aligned}$$
(13)

here, \(\mathbf {\mathrm{W}}_{{\mathrm{RS}}}\) and \(\mathbf {\mathrm{W}}_{{\mathrm{TS}}}\) is elected by the roulette wheel and tournament selection, respectively. The improved WOA is depicted in Algorithm 1. Furthermore, the proposed IWOA has been used to optimize the neural network for enhancing the efficiency of stance detection.

Fig. 2
figure 2

Structure of three-layer FNNs

figure a

Proposed stance detection method

Nowadays, metaheuristic methods are also used for training multilayer perceptron neural networks (MLPNN) to improve their performance. Metaheuristic-based neural network differs from traditional multilayer perceptron as the weight and biases are updated by the metaheuristic algorithms instead of gradient descent algorithms. Hence, these methods generally do not suffer from vanishing gradient and exploding gradient problems in large neural networks. Metaheuristic methods are applied to three aspects of neural networks. First, metaheuristic methods are applied to find the combination of weights and biases that minimizes the error. Second, metaheuristic methods find a proper structure of FFNN for a problem. Finally, parameters such as momentum and learning rate are tuned for gradient-based learning. In this work, an improved whale optimization algorithm-based stance detection method (IWOASD) has been introduced for finding the optimal values of weights and biases. The proposed IWOASD is based on the capabilities of IWOA and MLP. The most important factor in the proposed IWOASA is to find the optimal values for weights and biases. To achieve the objective, the input values given to the IWOA should be in the form of vector as given in Eq. (14):

$$\begin{aligned} \mathbf {V}=\{\mathbf {W}, \mathbf {b}\}=\{W_{11}, W_{12}, W_{13}, \ldots , W_{mm}, h, b_1, b_2, b_3, \ldots , b_{j}\}, \end{aligned}$$
(14)

where m is the number of the input nodes, \(W_{ij}\) is the connection weight from the node i to j, and \(b_{j}\) is the bias.

The next step is to define the objective function of the proposed IWOASD algorithm. The objective function for the proposed IWOASD has been discussed below.

Objective function Consider an FFNN depicted in Fig. 2 with three layers (one hidden, one input, and one output) has been given. The output of each hidden in every epoch is computed according to Eq. (15):

$$\begin{aligned} f(p_j) = \frac{1}{\left( 1+\exp \left( -\left( \sum _{i=1}^M {(W_{ij} \cdot x_j - b_j} \right) \right) \right) }, \quad j=1,2, \ldots , h, \end{aligned}$$
(15)

where h is count of hidden layer, \(W_{ij}\) is connecting weight from ith node to jth node, \(x_j\) is jth input, and \(b_j\) is bias.

After computing outcomes of hidden layer, final output \(O^k\) of FFNN is computed using Eq. (16):

$$\begin{aligned} O^k = \sum \limits _{j=1}^h {W_{kj} \cdot f(p_j) - b_k}, \quad k=1,2, \ldots , m; \end{aligned}$$
(16)

here, \(W_{kj}\) is the connection weight from kth output node to jth hidden node.

Finally, mean square error (MSE) is computed using Eq. (17). The value of MSE is the difference between MLP output \(O^k\) and desired output \(d^k\). Furthermore, for making the FFNN more effective, the efficacy of the FFNN is computed by taking the average value of MSE over every training samples using Eq. (18):

$$\begin{aligned} \text {MSE}= & {} \sum _{j=1}^m {(O^k_{j} - d^k_{j})^2} \end{aligned}$$
(17)
$$\begin{aligned} {\overline{\text {MSE}}}= & {} \sum _{k=1}^S {\frac{\sum _{j=1}^m(O^k_{j} - d^k_{j})^2}{S}}, \end{aligned}$$
(18)

where S is the number of training instances, \(d^k_{j}\) is predicted output, and \(O^k_{j}\) is the actual output of jth input when kth training sample is used.

To find the combination of optimal weights and biases, the proposed IWOASD algorithm uses MSE defined in Eq. (18). To design an optimized neural network, the encoding strategy also needs to be defined.

Encoding strategy In the literature, there are three encoding and representing methods, namely vector, matrix, and binary for representing weights and biases in metaheuristic methods. In a vector scheme, every agent is encoded as a vector, while in a matrix scheme, every search individual is encoded as a matrix. On the other hand, in the binary encoding scheme, each search individual is represented as strings of binary bits. The decoding particle’s vectors of biases and weights in the vector scheme is a complicated task. On the contrary, the binary encoding strategy represents particle variables in binary form in which the length of particle grows for the complex neural network structure, while the decoding strategy of the matrix approach is simple and easy to execute. Therefore, in this work, matrix encoding strategy has been used.

The detailed steps of stance detection using an optimized neural network are presented in Algorithm 2.

figure b
Fig. 3
figure 3

Flow graph for optimized neural network

The workflow of the proposed model can be summarized as follows.

  1. 1.

    First, weights of MLPNN are generated randomly using IWOA

  2. 2.

    Second, fitness of all the individuals are computed using the objective function that minimizes the mean squared error as given in Eq. (18)

  3. 3.

    Weights are updated using IWOA

  4. 4.

    Steps 2–3 are repeated till convergence.

The entire steps of the proposed IWOASD algorithm are depicted in Fig. 3. From the figure and above discussions, it has been observed that the proposed IWOASD algorithm optimizes the weight and biases of MLP by minimizing the MSEs. Considering MLP part similar for all the algorithms, the computational complexity of the IWOA part can be defined as \(O(I \times D\times N^2)\), where I is the current iteration, D is the dimension, and N is the total number of individuals (whales) in population. The computational complexity of FFNN depends upon training and an inference phase. The total time taken to train (backpropagation) an FFNN with n nodes will be \(O(n^5)\) and \(O(n^4)\) for forward propagation or inference phase. Thus, it can be perceived that the backpropagation phase is much slower than forward propagation. Due to this issue, nowadays, pre-trained neural networks are generally used.

Evaluating IWOA for bias(es)

Generally, the metaheuristic method shows biased results on a number of benchmark functions. Thus, it is better to get an idea of the inherent bias(es) of metaheuristic methods before testing their performance on benchmark functions. Metaheuristic methods may be central bias, edge bias, or/and axial bias. In central bias, solutions closed to the center are explored, while in edge bias, solutions near the boundary (edges) are explored. On the other hand, axial bias search solutions along the x- and y-axis. Therefore, signature test [19] has also been carried out to inspect the bias(es) of the proposed IWOA and standard WOA. For the same, the minimization problem is given in Eq. (19) is used in which every point in the search region represents an optimal solution and an unbiased metaheuristic method produces solutions similar to random search. The outcome of the signature test for WOA and IWOA is depicted in Fig. 4. From the figure, it can be envisioned that the IWOA explores the entire search space evenly and unbiased, while WOA shows center biased results:

$$\begin{aligned} \text {Minf}(x_1,x_2) = 3; x_1, x_2 \in [-3,3]. \end{aligned}$$
(19)
Fig. 4
figure 4

Signature test analysis of a WOA and b IWOA

Table 1 Description of standard benchmark functions

Experimental results

The performance of the proposed stance detection method is discussed in the following subsections. First, the section “Performance analysis of IWOA” investigates the efficacy of the IWOA over unimodal and multimodal benchmark functions. The section “Performance estimation of stance detection method” discusses the performance analysis of IWOA-optimized neural network on five stance detection benchmark datasets. All the experiments have been performed on Matlab 2017a on a computer with 8 GB of RAM and a 2.66 GHz core i3 processor.

Performance analysis of IWOA

The efficiency of IWOA has been evaluated over 17 benchmark functions including both unimodal (\(F_1\)\(F_{10}\)) and multimodal (\(F_{11}\)\(F_{17}\)) functions [60, 72]. Unimodal functions generally appraise the convergence performance, while multimodal functions investigate the probability of trapping into local solution. The considered unimodal and multimodal benchmark functions are depicted in Table 1. To assess the efficiency of the proposed IWOA, mean fitness values of IWOA and other state-of-the-art metaheuristic algorithms, namely cuckoo search (CS) [40, 80], gray wolf optimization (GWO) [45], whale optimization algorithm (WOA) [44], grasshopper optimization algorithm (GOA) [47], bat algorithm (BA) [81], hybrid cuckoo search (CSK) [61], vector co-evolving particle swarm optimization (VCPSO) [87], global and local neighborhood-based PSO (GLNPSO) [18], scout particle swarm optimization (ScPSO) [39], random spare reinforced whale optimization algorithm (RDWOA) [16], hybrid whale optimization algorithm (HWOA) [1], naive Bayes-based whale optimization algorithm (NB-WOA) [34], and enhanced whale optimization algorithm (EWOA) [49] have been computed. For a fair comparison, the controlling parameters such as population dimension (\(P_{{\mathrm{dims}}}\)) is set to 30 and 50 and max iteration (mitr) is 1000, for all the algorithms. The other parameters are taken from their respective literature and the values of parameters for the proposed algorithm have been decided empirically by testing its performance on different parameter values of standard WOA algorithm.

As nature-inspired algorithms are randomized by their behavior, therefore all the algorithms have been run 30 times and their average values are used for comparison. The average value of fitness of all the algorithm is displayed in Tables 2 and 3. It is observed from the tables that the proposed IWOA demonstrates the best outcomes than the state-of-the-art methods on 85% of the benchmark functions. There are a few functions for which other algorithms show better results than IWOA. EWOA shows comparable performance on \(F_1\), \(F_{13}\), and \(F_{15}\) for only one dimension and on both dimensions of function \(F_3\), while WOA performs better than the proposed IWOA for dimensions 30 and 50 over \(F_7\) and \(F_{14}\) benchmark functions, respectively. RDWOA achieves the best fitness value for the benchmark functions \(F_{5}\) and \(F_{6}\), respectively, for dimension 50, while for the benchmark functions \(F_{10}\) and \(F_{11}\), ScPSO, RDWOA, HWOA, EWOA, and NB-WOA show equivalent performance. Hence, it is evident that the proposed IWOA performs better than the compared methods.

Table 2 Comparative analysis of existing and proposed algorithms for mean fitness over 30 runs on the benchmark functions (\(F_1\)\(F_{9}\)) for 30 and 50 dimensions
Table 3 Comparative analysis of existing and proposed algorithms for mean fitness over 30 runs on the benchmark functions (\(F_{10}{-} F_{17}\)) for 30 and 50 dimensions
Table 4 Mean ranking of all the considered methods

Furthermore, the non-parametric Friedman’s test [89] has also been conducted in Table 4 to statistically validate the efficacy of IWOA. Friedman’s test assesses the efficiency of each method on every benchmark function and ranks them according to their performance [79]. The best performing method gets the rank of 1, the next best obtain rank 2, then \(3,\ldots ,n\). If two methods have the same performance, then the rank is computed by averaging the ranks returned in different runs [70]. The p value returned by the Friedman test is 0.0004518, which is much smaller than the threshold \((\alpha = 0.05)\) which signifies that the obtained results are significantly different. Table 4 tabulates the ranks of all the methods returned by the Friedman test. From Table 4, it is evident that the proposed IWOA has a minimum ranking value among all the considered models. Hence, the experimental and statistical outcomes reflect the high efficiency of IWOA.

Performance estimation of stance detection method

The performance of the proposed IWOASD method has been analyzed on five stance datasets namely, argument reasoning comprehension (ARC) dataset [31], headline and article bodies dataset (FNC-1) [52], claim polarity dataset Research [67], perspectrum dataset [69], and Snopes corpus [30]. The ARC dataset is manually created by Habernal et al. [29] and it consists of 188 debate topics from the user debate section of the New York Times, while the FNC-1 dataset is obtained from Fake News Challenge (FNC-1). Both ARC and FNC-1 datasets are annotated in four categories, namely, unrelated, discuss, agree, and disagree. On the other hand, claim polarity, perspectrum, and snopes corpus are annotated in two classes, viz, support (agree) and contest (disagree). All the datasets consist of pairs of claims and evidence texts along with stances. Stance in data may belong to any one of the categories, i.e., unrelated, discuss, agree, or disagree. The complete description of datasets is given in the following subsections.

FNC-1 dataset

This dataset is motivated by an Emergent dataset that originated from a digital journalism project at Columbia University [26]. The emergent dataset comprises of 2595 news articles and 300 rumored claims and grouped into true, false, and unverified categories by journalists. The FNC-1 dataset [77] expands the Emergent dataset by allocating four labels to each headline–body pair, namely, unrelated, discuss, agree, and disagree.

Table 5 The dataset statistics

ARC dataset

This dataset contains typical disputed subjects namely, schooling challenges, immigration, and international affairs from different news domains. This dataset is equivalent to the FNC-1 dataset with some considerable differences. In the FNC-1 dataset, the news articles are more balanced and complete, while in the ARC dataset, multisentence statement represents users’ perspective of a topic. Table 5 tabulates the complete statistics of FNC-1 and ARC datasets. It can be visualized from the table that both the datasets are skewed. In the literature, it has been shown that the poor models show unsatisfactory results on skewed datasets [13, 57]. Therefore, the efficacy of the proposed IWOA has been evaluated on skewed (imbalanced) datasets to show its performance.

Claim polarity dataset

This dataset contains 2394 evidences and claims for 55 topics Research [67]. Topics in the dataset were randomly selected from the debate motions database [8]. In the dataset, all the claims are manually annotated in two classes, namely, support (agree) and contest (disagree). This dataset comprises 1324 support claims and 1070 contest claims.

Perspectrum dataset

This dataset contains the users’ views and perspective from various debate websites such as debatewise.org, idebate.com, and procon.org [66]. Each claim in the dataset has different views and stances. The dataset consists of 6125 supporting (agree) claims and 5751 opposing (disagree) claims [66, 69]. For a fair comparison, the dataset is divided into three parts, i.e., training data, dev data, and test data. The complete dataset statistics have been tabulated in Table 5.

Snopes dataset

This dataset has been collected from the Snopes platform and consists of 8291 claim [30, 33]. The dataset is annotated into two classes, namely, agree and refutes. An instance in the dataset is annotated as agree (support) if the claim is supported by evidence text; otherwise, it is annotated as refute or disagree. There is a total of 6178 agree instances and 2113 refute instances in the dataset [33].

To assess the efficiency of the proposed IWOASD, the first datasets are partitioned into training and testing parts in which each instance of the dataset contains a pair of claim and evidence texts. Since stop words, fuzzy words do not contain relevant information. Therefore, the stance datasets are preprocessed for eliminating unwanted words such as ‘a’, ‘is’, ’are’, etc. from the datasets. Afterward, vocabulary lists are built for distinct terms occurring in headlines and article texts which are further used for extracting pre-trained word embeddings from the Stanford Glove corpus. The pre-trained word embeddings obtained from training sets are used for training the proposed IWOASD method. Finally, the test dataset is fed to a trained model to examine the efficiency of the proposed IWOA-based MLP.

The efficacy of the proposed IWOASD model is evaluated in respect of mean classification accuracy, mean error, standard deviation, and mean computational time for different numbers of hidden nodes (\(5, 7,9,\ldots ,20\)), and compared with CS, GWO, BA, WOA, and CSK. Mean classification accuracy is computed using a confusion matrix. In the confusion matrix \(C_x\) of size \(m \times m\), \(C_{kk}\) represents the number of samples of class k predicted to same class, i.e., k. In the confusion matrix, diagonal entries represent correctly predicted entries. The mean accuracy, mean error, standard deviation values, and mean computational time of the proposed and state-of-the-art methods have been tabulated in Tables 6, 7, 8, 9, and 10. From the tables, it can be observed that the proposed IWOA with 20 hidden nodes returns the highest accuracy on the FNC-1 dataset and snopes corpus, while the proposed IWOA with 11 hidden nodes attains the highest accuracy over the ARC, claim polarity, and perspectrum datasets. The proposed IWOA also surpasses other methods in terms of mean error, while for the performance measure standard deviation, other methods such as CS, BA, and CSK show better results than the proposed method. Moreover, the proposed IWOA also performs better than the considered methods for performance criterion and computational time.

Table 6 Comparison of mean accuracy, standard deviation, mean error, and execution time over FNC-1 dataset for the different numbers of hidden nodes
Table 7 Comparison of mean accuracy, standard deviation, mean error, and execution time over ARC dataset for the different numbers of hidden nodes
Table 8 Comparison of mean accuracy, standard deviation, mean error, and execution time over claim polarity dataset for the different numbers of hidden nodes
Table 9 Comparison of mean accuracy, standard deviation, mean error, and execution time over perspectrum dataset for the different numbers of hidden nodes
Table 10 Comparison of mean accuracy, standard deviation, mean error, and execution time over snopes annotated corpus for the different numbers of hidden nodes

Additionally, to validate the efficacy of the proposed IWOASD, it is also compared with baseline along with state-of-the-art and recent variants of WOA such as TF-IDF, Doc2Vec + GRU-GRU, Glove + EWOA, Glove + RDWOA, etc., in terms of mean classification accuracy, mean error, standard deviation, and mean computational time. Table 11 depicts the results of all the considered methods. It is observed from the table that the proposed IWOA with Glove word embedding attains 76.53, 74.72, 78.45, 79.63, and 59.02% accuracy over FNC-1, ARC, claim polarity, perspectrum, and snopes corpus, respectively. Besides, the proposed method also outperforms other methods in terms of mean error. However, Doc2Vec + GRU-GRU shows the least variation over all the considered datasets except the FNC-1 dataset on which Tf-idf with GRU-GRU performs the best.

Table 11 Comparison of proposed IWOA-based neural network model with the state-of-the-art models
Table 12 Stance detection results

Furthermore, the confusion matrix is also presented in Table 12 to know the number of correctly predicted stances by the proposed method. From the tables, it can be perceived that the proposed IWOASD shows poor performance if all four stances are considered. For the FNC-1 dataset, only 14 instances of the agreed and 5 instances of disagree stances are correctly predicted. On the contrary, the proposed IWOASD has a better recognition rate on the ARC dataset than the FNC-1 dataset. For the two-class stance datasets such as claim polarity, perspectrum, and snopes corpus, the proposed method shows much better results. The performance of the proposed IWOASD method degrades for FNC-1 and ARC dataset, since these datasets contain instances of the discussed category. The above analysis indicates that the efficiency of stance detection algorithms depends upon the number and type of stance categories in the datasets.

Conclusion and future work

This article proposes a new variant of WOA named IWOA for automated stance detection of fake news. The proposed IWOA uses tournament selection and roulette wheel selection in alternate iterations to manage the trade-off between exploration and exploitation. The proposed IWOA has been validated over 17 benchmark functions. Furthermore, an optimized neural network model based on the strength of improved whale optimization and multilayer perceptron neural network has been presented for the stance detection of fake news. The proposed optimized neural network updates weights and bias of the FFNN using an improved whale optimization algorithm. The performance of the proposed model has been tested on five stance detection datasets and compared with CS, GWO, BA, WOA, and CSK. The proposed IWOA-optimized FFNN model achieves the highest accuracy as compared to the other considered models. Moreover, the proposed model also outperforms other considered models for the performance measures mean error and execution time for more than 80% of the datasets. The proposed IWOA-optimized neural network shows better results; however, improvement is still required. The proposed IWOA employs random choices to find the optimal solution, which means that the computing time and the solution quality are actually random variables. Due to this stochastic nature, its rigorous analysis would be very difficult. Moreover, it cannot be guaranteed that the solution found in different runs will be globally optimal or of high quality. To mitigate the same, IWOA has been executed 30 times and its mean value has been used for comparison. From the results, it has been found that the proposed IWOA shows much better performance than other algorithms on single-objective problems. However, for the multi-objective problems, the proposed IWOA and other algorithms show the same performance. In future work, feature selection methods and different optimization methods can be explored for improving the accuracy. Furthermore, deep learning models such as CNN, LSTM, and BiLSTM could be also investigated. Besides, transfer learning and multitask methodologies could be also considered to exploit knowledge from other related domains.