1 Introduction

Cloud computing has a dynamic and uncertain nature in which a consumer can request services based on their business demand [1]. The uncertainty makes it pivotal for the service provider to proactively manage the risk of possible Service Level Agreement (SLA) violations [2]. Quality of Service (QoS) is the primary indicator to measure the performance of service-oriented applications. QoS illustrates the functional and non-functional attributes of services that are encapsulated within a Service Level Agreement (SLA) formed between a consumer and the provider. Implementation of these QoS parameters, such as security, availability, reusability, and others, ensures adequate service quality and management, resulting in a trusted relationship among stakeholders [3]. The service provider has an SLA breach when it fails to meet the promised target for agreed SLA metrics. The service provider is liable for SLA violation penalties, including service credit, penalty fees, licence extension, and support. The SLA violation influences the reputation and trust of the service provider, which could impact potential new consumers. One possible way to mitigate the risk of SLA violation is the QoS prediction.

Several approaches have tried to predict QoS parameters to avoid SLA violation optimally. Most of these approaches adopted collaborative filtering methods [4,5,6] to find the relationship between services and consumers [7]. Li et al. [4] proposed a time-aware cloud service recommendation algorithm based on a Time-aware Matrix Factorisation (TMF) model for QoS prediction. The approach used a collaborative filtering method in matrix factorisation to predict QoS parameters. The predicted results are then passed to a temporal smoothing method to obtain final-time aware QoS predictions for service recommendations. Hussain et al. [5] used a user-based and item-based collaborative filtering method with an enhanced K-NN algorithm to predict future QoS parameters to avoid SLA violation. Discussed approaches tried to find optimal QoS prediction using user-based or item-based filtering methods. The approaches attempted to make an informed decision for SLA violation; however, they could not accommodate complex QoS predictions. The approaches were unable to prioritise a particular set of QoS parameters over another. Nagarajan and Thirunavukarasu [8] proposed a service context-aware cloud broker method in another approach which pulls service features from cloud services using relevant data and evaluates service similarity using QoS parameters. The matrix factorisation concept addresses the cold start problem and forecasts higher QoS values for arriving customers. Shadabfar et al. [9] proposed a susceptible-exposed-infected-vaccinated-recovered (SEIVR) model to predict the spread of COVID-19. The authors considered multiple variables – transmission, recovery, and mortality. Sioofy Khoojine et al. [10] proposed an autoregressive network model to predict COVID-19 disease in another method. The discussed approaches work well in different problems, including healthcare, cloud and web services. However, in a complex nonlinear service-oriented framework where the QoS parameters are distributed widely across the distributed network, it is very difficult for the decision-maker to prioritise a certain set of data to make a complex prediction.

Machine learning (ML) algorithms are widely used in predictive models that allow a complex nonlinear relationship between responsive variables and predictors. Neural networks are data-driven algorithms that learn patterns from the dataset [11]. The main issue for different supervised learning algorithms is the specific requirements. To get optimal prediction results, the training dataset needs to be pretty good with a low avoidable bias. Furthermore, the training set needs to generalise very well to the development set. Besides that, it has been observed that the computational complexity significantly increases with an increase in a training dataset [12]. Different gradient descent optimisation techniques such as RMSProp, Adamax [13], and Adam are commonly used to address computational complexity problems. Deep autoencoder [14] is used to reduce the dimensionality of the input in a labelling layer [15, 16]. Most machine learning methods have convergence issues towards the global minimum. Moreover, it is challenging to manage high-order datasets of varying time intervals, such as QoS data. Like the traditional prediction methods, ML algorithms do not have any mechanism to assign variable weights to different intervals and reduce data dimensions without losing any information. Furthermore, the computational complexity increases with an increase in data dimensions [12].

Computational complexity defines the number of computational resources required to solve a specific problem by systematic algorithm application [17]. Computational complexity is commonly categorised into time and space and ordered into P, NP, NP-complete and NP-hard [18] problems. Different approaches try to address the issue, such as Scutari, Vitolo and Tucker [19] analysed the time complexity of Bayesian network structure with the greedy search. The study found that considering closed-form estimators for local distribution with few parents can significantly impact the complexity of a network. In another experiment, Alizadeh, Allen and Mistree [20] found that Multivariate Adaptive Regression Splines, Kriging and response surface models are optimal methods to reduce time complexity for large problems. Prediction methods with multiple variables increase their computational complexity much faster than the growth of a dataset [21]. Moreover, it is a big challenge for the decision-maker to prioritise a specific set of data in extensive data for nonlinear prediction without losing any information [12].

Yager [22] introduced the OWA operator in a neural network to overcome the high dimensionality of a dataset. The OWA operator is the parameterised class of the mean type aggregation operator [23]. The approach reorders inputs before feeding them to the network. The process reduces input size significantly, consequently reducing the computational complexity. Building on the same concept, Cheng et al. [12] used the approach in the ANFIS model to handle a large dataset of the TAIEX stock index and predict future indexes. Bo et al. [24] used the same method by combining the IOWA layer with the Fruit Fly algorithm to predict vegetable price prediction. Although discussed, OWA approaches work well for different simple reordering and decision-making processes. However, the approaches cannot handle the complex reordering of input—QoS parameters in complex SLA management. In our recent experimental work [25,26,27], multiple OWA operators are combined with Analytic Hierarchy Process (AHP), Adaptive Neuro-Fuzzy Inference Systems (ANFIS) and different fuzzy clustering methods to accommodate the complexities of prediction data. The experimental results demonstrated high efficiency and better accuracy. The approach in [28] assists CSP selection by combining QoS and QoE.

To address the limitations of above discussed approaches, the paper uses the induced ordered weighted averaging (IOWA) layer in neural network structure. The distinctive features of the paper are as follows:

  • The paper proposes a novel hybrid prediction model using the IOWA operator with multiple neural network methods for optimal QoS prediction.

  • Existing QoS prediction approaches are unable to handle the complex relationships between parameters. The approach prioritises a specific data set from a big dataset for complex prediction.

  • The method has the feature to reduce data size without losing any information to improve the complexity and retain accuracy.

  • Unlike existing approaches, the approach can accommodate the custom requirements of the decision-makers for complex predictions.

To achieve above objectives, the paper combines the IOWA operator with nine neural network methods—Cascade-forward backpropagation (CFBP), Elman backpropagation (EBP), Feedforward backpropagation (FFBP), Generalised regression (GR), Nonlinear autoregressive exogenous (NARX), Layer recurrent neural network (LRNN), Long short-term memory (LSTM), Gated recurrent unit (GRU) and a combination of LSTM-GRU method. The paper analyses the prediction accuracy of the real cloud QoS dataset extracted from the Amazon EC2 US-West IaaS instance. Prediction accuracies are compared using a benchmark of RMSE, MAD and MAPE. The rest of the paper is organised as follows: Sect. 2 discusses related literature and preliminaries. Section 3 discusses the proposed approach. Section 4 demonstrates the evaluation results, and finally, Sect. 5 concludes the paper with future research directions.

2 Preliminaries

This section discusses preliminaries and related studies that highlight the QoS prediction in a service-oriented environment.

2.1 QoS prediction approaches

Several approaches have used various methods to predict QoS parameters. Smahi, Hadjila, Tibermacine and Benamar [29] proposed a Deep AutoEncoder (DAE)-based Matrix Factorization model for predicting the QoS of Web services. Gao et al. [30] applied a memory-augmented autoencoder for IoT time-series data. The model uses a clustering technique for input gathering to mitigate the data sparsity problem and enhance web QoS prediction accuracy. It also considers the influence of services/users' geographical characteristics to achieve accuracy [31]. Boutaba et al. [32] discuss the role of machine learning methods in QoS prediction. The survey highlights network management of traffic prediction, resource management, network security and QoS and QoE management. It also identifies parameters for QoS prediction and QoE factors to control the network-related problems. Rehman et al. [33] proposed a medical QoE (m-QoE) prediction model for ultrasound video streaming. The approach used Multilayer Perceptron Neural Network to extract device features to predict medical applications' QoS. Hussain et al. [34] analysed different soft computing approaches to predict QoS to form a viable SLA. Haytamy and Omara [3] proposed a Deep Learning-based Service Composition framework (DLSC). The approach assists cloud consumers to predict QoS-based services of cloud providers. The framework implements the LSTM deep learning method compounding with a Particle Swarm Optimization (PSO) algorithm. LSTM predicts the possible QoS values and is fed into the PSO, where the best service provider selection is made based on the resources required and minimised cost function of the consumer. Integration of Induced Ordered Weighted Average (IOWA), Weighted Average (WA), and Fuzzy time series are used to provide a novel prediction approach in the neural network framework [26]. The strategy has the advantage to manipulate difficult nonlinear predictions in the neural network architecture. Moreover, the technique also anticipates nonlinear statistical data. Using an ANFIS model, Harandizadeh et al. [35] created a novel hybrid intelligence system, ANFIS-PNN-ICA, that combined an adaptive neuro-fuzzy inference system (ANFIS) with a polynomial neural network (PNN), improved using the ICA algorithm i.e. Imperialism competitive algorithm for forecasting TBM performance. In another approach [27] authors proposed a unique clustered Induced Ordered Weighted Averaging (IOWA) Adaptive Neuro-Fuzzy Inference System (ANFIS), (CI-ANFIS) model. The approach used fuzzy time series prediction model to minimise data dimension and manages the cloud QoS nonlinear correlation. The approach incorporates a fuzzy neural network architecture for optimum forecasting results and an intelligent sorting method to handle prediction uncertainties.

Liu and Chen [36] defined two QoS prediction approaches in dynamic Cloud Manufacturing (CMfg). The personalised clustering approach uses textual and rating information to find the task similarity through a clustering algorithm. There is some probability of inaccuracy in predicting QoS values. However, the amalgamation of both prediction approaches has addressed the issue. Chen et al. [37] proposed a self-adaptive resource allocation framework that allows dynamic allocation of services on request. The process runs in an iterative feedback loop utilising an iterative QoS prediction model and a POS-based runtime decision algorithm. The model makes resource provisioning decisions based on iterations and repetitive feedback. The prediction model predicts QoS values in iterations, and the resultant value is then fed to the decision algorithm to find out the future automatic resource allocation procedures [38]. Liu and Chen [39] introduced a hybrid QoS prediction approach for dynamic cloud manufacturing prediction. The approach used a similarity enhanced collaborative filtering method for better prediction results and then applied a case-based reasoning method to better extract users and service details. The Bayesian function raises the accuracy of the proposed approach and reduces data uncertainty. To represent user-service interactions, Ma et al. [40] introduced a neural network-based framework called GCF (Generic Collaborative Filtering). The approach performs dropout regularisation to reduce the bias caused due to continuous values considered by QoS. It also decreases the high variance due to low-rank assumptions from a wide range of values.

Li et al. [41] formulated a Bayesian network model for cloud service prediction. First, the approach correlates the QoS parameters and hardware details from the infrastructure and platform layers. It then used a Bayesian network algorithm to predict future QoS parameters better. Hussain et al. [21] applied various neural network algorithms and compared them with stochastic methods to analyse the prediction accuracy at different intervals. Xu et al. [42] proposed a Neural Fusion Matrix Factorisation model for QoS predictions. The approach merged neural networks with the matrix factorisation technique to conduct nonlinear collaborative filtering for consumer and service latent selected features. Huang et al. [43] modelled an optimisation-based allocation mechanism in a cloud data centre depending on the user requirements. The approach initially assigns the arriving virtual machines that request from mobile devices. Users are then assigned to suitable physical machines depending on their hardware resource usage and the data centre's throughput status. CPU usage criteria are defined to determine which virtual machines are reassigned before and after allocation. Hussain et al. [25] presented a CQoES architecture for centralised Quality of Experience (QoE) and Quality of Service (QoS). The approach enabled cloud users in locating the best service provider by taking into account their top priorities. It also aided the service provider in intelligent resource management and decision-making for finite resources. The model used a combination of AHP, IOWA, POWA and Collaborative Filtering using KNN methods for evaluation that facilitates cloud stakeholders to establish a long-term, mutually beneficial relationship. Fu et al. [44] proposed a QoS prediction method using an improved nearest neighbour method for cloud service recommendation. The approach used the quantisation method to represent the stable status of services and users and then applied a NearestGraph method to get better prediction results. Keshavarzi et al. [45] proposed an enhanced time-aware QoS prediction method to avoid SLA violations in the cloud. The proposed approach employed a modified k-medoids algorithm to cluster data. The proposed approach addresses the cold start problem by using DTW Barycenter averaging algorithm. Zou et al. [46] proposed a neural network-based technique for temporal-aware service QoS prediction. The approach combined the binarisation facility and the similarity features for better temporal feature representation of users and services.

Deep learning models of Gated Recurrent Units (GRU) learn and extract temporal features across entities. Parameter optimisation is then used to train the DeepTSQP model to forecast undefined service QoS. Gao et al. [30] used the time series data of IoT sensors to predict the deviation in the system's behaviour and possible anomaly detection. The authors [47,48,49] applied different deep learning models such as LSTM, GRU and RNN to cloud QoS data. Alkalbani and Hussain [50] applied multiple machine learning methods such as SVM, KNN, Decision Tree and others to analyse cloud QoS data for optimal service discovery. Chowdhury et al. [51] proposed a QoS prediction model using Hybrid filtering and a Hierarchical prediction process. The hybrid filtering approach seeks to find a group of users and services similar to a target user. The hierarchical prediction process used hierarchical neural regression to forecast the QoS value properly. A comparative analysis of related approaches is presented in Table 1.

Table 1 Comparative analysis of existing approaches

The comparative analysis shows that although the above-discussed approaches forecast QoS parameters to help the decision-maker in the decision-making process. However, many drawbacks include managing nonlinear relationships, manipulating high dimensional datasets, and handling complex nonlinear predictions where different QoS parameters have additional weightage. Moreover, the approaches were unable to control the complex reordering of the decision-making parameters. They did not focus on aspect data reduction without any information loss. The proposed paper presents the IOWA layer in the neural network to address these shortcomings, as discussed in Sect. 3.

2.2 OWA operator and families

The Ordered Weighted Averaging (OWA) operator introduced by Yager [23] is a family of mean-type operators. The OWA operator allows the aggregation realisation between the two extremes of OR and the AND [52, 53]. The operator is defined as follows:

Definition 1

The OWA operator of dimension n is a mapping OWA: Rn → R that has an associated weighting vector \(W = \left( { w_{1} , w_{2} , w_{3} , \ldots \ldots , w_{n} } \right)\) such that wi ϵ [0,1], i = 1,….., n and \(\sum\nolimits_{i = 1}^{n} {w_{i} = 1}\). The operator is presented as:

$$OWA \left( { x_{1} , x_{2} , x_{3} , \ldots \ldots , x_{n} } \right) = \mathop \sum \limits_{i = 1}^{n} w_{i} y_{i}$$
(1)

where \(\left( {y_{1} , y_{2} , y_{3} , \ldots , y_{n} } \right)\) is the reordered set of \(\left( {x_{1} , x_{2} , x_{3} , \ldots , x_{n} } \right)\) from largest to smallest.

Another family of OWA operators is the Induced OWA (IOWA) operator. The IOWA operator [54] introduced by Yager and Filev is an aggregation operator that uses an induced variable to reorder input variables. The IOWA operator is defined as:

Definition 2

The IOWA operator of dimension n is a function IOWA: Rn → R, to which the weighting vector W of dimension n, \(W = \left( { w_{1} , w_{2} , w_{3} , \ldots \ldots ., w_{n} } \right)\) is associated such that wi ϵ [0,1], i = 1,….., n and \(\sum\nolimits_{i = 1}^{n} {w_{i} = 1}\). It is defined to aggregate with the second set of arguments—induced variables \(u_{i}\) such that:

$$IOWA \left( {u_{1} , j_{1} , u_{2} , j_{2} , \ldots \ldots , u_{n} , j_{n} } \right) = \mathop \sum \limits_{a = 1}^{n} w_{a} k_{a}$$
(2)

where \(\left( { k_{1} , k_{2} , k_{3} , \ldots \ldots , k_{n} } \right)\) is the input argument \(\left( { a_{1} , a_{2} , a_{3} , \ldots \ldots , a_{n} } \right)\) reordered based on an ordered inducing variable \(\left( { u_{1} , u_{2} , u_{3} , \ldots \ldots , u_{n} } \right)\).

2.3 Neural network prediction methods

This study analysed nine neural network methods to compare their prediction accuracy with the proposed approach. The methods are discussed as follows:

  • Feedforward backpropagation network: A type of neural network also referred to as multi-layer perceptron that feedforward the values, calculate the error and propagate it back to the previous layer. The network comes with a hidden layer. Signals from the input layer are sent to neurons of the hidden layer in a weighted form which is further processed by the activation function. The output of each neuron is then sent to the output layer. The formulation of the network is presented as follows:

    $$O = f_{o} \left( {w_{b} + \mathop \sum \limits_{i = 1}^{k} w_{i}^{o} f_{h} \left( {w_{i}^{b} + \mathop \sum \limits_{j = 1}^{m} w_{ij}^{h} x_{j} } \right)} \right)$$
    (3)

    where \(f_{o}\), \(f_{h}\) is the activation function in the output layer and hidden layer, respectively, \({w}_{b}\) is the weight from bias to output, \({w}_{i}^{b}\) represents a weight from bias to hidden layer.

  • Cascade-forward backpropagation network: In this type of neural network, there is a connection from the input and every preceding layer to the subsequent layers. The method accommodates a nonlinear relationship between the input and the output. The formulation of the network is presented as follows:

    $$O = \mathop \sum \limits_{i = 0}^{n} f_{o} w_{i}^{i} x_{i} + f_{o} \left( {w_{b} + \mathop \sum \limits_{i = 1}^{k} w_{i}^{o} f_{h} \left( {w_{i}^{b} + \mathop \sum \limits_{j = 1}^{m} w_{ij}^{h} x_{j} } \right)} \right)$$
    (4)

    where \(w_{i}^{i}\) is the weight from the input layer to the output layer, \(f_{o}\), \(f_{h}\) is the activation function in the output layer and hidden layer, \(w_{b}\) is the weight from bias to output and \(w_{i}^{b}\) is the weight from bias to the hidden layer.

  • Elman backpropagation network: This is a feedforward neural network with an extra layer of recurrent connection with tap delay. The network is comprised of four layers. The first and second layers are the input layer and hidden layer. The third layer is the undertake layer that memorises the hidden layer output, and finally, the fourth layer is the output layer. The formulation of the network is presented as follows:

    $$O = TF \left( {w_{Out} \times \left( {f \left( {w_{h} \times x \left( {k - 1} \right)} \right) + w_{In} \times u \left( {k - 1} \right)} \right)} \right)$$
    (5)

    where TF is the transfer function, \(w_{Out}\) is the weight of the hidden layer to the output layer, \(x \left( {k - 1} \right)\) is the output of the undertaking layer, \(w_{h}\) is the weight of the undertaking layer to the hidden layer, \(w_{In}\) is the weight of the input layer to the hidden layer, \(u \left( {k - 1} \right)\) is the input of neural network.

  • Generalised regression neural network: This is a probabilistic neural network with a radial basis layer and a special linear layer. The method does not need the training process. Instead, it approximates the arbitrary function between input and output vectors. The approach is mostly used for function approximation. The generalised regression (GR) neural network comprises four layers: input, pattern, summation, and output. The formulation of the GR neural network is presented as follows [55]:

    $$O = \frac{{Sum_{s} }}{{Sum_{w} }}$$
    (6)
    $$O = \frac{{\sum _{{j = 1}}^{n} \left( {\exp \left( { - \frac{{(in - a_{j} )^{t} (in - a_{j} )}}{{2\sigma ^{2} }}} \right)} \right)}}{{\sum _{{i = 1}}^{n} w_{i} p_{i} }}$$
    (7)

    where \(Sum_{w}\) is the weighted sum of the pattern layer outputs, \(Sum_{s}\) is a simple summation of the pattern layer outputs, σ is a smoothing parameter, in is the input to the network, \(a_{j}\) is the pattern vector for neuron j, w is the connection weight of a particular neuron to related neurons in the summation layer, and O is the network output.

  • NARX: The nonlinear autoregressive exogenous (NARX) is a nonlinear autoregressive method that is widely used for time series prediction. The method is designed as a feedforward time-delay neural network that considers the same series of previous data. The method only takes the output neuron's feedback instead of the hidden neurons. The method can be mathematically represented as follows:

    $$O\left( {t + 1} \right) = f\left[ {o\left( t \right), \cdots , o \left( {t - d_{o} } \right);in\left( t \right), \ldots , in\left( {t - d_{in} } \right)} \right]$$
    (8)
    $$O\left( {t + 1} \right) = f_{o} \left[ {b_{o} + \mathop \sum \limits_{k = 1}^{Nh} w_{ko} f_{k} \left( {b_{h} + \mathop \sum \limits_{j = 0}^{{d_{in} }} w_{jh} in\left( {t - j} \right) + \mathop \sum \limits_{i = 0}^{{d_{o} }} w_{ih} o\left( {t - i} \right)} \right)} \right]$$
    (9)

    where in(n), o(n) are the input and output of the model at time interval t, \(d_{in} \ge 1, d_{o} \ge 1\) is the input and output delay, \(w_{jh} , w_{ko} , w_{ih}\) represents the weights of input, hidden, and output layers, \(b_{h} , b_{o}\) are biases of hidden and output layers.

  • Layer recurrent neural network (LRNN): This is similar to a feedforward neural network excepting for recurrent connection with tap delay associated in each layer to have a finite dynamic response to the input dataset. The method is also widely used in different time series prediction analyses. The formula of the current RNN state is presented as:

    $$h_{t} = \tanh \left( {W_{inh} in_{t} + W_{hh} h_{t} + b_{h} } \right),$$
    (10)
    $$O_{t} = W_{ho} h_{t} + b_{o}$$
    (11)

    where, in,o represents input and output sequence, h represents hidden vector sequence at time interval t, tanh is the activation function used in the hidden layer, W represents weight matrices

  • LSTM: The long short-term memory (LSTM) network is the extended version of the recurrent neural network developed by Hochreiter and Schmidhuber [56]. The standard recurrent neural networks cannot learn when the time lags are more than 5 – 10 distinct time steps between the observed and target data. The LSTM method overcomes the vanishing gradients and exploding gradients problems by introducing memory units or cell states. The typical formulation of a single LSTM cell is presented as follows [57]:

    $$In_{t} = \sigma \left( { W_{In} \times \left( {h_{t - 1} ,x_{t} } \right) + bi_{In} } \right),$$
    (12)
    $$Fg_{t} = \sigma \left( { W_{Fg} \times \left( {h_{t - 1} ,x_{t} } \right) + bi_{Fg} } \right),$$
    (13)
    $$Ou_{t} = \sigma \left( { W_{Ou} \times \left( {h_{t - 1} ,x_{t} } \right) + bi_{Ou} } \right),$$
    (14)
    $$Cell_{t} = Fg_{t} \times Cell_{t - 1} + In_{t} \times \widetilde{{Cell_{t} }},$$
    (15)
    $$\widetilde{{Cell_{t} }} = tnjh \left( { W_{Cell} \times \left( {h_{t - 1} ,x_{t} } \right) + bi_{Cell} } \right),$$
    (16)
    $$h_{t} = Ou_{t} \times tnjh \left( {Cell_{t} } \right),$$
    (17)

    where W represents weight matrix, In represents input gate, Fg represents forget gate, Ou represents output gate, Cell represents memory cell content, \(\tilde{Cell }\) represents new memory cell content, tnjh represents a hyperbolic tangent function, σ represents a sigmoid function, bi represents biases, h represents hidden vector at time interval t, x is the input. The approach is commonly used in various time series prediction problems.

  • GRU: The gated recurrent unit (GRU) is a similar network like LSTM, with a gated unit used to flow the information within the unit. It has fewer parameters than LSTM with no output gate and is more efficient than LSTM for the training process. The general formulation of the GRU network is presented as follows:

    $${\text{Re}}_{t} = sigf\left( { W_{{x{\text{Re}} }} x_{t} + W_{{y{\text{Re}} }} y_{t - 1} + bi_{{\text{Re}}} } \right),$$
    (18)
    $$Up_{t} = sigf\left( { W_{xUp} x_{t} + W_{yUp} y_{t - 1} + bi_{Up} } \right),$$
    (19)
    $$\tilde{y}_{t} = tnjh\left( { W_{xy} x_{t} + W_{yy} \left( {{\text{Re}}_{t} \odot y_{t - 1} } \right) + bi_{y} } \right),$$
    (20)
    $$y_{t} = Up_{t} \odot y_{t - 1} + \left( { 1 - Up_{t} } \right) \odot \tilde{y}_{t} ,$$
    (21)

    Re represents reset gate, Up represents update gate, x,y represents input and output vectors, sigf represents sigmoid activation function, tnjh represents a hyperbolic tangent function, W represents weight matrices, and bi represents biases.

3 Proposed approach

This section introduces an IOWA layer in neural network structure to prioritise certain data for complex prediction [58]. The OWA operator enables aggregate information without losing any details from it. The proposed approach informs the prediction model that a certain set of data is of higher importance than the rest of the data that the existing methods cannot do. The approach uses the IOWA operator, where the weightage is assigned based on the inducing variable. The IOWA operator aggregates not only the numerical values but can accommodate objects as intervals, which enables the decision-maker to prioritise any particular set of data and make it ready for any complex predictions. This paper use cloud QoS data for complex QoS prediction. However, the approach can perform other complex predictions such as stock market, IoT sensor data, web service recommendation prediction and many others. The key feature of the approach is that it reduces the size of a dataset significantly without losing any information. This results in a reduction of computational time and complexity. The IOWA layer in a neural network is defined as follows:

Definition 6

The IOWA operator in neural network structure having an of inputs of k dimensions is a mapping IOWA: Rk → R defined by the associated weights w of dimension k such that wi ϵ [0,1] and \(\sum\nolimits_{i = 1}^{k} {w_{i} = 1}\) the set of inducing variables of order ui, as presented in Fig. 1 and Eqs. 2223.

$$IOWA - NN \left( {u_{1} , x_{1} , u_{2} , x_{2} , \ldots \ldots ., u_{k} , x_{k} } \right) = AF_{i}$$
(22)
Fig. 1
figure 1

IOWA layer in neural network structure

AFi is the activation function which is the sum of the product of wi and bi which is

$$AF_{i} = \mathop \sum \limits_{i = 1}^{k} w_{i} y_{i}$$
(23)

where \(\left\langle {u_{i} , x_{i} } \right\rangle\) is a set of two tuple input, where \(u_{i}\) is inducing variable associated with the input \(x_{i} , y_{i}\) is the reordered input \(x_{i}\) in descending order of the \(u_{i} , w_{i}\) is the associated \(x_{i}\) weight, \(O_{i}\) is the actual output of the output neuron.

The aggregated results then pass to the system, where it is compared with the threshold value \(\theta_{i}\). The information is passed to the next layer neurons if the value is greater than or equal to the \(\theta_{i}\). Otherwise, it drops the information as presented in the below equation:

$$AF_{i} \ge \theta_{i} , O_{i} > 0 \Rightarrow pass \vee AF_{i} < \theta_{i} ,O_{i} = 0 \Rightarrow drop$$
(24)

The paper considers an example where the decision-maker has time-series data and prioritises a certain data set without losing any information to predict the future interval to better understand the approach.

Example

Let assume a decision maker has five set of input arguments with following values—x = (xt = 50, xt-1 = 60, xt-2 = 20, xt-3 = 30, xt-4 = 50). The order of inducing variables for inputs are arranged as—u = (7, 2, 4, 5, 3). The paper considers following weights for each interval—w = (w1 = 0.25, w2 = 0.10, w3 = 0.30, w4 = 0.15, w5 = 0.20). The inputs are rearranged based on inducing variable are as follow x = (xt = 50, xt-3 = 30, xt-2 = 20, xt-4 = 50, xt-1 = 60).

The activation function AF using Eq. 23 is calculated as:

  • AF = [ (w1 × xt), (w2 × xt-3), (w3 × xt-2), (w4 × xt-4), (w5 × xt-1)].

  • AF = [(0.25 × 50), (0.10 × 30), (0.30 × 20), (0.15 × 50), (0.20 × 60)].

  • AF = 12.5 + 3 + 6 + 7.5 + 12.

  • AF = 41.

4 Implementation and evaluation

This section presents the performance and efficiency of the proposed approach and demonstrates the accuracy and improved computational complexity using a case study.

4.1 Case study

To better understand the approach, the paper considers a complex scenario where the decision-maker prioritises certain data sets from large data to make a complex nonlinear prediction. The paper takes an example of cloud services where the decision-maker (service provider or a consumer) has a periodic record of QoS data. The decision-maker wants to prioritise certain data from the rest of the dataset for the custom requirements. The paper assumes that a decision-maker wants to analyse the QoS data of a cloud service for certain hours of a day. The decision-maker categorises the dataset into three working hours – peak hours, should hours and off-peak hours. Peak hours are those working hours when maximum activities of the business perform. In shoulder hours, some of the business activities are performed, while in off-peak hours, there are very few tasks that are performed.

Decision-maker categorises working hours as follows:

  • Peak hours (PH): Let's assume that the decision-maker prioritises and define working hours from 9:00 AM to 5:00 PM as the peak hours. During these hours, the decision-maker executes their main task and rarely compromises on QoS variations.

  • Shoulder hours (SH): Let's assume that the decision-maker takes two time periods for the shoulder period. The first period starts from 5:00:00 PM to 9:00:00 PM, and the second period starts from 5:00:00 AM to 9:00:00 AM.

  • Off-peak hours (OH): Let's assume the decision-maker hours between 9:00:00 PM to 5:00:00 AM as off-peak hours.

The decision-maker is very concerned about the QoS behaviour during peak hours for the next interval, but at the same time, it also wants to consider the QoS data for all previous hours as well. The decision-maker prioritises 24 h as follows – PH < SH < OH. The symbol ‘<’ means precede in terms of priority and weightage.

4.2 Experimental setup and dataset

The paper evaluates the approach in MATLAB R2020a, with a CPU of 1.8 GHz, RAM of 4.00 GB and storage of 1 TB. The Amazon EC2 US-West IaaS instance dataset is extracted from the PRTG monitoring service Paessler (www.paessler.com) for seven days from 20–04-2015 to 26–04-2015. The dataset comprised 5 min measurement of the cloud QoS data. Total of 2,016 records for training to predict the future eight hours are used. The traditional neural network used 2,016 records, and when applied the OWA neural network method, the records were reduced to 672 records for training to predict 96 intervals (peak 8 h) of the next day.

The approach evaluates EBP, FFBP, CCFBP, NARX, LR neural networks and their respective OWA methods with configuration settings. The approach uses a training function of TRAINLM, an adoption learning function of LEARNGDM with two layers. The first layer has 20 neurons, the transfer function for the hidden layer is TANSIG, and for the output layer, PURELIN is used. Training parameters are set as 1000 epochs with a maximum fail of 600. For LSTM and OWA- LSTM, the paper uses two LSTM layers, each with 100 units representing the dimension of the hidden state. The dropout value is set to be 0.5 with SIGMOID as an activation function and ADAM as an optimiser. The model runs through some 50 epochs. The GRU and OWA-GRU models contain three layers with 100 units each and a dropout value of 0.5. The activation function is LINEAR, and the optimiser is SGD. The number of epochs is set to be 50. The LSTM-GRU and respective OWA approaches consist of two LSTM and two GRU layers with a dropout of 0.5 each. The activation function is LINEAR, and the optimiser is ADAM.

4.3 Evaluation

The proposed approach works in two steps – IOWA aggregation and prediction, as presented in Fig. 2.

Fig. 2
figure 2

IOWA layer in prediction methods

IOWA aggregation: Let assume the service provider have an optimistic behaviour therefore, the OWA weights are assigned as w1 = 0.55, w2 = 0.35 and w3 = 0.10. The paper considers the priority of time intervals as an inducing variable to OWA aggregation. The reordered intervals and working hours based on inducing variables are as below:

u1= PH = 9:00:00AM to 5:00:00 PM = 96 data intervals

u2= SH = 5:00:00PM to 9:00:00 PM , 5:00:00PM to 9:00:00 PM= 96 data intervals

u3= OH = 9:00:00PM to 5:00:00 AM = 96 data intervals

Applying Eq. (23), the paper gets the IOWA aggregated result for each day.

Neural Network Prediction: Let's consider nine neural network backpropagation algorithms for the experiment. For each of the approaches, the paper performs two sets of experiments. First, the paper predicts the QoS parameter for future peak hours, that is—27–04-2015, from 9:00:00AM to 5:00:00 PM using the default approach of the neural network method. The paper applies the proposed approach with the respective neural network method to predict future peak hours in the second experiment. The prediction accuracy of both approaches is measured using the following accuracy measurement benchmarks:

  • Root Mean Square Error (RMSE):

    RMSE is one of the most commonly used methods tomeasure prediction accuracy. It presents how far the prediction falls from the actual data using Euclidean distance. RMSE can be calculated using the square root of the mean of the square of all errors, as presented in the below equation.

    $$RMSE = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {O_{i} - P_{i} } \right)^{2} }}{N}}$$
    (25)

    where O represents the observed data and P represents the predicted data.

    • Mean Absolute Error (MAE)

    MAE is another widely used metric to measure prediction accuracy. The method measures the average magnitude of the errors in a set of prediction results irrespective of their directions. MAE can be calculated by taking the average of absolute error, which is the absolute difference between the observed and predicted data where all individual differences have equal weights. MAE is presented in the below equation.

    $$MAE = \frac{1}{n} \times \mathop \sum \limits_{i = 1}^{n} \left| {O_{i} - P_{i} } \right|$$
    (26)
    • Mean Absolute Percentage Error (MAPE):

    MAPE is the average of the absolute percentage error of the predicted result. It gives the error result in terms of a percentage that makes it easier to understand. MAPE can be calculated as the mean absolute percentage error for each observed minus predicted divided by observed values. MAPE is presented in the below equation

    $$MAPE = 100* \frac{1}{N}\mathop \sum \limits_{k = 1}^{N} \left| {\frac{{O_{k} - P_{k} }}{{O_{k} }}} \right|$$
    (27)

Table 2 presents the prediction results of each method for the first six hours. Figure 3 presents the predicted results of approaches for all intervals of peak hours. Figure 4 presents each neural network prediction behaviour with its respective OWA approach. The RMSE, MAE and MAPE of different methods are presented in Table 3 and Fig. 5.

Table 2 Prediction results of approaches for the next six hours
Fig. 3
figure 3

Prediction results of all methods

Fig. 4
figure 4

Neural network with respective OWA method

Table 3 Prediction accuracy comparisons
Fig. 5
figure 5

RMSE, MAE for all prediction methods

The analysis result demonstrates that the approach significantly decreases the data size—by 66%, from 2016 to 672 records. From the prediction accuracy perspective, the proposed approach gives better or equal accuracy in almost all algorithms. There is a significant improvement in the GRU method when the approach includes the OWA layer. The RMSE has improved by 24%, from 17.144 to 12.937. The MAE has decreased from 16.83 to 12.74. The MAPE has decreased from 3 to 2.3%. Figure 6 presents a comparative overview of all OWA methods. The analysis result shows that OWA-LSTM gives the optimal prediction result to all other OWA methods with the RMSE of 12.15, MAE of 11.91 and MAPE of 2.10%.

Fig. 6
figure 6

Comparative analysis of OWA approaches

5 Conclusion

QoS prediction is one of the key factors to measure the quality of offered services. In a cloud environment, the agreed SLA is comprised of multiple offered services with several QoS parameters. The computational complexity of the system increases with the size of a dataset. Due widely spread of huge cloud QoS data, it is challenging to reduce the size of a dataset without losing any information. Existing approaches try to address the problem, but they cannot handle complex nonlinear predictions. The paper used the IOWA layer to predict nonlinear QoS prediction in the prediction method. The approach was tested using nine neural network methods, and their accuracies are compared with RMSE, MAE and MAPE. The experimental results demonstrate a notable data size reduction with better or equal prediction accuracy. The proposed method has significantly reduced the data size by about 66%, from 2,016 to 672 records, without losing any information. The GRU method has a significant improvement when the approach includes the OWA layer. The RMSE has improved by 24%, from 17.144 to 12.937. The MAE has decreased from 16.83 to 12.74. The MAPE has decreased from 3% to 2.3%. The experimental results evidenced that the approach handled complex nonlinear prediction by reducing data size with better or the same accuracies. In future, we will evaluate the approach to an extensive data IoT sensor network to make an informed decision.