Artificial intelligence (AI) models have been successively used to forecast time series data in different field of studies such as hydrology [2, 18, 19, 27, 28], supply chain demand forecasting [10, 24], electricity price and load forecasting [16, 21], flood forecasting [14]. In the upcoming section, a brief description of the models that have been used in this study has been given. Time series data of waste generated in New Delhi from the period of 1993–2011 were used in this study, and it was assumed that the time series data contain all the patterns associated with waste generation.
Artificial neural network (ANN)
ANN derives its roots from the biological nervous system that consists of interconnected operational units referred to as neurons [3, 8]. ANN is frequently used due to its capability to inspect patterns in data sets [25]. A neural network learns on the basis of input provided and the output generated. The primary layer of an ANN consists of input neurons which transfer information on to the second layer (hidden layer), which then transfers the processed information to the output neurons of the third layer. The number of neurons in the hidden layer is selected by trial and error method.
ANN is trained using a gradient descent back-propagation algorithm. Any modification in input data is processed using connecting weights; it generates sum of modified value of “x”. It is then further modified by sigmoidal transfer function “f”. In the training phase, predicted output (Opi) and desired output (Dpi) are compared which forms the basis for mean square error (Ep). The last step includes back-propagation from output to input with further modification in weights, till the error minimizes and reaches its limiting value.
$$f(x) = \frac{1}{1 + \exp (x)}$$
(1)
$$E_{p} = \sum\limits_{i = 1}^{n} {(Dpi - Opi)^{2} }$$
(2)
Adaptive neuro-fuzzy inference system
For this study, we have used the Sugeno-type fuzzy inference system. In this type of FIS (fuzzy inference system), the output membership function is either constant or linear. ANFIS incorporates the techniques followed in both ANN and fuzzy inference system [12]. For a FIS with two inputs a and b and one output o, rule base may look like
-
Rule 1 If a is U1 and b is V1, F1 = s1a + r1b + t1
-
Rule 2 If a is U2 and b is V2, F2 = s2a + r2b + t2
These rules represent the fuzzy inference rules. Here U1, U2, V1 and V2 are membership functions, and a and b are supplied parameters (inputs) to the system, with F1 and F2 being outputs acquired from the system.
The nodes present in the different layers perform the following functions:
Layer 1 nodes are the input nodes. Each node here establishes the degree of membership of the inputs on the basis of the suitable fuzzy set or membership function. The output from the node can be defined as:
$$O_{i}^{1} = \mu_{{U_{i} }} (a)$$
(3)
Layer 2 has the rule nodes. These nodes output the firing strength of the rule by performing cross multiplication of all the incoming signals:
$$O_{i}^{2} = \mu_{{U_{i} }} \left( a \right) \times \mu_{{V_{i} }} \left( b \right),\;\;i = 1,{\kern 1pt} 2$$
(4)
Layer 3 has the averaging nodes. The nodes of this layer output the normalized firing strength of each rule.
$$O_{i}^{3} = {\overline{W}_{i}} = \frac{{W_{i} }}{{W_{1} + W_{2} }},\;\; \, i = 1,2$$
(5)
Layer 4 contains the consequent nodes. Nodes of this layer perform the computation of the contribution of the ith rule towards the output of the model.
$$O_{i}^{4} = {\overline{W}_{i}} f = {\overline{W}_{i}} \left( {s_{i} + r_{i} + t_{i} } \right),\;\;i = 1,2$$
(6)
\({\overline{W}}_{i}\) is the result of the third layer and si, ri and ti are the consequent parameters.
Layer 5 is constituted by the output node that performs the summation of all the input signals and computes the overall output,
$$O_{i}^{5} = \frac{{\sum w_{i} f_{i} }}{{\sum w_{i} }}$$
(7)
Discrete wavelet theory
In wavelet analysis, an input signal is disintegrated into its constituent parts. In Fourier analysis, when a signal is transformed from time domain to frequency-based sinusoidal constituents, the time information of breakdown points, local trends, self-similarity and discontinuities gets lost [26]. In wavelet transform, the signal is decomposed into “wavelets” that are variations in the “mother wavelet”
$$\mathop \smallint \limits_{ - \infty }^{\infty } \varPsi \left( t \right)dt = 0$$
(8)
$$\varPsi_{\alpha ,\beta } = \frac{1}{\sqrt \alpha }\varPsi \left( {\frac{t - \beta }{\alpha }} \right)$$
(9)
here α is called scaling parameter, α > 0 and β is known as translation parameter, and t is finite. Ψ is the mother wavelet. The translation and scaling of the mother wavelet allow localization in time and frequency domain, respectively. In DWT, scaling and the translation parameters are dyadic (powers of two) in nature. For DWT, scale α = 2m and translation β = 2mn. In DWT, a signal is passed from two filters and one detail component and one approximation component is obtained. The low-frequency, high-scale components are known as approximations, while the high-frequency, low-scale components of the signal are known as details. Usually the most essential constituent of the signal is approximation as it contains the background information of the signal [2]. This breakdown is continued to reach the desired resolution level.
$${\text{Signal}} = {\text{approx}}_{n} \left( t \right) + \mathop \sum \limits_{i = 1}^{n} {\text{detail}}_{i} \left( t \right)$$
(10)
Genetic algorithm (GA)
GA is categorized under evolutionary algorithms, which is taken from the Darwin’s theory of evolution. Initially genetic algorithm contains a set of solutions (represented by chromosomes) called population [7]. Solutions from a population are selected which is processed to form a new population. Schematic generation of new population is motivated by a hope that the new population will perform better than the old one. Solutions selected to form new solutions (offspring) are selected by the measure of their fitness—the more suitable they are, the more capable they will be in passing their attributes to the next generation. It is repeated until a pre-defined condition (for example, number of populations or improvement in the best solution) is satisfied.
Data set
Data in this study were collected from two resources, namely primary and secondary. Secondary data were collected from governmental, semi-governmental and private publications, whereas primary data were recorded from the field survey. Data required and published work related to municipal solid waste management were collected from Municipal Corporation of Delhi (MCD), Central Pollution Control Board (CPCB), New Delhi Municipal Council (NDMC), Delhi Cantonment Board (DCB) and various departments of Government of National Capital Territory of Delhi (GNCTD). These data consist of yearly waste generation from the period of 1993–2011. Data to be fed into the system were formed by applying adjacent windowing technique on the time series data mentioned here, and intervals of windows are taken to be 2 and 4.
Description of models used
Pure ANN
For the analysis done in this paper, we have used a three-layered neural network containing an input layer, a hidden layer and an output layer. Different combinations with different number of delays (input neurons) and different number of hidden nodes were used to form a number of models.
-
No. of input neurons = 2, 4
-
No. of hidden layer neurons = 5, 10
The network was trained with Levenberg–Marquardt back-propagation optimization method. Structure of the network is shown in Fig. 1. The activation function used in the hidden layer is tan-sigmoid, while the one used in the output layer is pure and linear. The model has been created to forecast the next one value, so the number of output nodes is 1. The accuracy of modelling of the different models in terms of RMSE and IA is given in Tables 1 and 2, respectively (Figs. 2, 3, 4).
Table 1 RMSE values for ANN models Table 2 Index of agreement (IA) values DWT-ANN
In this hybrid model, discrete wavelet transformation is used to pre-process the input data to improve the forecast efficiency. Breakdown of input signal into simpler constituents is done with the help of DWT decomposition; this makes it easier for ANN to analyse the data. Schematic for this model is given in Fig. 5. First the initial time series signal is broken down into its constituent high- and low-frequency components (detail and approximation) up to the required resolution. In the subsequent step, the sub-series are fed into the ANN and the output of each sub-series is obtained. In the third step, the outputs obtained by passing each sub-series through the ANN are added to obtain the final output. In this study, Daubechies order 1(db1) mother wavelets are used to decompose input signal up to level 3.
GA-ANN
We have used genetic algorithm techniques to determine the optimal biases as well as the weights of the ANN, instead of using the back-propagation optimization technique. Similar to the approach used in pure ANN, different combinations of number of hidden nodes and number of input nodes were used to create the ANN models, which were then optimized using GA. The steps followed are:
-
1.
The architecture of the ANN was created by fixing the number of delays, hidden and output nodes.
-
2.
A population of 200 solutions was randomly generated.
-
3.
Each of the solution sets was used to obtain the biases and weights of the ANN, and output was obtained using each solution.
-
4.
The fittest individuals of the current generation were used to create the next generation by crossover method.
-
5.
Some of the individuals were mutated to obtain the next generation.
-
6.
Some migrations were also allowed between generations to increase diversity of the solutions.
-
7.
Steps (3) to (6) were repeated till the solutions stopped improving.
-
8.
The best solution was used to obtain the final output from the ANN.
Pure ANFIS
-
ANFIS models with the following parameters were created:
-
Number of inputs: 4
-
Number of input membership function: 2
-
The models made use of hybrid algorithm to use as optimization method
-
Input membership function types for grid partitioning FIS generation used were:
DWT-ANFIS
On the basis of results evaluated by ANFIS alone, it can be said that it fails to deal with dynamic time series data directly; therefore, pre-processing of data is required before giving to ANFIS. Here DWT plays an important role of improving the performance of this model. Initially data are decomposed by using Daubechies wavelet (DB1) at three levels. Then a check is performed for every signal for randomness. Signal containing random points that scale high in number is termed to be noisy signal which is unpredictable. This noisy signal is hence removed from analysis, because it creates inaccurate results.
GA-ANFIS
Genetic algorithm was implemented to optimize weights and biases of ANFIS. Similar to the approach used in pure ANFIS, a structure was created, which was then optimized using GA. The steps followed were:
-
1.
The architecture of the ANFIS was created by fixing the number of input membership function example 2.
-
2.
A population of 200 solutions was randomly generated.
-
3.
Each of the solution sets was used to evaluate the weights and biases of the ANFIS, and output was obtained using each solution.
-
4.
The fittest individuals of the current generation were used to create the next generation by crossover method.
-
5.
Some of the individuals were mutated to obtain the next generation.
-
6.
Some migrations were also allowed between generations to increase diversity of the solutions.
-
7.
Steps (3) to (6) were repeated till the solutions stopped improving.
The best solution from the pool of solutions was then used to obtain the final output from the ANFIS model.