1 Introduction and literature review

Error-free forecasting of the quantity of municipal solid waste (MSW) production is the prime requirement for the design as well as operation of an efficient system of waste management. Imprecise forecasting may be the cause of widespread inefficiencies in the waste management infrastructure such as excessive or insufficient disposal infrastructure (collection, landfill, incinerators or recycling units) [5]. The necessity of a correct forecast is even more in the case of a metropolitan city like New Delhi, India, so that a proper measure can be planned in advance. These measures not only include development as well as improvement in existing facilities but also sensitize the people so as to promote reduction, reutilization as well as recycling of the solid waste generated. If the waste generated cannot be managed properly, it may lead to devastating environmental as well as health hazards [9]. Because of the significant impact of waste management on the ecosystem, waste management systems having less universal and zonal environmental impacts need to be developed [17]. There are different methods of predicting solid waste generation which can be broadly classified into five main groups: descriptive statistical models, regression analysis, material flow method, time series analysis and artificial intelligence methods [1].

According to Navarro-Esbrı´ et al. [19], analysis of garbage generation which is dynamic in nature could be accomplished with the help of data in the form of a time series, containing the quantity of garbage being produced. This is because such data are dynamic in nature and provide us with a means to overcome the dearth of data regarding other parameters that may influence waste production [6]. As per Beigl et al. [4], quantity of waste generated cannot be determined directly at an individual level due to the presence of several parallel disposal channels (civic amenity sites for green waste, public kerbside collection, bulky waste, etc.). According to Rimaityte et al. [22], time series method is not dependent on other estimated economic or social parameters and hence has an advantage over the techniques based on regression as the analysis of the economic or the social parameters may also be inaccurate. Another advantage of time series analysis is the capability of forecasting seasonal variations in MSW generation. According to Kannangara et al. [13], artificial intelligence-based models have better prediction abilities than other regression-based models. Hence, in this study time series analysis using different artificial intelligence techniques has been done as these methods have the ability to cope with the historical data which are of nonlinear nature.

The models evaluated in this paper are: artificial neural network; discrete wavelet-based ANN; genetic algorithm optimized ANN; adaptive neuro-fuzzy inference system (ANFIS); discrete wavelet-based ANFIS; and genetic algorithm optimized ANFIS. Jalili et al. [11] have used ANN to forecast the quantity of waste being produced in the city of Mashhad for different seasons with the help of time series data of waste generation. Singh et al. [23] have used ANN for time series forecasting of waste generation for the city of Faridabad. Similarly ANN has been used by Sodanil et al. [26] for forecasting waste generation in the city of Bangkok. However, with a large and noisy data set the results of ANN may suffer. Hence, the data may need to be pre-processed by the wavelet method. Also, ANN may get stuck in a local minimum; hence, a hybrid model with GA has also been used in this study so that global minima can be obtained. Abbasi et al. [1] have used ANFIS to predict waste generation in the city of Logan, Australia, using time series data and have compared the model with SVM (support vector machine) and KNN (K nearest-neighbour) models. Noori et al. [20] have used support vector machine along with principal component analysis to model the waste generation in the city of Mashhad using weekly time series data. Fuzzy logic has been used to forecast solid waste for the Canary archipelago by Charles et al. [9]. However, not much work was found with hybrid ANFIS models. Hence, the performance of hybrid ANFIS models in waste prediction was also evaluated in this study. The time series data used were taken from Kumar [15]. The data used contain the yearly MSW generated in the city of New Delhi, India. The data were used with different models, and the different models were evaluated by calculating the RMSE and the IA values.

2 Artificial intelligence model overview

Artificial intelligence (AI) models have been successively used to forecast time series data in different field of studies such as hydrology [2, 18, 19, 27, 28], supply chain demand forecasting [10, 24], electricity price and load forecasting [16, 21], flood forecasting [14]. In the upcoming section, a brief description of the models that have been used in this study has been given. Time series data of waste generated in New Delhi from the period of 1993–2011 were used in this study, and it was assumed that the time series data contain all the patterns associated with waste generation.

2.1 Artificial neural network (ANN)

ANN derives its roots from the biological nervous system that consists of interconnected operational units referred to as neurons [3, 8]. ANN is frequently used due to its capability to inspect patterns in data sets [25]. A neural network learns on the basis of input provided and the output generated. The primary layer of an ANN consists of input neurons which transfer information on to the second layer (hidden layer), which then transfers the processed information to the output neurons of the third layer. The number of neurons in the hidden layer is selected by trial and error method.

ANN is trained using a gradient descent back-propagation algorithm. Any modification in input data is processed using connecting weights; it generates sum of modified value of “x”. It is then further modified by sigmoidal transfer function “f”. In the training phase, predicted output (Opi) and desired output (Dpi) are compared which forms the basis for mean square error (Ep). The last step includes back-propagation from output to input with further modification in weights, till the error minimizes and reaches its limiting value.

$$f(x) = \frac{1}{1 + \exp (x)}$$
(1)
$$E_{p} = \sum\limits_{i = 1}^{n} {(Dpi - Opi)^{2} }$$
(2)

2.2 Adaptive neuro-fuzzy inference system

For this study, we have used the Sugeno-type fuzzy inference system. In this type of FIS (fuzzy inference system), the output membership function is either constant or linear. ANFIS incorporates the techniques followed in both ANN and fuzzy inference system [12]. For a FIS with two inputs a and b and one output o, rule base may look like

  • Rule 1 If a is U1 and b is V1, F1 = s1a + r1b + t1

  • Rule 2 If a is U2 and b is V2, F2 = s2a + r2b + t2

These rules represent the fuzzy inference rules. Here U1, U2, V1 and V2 are membership functions, and a and b are supplied parameters (inputs) to the system, with F1 and F2 being outputs acquired from the system.

The nodes present in the different layers perform the following functions:

Layer 1 nodes are the input nodes. Each node here establishes the degree of membership of the inputs on the basis of the suitable fuzzy set or membership function. The output from the node can be defined as:

$$O_{i}^{1} = \mu_{{U_{i} }} (a)$$
(3)

Layer 2 has the rule nodes. These nodes output the firing strength of the rule by performing cross multiplication of all the incoming signals:

$$O_{i}^{2} = \mu_{{U_{i} }} \left( a \right) \times \mu_{{V_{i} }} \left( b \right),\;\;i = 1,{\kern 1pt} 2$$
(4)

Layer 3 has the averaging nodes. The nodes of this layer output the normalized firing strength of each rule.

$$O_{i}^{3} = {\overline{W}_{i}} = \frac{{W_{i} }}{{W_{1} + W_{2} }},\;\; \, i = 1,2$$
(5)

Layer 4 contains the consequent nodes. Nodes of this layer perform the computation of the contribution of the ith rule towards the output of the model.

$$O_{i}^{4} = {\overline{W}_{i}} f = {\overline{W}_{i}} \left( {s_{i} + r_{i} + t_{i} } \right),\;\;i = 1,2$$
(6)

\({\overline{W}}_{i}\) is the result of the third layer and si, ri and ti are the consequent parameters.

Layer 5 is constituted by the output node that performs the summation of all the input signals and computes the overall output,

$$O_{i}^{5} = \frac{{\sum w_{i} f_{i} }}{{\sum w_{i} }}$$
(7)

2.3 Discrete wavelet theory

In wavelet analysis, an input signal is disintegrated into its constituent parts. In Fourier analysis, when a signal is transformed from time domain to frequency-based sinusoidal constituents, the time information of breakdown points, local trends, self-similarity and discontinuities gets lost [26]. In wavelet transform, the signal is decomposed into “wavelets” that are variations in the “mother wavelet”

$$\mathop \smallint \limits_{ - \infty }^{\infty } \varPsi \left( t \right)dt = 0$$
(8)
$$\varPsi_{\alpha ,\beta } = \frac{1}{\sqrt \alpha }\varPsi \left( {\frac{t - \beta }{\alpha }} \right)$$
(9)

here α is called scaling parameter, α > 0 and β is known as translation parameter, and t is finite. Ψ is the mother wavelet. The translation and scaling of the mother wavelet allow localization in time and frequency domain, respectively. In DWT, scaling and the translation parameters are dyadic (powers of two) in nature. For DWT, scale α = 2m and translation β = 2mn. In DWT, a signal is passed from two filters and one detail component and one approximation component is obtained. The low-frequency, high-scale components are known as approximations, while the high-frequency, low-scale components of the signal are known as details. Usually the most essential constituent of the signal is approximation as it contains the background information of the signal [2]. This breakdown is continued to reach the desired resolution level.

$${\text{Signal}} = {\text{approx}}_{n} \left( t \right) + \mathop \sum \limits_{i = 1}^{n} {\text{detail}}_{i} \left( t \right)$$
(10)

2.4 Genetic algorithm (GA)

GA is categorized under evolutionary algorithms, which is taken from the Darwin’s theory of evolution. Initially genetic algorithm contains a set of solutions (represented by chromosomes) called population [7]. Solutions from a population are selected which is processed to form a new population. Schematic generation of new population is motivated by a hope that the new population will perform better than the old one. Solutions selected to form new solutions (offspring) are selected by the measure of their fitness—the more suitable they are, the more capable they will be in passing their attributes to the next generation. It is repeated until a pre-defined condition (for example, number of populations or improvement in the best solution) is satisfied.

2.5 Data set

Data in this study were collected from two resources, namely primary and secondary. Secondary data were collected from governmental, semi-governmental and private publications, whereas primary data were recorded from the field survey. Data required and published work related to municipal solid waste management were collected from Municipal Corporation of Delhi (MCD), Central Pollution Control Board (CPCB), New Delhi Municipal Council (NDMC), Delhi Cantonment Board (DCB) and various departments of Government of National Capital Territory of Delhi (GNCTD). These data consist of yearly waste generation from the period of 1993–2011. Data to be fed into the system were formed by applying adjacent windowing technique on the time series data mentioned here, and intervals of windows are taken to be 2 and 4.

2.6 Description of models used

2.6.1 Pure ANN

For the analysis done in this paper, we have used a three-layered neural network containing an input layer, a hidden layer and an output layer. Different combinations with different number of delays (input neurons) and different number of hidden nodes were used to form a number of models.

  • No. of input neurons = 2, 4

  • No. of hidden layer neurons = 5, 10

The network was trained with Levenberg–Marquardt back-propagation optimization method. Structure of the network is shown in Fig. 1. The activation function used in the hidden layer is tan-sigmoid, while the one used in the output layer is pure and linear. The model has been created to forecast the next one value, so the number of output nodes is 1. The accuracy of modelling of the different models in terms of RMSE and IA is given in Tables 1 and 2, respectively (Figs. 2, 3, 4).

Fig. 1
figure 1

A schematic diagram of a two-layered network

Table 1 RMSE values for ANN models
Table 2 Index of agreement (IA) values
Fig. 2
figure 2

Architecture used in ANFIS

Fig. 3
figure 3

Life cycle of genetic algorithmic program

Fig. 4
figure 4

Solid waste production of New Delhi

2.6.2 DWT-ANN

In this hybrid model, discrete wavelet transformation is used to pre-process the input data to improve the forecast efficiency. Breakdown of input signal into simpler constituents is done with the help of DWT decomposition; this makes it easier for ANN to analyse the data. Schematic for this model is given in Fig. 5. First the initial time series signal is broken down into its constituent high- and low-frequency components (detail and approximation) up to the required resolution. In the subsequent step, the sub-series are fed into the ANN and the output of each sub-series is obtained. In the third step, the outputs obtained by passing each sub-series through the ANN are added to obtain the final output. In this study, Daubechies order 1(db1) mother wavelets are used to decompose input signal up to level 3.

Fig. 5
figure 5

Schematic of the DWT-based model

2.6.3 GA-ANN

We have used genetic algorithm techniques to determine the optimal biases as well as the weights of the ANN, instead of using the back-propagation optimization technique. Similar to the approach used in pure ANN, different combinations of number of hidden nodes and number of input nodes were used to create the ANN models, which were then optimized using GA. The steps followed are:

  1. 1.

    The architecture of the ANN was created by fixing the number of delays, hidden and output nodes.

  2. 2.

    A population of 200 solutions was randomly generated.

  3. 3.

    Each of the solution sets was used to obtain the biases and weights of the ANN, and output was obtained using each solution.

  4. 4.

    The fittest individuals of the current generation were used to create the next generation by crossover method.

  5. 5.

    Some of the individuals were mutated to obtain the next generation.

  6. 6.

    Some migrations were also allowed between generations to increase diversity of the solutions.

  7. 7.

    Steps (3) to (6) were repeated till the solutions stopped improving.

  8. 8.

    The best solution was used to obtain the final output from the ANN.

2.6.4 Pure ANFIS

  • ANFIS models with the following parameters were created:

  • Number of inputs: 4

  • Number of input membership function: 2

  • The models made use of hybrid algorithm to use as optimization method

  • Input membership function types for grid partitioning FIS generation used were:

  • triangle-shaped (trimf)

  • trapezoid-shaped (trapmf)

  • Gaussian curve (gaussmf)

2.6.5 DWT-ANFIS

On the basis of results evaluated by ANFIS alone, it can be said that it fails to deal with dynamic time series data directly; therefore, pre-processing of data is required before giving to ANFIS. Here DWT plays an important role of improving the performance of this model. Initially data are decomposed by using Daubechies wavelet (DB1) at three levels. Then a check is performed for every signal for randomness. Signal containing random points that scale high in number is termed to be noisy signal which is unpredictable. This noisy signal is hence removed from analysis, because it creates inaccurate results.

2.6.6 GA-ANFIS

Genetic algorithm was implemented to optimize weights and biases of ANFIS. Similar to the approach used in pure ANFIS, a structure was created, which was then optimized using GA. The steps followed were:

  1. 1.

    The architecture of the ANFIS was created by fixing the number of input membership function example 2.

  2. 2.

    A population of 200 solutions was randomly generated.

  3. 3.

    Each of the solution sets was used to evaluate the weights and biases of the ANFIS, and output was obtained using each solution.

  4. 4.

    The fittest individuals of the current generation were used to create the next generation by crossover method.

  5. 5.

    Some of the individuals were mutated to obtain the next generation.

  6. 6.

    Some migrations were also allowed between generations to increase diversity of the solutions.

  7. 7.

    Steps (3) to (6) were repeated till the solutions stopped improving.

The best solution from the pool of solutions was then used to obtain the final output from the ANFIS model.

3 Results and discussions

The six models developed were compared by determining the RMSE, R square and IA values of the forecasted data. The results of the different ANN-based models are given in Tables 1, 2 and 3. On the basis of the values obtained, it can be observed that when the number of input delays is less, the performance of pure ANN and GA-ANN is comparable. The performance of DWT-ANN model is poorer than the others. However, when the number of input delays is increased it can be observed that by using genetic algorithm to determine the optimal weights and biases of the neural network, a lower RMSE and a higher IA and R square values can be obtained. The results of wavelet-based hybrid ANN models were also better than pure ANN models. The RMSE, R square and IA values for different ANFIS-based models are given in Tables 4, 5 and 6. In the case of ANFIS-based models also it has been observed that the performance is improved when the data are pre-processed using wavelet-based signal processing technique. The best results of different ANN- and ANFIS-based hybrid models are given in Table 7. Table 7 shows that GA-ANN-based hybrid model has the lowest RMSE value, highest IA value and R square value. It is further observed that the overall performances of all the hybrid models in general are better than that of pure models. The authors in this paper have validated the usability of different AI models in time series prediction of waste generation. There are different methods to forecast waste generation such as regression analysis and material flow analysis; however, due to the dearth of the availability of such complex data, time series method becomes preferable. Though work has been done previously on the AI-based time series forecasting of waste generation, the hybrid ANFIS algorithms and the GA optimized ANN have been used for the first time.

Table 3 Coefficient of determination (R2) values
Table 4 RMSE values for ANFIS
Table 5 Index of agreement (IA) for ANFIS
Table 6 Coefficient of determination (R2) for ANFIS
Table 7 Comparison of all models

4 Conclusion

A fundamental step in designing of an effective garbage management system is to predict the amount of waste that a city will generate. Determination of the volume of waste beforehand will result in the proper planning of landfill sites, recycling units, development as well as operation of garbage collection infrastructure. In a metropolitan city like New Delhi, it is of extreme importance to plan in advance, so that the handling of the waste generated can be properly managed and environmental and health hazards can be prevented. As this study presents, artificial intelligence-based techniques that can be successfully employed for forecasting the volume of the waste. In this paper, six models ANN ANFIS GA-ANN GA-ANFIS DWT-ANN and DWT-ANFIS were successfully evaluated to determine their effectiveness in MSW forecasting. If ranked in accordance with the performance of models, GA-ANN is found to be the most accurate one.