1 Introduction

The beneficiation of waste for power generation has gained a significant momentum across the globe. Every year, billions of tons of wastes are generated around the world. By 2050, the global community would have generated 3.40 billion tons of garbage annually, reflecting a drastic increase from the current value of 2.01 billion tons [1]. In a United Nations (UN) report, 17% of the global annual food production, estimated at 1.0 billion tons, has been wasted as of 2019 [2]. The breakdown shows that 61%, 26%, and 13% of food waste were from households, food services, and retail, respectively [2]. To add further context, the waste generated in the USA only, which is disposed of yearly, can be valued at $408 billion [3]. In terms of greenhouse gas emissions, carbon footprint associated with global food wastage has been estimated at 3.3 billion tons of CO2 [4]. With the growth in population and food processing industries, the amount of waste is likely to increase. It has also been suggested that waste will increase with industrialization and urbanization [5].

Millions of tons of food, fruit, and vegetable (FFV) wastes have been channelled to landfills accounting for nearly 50% of the fruits and vegetables produced globally on yearly basis [6]. FFV waste represents not only product waste along the full value chain but also other associated resources including the land, water, labor, and energy used in the production of such food products. It significantly contributes to climate change since greenhouse gases are emitted during food production and distribution. Landfill or incineration approaches to FFV disposal are not advisable because of fast biodegradability of FFV in the presence of contaminating microbes. Moreover, the methane produced in the landfill may not serve any useful purpose except it is captured and utilized as a clean energy source rather than allowing it to leak into the air or dispersed as waste leading to severe environmental pollution. Food, fruit, and vegetable (FFV) wastes are rich in organic substances making it a good candidate for anaerobic digestion. While the overarching target is to mitigate waste or at least reduce waste generation, its beneficiation for value-added products could be the sustainable solution. When FFV is beneficiated, our climate benefits in two ways: landfill emission is prevented, while the fossil fuel that may have been used for energy generation is reduced or ruled out. FFV as a low-cost waste could be beneficially deployed in anaerobic digestion (AD) for energy application and value-added co-products toward better enhancement of circular economy. It has been traditionally deployed as an effective, sustainable, and environmentally friendly technology for transforming liquid or organic waste into biogas and other value-added products such as organic fertilizer. The biogas produced in anaerobic digestion process can be directly converted to electricity and heat.

AD is a complex process involving microbial consortia with numerous metabolic processes and kinetic reactions leading to CO2, N2, and CH4 production [7]. Other gases such as H2S and NH3 are also produced in traces depending on feedstock characteristics and operating conditions. The generation of biogas from FFV proceeds in four stages which include hydrolysis, acidogenesis, acetogenesis, and methanogenesis [8, 9]. At the hydrolysis stage, the complex organic compounds such as lipids, protein, and polysaccharides are converted into soluble monomers or oligomers. Further, during acidogenesis, the sugars, fatty acids, and amino acids generated during hydrolysis are deployed to produce organic acids such as acetic, fatty acids, propionic, butyric, hydrogen, and CO2 via the activities of fermentative anaerobic bacteria. The action of bacteria makes acidogenesis the fastest process in anaerobic digestion. Alcohol and volatile fatty acids are anaerobically oxidized into acetate, H2S, and CO2 during acetogenesis. This process takes place in the presence of hydrogen-producing acetogenic bacteria. At the end of this stage, the methanogens (acetotrophic and hydrogenotrophic) covert acetate, H2, and CO2 into a mixture of CH4 and CO2. The acetotrophics produce around 70% of methane, while hydrogenotrophic produces more energy that acetate pathway since it is not limited by rate. It has been reported that methane-producing methanogens are very sensitive to environmental changes, but hydrogenotrophic methanogens can provide more resistance to the environmental changes [10, 11]. Some factors affect the AD process; these include pH, temperature, C:N ratio, organic loading rate (OLR), hydraulic retention time (HRT), and nutrients [12, 13]. The microbes are sensitive to pH since each group survive at different ranges. If the partial pressure of hydrogen increases, the methanogenesis phase of the anaerobic process may fail because of the accumulation of volatile fatty acids and reduction in pH [13]. Microbes which act during AD are equally sensitive to temperature [14, 15]. For instance, under mesophilic temperature, the activity and growth rate of bacteria decrease by 50% for each 10 °C drop in temperature. If the temperature is increase up to 37 °C, the time required for digestion process is reduced, up until a point where further increase in temperature leads to reduction in biogas yield. The biogas production reduces when temperatures decreases to 20 °C and production stops entirely at 10 °C [14, 16]. Nutrients are added to a digester to support the process through the provision of necessary bacteria is required for biodegradation. Also, it is required that the proper composition of feed be maintained so as to keep the C:N ratio at the appropriate level, because low C:N ratio may result in ammonium inhibition especially for nitrogen-rich substrates [17]. Apart from temperature, total solid (TS) and organic loading rate (OLR) are critical to a stable operation of AD process [16]. The loading rate assists in the determination of the amount of FFV feedstock required to be added in a digester on daily basis subject to the size of the digester. Hydraulic retention time measures the time the solids or slurry spend in the digester during the AD process. Depending on the types of substrate and the climatic condition, HRT can go as high as 100 days [18, 19]. Apart from the aforementioned operating parameter, several kinetic and stoichiometric parameters are associated with microbial growth and chemical reaction [20, 21]. Moreover, anaerobic digestion is more vulnerable to process instability due to substantial dissimilarity in feedstock composition and unpredictability of microbial activities [22]. This further complicates the AD process.

Although the FFV are readily available for AD, their degradation process could be very complex because of their characteristics. As a result of the complexity of this process, mathematical prediction is highly challenging, though AD is theoretically well understood [23]. Moreover, the conventional analytical techniques are time consuming, highly expensive, and demanding in term of the equipment. Therefore, there is a need for modelling approaches which can provide dynamic information regarding the AD process condition. Artificial intelligent models are data-driven techniques which consider the physical processes or systems as a black box from input and output measurements. This ensures high predictive capability based on observations [24]. AI can provide more superior techniques compared to theoretical or mathematical approaches for complex systems given multiple parameters and non-linear dependency which influences the process [23]. Artificial intelligent (AI) models have been profitably applied in predicting biogas yield due to their ability to generalize and learn complex input-output relationships [25,26,27,28,29]. Several AI models have been used to model AD processes, though only few exist for the prediction of biogas yield from FFV waste [7, 30,31,32]. Kanat and Saral [33] and Yetilmezsoy et al. [34] developed models for the prediction of biogas production from molasses wastewater using ANN. They noted the ability of ANN in determining the interdependence in an AD process without prior awareness of the mathematical principles or governing equations. Good prediction result was obtained bases on statistical metrics. In a study carried out by Neto et al. [35], the effects of seven (7) critical operating parameters were evaluated and deployed to predict biogas yield using an artificial neural network (ANN). Before this, most existing studies majorly consider one, two, or three operating factors despite the importance of others in extracting valuable information for the prediction and optimization of the AD processes [36,37,38]. The major limitation with the deployment of ANN is associated with its inability to guarantee global optimal solution and difficulties in knowledge representation [39, 40]. This is almost unavoidable considering the fundamental black box processing paradigm and several topologies which exist in neural computing [40]. In that case, the prediction of FFV biogas production can benefit from the deployment of a system which combines the advantage of ANN and fuzzy systems with evolutionary algorithms. Fuzzy systems can represent comprehensive linguistic knowledge while reasoning through fuzzy rules, though it does not have the mechanism for parameters tuning [40, 41]. On another note, neural networks can be trained and tuned from a set of input-output data stream. The combination of fuzzy system and neural network give rise to adaptive neuro-fuzzy inference system (ANFIS). ANFIS is very robust with ability to learn neural networks while modelling the uncertainty, linguistic concepts, and expert knowledge. NFIS model can adapt to variation in system conditions, control noisy data, and quickly model the system with low computation resources and non-linear process structures [42]. ANFIS is a prediction technique used in numerous fields of study in bioenergy exploration and conversion due to its capacity to map input-output inside a solution space so that local optimal values are avoided while considering fuzzy factors [43, 44]. While building an ANFIS model, the choice of clustering technique must be thoroughly considered given its impact on the accuracy and precision of the model [45]. Clustering methods help in the identification of group where an observation belong; thus, unsuitable clustering approach may reduce the accuracy of the models [45, 46].

The genetic algorithm (GA), as an evolutionary technique, has been extensively used in different fields [47,48,49]. Its preference over other evolutionary-based techniques has been associated with its ability to deal with complex problems and parallelism. Evolutionary genetic algorithms (GA) are applicable to any optimization problem: stationary or non-stationary, continuous or discontinuous, linear or non-linear, or random with noise [50]. GA optimization method stems from the Darwin Theory of Evolution which focuses on natural selection and survival of the fittest in biological genetics [51]. The main objective of the approach focuses on reproducing offspring with improved genetic fitness than their progenitors. This principle of evolution is deployed through reproduction, crossover, and mutation. For hyperparameter optimization, the GA technique optimizes the base model hyperparameters according to an objective function within the solution search space until convergence on improved solution is reached. With this prowess, the GA method has been deployed as a hybrid of other intelligent predictive models, leading to an improved prediction accuracy and error minimization via parameter optimization [25, 52]. A hybrid model comprising ANFIS optimized with GA promises better results by minimizing the curse of dimensionality and internal loss of interpretability of the model when used on large input datasets [53].

From the existing studies, hybridization of GA and ANFIS model has been successfully adopted for widespread application in order to improve the prediction and optimization capabilities. However, to the best of authors’ knowledge, the use of GA-ANFIS including the investigation of the effect of clustering techniques for the prediction of biogas yield from FFV has not been previously reported. The closest application was in the prediction of the heating value of biomass [54]. Therefore, building on the advantages of adaptive neuro-fuzzy inference system (ANFIS) functionality [38], this study applies GA model to predict the cumulative biogas production from FFV waste. Accordingly, the specific objectives of the present study are: (1) to develop GA-ANFIS model for the prediction of biogas yield from FFV; (2) to investigate the effect of clustering techniques on the performance of the developed model; and (3) to compare the developed predictive models based on several statistical performance indicators. The proposed model utilizes feeding, VS, pH, HRT, OLR, temperature, and reactor volume data as input variables. Sensitivity analysis was carried out to determine the relative contribution of each input parameter toward the prediction of output. The results of these models were compared with previously proposed model based on known performance metrics.

2 Materials and methodology

2.1 Overview of ANFIS model

Binary logic techniques often fail to closely approximate complex and non-linear problems. This is largely due to the insufficient knowledge or judgment error associated with human experts and the dynamic nature of the system. In this case, adaptive neuro-fuzzy inference system (ANFIS) has an advantage since it combines fuzzy inference linguistic transparency with self-learning capability of neural network. In ANFIS modelling, the fuzzy inference system and artificial neural network can be combined according to the Mamdani system [55] and Takagi-Sugeno system [56] topologies. Takagi-Sugeno system is more computation efficient, amenable to rule generation alongside the optimization technique of ANN while ensuring the continuity of the output space [57]. Succinctly, an ANFIS structure is a combination of information obtained from a fuzzy logic system and an artificial neural network. It comprises numerous membership function (MF) parameters tuned using optimization methods [58]. A typical ANFIS structure is a five-layer topology, each of which has a number of nodes defined by logical sets of statements. These logical statements perform specific tasks [26]. Specifically, the ANFIS structure comprises the fuzzification layer, the rule layer, the normalization layer, the de-fuzzification layer, and the total output layer, as discussed by Adedeji et al. [43] and Miller et al. [59]. Once the links between the layers have been established, the previous layer’s outputs are used as the inputs of the next layer. A typical ANFIS structure with two input parameters, x and y, and a single output parameter fi, is governed by the rules expressed in Eqs. (1) and (2):

$$\mathbf{Rule}\;\mathbf1\;:\;\mathrm{If}\;x_1\;is\;M_1\;\mathrm{and}\;y\;is\;N_1\;\mathrm{then}\;z=f_1\left(x,y\right)$$
(1)
$$\mathbf{Rule}\boldsymbol\;\mathbf2\boldsymbol\;:\;\mathrm{If}\;x_2\;is\;M_2\;\mathrm{and}\;y\;is\;N_2\;\mathrm{then}\;z=f_2\left(x,y\right)$$
(2)

where fuzzy terms are represented by M and N and fi(x, y) is a first-order Takagi-Sugeno fuzzy model with f1 and f2 indicating the fuzzy-if-then rules. The details of the fuzzy layers and their mathematical expressions have been discussed elsewhere [43, 60].

The five layers of ANFIS model are shown in Fig. 1 and are briefly discussed. Each nodes of the layers performs different functions, and they are optimized through learning processes. The node-to-node connection lines show the flow direction and do not imply any weight.

Fig. 1
figure 1

Structure of an ANFIS

2.1.1 Fuzzification layer

This is the first ANFIS layer where each neuron is adaptive such that individual weights are updated in the course of learning. In this case, the input parameters are expressed by the Gaussian membership functions with a node output expressed by Eqs. (3) and (4). Other membership functions can be used in addition to the Gaussian membership function such that the parameter with lowest error is selected in the learning process as shown in Eqs. (3) and (4):

$${O}_i^1={\mu}_{Ai}\left({x}_1\right),\kern0.5em for\ i=1,2$$
(3)
$${O}_i^1={\mu}_{Bi-2}\left({x}_2\right),\kern0.5em for\ i=3,4$$
(4)

where μBi − 2 and μAi are fuzzy membership functions.

2.1.2 Rule layer

This layer are non-adaptive, and each rule has a firing strength whose value is estimated using a simple multiplication [3]. The nodes in this layer receive the incoming signals from the “IF” part of the fuzzy rule, and it then outputs it, by using multiplication operator. This output, as expressed in Eq. (5), represents the fitness of the fuzzy rule.

$${O}_i^2={w}_i={\mu}_{Ai}(x).{\mu}_{Bi}(y),\kern0.75em for\ i=1,2$$
(5)

2.1.3 Normalization layer

This is the third layer of ANFIS and is also called the summation layer. This layer computes the normalized firing strength of the nodes as expressed in Eq. (6):

$${O}_i^3=\overline{w_i}=\frac{w_i}{w_1+{w}_2}\kern2.25em i=1,2$$
(6)

2.1.4 Defuzzification layer

The fourth layer is a defuzzification layer and its nodes are adaptive. A first-order polynomial function is used to multiply the normalized firing strength of each rule as shown Eq. (7):

$${O}_i^4=\overline{w_i}\left(\ {p}_ix+{q}_iy+{r}_i\right)=\overline{w_i}{f}_i\kern3.25em for\ i=1,2$$
(7)

where pi, qi, ri is a parameter set of the node and \(\overline{w_i}\) is the normalized firing strength of the third layer.

2.1.5 Output layer

This is the last layer of ANFIS, and it is also called summation output layer. This layer has a single non-adaptive node that summarizes all the incoming signals to produce output. The overall output is computed as shown in Eq. (8). This value is continuous in nature rather than a fuzzy set.

$${O}_i^5={f}_{out}=\sum_{i=1}\overline{w_i}{f}_i=\frac{\sum_{i=1}^2{w}_i{f}_i}{\sum_i{w}_i}$$
(8)

The performance of a fuzzy logic system is linked to how well the membership functions are normalized. Moreover, a fuzzy logic applies the correlation between the antecedents and consequents, including linguistic variables, to produce a specific output. Data clustering is an essential process used in assigning membership functions such that the tuned membership functions are generated according to the expert system’s knowledge. Three (3) clustering techniques were deployed in this study, and they are briefly discussed below. Detailed information about these techniques can be obtained from the studies conducted by Adedeji et al. [43] and Rao et al. [61];

2.1.6 Fuzzy c-mean clustering

Fuzzy c-means (FCM) is a data clustering approach that divides a dataset into several clusters. Each data point in the dataset belongs to each cluster to a varying degree. FCM was first presented by Dunn [62], but it was enhanced by Bezdek [63]. Each data point may belong to more than one cluster using this technique. However, the number of clusters must be determined depending on assumptions made in advance. In the FCM clustering, U is a characteristic matrix for the membership of each element in each cluster. Therefore, c-means objective function Jm(U, v) is defined such that the clustering algorithm minimizes the objective function as presented in Eq. (9).

$${J}_m\left(U,V\right)=\sum\nolimits_{i=1}^N\sum\nolimits_{j=1}^CU{\left(i,j\right)}^mD{\left(i,j\right)}^2,\kern3.25em 1\le m\le \infty$$
(9)

where

D(i, j)2 = ‖Xi − Vj ‖:

is the squared distance between the element Xi and cluster Vj

V j:

the centroid of cluster j

C:

number of clusters | 2 ≤ C < n

m:

fuzzification index of the algorithm

U(i, j)m:

degree of membership

2.1.7 Subtractive method

Subtractive clustering is a fast, one-pass algorithm used in estimating the number of clusters and the centers for a set of data [64]. The subtractive clustering (SC) algorithm is utilized to automatically generate the tuned membership functions in accordance with the domain. For this intent, the radius that determines the cluster’s influence in the space is specified [61]. The size of the cluster would be small if the radius of the cluster is set to be too small, thus increasing the number of clusters. On the other hand, the size of the cluster would be large, if the radius of the cluster is set to be large, thus reducing the number of the clusters. The cluster formulation is based on density calculated by Eq. (10):

$${D}_i=\sum\nolimits_{j=1}^n\exp \left(\frac{{\left\Vert {x}_i-{x}_j\ \right\Vert}^2}{{~}^{{r_a}^2}\!\left/ \!{~}_{4}\right.}\right)$$
(10)

where rais the radius ra. The greatest density point is chosen as the first cluster center xc1a; after this, the density measure of each data point xiin the next iteration is obtained in Eq. (11):

$${D}_i={D}_i-{D}_{ci}\sum\nolimits_{j=1}^n\exp \left(\frac{{\left\Vert {x}_i-{x}_j\ \right\Vert}^2}{{~}^{{r_a}^2}\!\left/ \!{~}_{4}\right.}\right)$$
(11)

The iteration process continues until a sufficient number of clusters are achieved, and all the data points fall within the radii of a cluster center.

2.1.8 Grid partitioning (GP)

Grid partitioning is often deployed when the dimension of the input space is small. As the number of input parameters increases, the number of membership functions increases exponentially, posing a significant limitation to the performance. GP is different from subtractive clustering and fuzzy c-means, because GP uses similar membership functions on the input space to generate identical partition within the symmetric function [43]. Fuzzy rules can be generated from input-output dataset deployed during the training. This allows rapid learning process and optimization of computation time (CT). The number of the fuzzy if-then rules is equal to Mn such that n is the input dimension and M is the number of fuzzy subsets that is partitioned for each input variables. The performance of this clustering technique significantly depends on the size of the input and the grid. The finer grid typically performs better, though adaptive grid partitioning can be deployed to optimize the size and location of the fuzzy grid regions [65]. However, the GP technique is limited by the exponential explosion of the numbers of membership functions as the number of the input parameters increases [66].

2.2 Data collection and processing

The 864 dataset deployed in this study was developed by Neto et al. [35]. Specifically, the data was from the experimental study carried out by the aforementioned authors and several other authors whose works have been published in reputable journals. The data were gathered across different season in different geospatial location under varying conditions [35]. It covers a wide array of actual output and input parameters that dictate the behavior of FFV wastes. In this case, seven (7) variables considered as the inputs are organic loading rate (OLR), volatile solids (VS), pH, hydraulic retention time (HRT), temperature, retention time, and reaction volume, while cumulative biogas production was the output. The impacts of individual input variable on the output value were established based on sensitivity analysis. This was done to determine the influential variables, which significantly contribute to the prediction of cumulative biogas yield. The type of reactors (anaerobic sequencing batch reactor (ASBR) and continuous stirred tank reactor (CSTR)) and feeding (semi-continuous and continuous) as well as the number of stages in the digesters (one, two, or three) were described by discrete parameters. The data collected from the database were randomized and subsequently divided in a ratio of 7:3 for training and testing, respectively. The statistical distribution of the FFV waste is presented in Table 1. The training performance is expressible through a comparison between the actual and predicted data in this step.

Table 1 Descriptive statistical distribution of FFV dataset used in the development of the model [35]

2.3 Model implementation

A hybrid genetic evolutionary algorithm based on a fuzzy inference system was deployed using three separate clustering (fuzzy c-means, grid partitioning, and subtractive clustering) techniques. The decision to assess the impact of the clustering technique was based on the studies conducted by [15, 20], which suggested that the choice of hyperplane tuning parameters such as clustering method affects the model’s performance. Figure 2 shows the flow diagram for the estimation of cumulated biogas yield. The GA algorithm starts by initializing with randomly generated population. In the course of each successive generation, a percentage of the existing population is selected to breed a new generation. The population is ranked, and the final solution is then selected through fitness-based procedure. In this study, Roulette wheel selection techniques were used since it has been identified as the most efficient in parent selection. The crossover of fittest parents is performed to produce new offspring which reflects the attributes of the pairing parents. The crossover process is followed by mutation where new solution is searched within the available search space to obtain revolutionary results which could help in arriving at efficient solution. After the optimal solution has been selected, a clustering technique is deployed as per the requirement of ANFIS model; then, the model is trained using the processed data. If the stopping criteria are met, then the model is tested with hold out data; otherwise, the training process is repeated until the stopping criteria are satisfied. The coding of the evolutionary genetic algorithms was actualized in MATLAB program version 2020a, and the software was installed on an 11th Gen Intel(R) Core (TM) i7-1165G7 CPU @ 2.80GHz laptop 32GB RAM, 2TB SSD machine. Iteration values of 1000, 800, and 600 were tested, but there was no discernible difference in the three cost functions. The training ends since no further progress is made and the maximum number of iterations has been reached. As a result, the shortest iteration time (600 iterations) was chosen for this investigation to reduce computational time.

Fig. 2
figure 2

Flow diagram of hybrid genetic evolutionary algorithm

Table 2 present the step-by-step procedure that was followed in the implementation of GA-ANFIS. It should be noted that the iterative process would continue until the stopping criteria are satisfied.

Table 2 Step-by-step procedure for the implementation of GA-ANFIS

The genetic algorithm is hyperparameter-sensitive which makes tuning very important because the appropriate selection of these parameters could reduce the prediction errors [67]. Also, clustering techniques are optimally selected to enhance the overall performance of the model. Clustering techniques reveal the intrinsic relationship between the dataset [68, 69]. The learning and optimization parameters that were used in this study are shown in Table 3.

Table 3 Learning and network optimization parameters

2.4 Model performance analysis

The models applied in this study were evaluated based on the relevant statistical metrics. The mathematical expression of these statistical metrics is presented in Table 4.

Table 4 Statistical performance metrics [43, 70]

3 Results and discussions

This section discusses the results obtained during the testing and training phase of the model. The performance evaluation results were reported, while further comparison was drawn with the study conducted by Neto et al. [35]. The influence of the 7 input variables on the cumulative biogas production is evaluated as shown in Fig. 3. The sensitivity analysis indicates that all variables have influence on the cumulative biogas yield, though to different degrees. It can be seen that HRT and VS were most influential, contributing 35% and 31%, respectively, to the prediction of cumulative biogas production. Further, pH contributes 22% to the output; however, the reactor volume is the least influential in cumulative biogas production. The significance of HRT is due to its impact on other variables such as temperature and substrate composition [71, 72]. When the HRT is reduced, the activity of the bacteria is reduced, while bigger HRT would require larger digester leading to higher cost and low efficiency. The HRT changes with temperature with lower HRT in thermophilic temperature, while greater HRT is required in mesophilic temperature. Also, the VS concentration of the substrate could affect the biogas production. For instance, comparison of the influent and effluent of the digester can help in the determination of the feedstock degradation.

Fig. 3
figure 3

Sensitivity analysis for biogas process input variables

It is noteworthy that the microbial growth during anaerobic digestion could be affected by the pH. In a study conducted by Jayaraj et al. [73], effects of different pH values (5, 6, 7, 8, 9, 10) on biogas yield from food waste after 30 days of retention were investigated. pH 7 produced better biogas yield and bacteria growth. There is a relation between the feeding rate and the OLR. Higher OLR at higher feeding rate can produce larger biogas volume, and vice versa [74]. OLR also affect the microbial population as well the reactor performance. It is critical to optimize OLR since increase in its value may produce acidification effect and subsequent reduction or stoppage of biogas production, while decrease in OLR may reduce the biogas production efficiency [75]. Temperature affects the HRT, OLR, VS, microbial growth, and the cumulative biogas yield. The rate of biogas production is enhanced at higher temperature [12, 76]. When the operation temperature changes, the biogas yield would also change [32, 77, 78].

After the effects of all variables have been verified, the GA-ANFIS model was successfully implemented in the MATLAB environment, and the resulting performance of each clustering techniques was further discussed. Figures 4, 5, and 6 show the experimental and predicted values based on SC, FC, and GP at the testing state. The prediction based on SC shows a strong relationship and satisfactory agreement with a significantly lower misprediction between the predicted and experimental values of biogas production. However, there were more notable instances of misprediction of biogas production in FC clustering technique. This may have been due to the unequal sizes and densities of the cluster [79]. The worse prediction scenario characterized by gross misprediction and underfitting was noted when grid partitioning clustering method was deployed. This affected the model accuracy as MAPE, R2, MAD, and RMSE reported were poor. Although several tunings were performed using different parameters to validate the GP result, the same prediction pattern was obtained. Similar observation was obtained by Adedeji et al. [69] in their short-term prediction of wind turbine power output. This may be due to high bias and dimensionality, which increased the complexity of the model and hamper its ability to learn the pattern of the training data. The better performance of SC may be as a result of its capability to automatically extract rules that fully account for the mobility and distribution of nodes. FCM, on the other hand, is highly sensitive to outliers in the biogas dataset, due to the Euclidean norm which measures the similarity between the center of the cluster and data points [66]. Moreover, the modelling was performed on a real-world FFV dataset, which contained noise and outliers to demonstrate the model’s effectiveness, efficiency, and robustness. Therefore, the sensitivity FC to noise and outliers may have been partly responsible for the poor performance [79, 80].

Fig. 4
figure 4

Experimental and evolutionary prediction of cumulative biogas yield (SC) (testing)

Fig. 5
figure 5

Experimental and evolutionary prediction of cumulative biogas yield (FC) (testing)

Fig. 6
figure 6

Experimental and evolutionary prediction of cumulative biogas yield (GP) (testing)

Apart from the visual observation of the agreement between the experimental and predicted biogas yield, statistical evaluation is performed, and the results presented are shown in Table 5. Generally, MAPE estimates the model’s forecast accuracy and fitness, while MAD and RMSE assess the magnitude of the average prediction error. Thus, it is preferred if the value of these metrics tends toward zero. Lower values of RMSE, MAPE, and standard deviation error indicate that the predictive model is more accurate and has less error. The SC clustering technique offered the best result of the three clustering methods at the training and testing stage for all evaluation metrics except for the computation time (CT). This could have resulted from SC clustering ability which can improve numerically different but similar data groups and sparse data points in a multidimensional space [81]. Although the computation time (CT) was 70 s, the MAPE forecast of the FC was significantly off by approximately 59%, which means only 41% of the data were correctly predicted. The poor statistical performance of GP shows that global optimal value could not be attained despite the optimization technique.

Table 5 Statistical evaluation of SC, FC, and GP

Since the data used in this study were sourced from Neto et al. [35], the results from the current study were compared with the biogas production models based on the determination coefficient (R2). These are presented in Table 6. For the ANN models from Neto et al. [35], biogas production was predicted using gradient descent algorithm (Traindx), Bayesian regularization backpropagation (Trainbr), and Levenberg and Marquardt function (Trainlm). However, for all these ANN variants, SC-based evolutionary algorithms produced better performance result for cumulative biogas production since its R2 only deviates from 1 by 0.0004%. On the contrary, all the ANN models deployed by Neto et al. [35] performed better than the Fuzzy c-means clustering and GP methods deployed in this study with GP significantly closer to zero. The R2 value (R2 =0.1872) based on GP method suggests that the model fitting of the data is poor. The likely cause maybe the high dimensionality of the dataset. In addition, it suggests than the model could not explain the majority of the dataset. Therefore, it is reasonable to conclude that the evolutionary algorithm based on subtractive clustering technique outperforms other clustering techniques in the prediction of cumulative biogas yield.

Table 6 Comparative assessment f prediction performance

4 Conclusions

Hybrid evolutionary (genetic) algorithm based on an adaptive neuro-fuzzy inference system (ANFIS) was applied to predict biogas yield from FFV. Three (3) clustering techniques (SC, FC, and GP) were considered, and the sensitivity of the input variables was evaluated. The sensitivity analysis indicates that all variables have influence on the cumulative biogas yield, with HRT and VS being the most influential, contributing 35% and 31%, respectively, to the prediction of cumulative biogas production. Also, the result achieved in the study demonstrates the effect of the clustering technique on the optimization of biogas production. The application of a hybrid genetic algorithm using a subtractive clustering approach offered highly satisfactory concurrence with the experimental data for biogas production. The statistical performance metrics for training and testing phases indicated that evolutionary ANFIS based on SC could reasonably predict the cumulative biogas yield with high accuracy and low error. It also provides better reliability than other models reviewed in this study. The results confirm the capacity of hybrid evolutionary (genetic) algorithm based on subtractive clustering technique to predict the biogas yield from FFV and serve as an effective tool for the upscaling of anaerobic digestion units as well as in techno-economic studies toward more efficient energy utilization.