1 Introduction

Since electricity has developed into such an integral element of our day-to-day lives in the contemporary world, the economic development of any country is closely tied to the infrastructure, network, and accessibility of its electrical supply [1, 2]. Consequently, there has been a phenomenal rise in the demand for domestic and commercial uses of energy all over the world. The process of forecasting future power demand is very important to the electric sector, as it serves as the foundation for decision-making about the operation and planning of power systems. Furthermore, with the development of renewable energy sources and smart networks, load forecasting, or the prediction of electrical energy consumption, is becoming increasingly important [2]. Good grid management requires careful forecasting of load demand, a regular repair schedule for generators, transmission and distribution lines, and a judicious allocation of loads across these facilities [3]. Forecasting electricity demand and pricing is an extremely important component in the process of developing efficient, dependable, and risk-free management strategies for the energy system in the context of deregulated energy markets [4]. The key to successfully sustaining energy stability is the implementation of strategies that encourage the expansion of technologies capable of forecasting demand [5]. When it comes to making judgements that are precise and well-informed on the planning of the future, the investigation of such technologies may be helpful.

Significant progress has been made in the field of prediction methods over time. Recently, forecasting using non-linear models has gained increased attention compared to traditional linear models. This shift is driven by the recognition of the inherent nonlinearity in real-world challenges. It underscores the importance of utilizing forecasting methodologies capable of accommodating nonlinearity to develop precise and reliable forecasting models [6]. The foundation of the artificial neural network (ANN) model is grounded in its nonlinear mapping structure, inspired by the configuration of human neurons. This model has proven effective in addressing diverse challenges across various fields of activity [7]. Within the domain of modeling methodologies, the adaptive neuro-fuzzy inference system (ANFIS) emerges as a sophisticated alternative, seamlessly blending the complexities of ANN with the nuanced architecture of a fuzzy inference system (FIS). The objective behind this fusion is to amplify the swiftness, error resilience, and adaptability of the modeling system, thereby optimizing its overall performance [8, 9]. The recent surge in positive feedback regarding the application of ANFIS to time series prediction and issue forecasting is a direct consequence of its notable effectiveness, surpassing other conventional approaches. This is attributed to ANFIS’s unique capability to concurrently leverage both ANN and FIS. Previous researchers have considered ANFIS in various domains [10,11,12]. However, it is essential to note that ANFIS may encounter imprecision in certain situations. This imprecision arises from the necessity to determine and optimize its parameters before achieving effective utilization. Addressing this challenge effectively involves optimizing ANFIS parameters, a task facilitated through the application of Evolutionary Algorithms (EAs), as recommended in relevant references.

The utilization of EAs provides a strategic approach to enhance the precision and performance of ANFIS, contributing to its overall effectiveness in various applications. That being said, EAs have also proven to be very effective when applied to other machine learning models [13, 14]. Their versatility and adaptability make them valuable tools across various applications. By incorporating EAs into the optimization process, researchers can fine-tune ANFIS parameters to better align with the specific requirements of different scenarios, ensuring improved outcomes and robust performance [9, 15, 16]. Combining EAs with ANFIS structures yields a potent AI-based forecasting method that draws on the traditional ANN’s propensity for learning and the flexibility of fuzzy logic to make precise forecasts [17]. Therefore, EA-based ANN and ANFIS models have rapidly gained popularity in the research community and have been investigated in a variety of contexts, few of which are wind [18, 19], heating, ventilation and air-conditioning (HVAC) systems [20], agriculture [21], economics [22], education [23], medicine [24], sport [25] etc. The same is true for prediction of electricity consumption. For instance, authors in [26] used MLR, ANFIS, and PSO-ANFIS to determine the industrial energy demand in Turkey. The PSO-ANFIS model outshines its counterparts, namely the MLR and ANFIS models, exhibiting heightened accuracy in predictions and minimizing the margin of estimation error, as corroborated by research their findings. In ref. [27], the forecasting accuracy for Bonneville, Oregon was increased using a NSGA II, ANFIS, and GA. When compared to other approaches, the suggested NSGA II-ANFIS-GA model had the best performance. Together, ANFIS and PSO methods were used to simulate the scour hole’s geometric characteristics in ski-jump spillways [28]. Simulation findings showed that the suggested model performed better than competing approaches on a number of well-known error computation indices. Kumaran and Ravi [29]used an ANN-biogeography optimization (BOA) model to an LTF of electric power demand in India. The suggested model uses two BOA-tuned ANNs to find the optimal nonlinear map between input and output values based on socio-economic aspects like population. Ahmad et al. [30]investigated a variety of methods for predicting the electrical load in buildings by using AI-based techniques such as SVM and neural networks. They proved that employing a combination of two different methods of predicting produces better results than using just one strategy alone. Banda and Folly [31], examined how well a hybrid PSO/ANN model performed in forecasting hourly to weekly changes in electrical load demand. In order to produce an accurate model, the PSO algorithm was used to fine-tune the structure of the standard ANN model and bring the forecasting error down. Results demonstrated that the PSO-ANN hybrid outperformed the ANN alone.

Even though hybrid models are effective energy forecasters, much prior research has omitted the potential impact of clustering methodologies and other crucial factors necessary for the ANFIS model to be successful. Appropriate care must be taken when selecting a clustering strategy to utilize in approximating an output function [32]. A proper selection of clustering methods is essential for maximizing competence and guaranteeing excellent forecast correctness. Eventually, the model’s accuracy will suffer if the clustering technique and parameters are not chosen properly. Moreover, the hybrid PSO and ANFIS have been used in a variety of contexts across a number of papers. Perhaps this is because the PSO algorithm is one of the easiest and most adaptable to use, making it popular in a wide range of industries. However, it is also very important to consider the inertia weight (IW), which is a key component of the PSO algorithm to function optimally.

The IW parameter holds significant importance in determining the equilibrium and convergence of the exploration–exploitation phase of the PSO algorithm. The inception of the IW concept, initially proposed in [33], was directed towards enhancing the performance of the conventional PSO algorithm. As the field advanced, subsequent research has actively sought to improve the conventional PSO algorithm, primarily through subtle adjustments to the IW. Consequently, this study systematically investigates the effects of employing distinct IW strategies. Furthermore, the research extends its horizon to probe into the multifaceted impact of diverse clustering methods and other pivotal parameters on a sophisticated hybrid model that integrates PSO and ANFIS for the purpose of forecasting electricity consumption. The geographical scope of the case study is specified to districts within Lagos, Nigeria. In addition to scrutinizing the predictive accuracy, the study systematically evaluates the robustness of the proposed model by subjecting it to rigorous comparisons with various PSO variants.

The main contribution of this study is as follows:

  1. (1)

    Develop a hybrid model by combining particle swarm optimization and ANFIS for electricity consumption, utilizing weather data and historical electric loads.

  2. (2)

    Investigate the impact of hyper parameters and two renowned clustering techniques such as subtractive clustering (SC) and fuzzy c-means (FCM) on the developed model.

  3. (3)

    Further conduct a comparative study between the developed model and other hybridized PSO-based ANFIS variants using different inertia weight strategies.

The subsequent sections provide an overview of the remaining aspects of this investigation. Section 2 details the materials and procedures employed, while Sect. 3 delves into the analysis of experimental findings. Finally, Sect. 4 concludes the study and presents potential avenues for future research.

2 Materials and methods

2.1 Description of the study area

Southwest Nigeria is home to one of the most populous cities on the African continent, which goes by the name of Lagos. It is widely renowned for being the most populous city in Nigeria as well as the key regional hub for transportation through air, land, and sea. Because of its location on the coast of Nigeria’s Atlantic Ocean and the excellent trade routes it provides, the area’s geographic setting is of special significance. In addition to having a big airport, it has road and rail connections to the Nigerian cities that are located in the vicinity. Situated at 6° 27′ 55.5192″ N latitude and 3° 24′ 23.2128″ E longitude, the metropolitan area with high population density encompasses 16 out of the 20 local government areas (LGA) within the region (refer to Fig. 1). The climate in the state manifests itself through two well-defined seasons: the rainy season spanning from April to October and the dry season prevailing from November to March. This climatic occurrence arises from the convergence of the hot and arid air mass from the continental interior with the warm and moisture-laden tropical air mass from the marine environment [34]. Changes in temperature and humidity throughout the year influence people’s routines and, as a result, the amount of energy they use. This research considered the development of a hybrid modelling scheme for predicting the energy usage during the wet season.

Fig. 1
figure 1

Geographical representation indicating the location of the study area [9]

2.2 Data collection

In this study, we investigate the influence that clustering methods and other critical characteristics have on hybrid models for estimating the amount of energy that is used in 10 different districts in Lagos (see Fig. 1). Utilizing information obtained from the Eko Electricity Distribution Company (EKEDC), the forecasting model was constructed using data from the wet months of 2020. Climatic data, encompassing maximum temperature, minimum temperature, humidity, wind speed, and dew, was sourced from the Visual Crossing Weather Data stations, aligning with the geographical scope of the study. The model’s output, measured in megawatt-hours (MWh), is the electricity consumption. The model was developed utilizing 214 sets of experimental data, encompassing daily consumption and environmental factors over the span of a year. Training of the model involved 150 data samples, while the remaining 64 hold-out data points were employed to assess the model’s precision. Following an assessment of model fit to the data, the optimal model was chosen based on criteria aiming for the least amount of error. A comprehensive analysis of the two primary methods of clustering is carried out, during which significant parameter changes are also taken into consideration. This results in the development of many sub-models. Following a number of different simulations, the best model is chosen. To provide insight into the data used, Table 1 presents the statistical properties of the input and output data.

Table 1 Statistical properties of the input and output data

2.3 Adaptive neuro-fuzzy inference system (ANFIS)

In 1993, Jang [8] introduced the concept of the ANFIS as a distinctive hybrid model that combines neural networks and fuzzy logic in its structure. One of its notable advantages is its capability to extract fuzzy rules from numerical data and expert knowledge, subsequently constructing an adaptive rule base from this information. This fusion aims to enhance the speed, robustness, and adaptability of the modeling system, thus optimizing its overall performance [8, 9]. As a result of this integration, ANFIS has gained significant acclaim for its application in time series prediction and issue forecasting. Its effectiveness is demonstrated by its superiority over traditional methods, attributed to its remarkable ability to simultaneously harness the strengths of both ANN and FIS. ANFIS addresses the complex task of translating human intelligence into fuzzy systems [35]. ANFIS strives to achieve the intricate task of establishing a model that effectively correlates input parameters, represented by initial values, to the desired target outcomes or predicted values. This involves a multi-stage process, encompassing the mapping of input characteristics to input MFs, subsequent relationships between these functions and a comprehensive set of TSK-type fuzzy if–then rules, mapping these rules to a set of output features, linking these features to output MFs, and ultimately connecting the output MF to a singular output value or a decision associated with the output, thus demonstrating the comprehensive functionality and complexity of ANFIS [36]. Presuming the fuzzy inference system comprises two inputs (x, y) and a solitary output (f), the first order Sugeno fuzzy model is characterized by a distinct structure in a singular fuzzy if–then rule.

$${\text{Rule }}\,1:{\text{ If}}\,x\,{\text{is}}\,I_{1} \,{\text{and}}\,y\,{\text{is}}\,J_{1} ,F_{1} = a_{1} x + b_{1} y + c_{1}$$
(1)
$${\text{Rule }}\,2:{\text{ If}}\,x\,{\text{is}}\,I_{2} \,{\text{and}}\,y\,{\text{is}}\,J_{2} ,F_{2} = a_{2} x + b_{2} y + c_{2}$$
(2)

where the membership functions are denoted by \({I}_{1}\), \({I}_{2}\), \({J}_{1}\), and \({J}_{2}\); the input parameters are represented as \(x\) and \(y\); the outputs obtained from the system are designated as \({F}_{1}\) and \({F}_{2}\); the nodal consequent parameters are \(a\), \(b\), and \(c\). The structure of ANFIS in this uncomplicated scenario is depicted in Fig. 2, encompassing five tiers. The initial layer serves as the input stage, followed by the fuzzification layer, succeeded by the third and fourth layers dedicated to fuzzy rule assessment, culminating in the fifth layer designated for defuzzification.

Fig. 2
figure 2

ANFIS model architecture

The model’s structural design is portrayed in Fig. 2. Within these layers, the product, normalization, and defuzzification layers maintain a consistent number of nodes, whereas the fuzzy and output layers possess adaptive characteristics. In the initial layer, each adaptive node flexibly adjusts to a function parameter, comprising a fuzzy MF. The output function is dictated by:

$$O_{j}^{1} = \mu_{{A_{j} }} \left( {I_{1} } \right),\quad j{ } = { }1,2{ }$$
(3)
$$O_{j}^{1} = \mu_{{B_{j} }} \left( {I_{2} } \right),\quad j{ } = { }1,2{ }$$
(4)

In addition, the second layer consists of nodes that are not adaptive, and the firing strength of each rule is calculated by utilizing Eq. (5).

$$O_{j}^{2} = w_{{\text{j}}} = \mu_{{A_{j} }} \left( {I_{1} } \right) \times \mu_{{B_{j} }} \left( {I_{2} } \right),\quad j{ } = { }1,2$$
(5)

The third layer conducts the normalization of the firing intensity at the jth node. The outcome of this layer is derived from the proportion of the node’s firing intensity to the aggregate firing intensity of the remaining nodes, as depicted in Eq. (6). Acceptable values for both the normalized layer and the normalized firing intensity span from 0 to 1.

$$O_{j}^{3} = \overline{{w_{i} }} = \frac{{w_{j} }}{{w_{1} + w_{2} }},\quad j{ } = { }1,2$$
(6)

The defuzzification procedure is executed within this layer. Every node in this stratum is dynamic, utilizing the acquired node functions. The ensuing nodes integrate both the input and normalized signals from the preceding normalized layer, determining the jth rule’s influence on the output, as delineated in Eq. (7).

$$O_{j}^{4} = \overline{{w_{j} }} z_{j} = \overline{{w_{i} }} \left( {p_{j} I_{1} + q_{j} I_{2} + r_{j} } \right)$$
(7)

where \({p}_{j}\), \({q}_{j}\), and \({r}_{j}\) are the consequent parameters of the node \(j\).

In the fifth layer, non-adaptive nodes are present, and a summation function is employed to aggregate all incoming signals from the preceding layers [37].

$$O_{j}^{5} = \mathop \sum \limits_{j} \overline{{w_{j} }} z_{j} = \frac{{\mathop \sum \nolimits_{j} w_{j} z_{j} }}{{\mathop \sum \nolimits_{j} w_{j} }}$$
(8)

2.4 PSO-ANFIS model

Kennedy and Eberhart’s groundbreaking work [38] led to the inception of the PSO, an evolutionary algorithm that has gained widespread acceptance. Drawing inspiration from the complex dynamics inherent in fish schooling and bird flocking, this population-driven bio-inspired approach has achieved notable recognition across diverse domains. PSO is celebrated for its uncomplicatedness, stability, and enhanced computational capabilities, particularly evident in addressing nonlinear, high-dimensional, and multi-optimal problems [39]. Therefore, the utilization of the PSO algorithm has become exceptionally prevalent in the optimization field, surpassing alternative algorithms. Furthermore, it has been employed in population-based search methodologies, wherein each potential solution or swarm is symbolized by a population particle. Through this approach, a continuous adaptation of each particle’s position within a search space is initiated until optimal solutions are reached, adhering to predefined computing constraints [40]. The integration of the PSO algorithm, ANN, and FIS structures defines the complex PSO-ANFIS hybrid model. This sophisticated amalgamation harnesses the collaborative capabilities of these constituents, resulting in an advanced framework tailored for modeling and optimization endeavors. The model’s effectiveness emanates from merging ANN’s relational structures and learning proficiencies, incorporating fuzzy logic’s inherent dynamic qualities in decision-making encapsulated within ANFIS, and integrating the PSO algorithm’s prowess in parameter tuning. In a population of N particles, each particle i comprises position components \({X}_{i}^{d}\) and velocity components \({V}_{i}^{d}\) at the dth dimension. The updates in position and velocity for each particle are expressed as follows:

$$v_{i}^{t + 1} = wv_{i}^{t} + c_{1} r_{1} \left( {p_{best} - x_{i}^{t} } \right) + c_{2} r_{2} \left( {g_{best} - x_{i}^{t} } \right)$$
(9)
$$x_{i}^{t + 1} = x_{i}^{t} + v_{i}^{t + 1}$$
(10)

where \({r}_{1}\) and \({r}_{2}\)\(\in \hspace{0.17em}\)[0, 1]; \({c}_{1}\) and \({c}_{2}\) are the cognitive and social constants, respectively. The term w is referred to as inertia weight. Figure 3 displays the employed PSO-ANFIS model.

Fig. 3
figure 3

Proposed PSO-ANFIS model

2.5 Clustering techniques

The process of clustering involves the partitioning of data sets into distinct groups, wherein each cluster encompasses a unique entity. Clustering assumes a pivotal role in the domains of data mining and statistical analysis, and it is noteworthy that it constitutes a fundamental factor contributing to the precision of ANFIS models. The ANFIS employs two different clustering methods to organize the data into similar fuzzy clusters, which it then uses to assign MFs and generate the FIS structure from the data[41]. The subsequent sections delve into an exploration of the two prominently utilized clustering techniques. This research scrutinizes each of these clustering algorithms with the overarching aim of predicting energy consumption.

2.5.1 Fuzzy c-means clustering (FCM)

The employment of the FCM methodology streamlines the clustering process by allowing individual data items to contribute to multiple clusters. To ensure the efficacious operation of this approach, each data point is allocated membership values based on its proximity to every cluster center, determined by the spatial separation between the cluster center and the specific data point. As an unsupervised methodology for data scrutiny and model construction, FCM discovers applications across a myriad of disciplines. Within the ANFIS framework, discerning the MF stands as a pivotal facet, entailing a challenge rooted in clustering. The primary objective of the FCM technique within this construct is the minimization of the overall number of fuzzy rules applied during the analysis. In FCM, gauging the extent to which data pertains to distinct clusters involves the minimization of an objective function. Equation (11) elucidates the formula utilized to determine the optimal value for the spatial distance from the center to each datum for every fuzzy group n and vector \({x}_{i}\), where \(i\)= 1, 2 … \(n\).

$$E = \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{k = 0}^{n} U_{ij}^{m} x_{i} - { }c_{j}^{2}$$
(11)

In the specified range \(\left( {1 \le m \le \infty } \right)\), the weighting exponent is denoted by ‘m,’ the degree of membership is represented by \(U_{ij}^{m} \in \left( {0, \, 1} \right)\), the data point is expressed as \(x_{i}\), the centroid of clusters is illustrated as \(c_{j}\), and the number of clusters is identified as C. The \(U_{ij}\) of the data point in the \(j\) cluster at any iteration is calculated as follows:

$$U_{ij} = \left( {\mathop \sum \limits_{k = 1}^{C} \left( {\frac{{x_{i} - { }c_{j} }}{{x_{i} - { }c_{j} }}} \right)^{{\frac{2}{m - 1}}} } \right)^{ - 1}$$
(12)

2.5.2 Subtractive clustering (SC)

The primary objective inherent in clustering methodologies is the systematic arrangement of data into discrete groups, employing a metric of similarity. The SC technique operates on the underlying assumption that each individual data point harbors the inherent potential to act as the centroid of a cluster. Subsequently, it quantifies the probability of each data point assuming the role of delineating the cluster center by scrutinizing the density of data points positioned in its immediate vicinity [42]. Assuming the resultant dataset, denoted by the letter x, arises from the amalgamation of the system’s input data set X and its output data set Y, and further assuming that each dimension of the data has undergone standardization, implying the confinement of data set x within a hypercube, the SC method treats each point as a candidate for the center of a cluster. It calculates the distance between these points utilizing Eq. (13) [43]:

$$D_{i} = \mathop \sum \limits_{j = i}^{n} exp\left[ {\frac{{\left| {x_{i} - x_{j} } \right|^{2} }}{{\left( {\frac{{r_{a} }}{2}} \right)^{2} }}} \right]$$
(13)

The symbol \({r}_{a}\) signifies the cluster’s radius, while |.| expresses the Euclidean distance between clusters, and \(n\) refers to the count of sampled data points. Applying Eq. (14), the SC algorithm computes the potential for each point. The initial cluster center, \({x}_{c1}\), is designated at the position with the utmost potential, denoted by \({D}_{c1}\). The potential for each data point, \({x}_{i}\), undergoes updating through the specified equation [43]:

$$D_{i } = D_{i } - D_{c1} exp\left[ {\frac{{\left| {x_{i} - x_{j} } \right|^{2} }}{{\left( {\frac{{r_{a} }}{2}} \right)^{2} }}} \right]$$
(14)

The region’s radius with significant potential for decrease is denoted as \({r}_{b}\). To avoid clusters becoming overly compact, it is advisable to set \({r}_{b}\) to a value greater than \({r}_{a}\) [43]. The selection of the next center involves identifying the point with the highest potential. This process continues until one of the predetermined stopping criteria is satisfied.

2.6 Model performance evaluation

The developed models’ precision is shown by comparing observed and predicted values of power consumption across five distinct indicators of system’s performance and efficiency. The chosen performance measures have been widely employed in numerous prior studies. Research has shown that RMSE (38%) is the most popular error measure among electricity forecasts, with MAPE (35%) coming in a close second [3]. In the context of an ANFIS model for electricity prediction, each evaluation parameter offers valuable insights into the accuracy and reliability of the forecasts. MAPE gauges the average absolute percentage difference between predicted and actual electricity consumption values, with lower MAPE values indicating higher precision in the predictions. Similarly, MAE calculates the average absolute difference between predicted and actual consumption values, providing a measure of prediction accuracy. RMSE assesses the typical deviation of the model’s predictions from the actual electricity consumption values, with lower RMSE values indicating improved model performance. CVRMSE offers a normalized measure of prediction error relative to the magnitude of observed electricity consumption values, with lower CVRMSE values suggesting enhanced predictive accuracy relative to the variability of observed consumption levels. The correlation coefficient (R) was also used to evaluate the performance of the developed models. The criteria for choosing the best model are outlined in Table 2. The delineations of the performance metrics (PM) are expounded as follows:

$$MAPE = \frac{1}{N}\mathop \sum \limits_{{{\varvec{k}} = 1}}^{{\varvec{N}}} \left| {\frac{{P_{k} - O_{k} }}{N}} \right|\user2{ } \times 100{\text{\% }}$$
(15)
$$MAE = \frac{1}{N}\mathop \sum \limits_{k = 1}^{N} \left| {P_{k} - O_{k} } \right|$$
(16)
$$RMSE = \sqrt {\frac{1}{N}\mathop \sum \limits_{k = 1}^{N} \left( {P_{k} - O_{k} } \right)^{2} }$$
(17)
$$CVRMSE = \frac{100}{{\overline{P}}}\sqrt {\frac{{\mathop \sum \nolimits_{k = 1}^{N} \left( {P_{k} - O_{k} } \right)^{2} }}{N}}$$
(18)
$$R = \left[ {\frac{{\mathop \sum \nolimits_{k = 1}^{N} \left( {O_{k} - \overline{O}} \right)\left( {P_{k} - \overline{P}} \right)}}{{\sqrt {\mathop \sum \nolimits_{k = 1}^{N} \left( {O_{k} - \overline{O}} \right)^{2} \times \mathop \sum \nolimits_{k = 1}^{N} \left( {P_{k } - \overline{P}} \right)^{2} } }}} \right]$$
(19)
Table 2 Respective acceptability criteria for performance metrics

In the equations, k represents the index of the sample, N denotes the total number of samples, \({P}_{k}\) stands for the predicted electricity consumption value for the ith sample, and \({O}_{k}\) represents the observed electricity consumption for the same sample. Additionally, and denote the average observed and predicted values, respectively.

3 Results and discussion

In this session, the experimental findings and statistical results obtained from the developed models are discussed. The performance of the model was evaluated using a separate portion of the data (30% hold-out data), and relevant statistical metrics were employed to assess its statistical significance and effectiveness. A careful examination was conducted on how the performance of a hybrid model can be affected by the selection of a specific clustering technique and set of parameters. This investigation was an integral part of our overall research study. The table labeled Table 3 provides information about the essential parameters used in each clustering technique. The settings of these parameters were varied to find the combinations that yielded the most dependable PSO-ANFIS simulations. Regarding the FCM clustering technique, a range of 2–6 clusters were tested to investigate the impact of number of clusters (NoC) on the FCM-based hybrid model. In addition, for the SC-based hybrid model, a cluster radius (CR) in the range of 0.40–0.60 in the increment of 0.05 was examined. The parameter settings for the PSO algorithm are as follows: \(c_{1} = 1\), \(c_{2} =\) 2, \(\upomega _{damp}\) = 0.99, \(\upomega\) = 1 [46]. In light of these considerations, various sub-models were developed and meticulously analyzed with diverse hyperparameter configurations to ensure accurate and reliable assessment.

Table 3 Clustering method parameters

3.1 Performance implications of clustering parameters

The application of the SC algorithm is prevalent in the clustering of data, employing a fundamental concept that situates the center of each cluster at the data point with the highest density (potential) across various variables or dimensions [48]. The robustness and efficacy of the resultant clusters are intricately tied to the radius parameter, a critical factor influencing both the quantity and strategic placement of cluster centers. A thoughtful consideration of this parameter is essential, as a small radius may inadvertently overlook pertinent data points within the cluster’s center, while an inflated value possesses the potential to disproportionately amplify the contributions of all data points, thereby diminishing the intended density effect [47]. Consequently, a series of tests were conducted on PSO-ANFIS, utilizing CR values ranging from 0.40 to 0.60 with an increment of 0.50. This led to the creation of five distinct sub-models. The outcomes of the PSO-ANFISSC sub-models are showcased in Table 4, while Fig. 4 illustrates the visual representation of observed and anticipated electricity consumption, accompanied by respective error plots. Considering the testing phase, an irregular trend was observed as the CR increased from 0.40 to 0.60. Among these sub-models, the best performance was achieved by PSO-ANFISSC1, which had the smallest CR value (0.40) and exhibited the lowest values for MAPE (8.3794%), RMSE (1.0188e + 03), and CRMSE (10.3782). PSO-ANFISSC1 demonstrated the highest forecasting accuracy of 91.6% compared to other sub-models. Nonetheless, PSO-ANFISSC2 exhibited a commendable performance in relation to MAE, recording a value of 608.8522. As revealed in Fig. 5 the regression (R = 0.67118) showed the performance of the forecasting model.

Table 4 Evaluation of the performance of PSO-ANFISSC models
Fig. 4
figure 4

Optimal PSO-ANFISSC sub-model’s actual and forecast energy consumption graph

Fig. 5
figure 5

Target versus network output for optimal PSO-ANFISSC testing data

The choice of the NoC within the neuro-fuzzy model clustered by FCM can influence performance, computational complexity, and interpretability. Striking the appropriate balance is essential, as an excess of clusters can lead to overfitting, while insufficient clusters may result in underfitting. To address this, we conducted experiments to determine the optimal NoC for our proposed PSO-ANFISFCM model, considering specific application requirements and data characteristics. Table 5 compares the performance of the PSO-ANFISFCM models in the testing phase. Figure 6 showcases the visual depiction of both the observed and anticipated electricity consumption, accompanied by the error plots that correspond to them. The sub-model PSO-ANFISFCM1, with 2 clusters, outperformed other sub-models in terms of key performance metrics. It achieved a MAPE of 7.7778%, indicating a prediction accuracy of 92.2%. Additionally, it exhibited a lower. Figure 7 shows the performance of the forecasting model with the R = 0.68647.

Table 5 Evaluation of the performance of PSO-ANFISFCM models
Fig. 6
figure 6

Optimum PSO-ANFISFCM sub-model’s actual and forecast energy consumption graph

Fig. 7
figure 7

Target versus network output for Optimal PSO-ANFISFCM testing data

MAE of 712.6094, a lower CVRMSE of 9.5464, and a lower RMSE of 909.4998. A careful observation of the show that the different number of clusters produced different results. In addition, it is possible that increasing the NoC in ANFIS-based FCM does not always lead to better performance, thus it may be necessary to carry out many tests to find the ideal quantity [9]. If the number of clusters is increased above 6, there is a possibility that there will be more ambiguity, noise, and overfitting.

When evaluating the regression R value, the optimal PSO-ANFISFCM model surpassed its counterpart, demonstrating a higher R value (0.68647) in contrast to the optimal PSO-ANFISSC model, which attained a value of 0.67118. This highlights the superior predictive capability of the PSO-ANFISFCM model in electricity prediction. This difference underscores the importance of model optimization techniques in achieving more accurate electricity prediction results. It suggests that the PSO-ANFISFCM approach may offer advantages over the PSO-ANFISSC method in capturing the complex relationships inherent in electricity consumption or generation data.

Table 6 illustrates the performances of the premier sub-models corresponding to each clustering approach. The outcomes unveiled the exceptional performance of each optimal sub-model. Notably, the PSO-ANFISFCM1 model emerged as the most accurate forecaster among them. This outcome signifies a commendable level of concurrence in the comprehensive forecast, affirming that the FCM-clustered PSO hybrid model, particularly with a reduced number of clusters (specifically, two clusters), stands as a feasible neuro-fuzzy hybrid model for precise energy consumption predictions.

Table 6 Optimal sub-model comparison

Furthermore, the performance of the optimal models was also compared with the standalone ANFIS. It can be seen that just using ANFIS in standalone mode will not provide optimal results. This further justifies why hybrid models should be considered for accurate model prediction.

3.2 Comparison of sub-optimal model with PSO variants

The inertia weight assumes a pivotal role in steering the convergence and equilibrium dynamics during the exploration–exploitation phase within the PSO algorithm. Shi and Eberhart’s innovative augmentation of the traditional PSO algorithm introduced an inertia weight, striving for an optimal equilibrium between local and global search strategies [33]. As the field progresses, a persistent effort is evident in refining the classical PSO algorithm, with a particular focus on enhancing the inertia weight (w). The algorithm’s efficacy is markedly contingent on the appropriateness of the w [48]. An excessively large w results in a deceleration of convergence, while an overly small value hastens the settlement on the best local solution. Hence, the meticulous selection of an appropriate w is paramount for ensuring the precision of PSO-ANFIS models. To address this, the utilization of three distinct PSO variants, each incorporating different w values and cognitive proposed by various researchers (e.g., PSOvar1 [49], PSOvar2 [50], and PSOvar3 [51]), has been instrumental in constructing diverse PSO-ANFIS models. The intricacies of the parameter configurations for the PSO variants are meticulously outlined in Table 7, accompanied by the explicit equations defining w for each variant.

$$({\text{a}})\quad \omega = \left( {\omega_{start} - \omega_{end} } \right)\left( {\frac{{t_{max} - t}}{{t_{max} }}} \right) + \omega_{end} \times e^{{ - \left( {\frac{t}{{\frac{{T_{max} }}{4}}}} \right)^{2} }}$$
(20)
$$\begin{aligned} ({\text{b}})\quad & \omega_{max} - (\omega_{max} - \omega_{min} ) \times \frac{4}{\pi }tan^{ - 1} \left( {\frac{t}{{t_{max} }}} \right) \\ & c_{1} = c_{1max} - (c_{1max} - c_{1min} ) \times \frac{4}{\pi }tan^{ - 1} \left( {\frac{t}{{t_{max} }}} \right) \\ \end{aligned}$$
(21)
$$\begin{aligned} ({\text{c}})\quad & z_{k + 1} = \mu \times z_{k} \times \left( {1 - z_{k} } \right) \\ & \omega_{t} = \left( {\omega_{start} - \omega_{end} } \right)\left( {\frac{{t_{max} - t}}{{t_{max} }}} \right) + \omega_{end} \times z_{k + 1} \\ \end{aligned}$$
(22)
Table 7 Configuration of parameters for the PSO variants

The comparative evaluation of the optimal model from the preceding section was conducted with new variants, and the model that exhibited the highest level of performance was ultimately chosen. Table 8 presents a comparative analysis of various PSO-ANFIS variants employed. As it may be seen in Table 8, the forecast accuracy in decreasing order is PSO-ANFISvar1 (88.2%), PSO-ANFISvar3 (90.6%), PSO-ANFISvar2 (92.1%), and PSO-ANFISFCM1 (92.2%). It can be observed that although all the models produced a commendable result the PSO-ANFISFCM1 maintained its optimal response. The present study’s findings suggest that the optimal value for the inertia weight parameter in the PSO-ANFISFCM1 is represented by the \(\omega\) parameter. An advantage of the PSO-ANFISFCM1 model lies in its incorporation of a damping factor (\(\upomega _{damp}\)), that serves to regulate the balance between the particles’ exploratory and exploitative abilities, as well as govern the transmission of positional information from the previous state [52, 53]. According to the findings, PSO-ANFISFCM1 is the most effective method for predicting the power consumption.

Table 8 Comparison of the Optimal Sub model with other PSO variants and methods

This study has shed light on the crucial role played by the selection of the data clustering technique in conjunction with other important parameters in determining the accuracy of ANFIS modeling. The results obtained clearly demonstrated that different clustering techniques can yield varying levels of precision and effectiveness in ANFIS modeling. Additionally, the impact of these parameters on the accuracy of the model cannot be understated, as they can significantly influence the performance and reliability of the ANFIS system. These findings emphasize the need for careful consideration and thoughtful selection of the data clustering technique and other relevant parameters to achieve optimal accuracy in ANFIS modeling.

In addition, the results of this analysis emphasize the enhanced precision and decreased margin of error attained through the utilization of the FCM clustering technique. This discovery aligns with prior observations, highlighting FCM’s preference among clustering methods due to its swift processing and unique feature of enabling items to belong to multiple groups, distinguishing it from alternative clustering algorithms [54]. Moreover, the FCM exhibits robustness in the face of ambiguity and possesses the capability to retain a substantially larger amount of data compared to alternative hard clustering methods [55]. In addition, it is not necessarily true that the performance of FCM clustered ANFIS model improves by simply increasing the number of clusters. Therefore, it becomes crucial to carry out multiple experiments in order to identify the optimal number of clusters for a specific model.

4 Conclusion

Predicting future energy use is essential for effective power system management and planning. Precise energy prediction has the potential to enhance energy utilization, minimize expenses, and enhance energy efficacy. Machine learning (ML) has become a potent technique for energy prediction, especially when used in conjunction with metaheuristic algorithms (MAs). The present study has examined the importance of hyperparameter tuning in hybrid neuro-fuzzy models for achieving optimal model building in the context of electricity consumption forecasting. The study has focused on selected districts in Lagos, Nigeria as a case study. The dataset was divided into a training set (70%) and a testing set (30%) to assess the accuracy and competency of the model. The PSO algorithm was employed as a means to efficiently explore the optimal values of the ANFIS hyperparameters. In light of the significance of the clustering technique and other hyperparameters in the operation of the ANFIS, this study sought to examine the effects of various clustering methods and their respective variables on the proposed PSO-ANFIS framework. The primary objective was to identify the most effective combination that yields superior prediction accuracy. Multiple sub-models were developed, analyzed, and compared using widely recognized statistical metrics such as MAPE, MAE, CVRMSE, and RMSE. Additionally, the robustness of the optimal sub-model was tested against various variants of the PSO algorithm. The experimental results revealed that the PSO-ANFIS with FCM clustering technique and 2 clusters outperformed other configurations, exhibiting the lowest MAPE (7.7778%), MAE (712.6094), CVRMSE (9.5464), and RMSE (909.4998). These findings highlight the superior accuracy and reduced error achieved by employing the FCM clustering technique in this analysis.

In further research, it could be worthwhile to consider expanding both the amount of experimental data and the number of input variables. Additionally, the impact of key parameters critical to the efficacy of the PSO algorithm, such as cognitive and social learning rate, neighborhood topology, swarm size, and velocity limit, can be examined in the hybrid PSO-ANFIS.