Abstract
During the development cycle of software projects, numerous defects and challenges have been identified, leading to prolonged project durations and escalated costs. As a result, both product delivery and defect tracking have become increasingly complex, expensive, and timeconsuming. Recognizing the challenge of identifying every software defect, it is crucial to foresee potential consequences and strive for the production of highquality products. The goal of software defect prediction (SDP) is to identify problematic locations within software code. This study presents the first experimental investigation utilizing the turbulent flow of water optimization (TFWO) in conjunction with the adaptive neurofuzzy inference system (ANFIS) to enhance SDP. The TFWO_ANFIS model is designed to address the uncertainties present in software features and predict defects with feasible accuracy. Data are divided randomly at the beginning of the model into training and testing sets to avoid the local optima and overfitting issues. By applying the TFWO approach, it adjusts the ANFIS parameters during the SDP process. The proposed model, TFWO_ANFIS, outperforms other optimization algorithms commonly used in SDP, such as particle swarm optimization (PSO), gray wolf optimization (GWO), differential evolution (DE), ant colony optimization (ACO), standard ANFIS, and genetic algorithm (GA). This superiority is demonstrated through various evaluation metrics for four datasets, including standard deviation (SD) scores (0.3307, 0.2885, 0.3205, and 0.2929), mean square error (MSE) scores (0.1091, 0.0770, 0.1026, and 0.0850), rootmeansquare error (RMSE) scores (0.3303, 0.2776, 0.3203, and 0.2926), mean bias error (MBE) scores (0.1281, 0.0860, 0.0931, and 0.2310), and accuracy scores (87.3%, 90.2%, 85.8%, and 89.2%), respectively, for the datasets KC2, PC3, KC1, and PC4. These datasets with different instances and features are obtained from an open platform called OPENML. Additionally, multiple evaluation metrics such as precision, sensitivity, confusion matrices, and specificity are employed to assess the model’s performance.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Defects are the most significant problems in the current situation, and forecasting them is a difficult procedure or process. This bug or defect’s presence increases the likelihood that the project will fail. Consequently, it may result in a drop in project quality as well as an increase in time and cost. As a result, finding these problems early in the software development life cycle (SDLC) reduces both the time and financial costs of the project as a whole. Therefore, defect prediction plays an essential role in the developing and testing phases and contributes to the success of the entire project. At the beginning of the SDLC, defects should be anticipated. For this reason, a variety of SDP models have been developed for professionals to locate the modules that are initially identified as defective [1, 2]. To meet user goals in a constrained amount of time, software engineering requires excellent quality and stability. Quality assurance teams can efficiently allocate their limited resources using SDP models to inspect and test software products [3, 4].
Initially, software businesses relied on manual testing, which consumed 27% of the project’s time and could not address all software defects. Typically, these businesses lack the resources and time to resolve every issue before product release, resulting in harm to their reputation and product value. SDP models provide a solution, allowing businesses to prioritize critical issues and allocate resources efficiently to the most defectprone code [5].
Machine learning (ML) is one of the promising methods that are having a big impact on prediction. ML is concerned with the creation of algorithms that can recognize patterns in known data to create models and then use those models to predict outcomes from unknown data. This is especially true when combined with data mining methods [6, 7]. As a result, deep learning (DL) and ML approaches have been widely used in SDP to enhance its performance.
Various methods, including support vector machine (SVM) [8], bagging [9], Naïve Bayes (NB) [10], boosting [11], C4.5 [12], random forest (RF) [13], artificial neural network (ANN) [14], and Knearest neighbor (KNN) [15], have been used in SDP. Despite the fact that these individual nonlinear machine learning algorithms outperform conventional models in SDP, these algorithms have issues with the accuracy of handling uncertainty in SDP and with overfitting and parameter optimization [7]. As a result, composite algorithms have been developed to improve prediction accuracy and address the shortcomings of single models [16,17,18]. Moreover, metaheuristic algorithms have been used in SDP to enhance the accuracy of prediction due to their ability to decrease complexity issues in real life, find the best solution, and search globally [7]. Every instance in the population offers a potential solution, and compared to other traditional approaches currently in use, they are more popular because of their intricacy and efficiency [19, 20]. According to the no free lunch (NFL) theorem [21], no single metaheuristic method can solve all optimization problems. In other words, a specific metaheuristic algorithm may produce good results in some situations but perform poorly in others.
In the context of SDP, addressing uncertainty is crucial. ANFIS, a form of soft computation, combines ANN capabilities with fuzzy inference processes. ANFIS offers strong adaptation abilities and a rapid, precise learning process [22, 23]. However, a significant challenge in realworld applications is training ANFIS parameters. Researchers prioritize adjusting these parameters for improved precision and accuracy. Various training techniques have emerged, typically categorized as probabilistic and deterministic methods.
Least square estimator (LSE) and gradient descent (GD) [24,25,26] are two examples of deterministic categories that are slow and occasionally fail to converge. Additionally, because the chain rule deployed creates the gradient computation at each step, the conventional ANFIS learning systems employ the GD algorithm, leading to a large number of local optimums. In contrast, a novel optimization technique based on TFWO is utilized in this paper. The random and natural behavior of vortices in oceans, rivers, and seas served as an inspiration for this technique [27].
In this paper, we employ a novel optimization technique based on TFWO to optimize ANFIS parameters. This optimization model takes advantage of the random and natural behavior of vortices in oceans, rivers, and seas.
The contributions of the study include the enhanced handling of uncertainty with greater accuracy in SDP through the proposed TFWO_ANFIS model. This model leverages the advantages of TFWO for adapting the ANFIS model’s parameters. The ANFIS training process uses the TFWO technique as a method for parameter adaption. The fuzzification and the defuzzification layers (premise and consequent parameters) are where the adaptive parameters are located. Four datasets were used with various evaluation criteria to assess the effectiveness of the proposed TFWO algorithm for adapting ANFIS parameters such as RMSE, MSE, SD, and accuracy. TFWO_ANFIS outperformed all other compared techniques with standard ANFIS and with specific optimization techniques [28], such as GA, DE [29], ACO, PSO [30], and GWO [31,32,33].
Given the rapid utilization of ML and artificial intelligence (AI)based softwareintensive systems in semiautonomous automobiles, recommendation systems, and various realworld applications, there are concerns about the outcomes of their use, especially when these systems have the potential to affect the environment or people, as in the case of selfdriving cars or the medical field. In such situations, addressing these uncertainties is crucial [34]. The developed model is used to predict defects in software with higher accuracy under uncertainty. The outcomes show that the recommended model TFWO_ANFIS outperformed the alternative optimization techniques in terms of the ANFIS’s training and testing error rates.
This research highlights the presence of uncertainty in software features, leading to adverse outcomes in SDP, including low product quality, increased defects during the SDLC, and extended delivery time and costs. To address this issue, a solution lies in combining the capabilities of an ANN with a fuzzy inference system known as ANFIS. The research proposed an enhanced variation of ANFIS termed turbulent flow of water optimization algorithm (TFWO) that increases ANFIS’s overall optimization performance. The proposed upgrade focuses on training ANFIS parameters with a novel optimization technique, as opposed to LSE and GD, which are timeconsuming, prone to a large number of local optima, and sometimes fail to converge. The TFWO_ANFIS model aims to better manage software metric uncertainty and predict defects with higher accuracy. Dealing with these problems leads to predicting defects in software with a feasible accuracy. Improving software performance, meeting customer needs in a short period of time, and assisting quality control teams in effectively allocating their limited assets during software system evaluation are the motivating factors behind handling uncertainty in software defect prediction and obtaining higher accuracy in the suggested model.
The following are the benefits of treating uncertainty in SDP:

1.
Models become more dependable when uncertainty is considered during software development. Additionally, appropriate software model validation helps reduce uncertainty in later phases of development.

2.
Applying software uncertainty modeling can improve decisionmaking during the development process.
The major contribution of this research can be summarized as follows:

(1)
Four datasets from NASA named KC2, PC3, KC1, and PC4 are utilized with different instances and features. They are obtained from an open platform called OPENML.

(2)
Proposed a novel model for predicting defects in software with higher accuracy in uncertain environments.

(3)
Utilizing the TFWO algorithm for adapting ANFIS’s parameter optimization rather than traditional algorithms.

(4)
Comparing the suggested TFWO_ANFIS with conventional ANFIS, ACO_ANFIS, DE_ANFIS, PSO_ANFIS, GWO_ANFIS, and GA_ANFIS.

(5)
Evaluating the suggested TFWO_ANFIS against some recent relevant metrics in SDP such as SD, MSE, RMSE, MBE, and accuracy.
The rest of this paper is structured as follows: Sect. 2 presents the related works on software defect prediction, the optimization process of ANFIS, and uncertainty analysis. Section 3 shows methods and materials. Section 4 presents the results and discussion. Finally, Sect. 5 presents conclusions and future work.
2 Related works
The related literature is organized into three subsections to precisely cover the essential topics in this research and present the latest findings in each field. First, software defect prediction is the process of identifying and rectifying flaws. In the realm of developing embedded software, this task is particularly timeconsuming and expensive due to the complex infrastructure, large scale, time constraints, and cost considerations. Measuring and achieving quality becomes a significant challenge, especially in automated systems. Second, the optimization process of ANFIS, where the ANFIS model offers the advantage of integrating linguistic and numerical expertise. Additionally, ANFIS harnesses the data categorization and pattern recognition capabilities of artificial neural networks (ANN). This organization aims to provide a comprehensive understanding of the critical aspects of this research. The ANFIS architecture is less prone to memorization problems and is clearer to the user than the ANN.
As a result, the ANFIS has a number of benefits, such as the ability to adapt, nonlinearity, and quick learning [35, 36]. Third, uncertainty analysis in SDP, especially in software features, can be handled in this research by adapting the parameters of ANFIS architecture. As a result, it is important to study the related work of these subsections in detail severally.
2.1 Software defect prediction (SDP)
Software testing is a crucial phase in the software development life cycle, as it identifies defects in the system and ensures that the software passes input test cases. Testing is not only timeconsuming but also costly. While some automated technologies can help reduce testing effort, their high maintenance costs often contribute to increased expenses. Early software defect prediction decreases work and budget greatly without compromising limitations. It highlights the modules that are more prone to defects and need more thorough testing. The difficulties in dimensionality reduction and class imbalance located in SDP, demand for a realistic and efficient defect prediction technique. Recently, ML has become a potent method for making decisions in this area [37]. SDP primarily relies on prediction models to anticipate software defects. Although various strategies and algorithms have been employed to enhance the performance, the fundamental processes of SDP are illustrated in Fig. 1 [38]:
(1) Accumulate clean and flawed code sample data from software systems; (2) collect characteristics to create a dataset; (3) adjust the source data if it is unstable; (4) train an SDP model on a set of data; (5) forecast the flawed parts for a dataset obtained from new software; and (6) assess the accuracy of the SDP model. This process involves iterations.
The process begins with gathering samples of both clean and flawed codes, as shown in Fig. 1. There are numerous formats in which software data are available, including commit messages, source codes, defect files, and other software artifacts. Typically, these data are taken from repositories and archives.
The feature extraction (collect characteristics) phase of SDP is the next stage. Software artifacts, source codes, messages, and commit logs, among others, are transformed into metrics at this phase and utilized as input data for training models. The feature extraction stage depends heavily on the type of input data, which can include McCabe metrics [39], Chidamber and Kemerer (CK) metrics [40], modification histories, assembly code, and source code. A number of DL algorithms today offer automatic feature extraction from more complicated, highdimensional data in addition to metricbased data. Defect data from wellknown open defect repositories, such as the NASA [41] and PROMISE [42] databases, have been used in types of researches in the literature.
Usually, the next stage is elective. Since defect datasets often include a lot fewer faulty parts than nonfaulty ones, this phase entails balancing the data. Consequently, this class imbalance issue affects the majority of SDP approaches, as it causes false results for various metrics used to assess SDP performance [43]. This problem can be resolved, and SDP performance can be improved by a number of methods, such as oversampling.
The fourth phase in the SDP process involves finding defective software components. Identifying suitable DL techniques, which can encompass various topologies such as convolutional neural networks and ML types, whether supervised or not, is a key consideration at this stage. Additionally, it is crucial to determine the granularity of the defective sections to be identified, which may range from file and module levels to function, class, or even phrase levels.
The following phase involves utilizing the trained model from the previous stage to forecast the flawed portions of new (test) data. The final phase of the SDP steps uses the prediction made here as its input.
The final stage of the SDP process involves evaluating the created model. Two commonly used metrics for assessing the SDP model are the area under the curve and Fmeasure. These metrics are employed when evaluating prediction models and making comparisons with other relevant studies.
Tang et al. [44] applied a swarm intelligence optimization technique to offer the model’s ideal parameters in an effort to enhance SDP. This study suggested an adaptive variable sparrow search algorithm (AVSSA) focused on different logarithmic spirals and variable hyperparameters. This work conducted AVSSA investigations on eight benchmark functions and received positive results.
Elsabagh et al. [5, 45] suggested an innovative classifier based on the spotted hyena optimizer algorithm (SHO) to anticipate defects in both single and crossprojects. SHO acts as a classifier by identifying the most suitable rules among populations. To find the optimal classification criteria, confidence and support are used as a multiobjective fitness function. These classification criteria are applied to other projects with incomplete data or new projects to forecast faults. Four software datasets from NASA were used for experiments.
Kakkar et al. [46] proposed a novel approach that relies on the ANFIS that is optimized by PSO. For improved performance, the PSOANFIS method integrates the adaptability of the ANFIS model with PSO’s capability for optimization. The dataset from varioussized opensource Java projects is used to test the presented model. They suggested an SDP modelbased PSOANFIS that provided software engineers with the amount of defects as an output. The data can then be used by engineers to allocate their limited resources, such time and labor, more effectively. The method called PSOANFIS makes use of the ANFIS model’s flexibility and employs PSO to optimize it. The PSOANFIS findings were excellent, and it can also be inferred that the size of the projects may have an impact on how well the SDP model based on PSOANFIS performs.
In response to the class imbalance issue, Somya Goyal [15] proposed the novel neighborhood undersampling (NUS) approach. This work aims to demonstrate the effectiveness of the NUS approach in accurately predicting damaged modules. NUS samples the dataset to enhance the visibility of minority data points while minimizing the removal of majority data points to avoid information loss.
Nasser et al. [47] offered robusttunedKNN (RTKNN), an ML method for SDP based on the Knearest neighbors classifier. Their work was summarized as follows: (1) adjusting KNN and determining the ideal value for k in both the testing and training stages that may produce accurate prediction outcomes. (2) Rescaling the many independent inputs using the robust scalar.
Lei Qiao et al. [48] put out a fresh strategy that makes use of DL methods to forecast the occurrence of defects. First, they refine a dataset that is openly accessible by performing data normalization and log transformation. To build the data input for the DL method, they next undertook data modeling. Third, they sent the generated data to a deep neural networkbased algorithm that was specifically created to forecast the number of faults. The following table presents a comparative study of SDP and illustrates the contribution to the most common literature review and the future possibilities for improving the SDP field.
2.2 Optimization process of ANFIS
ANFIS offers all the advantages of fuzzy systems and neural networks. However, when used for realworld applications, one of the major issues is learning ANFIS parameters. The problem of ANFIS learning has been addressed in numerous prior research using methods based on various algorithms, including the PSO, GWO, and GA.
Hasanipanah et al. [54] proposed a contemporary method for predicting rock fragmentation using the PSO method for parameter optimization in conjunction with ANFIS learning. Their model has shown efficacy when compared to SVM and multiple regression (MR) techniques.
Lin et al. [55] developed a method for learning ANFIS parameters based on the PSO. The system concentrated on applying quantum behaving PSO (QPSO) for setting the parameters of ANFIS. While the premise parameters were changed using the QPSO algorithm, the LSE was used to define the subsequent parameters.
Rahnama et al. [56] utilized ANFIS fuzzy cmeans, ANFIS subtractive clustering, ANFIS grid partitioning, and radial basis function (RBF) to anticipate the sodium adsorption rate of different areas in Iran. Also, Asadollahfardi et al. [57] used the GA algorithm to detect the optimal combination for optimizing the tracking stations of water quality. Asadollahfardi et al. [58] applied three models: fuzzy regression analysis, ANFIS, and RBF to predict the reactor efficiency of eliminating acid red 14.
In rainfall gage only areas, Aghelpour et al. [59] developed an efficient ANFIS method for agricultural drought detection, utilizing a minimal number of variables. They applied ANFIS in conjunction with bioinspired optimization methods, including ANFISPSO, ANFISGA, and ANFISACO. Among these, GA and ACO proved to be the most effective algorithms for ANFIS optimization.
On the other hand, a lot of research has gone into describing how the GA for adjusting ANFIS parameters works. For the purposes of predicting rainfall on river, Panda et al. [60] presented and applied the MR and the ANFIS method. Both methods have been used to predict the outcome as learning models. To obtain the hydrological parameter condition, the GA is next coupled with the MR training technique. The goal function’s optimal control factor value is obtained via a GA. A novel modified GA was developed by Sarkheyli et al. [61] using various population structures to improve the parameters for the fuzzy membership functions and rules of ANFIS.
Raftari et al. [31] calculated the friction strength ratio using a technique that employed twoparameter optimization methods, GA and PSO. Dehghani et al. [62] created a method for forecasting and simulating the short to longterm influence flow rate. To anticipate the quick, short, and long flow rates, ANFIS and GWO were combined. GWO optimized and modified each parameter of ANFIS.
Maroufpoor et al. [63] created a method that combined the ANFIS with the GWO. The method outperformed the SVM, neural network, and standard ANFIS methods in terms of performance. A strategy for compressive power forecasting of energy, expense, and timeframe was presented by Golafshani et al. [64]. They employed the GWO and ANFIS methodologies to modify the ANN’s initial weights and parameters. A method for whale optimization algorithm (WOA) that used 28 days for the assessment of compressive power of concrete was proposed by Bui et al. [65]. The WOA is used to optimize its computational parameters in conjunction with a neural network (NN).
2.3 Uncertainty analysis
In risk evaluation, information currently available is gathered and used to inform judgments about the risk connected to a specific stressor, such as a physical, biological, or chemical factor. Risk assessment decisions are generally not made with complete clarity, which leads to confusion and uncertainty. Risk assessment includes a section called uncertainty analysis, which concentrates on the assessment’s uncertainties. The qualitative analysis that detects the uncertainties, the quantitative analysis that examines how the uncertainties affect the decisionmaking process, and the communication of the uncertainty are crucial elements of uncertainty analysis. The problem will determine how to analyze the uncertainty [66]. The way a scientist views uncertainty frequently differs by field. A risk manager would frequently perceive uncertainty as a decisionmaking process, assessing the costs and errors of actions. Uncertainty is perceived as a bothersome element that impairs decisions.
Kläs et al. [34] proposed three effective categories for identifying the primary sources of uncertainty in practice: model fit, data quality, and scope compliance. They emphasize the significance of these categories in the context of AI and ML model development and testing by establishing connections with specific tasks and methods for assessing and addressing these uncertainties.
One of the hardest issues in medical image analysis is accurate automated medical picture classification, covering segmentation and classification. DL techniques have recently achieved success in the classification and segmentation of medical images, indeed emerging as stateoftheart techniques. However, most of these techniques are frequently overconfident and unable to offer uncertainty quantification (UQ) for their results, which can have severe effects. To solve this problem, Bayesian DL (BDL) techniques can be employed to quantify the uncertainty of conventional DL techniques. Three strategies for identifying uncertainty are used by Abdar et al. [67] to address uncertainty in the classification of skin cancer images. They are ensemble Monte Carlo (EMC) dropout, deep ensemble (DE), and Monte Carlo (MC) dropout. They offered a novel hybrid dynamic BDL method that accounts for uncertainty and relies on the threeway decision (TWD) theory to address the ambiguity or uncertainty that remains after using the MC, EMC, and DE approaches.
Walayat et al. [68] introduced a novel predictive model based on fuzzy time series, weighted averages (WA), and induced ordered weighted averages (IOWA).
A recent development in water engineering is fuzzy logic, a soft computing approach of AI. It is a fantastic mathematical tool for dealing with system uncertainty brought on by fuzziness or ambiguity. Bisht et al. [69] applied fuzzy logic modeling and ANFIS as soft computing methodologies. These systems start with some fundamental guidelines that define the procedure. To predict the elevation of the ground water table, two methods using fuzzy rules and two methods using ANFIS have been created. Out of all the generated methods, ANFIS produced the best results based on performance criteria [69].
Finally, based on the literature review, traditional techniques such as LSE and GD have been employed to modify the parameters of ANFIS [24,25,26] to handle uncertain environments. However, these techniques are often slow and may fail to converge. Furthermore, using the chain rule in conventional ANFIS learning systems, which employ the GD algorithm, can result in many local optimums. Consequently, optimizing ANFIS parameters becomes a significant issue in realworld applications to handle uncertainty and improve accuracy. Hence, there is a growing demand to learn ANFIS parameters in SDP and choose the appropriate optimization algorithm for their management. In this study, the TFWO algorithm is selected to finetune ANFIS parameters due to its stable architecture, enhanced convergence capability, and effectiveness in addressing the control parameter selection issue. TFWO is inspired by the random and natural behavior of vortices in oceans, rivers, and seas.
3 Methods and materials
In this research, methods and materials to handle uncertainty in SDP are organized into three subsections: (1) ANFIS that represents human reasoning to address uncertainty problems. Fuzzy logic is used by ANFIS to turn information connections and fully integrated components of NN inputs into the desired output. (2) The TFWO algorithm is used as an optimization algorithm for modifying the parameters of the ANFIS during the SDP process due to its efficiency and reliability. (3) Adaptation of ANFIS utilizing TFWO: This subsection demonstrates the configuration of ANFIS with TFWO. ANFIS system is trained using the TFWO algorithm to optimize its parameters. This adaptation is illustrated through the flowchart of TFWO in Fig. 5, algorithm 1, and the architecture of TFWO_ANFIS model in Fig. 6.
3.1 ANFIS: adaptive neurofuzzy inference system
Jang [70] introduced ANFIS, an AI technique that emulates human thought processes to address inaccuracies. ANFIS utilizes fuzzy logic to process inputs from integrated neural network components and information links to produce appropriate outputs. This method is a straightforward approach to data learning. ANFIS combines fuzzy logic and ANN, making it capable of handling complex nonlinear problems, imprecise data, and human cognitive uncertainty within a single structure [71]. ANFIS is a widely used significant contribution approximated where the relationship among both the input and output dimensions of the problem is represented as a collection of if–then rules.
The Mamdani fuzzy technique and the Takagi–Sugeno (T–S) fuzzy technique are two popular fuzzy rulebased inference systems [71]. The Mamdani fuzzy technique has some benefits: 1. It makes sense. 2. It is generally accepted. 3. It is compatible with human cognition [72,73,74].
The T–S system ensures output surface continuity and performs well with linear techniques [75, 76]. However, it faces challenges in handling multiparameter synthetic assessment and weighing each input while applying fuzzy rules. On the other hand, the Mamdani system is known for its readability and understandability to a broad audience. In this work, we employ the Mamdani system, which proves beneficial in output expression.
It is necessary to designate a function for each of the following operators to fully describe the behavior of a Mamdani system:

1.
For the computation of the rule firing strength with AND’ed premises, use the AND operator (often Tnorm).

2.
OR operation for estimating the firing strength of a rule with OR’ed premises (often Tconorm).

3.
An operator for computing suitable consequent membership functions (MFs) depending on the firing strength provided, often a Tnorm.

4.
An aggregate operation, typically a Tconorm, for combining qualified consecutive MFs to produce an overall output MF.

5.
A defuzzification operation that converts a sharp single output value from an output MF.
The following theorem is derived if the AND operation and implication operation are product, the aggregate operation is sum, and the defuzzification operation is centroid of area (COA) [77]. Implementing such composite inference has the benefit of allowing the Mamdani ANFIS to learn due to differentiability during processing (Table 1).
The following theorem [78] is provided by the sumproduct; look at Eqs. 1 and 2. When utilizing centroid defuzzification, the final crisp result is equal to the weighted average of the centroids of the subsequent MFs, where:
where \(\psi \left({r}_{i}\right)\) is the factor weight of \({r}_{i}\); \(i\)th is the rule; \(\omega \left({r}_{i}\right)\) is the strength of firing the rule \({r}_{i}\); and \(a\) is the area of MFs in the consequent part of the rule \({r}_{i}\).
where \({a}_{i}\) is the area, and \({z}_{i}\) is the center of the consequent.
MF \(\mu {c}_{i}(z)\). We obtain the relevant Mamdani ANFIS using Eqs. 1 and 2 as in Fig. 2.
Rule (1): If \(x\) is \({A}_{1}\) and \(y\) is \({B}_{1}\) then \({f}_{1}={\omega {\prime}}_{1}{a}_{1}\cdot {z}_{1}\)
Rule (2): If \(x\) is \({A}_{2}\) and \(y\) is \({B}_{2}\) then \({f}_{2}={\omega {\prime}}_{2}{a}_{2}\cdot {z}_{2}\)
where \({A}_{1}\) and \({A}_{2}\) are sets of fuzzy for input \(x\); \({B}_{1}\) and \({B}_{2}\) are sets of fuzzy for input \(y\).
The outcome of each layer in the fivelayer Mamdani ANFIS design is as follows [64, 71, 79,80,81].
Layer (1) Create the membership degrees \({\mu }_{A},{\mu }_{B}\)
The MF is the generalized Gaussian function which is described by two parameters (d,\(\sigma \)):
Although center d and width \(\sigma \) govern the Gaussian MF, they are sometimes referred to the parameters of premise.
Layer (2)
The product approach generates the firing strength \({\omega }_{i}\).
Layer (3)
Layer (4)
where the consequential parameters, \({ a}_{i}\) and \( {z}_{i}\), are, respectively, the area and center of the resulting MFs.
Layer (5)
As shown in Fig. 3, a general MANFIS system can be generated.
Rule (1): If \(x\) is \({A}_{1}\) and \(y\) is \({B}_{1}\) then \(Z={C}_{1}\)
Rule (2): If \(x\) is \({A}_{2}\) and \(y\) is \({B}_{2}\) then \(Z={C}_{2}\)
The outcome of each layer in the five layers of general MANFIS design is as follows.
Layer (1) Layer of fuzzification
The MF is the generalized Gaussian function which is described by two parameters (d,\(\sigma \)):
Layer (2) Layer of rules
The product approach generates the firing strength \({\omega }_{i}\)
Layer (3)
Product is the implication operator.
Layer (4) Layer of aggregation
Sum is the aggregate operator. \({C}_{i}\) Establishes the consequential parameters.
Layer (5) Layer of defuzzification
The defuzzification approach COA yields a crisp or sharp output.
The ANFIS training process employed both forward and backward training techniques to update its parameters. ANFIS improves its parameters to reduce errors between predicted and target outcomes by using a hybrid GD (gradient descent) and LSE (least squares error) estimator, as shown in 2.
In the forward pass of the learning method, node outputs progressed from layers 1 to 4, and the consequential parameters were chosen and updated using the LSE. In the backward pass, GD updated the premise parameters as error signals propagated backward from the output to the input. The NN learned and trained to select parameter values that best fit the training data.
3.2 TFWO: turbulent flow of water optimization
A novel and effective optimization technique based on TFWO is utilized in this paper. The random and natural behavior of vortices in oceans, rivers, and seas served as an inspiration for this technique. TFWO is selected due to its stable structure, which increases the power of convergence, and overcomes the issue of determining control parameters. The TFWO is utilized to locate the overall solutions in various dimensions [27]. In addition, two realworld technical field optimization challenges are addressed using TFWO, including reliability–redundancy allocation optimization for the excessive speeding security mechanism of a gas turbine and different kinds of nonlinear economic load dispatch optimization issues in energy systems. The outcomes demonstrate the TFWO algorithm’s superiority and reliability in contrast with other optimization techniques, such as metaheuristic techniques.
3.2.1 The whirlpool concept: an introduction to turbulent water flow
A whirlpool forms when water moves turbulently in a narrow, circular path, typically around a submerged obstacle like a rock. The gravitational force influences this circular motion, causing the water to follow a downwardspiraling pattern. As the water spirals, it accelerates, creating a small hole at its center, which further increases the flow speed. The formation of a whirlpool occurs as water is drawn into this central hole, causing a spinning motion [27].
3.2.2 TFWO algorithm
Seas, rivers, and oceans all have whirlpools as a random act of nature. In whirlpools, the middle of the whirlpool functions as a sucking hole, pulling the surrounding particles and objects toward its core and interior or applying centripetal force on them. In reality, a whirlpool is a body of moving water that is mostly caused by ocean tides. Where there are a few little ridges next to one another on the streamlet’s surface, whirlpools can emerge. These ridges bump into the rushing water, which then circles back around itself. This causes the water to progressively amalgamate around this circuit and form a funnel as it passes in a restricted path around the ridges. Centrifugal force is what causes the water to flow in this way. Sometimes, whirlpools near to one another interact in addition to having an impact on the particles and objects in their immediate surroundings, as shown in the next subsections [27].
3.2.2.1 The impacts of whirlpools on its set of objects and other whirlpools
The starting population \(({X}^{o}\), consisting of \({N}_{p}\) members) of the technique is equally distributed between whirlpool group or \({N}_{wh}\) sets, then the strongest object of every set of whirlpool (the item with the better objective values \((f\)) is taken into account as the whirlpool that pulls the objects \((X),\) including, \({N}_{p}{N}_{wh}\).
Every whirlpool \((wh)\) functions as a sucking hole or well and, by doing the force of centripetal on the particles in its set \((X)\), tends to bring their locations into alignment with the well’s central position. Because of this, the \(j\)th whirlpool behaves in a way that makes the position of the \(i\)th object \(({X}_{i})\) equal to the position of the \(i\)th whirlpool, i.e.,\( { X}_{i}={wh}_{j}\). However, other whirlpools give certain deviations \((\Delta {X}_{i})\), depending on how far away from the objective method \((f)\) they are \((wh{wh}_{j})\). The updated position of the \(i\)th particle or object would then be equal to \({X}_{i}^{new}={wh}_{j}{\Delta X}_{i}\). The objects \((X)\) rotate around the center of their whirlpool at their unique angle \((\theta )\). As a result, this angle changes with every iteration of the algorithm, as shown in Fig. 4.
To compute and determine \(\Delta {X}_{i}\), the farthest and closest whirlpools, or the whirlpools with the most and least weighed distance from all particles, are computed as Eq. (16), then \(\Delta {X}_{i}\) is computed as Eq. (17). To update the object’s position, apply Eq. (18).
where \({wh}_{w}\) and \({wh}_{f}\) are the whirlpools with the highest and lowest values of \({\Delta }_{t}\), respectively. \({\theta }_{i}\) is the \(i\)th particle’s angle.
3.2.2.2 Centrifugal power \(({\mathbf{F}\mathbf{E}}_{\mathbf{i}})\)
While centripetal force attracts moving objects toward the whirlpool’s center, centrifugal force pushes them away from that center, as represented in Eq. (19). If this force exceeds a randomly generated number between 0 and 1, the centrifugal operation is performed on the randomly chosen dimension according to Eq. (20).
3.2.2.3 The whirlpools’ interactions
Whirlpools interact and move around one another in a manner similar to that of a whirlpool on the particles in its surrounding as shown in Eqs. 21, 22, and 23.
where \({\theta }_{j}\) is the angle of the \(j\)th whirlpool opening.
Finally, the individual of the new particles of the whirlpool’s set is picked as a new particle if it has more strength (i.e., its value of the objective method is lower) than its related whirlpool. So, it is decided to use it as the new whirlpool in the following iteration. Therefore, all the previous steps are shown briefly in Fig. 5.
3.3 Adaptation of ANFIS utilizing TFWO
In this study, both the subsequent and antecedent (premise) parameters of the ANFIS model are adjusted using the TFWO algorithm. The ANFIS training algorithms employ the conventional hybrid optimization technique, GD_LSE, which combines GD and LSE. This traditional hybrid technique uses LSE for modifying parameter values in the forward pass and GD for parameter modification in the membership function during the backward pass, similar to back propagation (as shown in Table 2). As a result, Table 2 is updated in accordance with the proposed model, as shown in Table 3.
Traditional mathematical programming methods often fail to provide optimal solutions for realworld optimization problems due to the large number of parameters involved [27, 82]. GD and LSE are two examples of deterministic categories that are slow and occasionally fail to converge, and a major critique of GD is that it tends to stick to local minima, which is avoided by TFWO. In comparison with GD, TFWO performed the function of learning ANFIS parameters more quickly and flexibly since it is computationally less expensive. The total number of ANFIS adjustable parameters is a crucial element in the development of an ANFIS network due to the processing effort required for the adaptation process. Therefore, attention should be paid when choosing the membership categories. Better than other member functions is the Gaussian function, which simply requires the two parameters center and width as illustrated in Eq. 4.
The complete TFWO cycles with ANFIS are depicted in Fig. 6 and Algorithm 1. They outline the steps of the proposed TFWO_ANFIS as follows:

1.
Data are divided at the beginning of the model into training and testing sets. Data for both training and testing are chosen at random to avoid the local optima and overfitting issues. Data are trained using (70% of the datasets). This maintains the proper level of population variety and increases the capability of global search.

2.
Create initial ANFIS model utilizing fuzzy Cmeans clustering (FCM) to find the degree of membership. ANFIS model contains a set of premise and consequential parameters that describe the parameters of membership functions in the two parts of if–then rule. ANFIS model is created utilizing the equations described in layers from 1 to 5 in Sect. 3.1.

3.
Feed parameters of ANFIS (premise and consequential parameters) to TFWO algorithm with training data.

4.
Initialization step that includes, creating initial population randomly, assesses the fitness function of initialized population MSE and split the population into \({N}_{wh}\) sets of whirlpools.

5.
The TFWO algorithm uses its advantages iteratively, to reach its best whirlpool to modify the parameters of ANFIS based on MSE fitness function for each whirlpool.

6.
Till MaxDecades, best ANFIS model is returned, then compute the result of best ANFIS with training data.

7.
Find the result of best ANFIS with the rest testing datasets (30%).

8.
The evaluation performance is applied using SD, RMSE, MSE, MBE, and accuracy.
When comparing the target amount to the actual performance, the fitness function was measured as a mean square error (MSE) as shown in Eq. 23.
where \({out}_{m}\) is the target (desired outcome), \({out}_{m}^{\Lambda }\) is the predicted outcome, and \(K\) is the volume of data.
As depicted in Fig. 6, the initial stage of the model involves randomly splitting the data into training and testing sets to avoid issues such as local optima and overfitting. This random selection maintains population diversity and enhances global search capabilities. Additionally, the ANFIS system design utilizes fuzzy Cmeans clustering (FCM) for identifying the degree of membership. The model includes a set of premise and consequential parameters that describe the parameters of membership functions in the two parts of if–then rule. The parameters of ANFIS are used as input for TFWO algorithm. TFWO creates an initial population randomly, assesses the fitness function of initialized population by MSE, and splits the population into N_{wh} sets of whirlpools. After computing fitness function, these processes are repeated till the maximum of iterations. Afterwards, the system finds the result of best ANFIS with testing data. The evaluation performance is applied using SD, RMSE, MSE, and accuracy. Finally, evaluating the performance of TFWO_ANFIS is performed utilizing accuracy metric.
4 Results and discussion
This section details the evaluation of TFWO_ANFIS efficiency. The experiment assesses the effectiveness and efficiency of the TFWO_ANFIS model in addressing uncertainty in the SDP field with higher accuracy and achieving the lowest error on four datasets obtained from OPENML [83]. This experiment marks the first use of TFWO with ANFIS to enhance SDP. The TFWO_ANFIS model is designed to better manage software metrics’ uncertainty and predict defects with higher accuracy. We compare TFWO_ANFIS with conventional ANFIS, ACO_ANFIS, DE_ANFIS, PSO_ANFIS, GWO_ANFIS, and GA_ANFIS. The evaluation of TFWO_ANFIS against recent relevant studies in SDP demonstrates its superior performance over all other techniques.
4.1 Evaluation performance
To evaluate the effectiveness of the recommended TFWO ANFIS technique and the performance of the results, various metrics are employed. These metrics are as follows:

1.
MSE:
$$ {\text{MSE}} = \frac{{\mathop \sum \nolimits_{m = 1}^{K} \left( {out_{m}  out_{m}^{{\Lambda }} } \right)^{2} }}{K} $$(25)where \({out}_{m}\) is the target (desired output), \({out}_{m}^{\Lambda }\) is the predicted output, and \(K\) is the size of data.

2.
RMSE:
$$ {\text{RMSE}} = \sqrt {\frac{{\mathop \sum \nolimits_{m = 1}^{K} \left( {out_{m}  out_{m}^{{\Lambda }} } \right)^{2} }}{K}} $$(26)

3.
SD:
$$ {\text{SD}} = \sqrt {\frac{{\mathop \sum \nolimits_{m = 1}^{K} \left( {X_{m}  \mu } \right)^{2} }}{m}} $$(27)

4.
Mean bias error (MBE):
$$ {\text{MBE}} = \frac{1}{k}\mathop \sum \limits_{m = 1}^{k} \left {\frac{{out_{m}  out_{m}^{{\Lambda }} }}{{out_{m}^{{\Lambda }} }}} \right $$(28)where \({X}_{m}\) is every value from population, \(\mu \) is the mean, and \(m\) is the size.

5.
Accuracy (ACC):
$$ {\text{ACC}} = \left( {\text{TP + TN}} \right)/ ({\text{T P + TN + FP + FN)}} $$(29) 
6.
Specificity (SP):
$$ {\text{SP}} = {\text{ T N}}/ \left( {{\text{T N}} + {\text{F P}}} \right) $$(30) 
7.
Sensitivity (S):
$$ S = {\text{TP}}/ \left( {{\text{TP}} + {\text{FN}}} \right) $$(31) 
8.
Precision (P):
$$ P = {\text{T P}}/ \left( {{\text{T P}} + {\text{ F P}}} \right) $$(32)
The model is considered suitable for training when MBE equals zero. A negative MBE suggests an underestimated model, while a positive MBE indicates overestimations during the training phase [58].
where TP, TN, FP, and FN are shown in confusion matrix’s, Table 4.
A common way to display the efficiency of a classification technique is by using a confusion matrix [84]. This matrix includes both the predicted class value and its corresponding actual class. These values are employed to assess the classifier’s performance, as shown in Table 4.
4.2 TFWO_ANFIS evaluation
4.2.1 Tools and environment
This subsection includes four software defect datasets obtained from OPENML [83]. These datasets are used to evaluate the effectiveness and efficiency of the proposed technique (TFWO_ANFIS) in addressing uncertainty issues in the field of software defect prediction (SDP). These datasets were selected based on their variations in sample sizes, features, and the number of defects, which reflects the diversity needed for the study’s accuracy. These datasets include essential information for SDP, and they were made publicly available to support the development of reliable, measurable, debatable, and enhanced software development prediction models. These datasets originated from source code extractors by McCabe and Halstead, designed to accurately define code aspects related to software quality, such as lines of code, cyclomatic complexity, volume, Halstead’s line count, unique operators, and operands. Detailed characteristics of these datasets are presented in the following table (Table 5).
In this experiment, the TFWO_ANFIS model is tested against various metaheuristic methods, including ACO, PSO, GWO, standard ANFIS, DE [85], and GA. The dataset is split into 70% for training and 30% for testing. Parameters for each algorithm are found in Table 6. The experiments were conducted on a system running Windows 10 Pro (64bit) with an Intel(R) Core(TM) i5 CPU and 4 GB of RAM. MATLAB (R2016a) [86] was used for all implementations.
All optimization techniques have the following parameters: maximum decades (iteration) = 100, size of population = 93 according to this equation of TFWO \({N}_{pop}={Nw}_{h}+{Nw}_{h}*{N}_{obw}\) such that \({Nw}_{h}=3\), \({N}_{obw}=30\), the upper and lower bound are 10 and − 10, respectively (Fig. 7).
4.2.2 Output of experiment
The experiment used common metrics such as accuracy, RMSE, precision, SD, specificity, sensitivity, and MSE to evaluate the TFWO_ANFIS model’s performance in optimizing ANFIS parameters. Average results for the ten experiments are presented in Tables 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, and 16 and Figs. 8, 9, 10, and 11, respectively. These figures and tables demonstrate that TFWO_ANFIS outperforms the other algorithms across all four datasets. Tables 10, 12, 14, 16, and 8 demonstrate that TFWO_ANFIS outperformed all other algorithms in terms of accuracy across all datasets used in this experiment. This validates the effectiveness and efficiency of our recommended model, which can enhance ANFIS parameter tuning. Additionally, a typical metric for optimization techniques is the convergence rate [87]. As shown in Fig. 7, convergence describes how a solution’s progression through the iterations to an appropriate point in less time. In this study, the TFWO converges at a rate of about 3% of the total number of iterations. As a result, the bestfit individual is satisfied more quickly.
Table 7 and Fig. 9 show the MSE metric that can be calculated according to Eq. 25. They show the comparison between our proposed TFWO_ANFIS with common metaheuristic optimization techniques in the literature such as PSO, GA, GWO, DE, ACO, and standard ANFIS. The scores of the proposed TFWO_ANFIS in terms of MSE are 0.1091, 0.0770, 0.1026, and 0.0850 for the KC2, PC3, KC1, and PC4 datasets, respectively. Table 8 and Fig. 10 present RMSE metrics that can be computed regards to Eq. 26. Our proposed model (TFWO_ANFIS) scores lowest results 0.3303, 0.2776, 0.3203, and 0.2926 in terms of RMSE for KC2, PC3, KC1, and PC4 datasets, respectively. SD metric presented in Table 9 and Fig. 11 is calculated as shown in Eq. 27. It is also used to evaluate the performance of proposed TFWO_ANFIS with other techniques. The SD of the proposed model scores 0.3307, 0.2885, 0.3205, and 0.2929. From Tables 7 and 9 and Figs. 9 and 11, the MSE, RMSE, and SD of TFWO_ANFIS are the lowest, so the proposed model has a better performance.
Table 10 displays the confusion matrix results for the TFWO_ANFIS applied to the KC2 dataset. From this table, evaluation metrics such as P, SP, S, and ACC can be calculated using Eqs. 29, 30, 31, and 32. Accuracy is one of the most important metrics in this study, and the proposed TFWO_ANFIS achieves the highest accuracy of 87.3%, outperforming other techniques.
Tables 12 and 13 describe the confusion matrix and comparative between the proposed TFWO_ANFIS with other metaheuristic techniques on the PC3 dataset. TFWO_ANFIS scores 90.2% in terms of accuracy that is the best score.
Table 15, derived from Table 14, provides a comparison between TFWO_ANFIS and other techniques using the KC1 dataset. The confusion matrix for the tested KC1 dataset is presented in Table 13. When TFWO_ANFIS is applied to the test data, it achieves the highest accuracy among all techniques, scoring 85.8%.
Finally, TFWO_ANFIS is applied to the PC4 dataset. The results are shown in Tables 16 and 17. Table 16 represents the confusion matrix resulting from applying TFWO_ANFIS on the tested PC4 dataset, and Table 17 describes the comparative analysis between the proposed TFWO_ANFIS with other techniques. Table 16 shows that TFWO_ANFIS has better accuracy than others, with a score 89.2%.
Table 18 presents the most common metrics for evaluating the model, such as MSE, MBE, RMSE, and SD. This table depicts the different datasets utilized in the proposed research KC2, PC3, KC1, and PC4 with their number of data and features, respectively. This table concludes the outperformance of the TFWO_ANFIS.
4.2.3 Result discussion
The research results offer several advantages in the field of software defect prediction (SDP) and related areas. Firstly, when compared to optimization algorithms such as PSO, GWO, DE, ACO, standard ANFIS, and GA, the TFWO_ANFIS model demonstrates superior accuracy in predicting software defects. This enhanced accuracy is valuable for software development teams and organizations as it enables them to identify and address potential issues early, thereby improving software quality and reliability. Secondly, thanks to the underlying TFWO algorithm, the TFWO_ANFIS model provides stability and convergence power, ensuring consistent performance across various datasets and instances. This stability makes it a reliable choice for realworld applications. Furthermore, the proposed TFWO_ANFIS model effectively handles uncertainty in software features, a common issue in realworld software engineering. The TFWO_ANFIS model solves this problem by offering a more accurate defect prediction, enabling quality assurance teams to allocate their resources and efforts. Also, the research findings’ practical usefulness is improved by the use of publicly available datasets from platforms such as OPENML. The model’s performance and accuracy may be verified and extended to other software development scenarios and contexts by using realworld datasets.
There are four datasets explained in Table 4, namely, KC2, PC3, KC1, and PC4, with different instances and features in our experiment to examine and assess the effectiveness and efficiency of the proposed TFWO_ANFIS in handling uncertainty in software features. In every tested dataset, the TFWO_ANFIS produced good results.
Case KC2 TFWO_ANFIS results in 87.3%, 0.1091, 0.1281, 0.3303, and 0.3307 in terms of accuracy, MSE, MBE, RMSE, and SD, respectively.
Case PC3 TFWO_ANFIS achieves 90.2%, 0.0770, 0.0860, 0.2776, and 0.2885 in terms of accuracy, MSE, MBE, RMSE, and SD, respectively.
Case KC1 TFWO_ANFIS fulfills 85.8%, 0.1026, 0.0931, 0.1026, and 0.3205 in terms of accuracy, MSE, MBE, MBE, RMSE, and SD, respectively.
Case PC4 TFWO_ANFIS obtains 89.2%, 0.0850, 0.2310, 0.2926, and 0.2929 in terms of accuracy, MSE, MBE, RMSE, and SD, respectively.
These cases conclude that the TFWO_ANFIS outperformed the traditional ANFIS model and other metaheuristic optimization techniques such as GA, PSO, GWO, ACO, and DE in terms of training and testing accuracy while also having the lowest error rate. The outcomes show that the suggested TFWO_ANFIS performed better than all of them in terms of accuracy, MSE, SD, and RMSE.
This study has significant theoretical and practical implications. Theoretical implications arise from addressing the limitations of conventional methods such as LSE and GD when optimizing ANFIS parameters in uncertain scenarios. The research enhances optimization strategies for handling uncertainty and improving software defect prediction (SDP) accuracy through the introduction of the TFWO algorithm.
In practical terms, the TFWO_ANFIS model offers valuable applications. Its improved convergence power and stable architecture allow for efficient adjustment of ANFIS parameters, resulting in enhanced SDP accuracy. The model proves its utility in practice by outperforming alternative optimization algorithms across various evaluation measures. The study also emphasizes the importance of effective algorithm selection and parameter optimization in SDP. However, it is crucial to be aware of practical considerations, such as the additional time required for configuration and the complexity of implementing the suggested algorithm. These insights provide valuable guidance for those considering the use of the TFWO_ANFIS model in software defect prediction and related fields. This research contributes to the fields of software engineering and optimization, highlighting both theoretical advancements and their practical applications. It holds value for both researchers and professionals in the industry.
To summarize, the characteristics of the ANFIS, which is used to anticipate software defects, are the primary parameters the proposed research attempted to enhance in the suggested study. The ANFIS system combined the interpretability of fuzzy logic with the learning powers of NNs. To achieve accurate predictions in conventional ANFIS learning systems, characteristics such as membership function shapes, the number of fuzzy rules, and consequent parameters are essential. Nevertheless, there is a big issue in optimizing these parameters when there is uncertainty and when SDP is involved. The proposed research aimed to use TFWO to enhance the parameters of ANFIS in SDP, increasing accuracy and handling uncertainty.
The no free lunch theorem asserts that no single optimization algorithm can effectively address every optimization problem. Therefore, it is important to recognize that the TFWO algorithm may not be suitable for all optimization issues. Additionally, the proposed model may require additional iterations during the training process. While TFWO with ANFIS demonstrates efficiency, it is worth noting that implementing the proposed algorithm can be quite complex, and configuring it may take more time.76
5 Conclusions and future work
This study introduces a model called TFWO_ANFIS to address uncertainty in SDP with improved accuracy. Unlike traditional methods such as GD and LSE, TFWO_ANFIS leverages the turbulent flow of water optimization (TFWO) to optimize parameters in the adaptive neurofuzzy inference system (ANFIS), including membership function shapes, fuzzy rule numbers, and consequent parameters.
The proposed TFWO_ANFIS outperformed other optimization algorithms and recent literature in SDP such as particle swarm optimization (PSO), gray wolf optimization (GWO), differential evolution (DE), ant colony optimization (ACO), standard ANFIS, and genetic algorithm (GA) in terms of standard deviation (SD), mean square error (MSE), mean bias error (MBR), rootmeansquare error (MSE), and accuracy. Four datasets with different instances and features from open platform for publishing datasets called OPENML are utilized. The proposed TFWO_ANFIS had an accuracy 87.3%, 90.2%, 85.8%, and 89.2%, respectively, for the datasets KC2, PC3, KC1, and PC4. Moreover, many evaluation metrics are utilized such as precision, sensitivity, confusion matrices, and specificity.
The results indicate that TFWO_ANFIS has better and outperformed than the previous algorithms across all four datasets. Moreover, they showed that the suggested TFWO_ANFIS outscored all other algorithms in all used datasets in terms of accuracy and other evaluation metrics. Finally, this experiment validates the effectiveness and efficiency of the recommendation model and can be used to enhance the method for ANFIS’s parameter tuning to handle uncertainty in SDP with higher accuracy.
Future research is expected to enhance the described TFWO_ANFIS model by incorporating additional realworld fields and datasets. Addressing software feature uncertainty in SDP with alternative methods is also considered a critical challenge.
Data availability
The datasets created during the ongoing work are accessible from the open platform for publishing datasets, called OPENML, https://www.openml.org/search?type=data.
References
Pavana MS, Pushpalatha MN, Parkavi A (2022) Software fault prediction using machine learning algorithms. In: Sengodan T, Murugappan M, Misra S (eds) Advances in electrical and computer technologies. ICAECT 2021. Lecture Notes in Electrical Engineering, vol 881. Springer, Singapore. https://doi.org/10.1007/9789811911118_16
Wahono RS, Suryana N (2013) Combining particle swarm optimization based feature selection and bagging technique for software defect prediction. Int J Softw Eng Appl 7:153–166
Nam J (2014) Survey on software defect prediction. Department of Compter Science and Engineerning, the Hong Kong University of Science and Technology Tech Rep
Raukas H Some Approaches for software defect prediction
Elsabagh MA, Farhan MS, Gafar MG (2021) Metaheuristic optimization algorithm for predicting software defects. Expert Syst 38:e12768
Kuang B, Tekin Y, Mouazen AM (2015) Comparison between artificial neural network and partial least squares for online visible and near infrared spectroscopy measurement of soil organic carbon, pH and clay content. Soil Tillage Res 146:243–252
ElHasnony IM, Barakat SI, Mostafa RR (2020) Optimized ANFIS model using hybrid metaheuristic algorithms for Parkinson’s disease prediction in IoT environment. IEEE Access 8:119252–119270
Goyal S (2022) Effective software defect prediction using support vector machines (SVMs). Int J Syst Assur Eng Manag 13:681–696
Kuncheva LI, Skurichina M, Duin RPW (2002) An experimental study on diversity for bagging and boosting with linear classifiers. Inf fus 3:245–258
Okutan A, Yıldız OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19:154–181
Aljamaan HI, Elish MO (2009) An empirical study of bagging and boosting ensembles for identifying faulty classes in objectoriented software. In: 2009 IEEE symposium on computational intelligence and data mining. IEEE, pp 187–194
Li B, Shen B, Wang J, et al (2014) A scenariobased approach to predicting software defects using compressed C4. 5 model. In: 2014 IEEE 38th annual computer software and applications conference. IEEE, pp 406–415
Alshammari FH (2022) Software Defect prediction and analysis using enhanced random forest (extRF) technique: a business process management and improvement concept in IOTbased application processing environment. Mob Inf Syst
Khan MA, Elmitwally NS, Abbas S et al (2022) Software defect prediction using artificial neural networks: a systematic literature review. Sci Program
Goyal S (2022) Handling classimbalance with KNN (neighbourhood) undersampling for software defect prediction. Artif Intell Rev 55:2023–2064
Khosravi K, Daggupati P, Alami MT et al (2019) Meteorological data mining and hybrid dataintelligence models for reference evaporation simulation: a case study in Iraq. Comput Electron Agric 167:105041
Yaseen ZM, Mohtar WHMW, Ameen AMS et al (2019) Implementation of univariate paradigm for streamflow simulation using hybrid datadriven model: case study in tropical region. IEEE Access 7:74471–74481
Yaseen ZM, Ebtehaj I, Kim S et al (2019) Novel hybrid dataintelligence model for forecasting monthly rainfall with uncertainty analysis. Water 11:502
Dhiman G, Kumar V (2017) Spotted hyena optimizer: a novel bioinspired based metaheuristic technique for engineering applications. Adv Eng Softw 114:48–70
Dhiman G, Kumar V (2019) Spotted hyena optimizer for solving complex and nonlinear constrained engineering problems BT—Harmony search and nature inspired optimization algorithms. In: Yadav N, Yadav A, Bansal JC et al (eds) Springer, Singapore, pp 857–867
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evolut Comput 1:67–82
Allawi MF, Jaafar O, Mohamad Hamzah F et al (2018) Reservoir inflow forecasting with a modified coactive neurofuzzy inference system: a case study for a semiarid region. Theor Appl Climatol 134:545–563
Sharafati A, Tafarojnoruz A, Shourian M, Yaseen ZM (2020) Simulation of the depth scouring downstream sluice gate: the validation of newly developed dataintelligent models. J HydroEnviron Res 29:20–30
Enayatollahi H, Fussey P, Nguyen BK (2020) Modelling evaporator in organic Rankine cycle using hybrid GDLSE ANFIS and PSO ANFIS techniques. Therm Sci Eng Prog 19:100570
Silarbi S, Tlemsani R, Bendahmane A (2021) Hybrid PSOANFIS for speaker recognition. Int J Cognit Inform Nat Intell 15:83–96
Qiao J, Sun Z, Meng X (2023) Interval type2 fuzzy neural network based on active semisupervised learning for nonstationary industrial processes. IEEE Trans Autom Sci Eng
Ghasemi M, Davoudkhani IF, Akbari E et al (2020) A novel and effective optimization algorithm for global optimization and its engineering applications: turbulent flow of waterbased optimization (TFWO). Eng Appl Artif Intell 92:103666
Jing W, Yaseen ZM, Shahid S et al (2019) Implementation of evolutionary computing models for reference evapotranspiration modeling: short review, assessment and possible future research directions. Eng Appl Comput Fluid Mech 13:811–823
Rauf HT, Bangyal WHK, Lali MI (2021) An adaptive hybrid differential evolution algorithm for continuous optimization and classification problems. Neural Comput Appl 33:10841–10867
Pervaiz S, UlQayyum Z, Bangyal WH, et al (2021) A systematic literature review on particle swarm optimization techniques for medical diseases detection. Comput Math Methods Med
Moayedi H, Raftari M, Sharifi A et al (2020) Optimization of ANFIS with GA and PSO estimating α ratio in driven piles. Eng Comput 36:227–238
Tien Bui D, Khosravi K, Li S et al (2018) New hybrids of anfis with several optimization algorithms for flood susceptibility modeling. Water 10:1210
Ahmadlou M, Karimi M, Alizadeh S et al (2019) Flood susceptibility assessment using integration of adaptive networkbased fuzzy inference system (ANFIS) and biogeographybased optimization (BBO) and BAT algorithms (BA). Geocarto Int 34:1252–1272
Kläs M, Vollmer AM (2018) Uncertainty in machine learning applications: a practicedriven classification of uncertainty. In: International conference on computer safety, reliability, and security. Springer, Singapore, pp 431–438
Srisaeng P, Baxter GS, Wild G (2015) An adaptive neurofuzzy inference system for forecasting Australia’s domestic low cost carrier passenger demand. Aviation 19:150–163
Şahin M, Erol R (2017) A comparative study of neural networks and ANFIS for forecasting attendance rate of soccer games. Math Comput Appl 22:43
Anand K, Jena AK (2023) Software defect prediction: an ML approachbased comprehensive study. In: Communication, software and networks. Springer, Singapore, pp 497–512
Giray G, Bennin KE, Köksal Ö et al (2023) On the use of deep learning in software defect prediction. J Syst Softw 195:111537
McCabe T, Meqsure AC (1976) A complexity measure. IEEE Tran Softw Eng 2(4):308–320. https://doi.org/10.1109/TSE.1976.233837
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Tans Softw Eng 20(6):476–493. https://doi.org/10.1109/32.295895
Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the nasa software defect datasets. IEEE Trans Softw Eng 39:1208–1215
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering, pp 1–10
Bennin KE, Toda K, Kamei Y, et al (2016) Empirical evaluation of crossrelease effortaware defect prediction models. In: 2016 IEEE international conference on software quality, reliability and security (QRS). IEEE, pp 214–221
Tang Y, Dai Q, Yang M et al (2023) Software defect prediction ensemble learning algorithm based on adaptive variable sparrow search algorithm. Int J Mach Learn Cybern 14(6):1–21
Elsabagh MA, Farhan MS, Gafar MG (2020) Crossprojects software defect prediction using spotted hyena optimizer algorithm. SN Appl Sci 2:538. https://doi.org/10.1007/s4245202023204
Kakkar M, Jain S, Bansal A, Grover PS (2021) An optimized software defect prediction model based on PSOANFIS. Recent Adv Comput Sci Commun (Former Recent Patents Comput Sci) 14:2732–2741
Nasser AB, Ghanem W, AbdulQawy ASH, et al (2023) A robust tuned Knearest neighbours classifier for software defect prediction. In: International conference on emerging technologies and intelligent systems. Springer, Singapore, pp 181–193
Qiao L, Li X, Umer Q, Guo P (2020) Deep learning based software defect prediction. Neurocomputing 385:100–110
Bejjanki KK, Gyani J, Gugulothu N (2020) Class imbalance reduction (CIR): a novel approach to software defect prediction in the presence of class imbalance. Symmetry 12:407
Agrawal P, Abutarboush HF, Ganesh T, Mohamed AW (2021) Metaheuristic algorithms on feature selection: a survey of one decade of research (2009–2019). IEEE Access 9:26766–26791
Suresh Kumar P, Behera HS, Nayak J, Naik B (2021) Bootstrap aggregation ensemble learningbased reliable approach for software defect prediction by using characterized code feature. Innov Syst Softw Eng 17:355–379
Goyal S (2020) Heterogeneous stacked ensemble classifier for software defect prediction. In: 2020 sixth international conference on parallel, distributed and grid computing (PDGC). IEEE, pp 126–130
Oloduowo AA, Raheem MO, Ayinla FB, Ayeyemi BM (2020) Software defect prediction using metaheuristicbased feature selection and classification algorithms. Ilorin J Comput Sci Inf Technol 3:23–39
Hasanipanah M, Amnieh HB, Arab H, Zamzam MS (2018) Feasibility of PSO–ANFIS model to estimate rock fragmentation produced by mine blasting. Neural Comput Appl 30:1015–1024
Lin X, Sun J, Palade V, et al (2012) Training ANFIS parameters with a quantumbehaved particle swarm optimization algorithm. In: International conference in swarm intelligence. Springer, Singapore, pp 148–155
Rahnama E, Bazrafshan O, Asadollahfardi G (2020) Application of datadriven methods to predict the sodium adsorption rate (SAR) in different climates in Iran. Arab J Geosci 13:1–19
Asadollahfardi G, Heidarzadeh N, Mosalli A, Sekhavati A (2018) Optimization of water quality monitoring stations using genetic algorithm, a case study, Sefidrud river. Iran Adv Environ Res 7:87–107
Asadollahfardi G, Afsharnasab M, Rasoulifard MH, Tayebi Jebeli M (2022) Predicting of acid red 14 removals from synthetic wastewater in the advanced oxidation process using artificial neural networks and fuzzy regression. Rend Lincei Scienze Fis e Nat 33:115–126
Aghelpour P, BahramiPichaghchi H, Kisi O (2020) Comparison of three different bioinspired algorithms to improve ability of neuro fuzzy approach in prediction of agricultural drought, based on three different indexes. Comput Electron Agric 170:105279
Ghose DK, Panda SS, Swain PC (2013) Prediction and optimization of runoff via ANFIS and GA. Alex Eng J 52:209–220
Sarkheyli A, Zain AM, Sharif S (2015) Robust optimization of ANFIS based on a new modified GA. Neurocomputing 166:357–366
Dehghani M, Seifi A, RiahiMadvar H (2019) Novel forecasting models for immediateshortterm to longterm influent flow prediction by combining ANFIS and grey wolf optimization. J Hydrol 576:698–725
Maroufpoor S, Maroufpoor E, BozorgHaddad O et al (2019) Soil moisture simulation using hybrid artificial intelligent model: hybridization of adaptive neuro fuzzy inference system with grey wolf optimizer algorithm. J Hydrol 575:544–556
Golafshani EM, Behnood A, Arashpour M (2020) Predicting the compressive strength of normal and highperformance concretes using ANN and ANFIS hybridized with grey wolf optimizer. Constr Build Mater 232:117266
Tien Bui D, Abdullahi MM, Ghareh S et al (2021) Finetuning of neural computing using whale optimization algorithm for predicting compressive strength of concrete. Eng Comput 37:701–712
Smith E (2002) Uncertainty analysis. Encycl Environ 4:2283–2297
Abdar M, Samami M, Mahmoodabad SD et al (2021) Uncertainty quantification in skin cancer classification using threeway decisionbased bayesian deep learning. Comput Biol Med 135:104418
Hussain W, Merigo JM, Raza MR (2022) Predictive intelligence using ANFISinduced OWAWA for complex stock market prediction. Int J Intell Syst 37:4586–4611
Bisht DCS, Raju M, Joshi M (2009) Simulation of water table elevation fluctuation using fuzzylogic and ANFIS. Comput Model New Technol 13:16–23
Jang JS (1993) ANFIS: adaptivenetworkbased fuzzy inference system. IEEE Trans Syst Man Cybern 23:665–685
Chai Y, Jia L, Zhang Z (2009) Mamdani model based adaptive neural fuzzy inference system and its application. Int J Comput Inf Eng 3:663–670
Mamdani EH, Assilian S (1999) An experiment in linguistic synthesis with a fuzzy logic controller. Int J Hum Comput Stud 51:135–147
Mamdani EH, Gaines BR (1981) Fuzzy reasonings and its applications. Academic Press, Inc, Cambridge
Mamdani EH (1977) Application of fuzzy logic to approximate reasoning using linguistic synthesis. IEEE Trans Comput 26:1182–1191
Takagi T, Sugeno M (1983) Derivation of fuzzy control rules from human operator’s control actions. IFAC Proc 16:55–60
Takagi T, Sugeno M (1985) Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern 15(1):116–132. https://doi.org/10.1109/TSMC.1985.6313399
Yager RR, Filev DP (1993) SLIDE: a simple adaptive defuzzification method. IEEE Trans fuzzy Syst 1:69
Jang JSR, Sun CT, Mizutani E (1997) Neurofuzzy and soft computinga computational approach to learning and machine intelligence [Book Review]. IEEE Trans Autom Control 42:1482–1484
Hassan N, Ghazali R, Hussain K (2017) Training ANFIS using catfishparticle swarm optimization for classification. In: Recent advances on soft computing and data mining: the second international conference on soft computing and data mining (SCDM2016), Bandung, Indonesia, August 18–20, 2016 Proceedings Second. Springer, pp 201–210
Negnevitsky M (2005) Artificial intelligence: a guide to intelligent systems. Pearson education
Salih SQ, Allawi MF, Yousif AA et al (2019) Viability of the advanced adaptive neurofuzzy inference system model on reservoir evaporation process simulation: case study of Nasser Lake in Egypt. Eng Appl Comput Fluid Mech 13:878–891
Ghasemi M, Taghizadeh M, Ghavidel S, Abbasian A (2016) Colonial competitive differential evolution: an experimental study for optimal economic load dispatch. Appl Soft Comput 40:342–363
OpenML (2022) https://www.openml.org/search?type=data. Accessed 9 Dec 2022
Antaki F, Coussa RG, Kahwati G et al (2023) Accuracy of automated machine learning in classifying retinal pathologies from ultrawidefield pseudocolour fundus images. Br J Ophthalmol 107:90–95
Sun J, Zhang Q, Tsang EPK (2005) DE/EDA: a new evolutionary algorithm for global optimization. Inf Sci (Ny) 169:249–262
Products and services—MATLAB & Simulink, MATLAB & Simulink, https://www.mathworks.com/downloads/web_downloads/?s_tid=sp_ban_dl. Accessed 9 Dec 2022
Reiszadeh M, Narimani H, Fazel MS (2023) Improving convergence properties of autonomous demand side management algorithms. Int J Electr Power Energy Syst 146:108764
Funding
Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB). This study is unfunded and is being submitted to get a PhD’s degree.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Elsabagh, M.A., Emam, O.E., Gafar, M.G. et al. Handling uncertainty issue in software defect prediction utilizing a hybrid of ANFIS and turbulent flow of water optimization algorithm. Neural Comput & Applic 36, 4583–4602 (2024). https://doi.org/10.1007/s00521023093150
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521023093150