Abstract
Artificial neural network (ANN) which is an information processing technique developed by modeling the nervous system of the human brain is one of the most powerful learning methods today. One of the factors that make ANN successful is its training algorithm. In this paper, an improved butterfly optimization algorithm (IBOA) based on the butterfly optimization algorithm was proposed for training the feedforward artificial neural networks. The IBOA algorithm has the chaotic property which helps optimization algorithms to explore the search space more dynamically and globally. In the experiments, ten chaotic maps were used. The success of the IBOA algorithm was tested on 13 benchmark functions which are well known to those working on global optimization and are frequently used for testing and analysis of optimization algorithms. The Tentmapped IBOA algorithm outperformed the other algorithms in most of the benchmark functions. Moreover, the success of the IBOAMLP algorithm also has been tested on five classification datasets (xor, balloon, iris, breast cancer, and heart) and the IBOAMLP algorithm was compared with four algorithms in the literature. According to the statistical performance metrics (sensitivity, specificity, precision, F1score, and Friedman test), the IBOAMLP outperformed the other algorithms and proved to be successful in training the feedforward artificial neural networks.
1 Introduction
Today, computers and computer systems have been intertwined with our lives and have become an inseparable part. Computers are used in almost every aspect of our lives. While computers were only performing calculations or data transfers in the past, over time they have turned into more effective machines that can analyze large amounts of data and make comments about events using this data. Today, this development continues, and computers have gained the ability to both make decisions about events and learn the relationship between events. Thus, problems that could not be formulated mathematically and could not be solved have begun to be solved by computers. One of the most important factors of this advancement in computing is the field of artificial intelligence and artificial neural networks (ANNs) are one of the subjects with the most research in the field of artificial intelligence.
The first artificial neural network modeling work was put forward by Warren McCulloch and Walter Pitts in 1943. ANN is one of the most important inventions in its field, developed by taking the movement of the human brain as a role model. ANN is used in many areas such as classification, recognition, prediction (Tümer et al. 2020), and optimization (Madenci and Gülcü 2020). ANN which was developed by imitating the human brain has proven its success in the field of optimization as in many other fields.
ANN, which is one of the most powerful learning methods today, is used to predict and classify unknown functions. The most commonly used ANN method while performing these operations is the multilayer perceptron (MLP). The ability of ANN to give more accurate results and to make more successful classifications is ensured by updating the bias and weight values in the most appropriate way. ANN which is trained with optimum values can reach more accurate results in finding classification values. Many researchers have proposed algorithms in the literature to train multilayer perceptron. However, gradient techniques from these proposed algorithms often encounter problems in solving optimization problems in the real world. They get stuck in the local optimum and produce poorquality solutions (Gülcü 2022b; Mirjalili 2015; Tang et al. 2018). To overcome these difficulties, metaheuristic algorithms have been used to train ANNs (Gülcü 2022a; Jaddi and Abdullah 2018). Metaheuristic algorithms, designed by taking the biological movements of living things as role models such as hunting, reproduction, and feeding, aim to find the optimum result to problems in a reasonable time. One of these metaheuristic algorithms is the butterfly optimization algorithm (BOA) which is based on swarm intelligence. The BOA algorithm was developed by Arora and Singh (2019) inspired by nature. The BOA is a metaheuristic algorithm designed by modeling the mating and foraging behaviors of butterflies that communicate with each other through the scent they emit. The BOA is an algorithm designed by modeling butterflies to find food and mating mates using their senses of smell, sight, taste, touch, and hearing. These senses are also useful for migrating from place to place, escaping from a predator, and laying eggs in suitable places. Among all these senses, smell is the most important one that helps the butterfly find food, such as nectar, even from long distances. The BOA has shown excellent performance for several continuous, discrete, singleobjective, and multiobjective optimization problems compared to several stateoftheart metaheuristics and evolutionary algorithms. Due to the success of the BOA, it has been applied to many different optimization problems including engineering design problems (Arora and Anand 2018; Sharma et al. 2021), feature selection (Long et al. 2021), power plant management (Dey et al. 2020), reliability optimization problems (Sharma 2021), and healthcare systems (Dubey 2021). Therefore, these motivated our attempts to improve the butterfly optimization algorithm and to employ it for training the feedforward artificial neural networks.
In this study, the IBOA algorithm was developed for the solution of singleobjective optimization problems, which is a subbranch of continuous optimization problems. The IBOA algorithm is an improved algorithm by adding parameter analysis and chaotic maps to the BOA algorithm. While developing the IBOA, 10 chaotic maps (Chebyshev, Circle, Gauss, Iterative, Logistic, Piecewise, Sine, Singer, Sinusoidal, and Tent) were used to update and optimize the p (key probability) value, which is one of the most important parameters of the BOA algorithm. The developed IBOA algorithm was tested on 13 benchmark functions. These functions are well known to those working on global optimization and are frequently used for testing optimization algorithms. The results of the BOA algorithm and the results of the IBOA algorithm with 10 different maps were compared. According to the results, the Tentmapped IBOA algorithm is more successful. In the second part of this study, the training of the ANN was carried out using the IBOA algorithm and this proposed new algorithm is named IBOAMLP. The IBOA algorithm tries to find the optimum bias and weight values of the MLP. In the experimental study, five datasets (xor, iris, heart, balloon, breast cancer) taken from the UCI machine learning repository were used. The results obtained were compared with the results of four different MLP algorithms in the literature. According to the comparison result, it was seen that the IBOAMLP algorithm achieved a good performance.
The main contributions of this article are: (1) A new improved butterfly optimization algorithm (IBOA) is proposed. (2) The IBOA algorithm is applied for training ANN and optimizes the weights and biases of ANN. (3) The IBOAMLP algorithm has the ability to escape from local optima. (4) The initial parameters and positions don’t affect the performance of the IBOAMLP algorithm. (5) The features of the IBOAMLP algorithm are simplicity, requiring only a few parameters, solving a wide array of problems, and easy implementation.
This study is organized as follows: In the first section, the history of ANN is briefly explained, the problem is explained, and the main contributions in this article are emphasized. In the second section, the current studies in the literature on the training of ANN are examined. In the third section, detailed information about the ANN and BOA algorithms is given. Then, the developed IBOA algorithm and the training of ANN by the IBOA algorithm (IBOAMLP) are explained in detail. In the fourth section, the experimental results of the developed algorithms are presented. First of all, the benchmark functions used in experimental studies are introduced and the experimental results of the IBOA algorithm with 10 different maps on these benchmark functions are given. The results of the IBOA and BOA algorithms are compared in terms of the success and computational time of the algorithms, and it is shown that the IBOA algorithm is more successful. In the second part of this section, five classification datasets are introduced and the experimental results of the IBOAMLP algorithm are presented. The experiments are carried out on the classification problems and the results of the IBOAMLP algorithm are compared with the results of algorithms in the literature. It is seen that the IBOAMLP algorithm is more successful than other algorithms on most of the classification problems according to the statistical performance metrics. Finally, in the fifth section, the general results obtained in the study are given as a summary. In addition, suggestions for future work are given.
2 Related works
After reviewing the literature in detail, it is seen that many metaheuristic algorithms have been used for the training of ANN. For the scope of the literature research of this article, some of the important studies were examined and evaluated. Moreover, the summary of studies about the training of ANN is presented in Table 1.
Zhang et al. (2007) proposed a hybrid algorithm based on the particle swarm optimization (PSO) algorithm and the traditional backpropagation algorithm. The proposed algorithm was applied for the training of ANN on three bits parity problem, function approximate problem, and classification problem. According to the experimental, the performance of the proposed algorithm was better than the performance of the other two algorithms used for comparison, and the proposed algorithm obtained satisfactory results in terms of the convergence speed.
Özbakir et al. (2009) developed a new metaheuristic algorithm based on ant colony optimization (ACO) to detect the relations among the classification data and extract the rules. The proposed algorithm trained ANN, and its performance was measured on the benchmark classification data such as ECG, iris, Ljubljana breast cancer, nursey, pima, and Wisconsin breast cancer. The algorithm is compared with the NBTree, DecisionTable, Part, and C4.5 approaches. The experimental results were quite successful.
Zamani and Sadeghian (2010) used the PSO algorithm for training artificial neural networks. The proposed approach for classification was tested on the iris, wine, heart, and ionosphere datasets. The effects and successes of different parameters for PSO and ANN were investigated. According to the experimental results, the PSO algorithm obtained satisfactory results in training ANN.
Zanchettin et al. (2011) developed a hybrid approach (GaTSa) consisting of the combination of the genetic algorithm (GA), Tabu search (TS), and simulated annealing (SA) algorithms for ANN training. The performance of the approach was compared with the performance of five algorithms. To measure the performance of the algorithms, an artificially obtained dataset and ten different datasets (iris, diabetes, thyroid, card, cancer, glass, heart, horse, soybean, and mglass) are frequently handled in the literature were used. The author stated that their proposed approach produced more successful and satisfactory results than other algorithms.
Kulluk et al. (2012) discussed the application of harmony search (HS) algorithms for the supervised training of feedforward type ANNs, which are frequently used for classification problems. In the study, special attention was paid to the selfadaptive global bestfit search (SGHS) algorithm and five different variants of the fit search algorithm were examined. The proposed approach was tested on six different benchmark classification datasets (glass, ionosphere, iris, thyroid, wine, Wisconsin breast cancer) and a realworld dataset based on the classification of quality defects common in textiles. According to the experimental results, the proposed algorithm showed a very successful and competitive performance for the training of ANN.
Pereira et al. (2014) trained the ANN using the social spider algorithm. The proposed approach was used to classify different datasets (ionosphere, satimage, diadol, mea, and spiral) in the field of medicine. The performance of the approach was compared with the performance of five different algorithms (ABC, CSS, FFA, PSO, and SGHS) and although it was not achieved high success, it showed competitive results with most algorithms and achieved an average level of success.
Chen et al. (2015) successfully combined PSO and Cuckoo search (CS) algorithms to create a hybrid model (PSOCS) and used it for ANN training. In the proposed approach, the successful aspects of both algorithms were combined and developed. The algorithm was applied to a mathematical function estimation and iris classification data. The performance of the algorithm was compared with the performance of its components, PSO and CS algorithms. According to the experimental results, the performance success of the model surpassed the other two algorithms.
Mirjalili (2015) used the gray wolf optimization (GWO) algorithm for multilayer perceptron (MLP) training. The experiments were performed on five classification datasets (xor, balloon, iris, breast cancer, and heart) and three standard functions (sigmoid, cosine, and sine) to measure the performance of the proposed method. The proposed algorithm was compared with five metaheuristic algorithms (PSO, GA, ACO, ES, and PBIL) that are frequently used in the literature. According to the results, the algorithm obtained competitive results. In addition, it achieved a high level of success in classification.
Al Nuaimi and Abdullah (2017) proposed a new hybrid algorithm combining PSO and ABC algorithms and applied the proposed algorithm for the training of ANN. Four benchmark classification datasets (iris, cancer, diabetes, and glass) were used to evaluate the algorithms. The approach was compared with the PSO and ABC algorithms. The proposed approach achieved more successful results than the other two algorithms.
Dash (2018) used the improved Shuffled frog leaping algorithm (SFLA) for ANN training. The proposed algorithm was applied to the datasets on the forecast of the US dollar according to 3 different exchange rates. The performance of the proposed algorithm is compared with the performance of SFLA, PSO, and DE algorithms. According to the results, the success of the algorithm was better than the other algorithms.
Jaddi and Abdullah (2018) used the kidneyinspired algorithm modeled based on the behaviors of the kidneys in the human body to optimize the ANN parameters. With the α value changed between the minimum and maximum values in the developed algorithm, exploration and exploitation capabilities were strengthened, and this had a significant impact on ANN training. The proposed method was applied to the different benchmark classification datasets (iris, diabetes, thyroid, cancer, card, glass, mglass, and gas furnace) and a realworld problem (rainfall forecasting). The algorithm was compared with four algorithms and the proposed algorithm was promising according to the results.
Aljarah et al. (2018) trained a feedforward neural network by using the whale optimization algorithm (WOA). The proposed model was tested on twenty different benchmark classification datasets with different difficulty levels. The performance of the algorithm was compared with the performance of seven different algorithms (BP, GA, PSO, ACO, DE, ES, and PBIL) that are frequently used in the literature. The qualitative and quantitative results proved that the proposed trainer was able to outperform seven algorithms on the majority of datasets in terms of both local optima avoidance and convergence speed.
Ghaleini et al. (2019) proposed a model in which the ANN was trained by the ABC algorithm to predict the safety factors of retaining walls. The weight and bias values of the ANN were optimized by the ABC algorithm to get higher accuracy and performance estimation in safety factors. The proposed approach was analyzed by different ANN models with a different number of hidden layers. According to the results, the network performance was strengthened with the model proposed.
Gülcü (2022a) developed a new metaheuristic approach, animal migration optimization with Levy flight feature, and used it for training the ANN. The proposed hybrid algorithm is named IAMOMLP. Thirteen benchmark functions, five classification datasets, and one realworld problem in civil engineering were used in the experiments. It was observed that the initial positions of the individuals did not affect the performance of the developed algorithm and the IAMOMLP algorithm successfully escaped from the local optima.
Gullipalli (2021) performed ANN training using the CS algorithm. It was aimed to increase the convergence capability of the algorithm by applying different modifications to the CS algorithm. The proposed approach was tested on eight classification data (car, germancredit, hypothyroid, mfeat, nursey, pageblocks, segment, sick) from the UCI machine learning repository. The performance of the proposed algorithm was compared with the performance of the Hoeffding tree and CS forest approaches. According to the experimental results, the algorithm was sufficient and competitive in terms of performance.
Erdogan and Gulcu (2021) proposed a new hybrid algorithm CSAMLP for training the ANN. The algorithm CSAMLP was based on the crow search algorithm which is a populationbased metaheuristic optimization inspired by the behavior of crows to store their excess food and retrieve it from the landfill when needed. The experimental results showed that the crow search algorithm was a reliable approach in training the ANN.
Many researchers have proposed algorithms in the literature to train multilayer perceptron. However, gradient techniques from these proposed algorithms often encounter problems in solving optimization problems in the real world. They get stuck in the local optimum and produce poorquality solutions (Gülcü 2022b; Mirjalili 2015; Tang et al. 2018). To overcome these difficulties, metaheuristic algorithms have been used to train ANNs (Gülcü 2022a; Jaddi and Abdullah 2018). Due to the success of the BOA, it has been applied to many different optimization problems. Therefore, these motivated our attempts to improve the butterfly optimization algorithm and to employ it for training the feedforward artificial neural networks.
3 Materials and methods
In this section, artificial neural networks, multilayer perceptron, butterfly optimization algorithm, improved butterfly optimization algorithm, and training of artificial neural networks using improved butterfly optimization algorithm are explained in detail.
3.1 Artificial neural network
The artificial neural network is an information processing technique developed by modeling the nervous system of the human brain. In other words, it is the transfer of synaptic connections between neurons in the human brain to a digital platform. Biological nervous system elements and their task equivalents in artificial neural networks are shown in Table 2 (Koç et al. 2004).
An artificial nerve cell was developed by imitating the human nerve cell. An artificial neuron is shown in Fig. 1. To obtain the net input of the neuron, the inputs coming to the neuron are multiplied by their connection weights and then combined with the aggregate function. The result is processed with the activation function and thus the net output of the neuron is calculated.
ANNs are formed by the combination and grouping of artificial nerve cells. This integration consists of layers, and as a result, ANN consists of more than one interconnected layer. ANN consists of three layers: input layer, hidden layer, and output layer. However, in some cases, the number of hidden layers may be more than one. In many ANN models, processes run sequentially. In short, the hidden layer receives the data from the previous input layer, processes it, and forwards it to the next output layer. The features of the three layers that make up the ANN can be summarized as follows:

Input layer: It is the layer where information input is made. There is no operation in this layer, the information coming to the layer is transmitted directly to the hidden layers. Each node has only one input and one output.

Hidden layers: These are the layers where outputs will be produced by mathematical operations according to the inputs. They provide communication between the input layer and the output layer and transfer information. An ANN can have more than one hidden layer.

Output layer: It is the layer that takes the result produced in the hidden layers, processes it, and creates the output by transferring it to the outside of the system.
As mentioned above, activation functions are used to obtain the net output of the neuron. The function that maps inputs and outputs and establishes a connection with each other is the activation function. If the activation function is not used, ANNs would be polynomials of one degree. Since nonactivated ANNs would be linear, their learning capabilities would be very low. If the neural network is desired to learn nonlinear states, it is important to use an activation function. There are many activation functions in the literature. Which activation function to choose is crucial for learning ability. The mathematical representation of the sigmoid activation function which is one of the most frequently used activation functions in the literature is presented in Eq. (1).
The most preferred type of ANN is the multilayer perceptron (MLP). A multilayer perceptron is a feedforward network structure with one or more hidden layers between the input layer and the output layer. The backpropagation algorithm is generally used as the learning algorithm in MLP (Turkoglu and Kaya 2020). Figure 2 shows the fundamental structure of MLPs. Turkoglu and Kaya (2020) stated that an MLP consists of the following components: artificial neuron, layers, aggregate function, activation function, error function, and learning algorithm.
The artificial nerve cell, which consists of inputs, aggregate, and activation function, has been developed by imitating the human nervous system. In the artificial neuron, the input values are multiplied by the node weights and sent to the aggregate function. The result returned from the aggregate function is sent to the activation function and thus the net output of the artificial neuron is obtained. The function that takes the net input by combining the weights of the incoming inputs is called the aggregate function. The results from the aggregate function are sent to the activation function and converted to output. It is also called compression or threshold function in the literature. This is because the output signals are limited to the range [0, 1] or [−1, 1]. In this study, the sigmoid function was chosen as the activation function. The reason for this is that its derivative can be taken and its success has been proven by its frequent use in the literature.
It is known that ANN is a learningbased system and an objective function is defined to find the error in this system. In statistics, the mean squared error (MSE) of an estimator gives the mean of the squares of the errors. That is, it measures the mean squared difference between the estimated values and the actual value. The MSE formula is shown in Eq. (2). Since there may be more than one neuron in the output layer of the MLP, the MSE formula used in this study is shown in Eq. (3). MSE measures the performance of a machine learning model. It is always positive and it can be said that the model with an MSE value close to zero performs better.
where N stands for the number of samples. \({g}_{i}\) and \({t}_{i}\) represent the actual value and the predicted value for the sample i, respectively.
where N stands for the number of samples, and K stands for the number of neurons in the output layer in the MLP. \({g}_{j}^{i}\) and \({t}_{j}^{i}\) represent the actual value and predicted value of the neuron j in the output layer for the sample i, respectively.
Finally, to talk about the learning algorithm which is the component of MLP, MLPs are mostly trained using the backpropagation algorithm. Training the MLP by the backpropagation algorithm takes place in three stages: (1) The progression of the network input from the input layer to the output layer, (2) the calculation of the error in the output neurons, (3) the backpropagation, and updating the weights according to the backward propagated error (Turkoglu and Kaya 2020). After the training of the network is complete, the MLP works forward.
3.2 Butterfly optimization algorithm
In the Linnaean Animal Kingdom system, butterflies are in the class Lepidoptera. There are more than 18,000 species of butterflies in the world. The reason for their survival for millions of years lies in their senses (Saccheri et al. 1998). Butterflies use their senses of smell, sight, taste, touch, and hearing to find food and mating partners. These senses also help in migrating from one place to another, escaping from the predator, and laying eggs in suitable places. Among all these senses, smell is the most important sense that helps the butterfly find food even from long distances (Blair and Launer 1997). Butterflies use sensory receptors used for smelling to find the source of nectar, and these receptors are distributed over body parts such as the antennae, legs, and fingers of the butterfly. These receptors are nerve cells on the body surface of the butterfly and are called chemoreceptors. These chemoreceptors guide the butterfly to find the best mating partner to maintain a strong genetic line. A male butterfly can identify a female by means of pheromones which are scent secretions that the female butterfly emits to cause certain reactions (Arora and Singh 2019). Based on scientific observations, it has been found that butterflies have a very accurate perception of the source of the odor (Raguso 2008). They can also distinguish different odors and sense their intensity (Wyatt 2003).
The butterfly optimization algorithm (BOA) which is based on swarm intelligence was developed by Arora and Singh (2019) inspired by nature to solve global optimization problems. BOA is mainly based on the moving strategy of butterflies which uses their sense of smell to locate nectar or mating mates. Butterflies detect and analyze odor with their sense sensor as noted above to determine the potential direction of a nectar/mating mate. The BOA mimics this behavior to find the optimum in the search space. The BOA is an algorithm designed by modeling butterflies to find food and mating mates using their senses of smell, sight, taste, touch, and hearing. Among all these senses, smell is the most important which helps the butterfly find food, usually nectar, even from long distances. Butterflies are search agents for optimization in the BOA algorithm. A butterfly will produce scent with an intensity associated with its fitness. Namely, as a butterfly moves from one location to another its fitness will change. The scent spreads over the distance, other butterflies can sense it, and butterflies can share their personal information with other butterflies and create a collective social information network. When a butterfly can detect the scent of another butterfly, the butterfly will move toward it, and this step is called global search in the proposed algorithm (Arora and Singh 2019).
The BOA was developed by taking the following three features as role models. (i) All butterflies are expected to emit a scent that makes the butterflies attract each other. (ii) Each butterfly will move randomly or toward the best butterfly that emits more scents. (iii) The stimulus intensity of a butterfly is affected or determined by the value of its objective function.
In the BOA which is based on the behavior of butterflies, the three stages can be explained as follows: (1) Initialization Phase: Parameters are determined, and an initial population is generated for the algorithm. When creating the initial population, the position of the butterflies is randomly assigned by calculating the odor values. (2) Iteration Phase: This is the part where the main processes are carried out and each butterfly tries to reach the best result with parameters specific to the BOA. At each iteration, all butterflies in the search space are moved to new positions, and then their fitness values are evaluated. (3) The last stage: It is the part where the stopping criterion is met and the optimum or closest to the optimum result is reported.
Understanding the BOA modality relies on three key concepts. These concepts are sensory method (c), stimulus intensity (I), and power exponent (α). The sensory method refers to the raw input used by the sensors to measure the sensory energy form and process it in similar ways. The stimulus intensity parameter I is limited to an exponential value. According to previous studies by scientists, this is because as the stimulus gets stronger, the insects go intensely to the stimulus and eventually become less sensitive to it. The parameter α is used to correct this situation. The parameter α is the power base that is dependent on modality (smell in BOA). If α = 1, it means that there is no odor absorption. Namely, the amount of scent emitted by a particular butterfly is perceived by other butterflies with the same capacity. This brings us closer to a single solution, usually the optimum. If α = 0, it means that the scent emitted by any butterfly cannot be perceived by other butterflies. This provides the local search. α and c represent a random number between [0, 1], f represents the perceived magnitude of the odor, and I represents the stimulus intensity. There are two important stages in the algorithm: local search and global search. The global search is shown in Eq. (5) and the local search is shown in Eq. (6). In the local search, the butterfly x_{i}^{t} does not move toward the global best (\({g}^{*}\)), but instead exhibits a random walk in the search space.
where \({g}^{*}\) represents the best available solution among all the solutions in the current iteration, X_{i}^{t} and X_{k}^{t} represent the butterflies in the search spaces, r represents the parameter that provides randomness in the range [0, 1], and f_{i} represents the perceived scent of the butterfly.
where, unlike the global search, the butterfly x_{i}^{t} does not move toward the global best (\({g}^{*}\)), but instead exhibits a random walk in the search space.
Searching for food and mating partners with butterflies can occur on both the local and global scales. Thus, a switching probability p is used in the BOA to switch between the global search and the local search. The switching probability decides whether a butterfly will move to the best butterfly or randomly. The flowchart of the BOA algorithm is shown in Fig. 3.
3.3 Improved butterfly optimization algorithm
We propose an improved butterfly optimization algorithm using chaotic maps to solve getting stuck in local optima and early convergence problems of the butterfly optimization algorithm. There is a wide variety of chaotic maps used in the optimization field. However, in this study, the ten most frequently used chaotic maps in the literature were selected and used. The origin of the word chaotic in chaotic maps comes from the word chaos, in which the universe was formless and disorganized, discordant and chaotic before it came into order. The word maps here means matching or associating with some parameters using behavior that can be described as chaos in the algorithms used. Therefore, chaotic maps are maps that show the complex and dynamic behavior in nonlinear systems (Pecora and Carroll 1990). In recent years, chaotic maps have been widely appreciated in the optimization field for their dynamic behavior that helps optimization algorithms to explore the search space more dynamically and globally. Its behavior is predictable only under initial conditions, but then it behaves randomly.
Chebyshev map: the formula for the Chebyshev map is shown in Eq. (7).
Circle map: the formula for the Circle map is shown in Eq. (8).
Gauss map: the formula for the Gauss map is shown in Eq. (9).
Iterative map: the formula for the Iterative map is shown in Eq. (10).
Logistic map: the formula for the Logistic map is shown in Eq. (11).
Piecewise map: the formula for the Piecewise map is shown in Eq. (12).
where 0 < P ≤ 0.5.
Sine map: The formula for the Sine map is shown in Eq. (13).
Singer map: The formula for the Singer map is shown in Eq. (14).
where the P is the control parameter and its values are between 0.9 and 1.08.
Sinusoidal map: The formula for the Sinusoidal map is shown in Eq. (15).
Tent map: The formula for the Tent map is shown in Eq. (16).
The key element of the BOA establishes the balance between the equations shown in Eqs. (5)(6) is the parameter p. It was seen that this parameter p was set to 0.8 in the BOA algorithm. However, the parameter p in IBOA is dynamically adjusted using chaotic maps. In the experimental studies, the IBOA algorithm with 10 chaotic maps was tested on 13 different benchmark functions and it was seen that the IBOA was successful.
At the beginning of the IBOA algorithm, the butterfly population is randomly generated. Each member of the population can be represented as x_{i}. First, butterflies need their sense of smell to understand modality. Here, the odor value is calculated using Eq. (4). Then, the value of the key probability p controlling the global and the local search capabilities is then calculated by the chaotic map. Thus, the value of the p is changed from the fixed value of 0.8, and it is updated by taking advantage of the chaos in each iteration so that diversity is created. The flowchart and the pseudocode of the IBOA algorithm are shown in Figs. 4 and 5.
3.4 Training multilayer perceptron using IBOA
ANN learns from inputs and outputs. Therefore, the values of the weights and biases in the ANN are updated according to the inputs and outputs (Kiranyaz et al. 2009). The ability of ANN to give accurate results and to make successful classifications is ensured by updating the values of the weights and biases in the most appropriate way. In the literature, researchers have proposed various algorithms to train the Multilayer Perceptron (MLP). Two popular methods of them are gradient techniques and metaheuristic algorithms. Gradient techniques often encounter problems in solving this optimization problem (training the MLP). The most important ones of these problems are the getting stuck in the local optima and the producing poorquality solutions. To overcome these difficulties, metaheuristic algorithms are used to train the MLP (Gulcu 2020). In this study, a hybrid IBOAMLP algorithm was developed to optimize the values of the weights and biases. The IBOA algorithm which is a metaheuristic algorithm was used for the first time in MLP training. Meanwhile, the optimization process of the weights and biases is shown in Fig. 6. To optimize the weights and biases in MLP, the weights and biases must first be represented by the butterflies in the IBOAMLP algorithm. For this purpose, a representation vector consisting of weights and biases is used. This vector is shown in Eq. (17) according to the MLP structure in Fig. 6.
where \({w}_{i,j}\) represents the values of the weights between the input layer and the hidden layer, and \({w}_{j,k}\) represents the values of the weights between the hidden layer and the output layer. \({v}_{0,j}\) represents the bias values between the input layer and the hidden layer, and \({v}_{0,k}\) represents the bias values between the hidden layer and the output layer. The \(\sqcup \) notation represents the union of two sets.
At the beginning of the IBOAMLP algorithm, the butterfly population is randomly generated. Each butterfly represents a different MLP, namely the representation vector in Eq. (17). Then, using the training dataset, the odor of each butterfly is calculated and the best butterfly in the population is found. Then the position of each butterfly, namely the values of the weights and biases in the MLP, is updated using Eqs. (5)–(6) according to the parameter p. The smaller the value of the parameter p, the more likely Eq. (5) will be selected, and the larger the value of the parameter p, the more Eq. (6) will be selected. The important point here is to optimize the value of the parameter p to ensure the balance between these two equations. The chaotic maps were used for optimization and the training process was started. This process continues until the termination criteria are met.
4 Experimental results
In this section, the experimental results of the IBOA and IBOAMLP algorithms developed in this study are presented. The success of the IBOA algorithm was tested on both benchmark functions and classification datasets. Therefore, this section was divided into two parts (the benchmark functions, and classification datasets). The features of the hardware and software used in the experiments are as follows: Microsoft Windows 10, Intel i53470 3.20 GHz, 6 GB memory. The algorithms were coded and run in MATLAB R2018a. All statistical analyzes in this study were performed with the software Microsoft Excel 2013.
4.1 Benchmark functions
In this section, the experimental results of the IBOA algorithm on 13 benchmark functions are presented. Table 3 shows the formulas, dimensions, global minimums, and search range of these 13 benchmark functions. These functions are well known to those working on global optimization and are frequently used for testing and analysis of optimization algorithms. The f_{1}–f_{5} functions are the singlemode functions. f_{6} is a discontinuous step function with a minimum. The f_{7} function is a noisy quadratic function. f_{8}–f_{13} are multimode test functions. The number of local minimums for these functions increases exponentially with the size of the problem. These functions belong to the most difficult problem class for most optimization problems. The names of f_{1}–f_{13} functions in the literature are as follows, respectively: sphere, schwefel 2.22, schwefel 1.2, schwefel 2.21, rosenbrock, step, quartic with noise, schwefel, rastrigin, ackley, griewank, penalized1, penalized2. These 13 benchmark functions with different difficulty levels were used in experiments to test the performance of the IBOA algorithm and to compare the IBOA with other algorithms. The dimension of these benchmark functions was taken as 30. All functions except f_{8} have a global minimum value of zero. Because the functions have different difficulty levels the search intervals are different as shown in Table 3.
In the experiments, the BOA algorithm and the IBOA algorithm with 10 different maps were compared. To fairly compare the algorithms, each algorithm performed the same number of fitness function evaluations (FEs) on each run: 75,000 FEs for f_{1}, f_{6}, f_{12}, and f_{13}; 100,000 FEs for f_{2} and f_{11}; 150,000 FEs for f_{7}, f_{8}, and f_{9}; 250,000 FEs for f_{3}, f_{4}, and f_{5}. The population size was set to 50 in the algorithms. Algorithms were run independently 30 times on each function.
Table 4 shows the average results of the BOA algorithm and the IBOA algorithm with 10 different maps. The best results in the table are shown in bold. When the results are examined, it is seen that the IBOA algorithm with the Tent map achieves better results than other methods. Also, in Table 5, the average computational times of the algorithms are shown in seconds.
According to the results of the algorithms in Table 4, it is clear that the Tentmapped IBOA algorithm outperforms the other algorithms in most of the benchmark functions. Tentmapped IBOA algorithm achieved the best results in seven benchmark functions. It also has the thirdbest result in the benchmark function f_{10}. According to Table 4, it is clear that the proposed IBOA algorithm with the Tent map has better performance than the classical BOA algorithm.
4.2 Classification datasets
To test the performance of the IBOAMLP algorithm and compare it with other studies, five classification datasets with different training/test samples and different difficulty levels were used in the experiments. These five datasets are xor, balloon, heart, breast cancer, and iris. Balloon, heart, breast cancer, and iris datasets were taken from the wellknown UCI repository and are widely used for testing machine learning algorithms.
In the literature, there are some studies about dimensionality reduction, and extracting the most important features of datasets (Gundluru et al. 2022; Lakshmanna et al. 2022). But, dimensionality reduction and feature extraction was not applied to datasets in this study. To fairly compare the algorithms, the algorithms used the same training and test subsets taken from www.seyedalimirjalili.com. All datasets were normalized by using the min–max normalization function given in Eq. (18) to eliminate the effect of attributes that may have different effective rates on the classification.
where x′ is the normalized value of x which is in the range [x_{min}, x_{max}]. The normalized value x′ will be in the range [0, 1].
Table 6 shows the characteristics of the datasets. It is seen that the xor dataset is the easiest problem with 8 training and 8 test examples, 3 attributes, and 2 classes. The balloon dataset has 4 features, 20 training examples, 20 test examples, and 2 classes. The iris dataset has 4 features, 150 training samples, 150 test samples, and 3 classes. The breast cancer dataset includes 9 features, 599 training samples, 100 test samples, and 2 classes. In the heart dataset, there are 22 features, 80 training samples, 187 test samples, and 2 classes. These training and testing datasets were taken from www.seyedalimirjalili.com and were also used in the (Mirjalili 2015) study. As can be seen, the different training/test samples and the datasets with different difficulty levels were selected to verify the success of the IBOAMLP algorithm.
To verify the success of IBOAMLP, the IBOAMLP algorithm was compared with the BOAMLP (Irmak and Gülcü 2021) based on the butterfly optimization algorithm (Arora and Singh 2019), BATMLP based on the bat optimization algorithm (Yang 2010), SMSMLP (Gulcu 2020) based on the states of matter optimization algorithm (Cuevas et al. 2014), and BP (HechtNielsen 1992) algorithms in the literature. For all datasets, the initial values of biases and weights were randomly generated in the range of [−10, 10]. The parameters (number of training/tests, number of classes, MLP structure, and vector size) of the algorithms used for comparison are shown in Table 6. In this study, we did not focus on finding the optimal number of neurons in the hidden layer. In the literature, the number of neurons in the hidden layer is usually calculated with the formula (\(2\times n)+1\). Therefore, in this study, we obtained the number of neurons in the hidden layer by this formula. In the formula, n is the number of neurons in the input layer. Table 6 shows the MLP structure used in the datasets. As the activation function, the sigmoid function which is mostly used in the literature was chosen.
In Table 7, the MSE results of the IBOAMLP algorithm were compared with the MSE results of four algorithms (BOAMLP, BATMLP, SMSMLP, and BP) in the literature. The results of the BOAMLP, BATMLP, SMSMLP, and BP algorithms were taken from the study (Irmak and Gülcü 2021). While making the comparison, the Sine map which produced the most successful results among the ten maps was selected. When Table 7 is examined, it is observed that the IBOAMLP algorithm surpasses the BATMLP, SMSMLP, and BP algorithms. The IBOAMLP algorithm surpasses the BOAMLP algorithm on the xor and iris datasets and exhibits competitive results on other datasets. The best results in the table are shown in bold.
Table 8 shows the average classification accuracy of the IBOAMLP, BOAMLP, BATMLP, SMSMLP, and BP algorithms on the test data. When the results in Table 8 are examined, it is seen that the IBOAMLP is successful. Table 9 shows the average calculation times of the algorithms. According to Table 9, the fastest algorithm is the BP algorithm.
The statistical performance metrics used to measure the success of the algorithms in the experiments are the sensitivity, specificity, precision, and F1Score metrics. The sensitivity mathematically defines the accuracy of a test that reports the presence of a condition and its formula is shown in Eq. (19). Specificity mathematically defines the accuracy of a test that reports the absence of a condition and its formula is shown in Eq. (20). Precision shows how many of the values we predicted as positive are positive and its formula is shown in Eq. (21). F1Score value shows the harmonic mean of the values of the precision and the sensitivity. Its formula is shown in Eq. (22).
where TP, FN, TN, and FP represent the true positive, the false negative, the true negative, and the false positive, respectively.
Table 10 shows the results of the sensitivity, specificity, precision, and F1score values obtained statistically by the IBOAMLP, BOAMLP, BATMLP, SMSMLP, and BP algorithms on the xor dataset. According to Table 10, the IBOAMLP algorithm surpasses the BATMLP, SMSMLP, and BP algorithms and exhibits the same success rate as the BOAMLP algorithm.
Table 11 shows the values of the sensitivity, specificity, precision, and F1score obtained by the IBOAMLP, BOAMLP, BATMLP, SMSMLP, and BP algorithms on the balloon dataset. When Table 11 is examined, it is observed that the IBOAMLP algorithm surpasses the BATMLP, SMSMLP, and BP algorithms, and exhibits the same success rate as the BOAMLP algorithm. Another point to note here is that the values of the sensitivity, precision, and F1score of the BP algorithm are low.
For the iris dataset, there are 4 neurons in the input layer, 9 neurons in the hidden layer and 3 neurons in the output layer in the MLP structure and the MLP structure has 75 dimensions. The sensitivity, specificity, precision, and F1score values of the IBOAMLP, BOAMLP, BATMLP, SMSMLP, and BP algorithms on the iris dataset are shown in Table 12. According to Table 12, the IBOAMLP surpasses the BATMLP, SMSMLP, and BP algorithms, and shows competitive results to the BOAMLP algorithm.
Another challenging dataset that has an important place in the literature is the breast cancer dataset. For the breast cancer dataset, the MLP consists of 9 input neurons, 19 hidden layer neurons, and 1 output neuron. The vector length has 210 dimensions. The sensitivity, specificity, precision, and F1score values of the IBOAMLP, BOAMLP, BATMLP, SMSMLP, and BP algorithms on this dataset are shown in Table 13. According to Table 13, the IBOAMLP surpasses all the algorithms and is successful.
For the heart dataset, the MLP consists of 22 input neurons, 45 hidden layer neurons, and one output neuron. The vector length has 1081 dimensions. The sensitivity, specificity, precision, and F1score values of the IBOAMLP, BOAMLP, BATMLP, SMSMLP, and BP algorithms on this dataset are shown in Table 14. When Table 14 is examined, it is observed that IBOAMLP surpasses all algorithms and is successful. In addition, the low success of the BP algorithm on this dataset is remarkable.
Figure 7 shows the convergence graphs of the algorithms. When the convergence graphs are examined, it is seen that the IBOAMLP and BOAMLP algorithms give better results on the xor and balloon datasets. It is seen that the IBOAMLP, BOAMLP, and BP algorithms give better results on the iris dataset. On the breast cancer dataset, all algorithms show competitive results, but the IBOAMLP, BOAMLP, and BATMLP algorithms seem to be more insistent on seeking the global optimum. On the heart dataset, it is seen that the IBOAMLP and BOAMLP algorithms give better results. In general, it can be said that the IBOAMLP algorithm does not get stuck at the local optima and insists on searching for the global optimum.
Figure 8 shows the boxplots of the classification accuracy results of the algorithms. According to the boxplots, the IBOAMLP appears to have been successful. In Table 15, the Friedman test results of the IBOAMLP, BOAMLP, BATMLP, SMSMLP, and BP algorithms are presented. The Friedman test is a statistical analysis technique used to make meaningful comparisons between dependent groups in cases where the assumption of normality is not provided. According to Table 15, the best result belongs to the IBOAMLP algorithm. Another remarkable point is the low performance of SMSMLP and BP algorithms. According to the Friedman test result, the IBOAMLP algorithm ranks higher than other algorithms with a ranking score of 4.7. The BOAMLP algorithm has the second ranking with a ranking score of 4.3. The BATMLP algorithm ranks third with a ranking score of 2.4. The SMSMLP algorithm and the BP algorithm rank fourth with a ranking score of 1.8.
5 Conclusions
In this study, the IBOA algorithm is proposed to train the ANN and optimize the weights and biases. The IBOA algorithm is the improved version of the BOA algorithm by utilizing chaotic maps. The key parameter p of the IBOA algorithm, which establishes the balance between the local search and the global search, is updated using chaotic maps.
The main contributions of this article are: A new improved butterfly optimization algorithm (IBOA) is proposed. The IBOA algorithm is applied for training ANN and optimizes the weights and biases of ANN. The IBOAMLP algorithm has the ability to escape from local optima. The initial parameters and positions don’t affect the performance of the IBOAMLP algorithm. The features of the IBOAMLP algorithm are simplicity, requiring only a few parameters, solving a wide array of problems, and easy implementation.
In the experiments, the success of the IBOA algorithm was verified on 13 benchmark functions. The IBOA algorithm with the Tent map outperformed the other algorithms on the benchmark functions. Afterward, the proposed IBOAMLP algorithm was used to optimize the values of the biases and weights in the ANN, and it was aimed to increase the learning ability of the ANN. In the experiments, the IBOAMLP algorithm was tested on the classification datasets (xor, iris, balloon, heart, and breast cancer) in the literature. The IBOAMLP algorithm was compared with four algorithms (BOAMLP, BATMLP, SMSMLP, and BP). According to the results, it was observed that the IBOAMLP algorithm outperformed the BOAMLP, BATMLP, SMSMLP, and BP algorithms on most of the datasets. In addition, the IBOAMLP algorithm had the first ranking according to the Friedman test results. It was concluded that the IBOA algorithm was successful in optimizing the biases and weights. Therefore, it was proven suitable for training the MLP. In conclusion, the IBOA and IBOAMLP algorithms can escape from local optima thanks to the chaotic map.
This study has some limitations. On smaller datasets, the IBOAMLP algorithm tends to overfit. It memorizes the training data and does not generalize well to new examples. The IBOAMLP algorithm needs more extensive datasets for training. Therefore, the IBOAMLP algorithm requires high computation power and computational resources.
In future work, the IBOAMLP algorithm can be applied to different datasets such as COVID19. The IBOAMLP algorithm can be hybridized with a metaheuristic algorithm such as the particle swarm optimization or the crow search algorithm to increase the performance of the IBOAMLP algorithm. Further research regarding the role of the activation function and the parameters of the butterfly optimization algorithm would be worthwhile.
Data availability
Enquiries about data availability should be directed to the authors.
References
Al Nuaimi ZNAM, Abdullah R (2017) Neural network training using hybrid particlemove artificial bee colony algorithm for pattern classification. J Inf Commun Technol 16(2):314–334
Aljarah I, Faris H, Mirjalili S (2018) Optimizing connection weights in neural networks using the whale optimization algorithm. Soft Comput 22(1):1–15
Arora S, Anand P (2018) Learning automatabased butterfly optimization algorithm for engineering design problems. Int J Comput Mater Sci Eng 7(04):1850021
Arora S, Singh S (2019) Butterfly optimization algorithm: a novel approach for global optimization. Soft Comput 23(3):715–734. https://doi.org/10.1007/s0050001831024
Blair RB, Launer AE (1997) Butterfly diversity and human land use: species assemblages along an urban grandient. Biol Cons 80(1):113–125. https://doi.org/10.1016/S00063207(96)000560
Chen JF, Do QH, Hsieh HN (2015) Training artificial neural networks by a hybrid PSOCS algorithm. Algorithms 8(2):292–308. https://doi.org/10.3390/a8020292
Cuevas E, Echavarría A, RamírezOrtegón MA (2014) An optimization algorithm inspired by the States of Matter that improves the balance between exploration and exploitation. Appl Intell 40(2):256–272
Dash R (2018) Performance analysis of a higher order neural network with an improved shuffled frog leaping algorithm for currency exchange rate prediction. Appl Soft Comput 67:215–231. https://doi.org/10.1016/j.asoc.2018.02.043
Dey PP, Das DC, Latif A, Hussain SS, Ustun TS (2020) Active power management of virtual power plant under penetration of central receiver solar thermalwind using butterfly optimization technique. Sustainability 12(17):6979
Dubey AK (2021) Optimized hybrid learning for multi disease prediction enabled by lion with butterfly optimization algorithm. Sādhanā 46(2):1–27
Erdogan F, Gulcu S (2021) Training of the artificial neural networks using crow search algorithm. Int J Intell Syst Appl Eng 9(3):101–108. https://doi.org/10.18201/ijisae.2021.237
Ghaleini EN, Koopialipoor M, Momenzadeh M, Sarafraz ME, Mohamad ET, Gordan B (2019) A combination of artificial bee colony and neural network for approximating the safety factor of retaining walls. Eng Comput 35(2):647–658
Gulcu Ş (2020) Training of the artificial neural networks using states of matter search algorithm. Int J Intell Syst Appl Eng 8(3):131–136
Gullipalli TR (2021) An improved under sampling approaches for concept drift and class imbalance data streams using improved cuckoo search algorithm. Turk J Comput Math Educ (turcomat) 12(2):2267–2275. https://doi.org/10.17762/turcomat.v12i2.1945
Gundluru N, Rajput DS, Lakshmanna K, Kaluri R, Shorfuzzaman M, Uddin M, Rahman Khan MA (2022) Enhancement of detection of diabetic retinopathy using Harris Hawks optimization with deep learning model. Comput Intell Neurosci
Gülcü Ş (2022a) An improved animal migration optimization algorithm to train the feedforward artificial neural networks. Arab J Sci Eng 47(8):9557–9581. https://doi.org/10.1007/s1336902106286z
Gülcü Ş (2022b) Training of the feed forward artificial neural networks using dragonfly algorithm. Appl Soft Comput 1:1. https://doi.org/10.1016/j.asoc.2022.109023
HechtNielsen R (1992) Theory of the backpropagation neural network. In: Neural networks for perception. Academic Press, pp 65–93. https://doi.org/10.1016/B9780127412528.500108
Irmak B, Gülcü Ş (2021) Training of the feedforward artificial neural networks using butterfly optimization algorithm. Manas J Eng 9(2):160–168. https://doi.org/10.51354/mjen.917837
Jaddi NS, Abdullah S (2018) Optimization of neural network using kidneyinspired algorithm with control of filtration rate and chaotic map for realworld rainfall forecasting. Eng Appl Artif Intell 67:246–259
Kiranyaz S, Ince T, Yildirim A, Gabbouj M (2009) Evolutionary artificial neural networks by multidimensional particle swarm optimization. Neural Netw 22(10):1448–1462. https://doi.org/10.1016/j.neunet.2009.05.013
Koç ML, Balas CE, Arslan A (2004) Preliminary design of ruble mound breakwaters by using artificial neural networks. Tech J Turk Chamber Civ Eng 15(74):3351–3375
Kulluk S, Ozbakir L, Baykasoglu A (2012) Training neural networks with harmony search algorithms for classification problems. Eng Appl Artif Intell 25(1):11–19
Lakshmanna K et al (2022) A Review on Deep Learning Techniques for IoT Data. Electronics 11(10):1604
Long W, Jiao J, Liang X, Wu T, Xu M, Cai S (2021) Pinholeimagingbased learning butterfly optimization algorithm for global optimization and feature selection. Appl Soft Comput 103:107146
Madenci E, Gülcü Ş (2020) Optimization of flexure stiffness of FGM beams via artificial neural networks by mixed FEM. Struct Eng Mech 75(5):633–642. https://doi.org/10.12989/sem.2020.75.5.633
Mirjalili S (2015) How effective is the Grey Wolf optimizer in training multilayer perceptrons. Appl Intell 43(1):150–161
Özbakir L, Baykasoğlu A, Kulluk S, Yapıcı H (2009) TACOminer: an ant colony based algorithm for rule extraction from trained neural networks. Expert Syst Appl 36(10):12295–12305. https://doi.org/10.1016/j.eswa.2009.04.058
Pecora LM, Carroll TL (1990) Synchronization in chaotic systems. Phys Rev Lett 64(8):821. https://doi.org/10.1103/PhysRevLett.64.821
Pereira LA, Rodrigues D, Ribeiro PB, Papa JP, Weber SA (2014) Socialspider optimizationbased artificial neural networks training and its applications for Parkinson's disease identification. In: 2014 IEEE 27th international symposium on computerbased medical systems, 2014, pp 14–17. https://doi.org/10.1109/CBMS.2014.25
Raguso RA (2008) Wake up and smell the roses: the ecology and evolution of floral scent. Annu Rev Ecol Evol Syst 39:549–569. https://doi.org/10.1146/annurev.ecolsys.38.091206.095601
Saccheri I, Kuussaari M, Kankare M, Vikman P, Fortelius W, Hanski I (1998) Inbreeding and extinction in a butterfly metapopulation. Nature 392(6675):491–494
Sharma TK (2021) Enhanced butterfly optimization algorithm for reliability optimization problems. J Ambient Intell Humaniz Comput 12(7):7595–7619
Sharma TK, Sahoo AK, Goyal P (2021) Bidirectional butterfly optimization algorithm and engineering applications. Mater Today: Proc 34:736–741
Tang R, Fong S, Deb S, Vasilakos AV, Millham RC (2018) Dynamic group optimisation algorithm for training feedforward neural networks. Neurocomputing 314:1–19
Turkoglu B, Kaya E (2020) Training multilayer perceptron with artificial algae algorithm. Eng Sci Technol Int J 23(6):1342–1350. https://doi.org/10.1016/j.jestch.2020.07.001
Tümer A, Edebali S, Gülcü Ş (2020) Modeling of removal of chromium (VI) from aqueous solutions using artificial neural network. Iran J Chem Chem Eng (IJCCE) 39(1):163–175. https://doi.org/10.30492/ijcce.2020.33257
Wyatt TD (2003) Pheromones and animal behaviour: communication by smell and taste. Cambridge University Press
Yang XS (2010) A new metaheuristic batinspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010). Springer, pp 65–74
Zamani M, Sadeghian A (2010) A variation of particle swarm optimization for training of artificial neural networks. In: Computational ıntelligence and modern heuristics. IntechOpen. https://doi.org/10.5772/7819
Zanchettin C, Ludermir TB, Almeida LM (2011) Hybrid training method for MLP: optimization of architecture and training. IEEE Trans Syst Man Cybern Part B (cybernetics) 41(4):1097–1109. https://doi.org/10.1109/TSMCB.2011.2107035
Zhang JR, Zhang J, Lok TM, Lyu MR (2007) A hybrid particle swarm optimization–backpropagation algorithm for feedforward neural network training. Appl Math Comput 185(2):1026–1037. https://doi.org/10.1016/j.amc.2006.07.025
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author selfarchiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Irmak, B., Karakoyun, M. & Gülcü, Ş. An improved butterfly optimization algorithm for training the feedforward artificial neural networks. Soft Comput 27, 3887–3905 (2023). https://doi.org/10.1007/s0050002207592w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s0050002207592w
Keywords
 Artificial neural networks
 Butterfly optimization algorithm
 Chaos
 Multilayer perceptron
 Training artificial neural networks