Evaluating the performance of metaheuristic-tuned weight agnostic neural networks for crop yield prediction

Jovanovic, Luka; Zivkovic, Miodrag; Bacanin, Nebojsa; Dobrojevic, Milos; Simic, Vladimir; Sadasivuni, Kishor Kumar; Tirkolaee, Erfan Babaee

doi:10.1007/s00521-024-09850-4

Evaluating the performance of metaheuristic-tuned weight agnostic neural networks for crop yield prediction

Original Article
Open access
Published: 10 May 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

Evaluating the performance of metaheuristic-tuned weight agnostic neural networks for crop yield prediction

Download PDF

Luka Jovanovic¹,
Miodrag Zivkovic²,
Nebojsa Bacanin^2,3,4,
Milos Dobrojevic¹,
Vladimir Simic^5,6,7,
Kishor Kumar Sadasivuni⁸ &
…
Erfan Babaee Tirkolaee ORCID: orcid.org/0000-0003-1664-9210^9,10

Abstract

This study explores crop yield forecasting through weight agnostic neural networks (WANN) optimized by a modified metaheuristic. WANNs offer the potential for lighter networks with shared weights, utilizing a two-layer cooperative framework to optimize network architecture and shared weights. The proposed metaheuristic is tested on real-world crop datasets and benchmarked against state-of-the-art algorithms using standard regression metrics. While not claiming WANN as the definitive solution, the model demonstrates significant potential in crop forecasting with lightweight architectures. The optimized WANN models achieve a mean absolute error (MAE) of 0.017698 and an R-squared ($R^2$) score of 0.886555, indicating promising forecasting performance. Statistical analysis and Simulator for Autonomy and Generality Evaluation (SAGE) validate the improvement significance and feature importance of the proposed approach.

Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues

Article 29 March 2024

Geyser Inspired Algorithm: A New Geological-inspired Meta-heuristic for Real-parameter and Constrained Engineering Optimization

Article 26 September 2023

Machine learning in agriculture: a review of crop management applications

Article 01 July 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Agriculture has always been based on mass labor and intensive physical work, with the results heavily dependent on weather and climate conditions [42]. Despite the everlasting uncertainty of crop yields, the increasing demand for food due to rapid population growth, Fig. 1, has provided farmers with a stable income and thus made agriculture the largest workforce absorber. Even nowadays, despite modern technology and automation of agricultural production, a third of the world’s economically active population derives its income from agriculture [22].

New technologies facilitated and mechanized work in agriculture, reducing the number of required farmers [24]. In the second half of the twentieth century, a significant effort known as the Green Revolution [17] was made to increase the production of high-yielding cereals, especially wheat and rice, to suppress hunger and increase yield per plant, Fig. 2. Food production transitioned from being of the local character, i.e., farmers producing food for their families or communities, to global food trade which has made diets around the world more diverse and brought new business opportunities to farmers and processing industries [49]. This process was not interrupted even by major events such as the 2007–2008 Global Financial Crisis (GFC), the recent COVID-19 pandemic, or the ongoing conflict in Ukraine.

It is necessary to emphasize the fact that the problem of hunger in the world has not been solved yet. Many countries still have a significant percentage of the population that cannot meet their nutritional energy needs on a regular basis [36].

Arable land comprises only a small part of the total area of each country. According to currently available statistical data, $\sim$ 15.83 million square kilometers ($\text{mkm}^{2}$) are cultivated on the global level [6], i.e., just under 11% of the total land mass, and yet only 10 countries control more than half of the globally available arable land, Fig. 3.

A simplified assessment of the extent to which the most populous countries are currently effective in solving the problem of food security can be evaluated based on the comparison of the population to available arable land ratio and cereal yield, both compared to the global average. Comparing the ten most populous countries in the world, Fig. 4, the US, Brazil, and Russia have the most favorable population to available arable land ratio (< 1), but yet only the US and China achieve significantly above-average cereal yields, $\sim$ 200% and $\sim$ 150%, respectively. Countries with a prevalence of undernourishment, Fig. 5, have a mostly unfavorable ratio of population to arable land (ratio > 2), and cereal yields significantly below the global average ($\sim$ 40%).

Although the evident increase in crop yield was achieved, the nutritional quality failed to keep pace. Modern cereals suffer from deficiencies such as low-quality proteins, and the lack of essential amino acids, vitamins, and minerals [53]. The so-called ancient grains and heirloom varieties became popular in the early twenty-first century, but their lower yield per plant may present a problem in resource–poor areas where crops used by the producer to feed his own family and livestock in subsistence agriculture (food crops) are being replaced by crops grown for profit (cash crops) [18].

Intensive farming with the use of synthetic fertilizers and pesticides increased the productivity of crops but also increased environmental pollution and its impact on the quality of life, which people became aware of and revived interest in organic, regenerative, and sustainable agriculture. The European Union was a pioneer in this field by introducing certification of organic food in 1991 [20]. Research interest in alternative technologies was reestablished, primarily in pest management, selective breeding, and controlled environment agriculture [60].

Until recently, it was believed that the single means of surviving in agriculture were to increase the holdings and productivity, with the mandatory (mis)use of chemical fertilizers and appropriate machinery, but as mentioned before, this approach has brought a whole new set of challenges. In research, the scenario of industrial agricultural production is considered most often, while individual households and small farms are classified as a recurrence of the pre-industrial form of production [22]. However, during the COVID-19 pandemic, a significant number of urban households classified themselves as food insecure due to occasional food shortages caused by disruptions in supply chains and began producing food for their own needs through gardening, fishing, backyard livestock, etc. [40]. Trends of exurbanization and counter-urbanization gained in popularity, where part of urban population trade the city life for life in smaller, but healthier environments. These people are mostly digitally savvy, work remotely, and are not interested in industrial agriculture but may produce organic food for their needs [11].

Crop yield forecasting could help alleviate many uncertainties associated with food production. However, as many factors influence production, this is not a straightforward task. Fluctuations in several factors, unpredictable disease outbreaks, natural disasters, and many other factors can severely and unforeseeable impact yields. To help tackle this ever-pressing challenge, robust techniques are needed, capable of responding to an ever-changing environment.

One possible approach comes from the application of artificial intelligence (AI). These algorithms mathematically mimic behaviors observed in biological brains, and given enough computational time and data can adjust their behaviors to suit a specific problem without explicitly being programmed to do so. By applying powerful AI algorithms to the task of crop yield prediction, nonlinear relations between various factors can be observed and leveraged to cast more accurate forecasts. These can become a valuable tool for both farmers and policy-makers allowing preemptive measures to be taken to prevent crop failure and even famine. Preceding works have explored the potential of integrating powerful machine learning (ML) techniques to incorporate emerging technologies into agriculture systems [10] as well as other difficult challenges in computing [3, 47, 59].

A notably interesting emerging class of AI algorithms inspired by evolution and cell structures in the brain, as well as the processes of shaping these connections comes from neuroevolutionary algorithms. While more traditional algorithms rely on simulated predefined structures divided into layers and organized into a network, neuroevolutionary algorithms are capable of evolving the structure of the network to better suit the specific problem. The process of selection is similar to that of the genetic algorithm (GA) [38] where each potential solution is assigned a describing genome that is altered, mutated, and combined with other solutions to attain an optimal. This can often result in simpler and smaller networks, requiring less computational power to execute. Furthermore, while traditional networks heavily depend on weights and biases, the approach of using weight agnostic neural networks (WANN) [19] simplifies the process of selecting weights, pushing the responsibility for addressing the problem toward network architecture.

The motivation behind employing WANN in this research lies in the intriguing concept that a network’s architecture can be dynamically evolved to address specific challenges, diverging from the conventional emphasis solely on weights and biases. This relatively unexplored avenue within computer science offers a novel approach to problem-solving. Additionally, WANN’s distinct advantage in mitigating the computational complexity associated with traditional backpropagation methods serves as a key motivation. By streamlining the network optimization process during training, WANN provides an efficient and promising framework for enhancing crop yield forecasting models, prompting further exploration and investigation within the scope of this work.

The performance of AI algorithms is heavily connected to parameter values that define behavior. Often referred to as hyperparameters, these are usually defined by default to ensure a good general performance of the algorithm. Additional tuning is often required for the algorithm to address a specific problem effectively. This task was traditionally tackled via trial and error; however, with the increasing number of hyperparameters present in newer algorithms, automated processes are needed.

Hyperparameter tuning can often be considered an NP-hard task. Therefore, to tackle it effectively, algorithms capable of resolving NP-hard problems are required. One notably interesting subgroup of algorithms known for the ability to tackle NP-hard problems with reasonable computational resources and within realistic time frames is swarm intelligence algorithms. These algorithms simulate cooperative populations usually observed in nature, through a set of simple rules. By following these rules, individuals allow for complex behaviors to emerge on a global scale, allowing for effective optimization to take place.

To effectively tackle the pressing issue of crop yield forecasting, several problems need to be addressed as well. To efficiently apply WANN, adequate parameter selections need to be made. Furthermore, adequate weight adjustments need to be made to attain satisfying results when applied to this specific problem.

This work proposes a two-layer cooperative framework based on metaheuristics optimization algorithms. The first layer (L1) is tasked with optimizing the WANN hyperparameters to efficiently select the best possible network architecture suited to crop yield forecasting. The best models created by L1 are passed on to layer two (L2) where shared weights are further optimized. Additionally, a modified version of the recently introduced RSA is developed and applied within the context of the optimization framework. This approach has been evaluated on two real-world datasets, and the results compared with several state-of-the-art metaheuristics tuned WANN as well as several standard ML, and AI models applied to the same problem. Finally, the best-performing models have been subjected to statistical evaluations to determine the significance of the attained improvements, followed by a Simulator for Autonomy and Generality Evaluation (SAGE) [21] analysis to determine what features have the highest impact on the model predictions.

The scientific contributions of this work may be summarized as the following:

A proposal of a novel WANN-based approach for forecasting crop yields.
An introduction of a cooperative two-layer framework for WANN structure optimization and shared weights tuning
An introduction of a modified version of the RSA metaheuristics which is incorporated into the proposed framework.

The remainder of this work is structured according to the following: Sect. 2 presents preceding research that has contributed to the conducted work. In Sect. 3, the proposed method is described in detail, and the logic behind the modified metaheuristic is elaborated. The experimental setup and developed two-layer experimental framework, as well as the two utilized datasets, are described in Sect. 4, and the attained results are presented and discussed in Sect. 5. Finally, Sect. 6 gives a few concluding words on the work and presents proposals for future research.

2 Background and related work

The concept of precision agriculture implies the application of Information and Communication Technologies (ICT) to provide better situational awareness on the farm, and therefore provide the possibility of making more effective decisions. Smart farms are systems with a multi-layered structure that allows individual components to be added or removed according to specific needs. IoT enables the collection, transmission, and exchange of information between components, while AI brings automation of system management through autonomous decision-making.

Optimum crop management requires prior soil analysis and assessment of irrigation needs. Then, the estimate of actual crop growth and yield can be compared with projections, taking into account weather conditions and other factors. Deep convolutional neural networks (CNN or DCNN) combine the ability to recognize objects by shape, color, and texture. Computer vision (CV) and YOLOv3 algorithm may be successfully used, with a precision of over 92%, by harvesting robots for fruit detection and yield counting, as described in [32]. AI, CV, and YOLO can also help in the real-time detection of crop diseases, e.g., early blight disease in potato fields, apple scab and rust, or grapevine disease, just to name a few [51]. Existing algorithms for fruit recognition are the basis for creating a new generation of robots capable of solving the problem of labor shortages for fruit harvesting.

Precise crop yield estimation may prove to be difficult due to complex, interrelated environmental factors. Significant variations in the assessment can be influenced by weather changes in different stages of plant growth, spatial variability of soil properties, crop rotation, fertilization, irrigation, etc. There are two basic approaches when estimating crop yields, a crop growth model and a data-driven model.

The existing literature on crop yield forecasting has witnessed advancements in various methodologies; however, a notable research gap exists concerning the exploration and optimization of WANN in this context. While the potential of WANNs to generate lightweight networks with shared weights has been acknowledged, their application and optimization for crop yield forecasting remain underexplored. The majority of studies in this domain focus on traditional approaches or lack comprehensive investigations into leveraging the capabilities of WANNs. This research aims to bridge this gap by introducing a novel two-layer cooperative framework and a modified metaheuristic to optimize WANN parameters for enhanced crop yield forecasting accuracy. The proposed methodology addresses the current gap by providing a systematic exploration of WANNs in the specific context of crop yield prediction, offering insights that contribute to the advancement of predictive modeling in agriculture.

Various mathematical models, the so-called crop growth models can be used to simulate the interaction of plant physiological processes with the environment. For the model to work, it is necessary to provide real data on the type of soil, solar radiation, precipitation, temperature changes, adopted management practices, etc. Semi-empirical crop models may provide fair results [25, 46, 61], but they are expensive in terms of time and money and impractical for mass applications and agricultural planning.

The empirical approach is more practical and easier to use than the crop growth model. Here, yield data from the recent past is used, and a set of the most influential parameters on yield variation is determined. Accepting these parameters as independent, and the harvest yield as the dependent variable, empirical equations are formed to calculate the coefficients of these parameters, which are then used for the final estimation of the crop yield. This approach is economically more viable and easier to implement and does not require prior information about the physiological processes involved in plant growth or a predefined model structure [39].

Modern ICT provided the basis for agriculture to become more efficient, first by massively embracing web technologies in the early twenty-first century. A decade later, the emergence of affordable sensors, microcontrollers, single-board computers (SBC), and eventually wireless sensor networks (WSN), has inevitably led to the ever-expanding use of modern electronics in agriculture, both industrial and subsistence, with the aim of data collection, transfer, aggregation, and analytics, all toward tasks automation and increased productivity. Internet of Things (IoT), fog computing (FC), and cloud computing (CC) have become indispensable components of modern farming (Fig. 6).

Metaheuristic algorithms have proven to be powerful optimization algorithms, with the ability to address even NP-hard problems. Swarm intelligence algorithms are notably interesting for their ability to tackle these problems using a relatively simple set of rules imposed on a population. By following these rules, overarching behaviors occur on a global scale leading the algorithm toward promising areas in the search space and eventually optimal solutions.

Some notably powerful algorithms are inspired by nature such as the artificial bee colony (ABC) [15, 29] algorithm, firefly algorithm (FA) [30, 67], particle swarm optimization (PSO) [62] algorithm, bat algorithm (BA) [68], Harris hawks optimization algorithm (HHO) [23], and whale optimization algorithm (WOA) [37]. More novel algorithm examples include the reptile search algorithm (RSA) [1] and the chimp optimization algorithm (ChOA) [31]. Finally, while demonstrating admirable performance, these algorithms do not come without shortcomings, and one notable approach for improving performance comes from algorithm hybridization.

Hybrid algorithms have been applied to several real-world problems and demonstrated admirable performance. Some notable examples come from health care [8, 70]. Ways of tackling computer security issues have also been improved through the use of hybrid algorithms [26, 69]. Forecasting has also been improved with hybrid algorithms as demonstrated on crude oil [27], stock prices [4, 28], and energy prediction [45, 56].

3 Methods

The following section presents an overview of WANN principles. After this, the original RSA is presented followed by the introduced modifications. Finally, the proposed two-layer framework is presented and discussed.

3.1 Weight agnostic artificial neural networks

Weight agnostic neural network is a neural architecture search (NAS) technique and an evolutionary strategy in the development of neural networks where the model weights are not trained [19]. The goal is to find the smallest neural network architecture capable of coping with several reinforcement learning (RL) tasks without training weights. WANNs are inspired by animals where newborns come with innate reflexes (e.g., walking, swimming, and hiding from predators), not having to acquire them through trial and error, i.e., training. The task of WANN is to provide satisfactory performance with a single common weight, even when randomly assigned.

WANNs imply smaller network sizes, with fewer connections between nodes and with the optimal overall architecture. Performance depends exclusively on the network architecture and can prove to be inferior compared to other methods, based on the given scenario. The model weights are not optimized for the given task and therefore are underutilized, and there are no clear stopping criteria since the search space is unbounded. A low number of iterations will lead to poor network performance, while too many iterations will take more computing resources than necessary [35].

The WANN algorithm can be described through the following steps:

1.
Initialize. Generate an initial population of minimal neural network topologies, consisting of input and output layers.
2.
Evaluate. Each network’s performance with the different common values assigned in each pass. Use of a fixed sequence, e.g., − 2, − 1, − 0.5, + 0.5, + 1, and + 2, can help reduce the variance between evaluations, and use of the values greater than 2 will lead to similar behavior due to saturation of the activation functions.
3.
Rank. After evaluation, the networks get ranked based on the achieved performance and simplicity. Comparing two network models with similar performance, the one with the simpler structure gets chosen.
4.
Vary. New network topologies are created by mutation of existing simple networks by adding nodes, and neurons or changing the activation function, and the best topology gets chosen through tournament selection.

At this point, the algorithm may go back to Step (2), and the process repeats. The outcome is weight agnostic topology of gradually increasing complexity and enhanced performance in each successive generation. When the number of iterations hits the allowed maximum, the algorithm will stop.

WANNs are capable of learning abstract associations, without the need for encoding explicit relationships between inputs. The use of learned features can be evaluated through the execution of continuous control tasks, e.g., CartPoleSwingUP, BipedalWalker, and CarRacing-v0 tasks as described in [5, 19].

Unlike the conventional fixed topology networks which require extensive tuning in order to produce desired behavior, WANNs may accomplish this with random shared weights due to architecture strongly biased toward solution. Although the magnitude of weights may not be crucial, their respective value and consistency of sign are, and thus, WANNs can fail with randomly assigned individual weights. Finally, the use of single shared weight is much simpler compared to the use of gradient-based methods.

Besides RL tasks, WANNs may also be used in solving high-dimensional classification tasks, e.g., image classification, as demonstrated on MNIST dataset in [19, 34]. Restricted to a single weight value, WANNs performed in MNIST digits classification as well as a single-layer neural network with thousands of weights, and yet, the WANN structure remains flexible to allow further weight training and accuracy improvements.

WANNs structure provides different predictions at each weight value, which may be treated as a distinct classifier. This gives the possibility of use of a single WANN with multiple weight values as a self-contained ensemble. Vice versa, as WANNs are optimized to perform well using a shared weight over a range of values, this single parameter can be used to increase the network performance, which may prove to be useful in few-shot learning [16] and continual learning [43].

3.2 Original reptile search algorithm (RSA)

Inspired by the social, hunting, and encircling behaviors of crocodiles, the RSA algorithm is a novel gradient-free and population-based optimization algorithm originally introduced by [1]. By mathematically simulating these processes, the RSA can address complex tasks. By simulating agent cooperation robustness is further augmented. The algorithm is comprised of several stages described below.

3.2.1 Initialization stage

The first step in the optimization procedure is creating a population of agents (X) as per Eq. 1 that represents potential solutions. These solutions are created through a stochastic process. The best-attained solution is treated as optimal through subsequent iterations.

$$\begin{aligned} X = \left[ \begin{array}{ccccc} x_{1,1} &{} \cdots &{} x_{1,j} &{} x_{1,n-1} &{} x_{1,n} \\ x_{2,1} &{} \cdots &{} x_{2,j} &{} \cdots &{} x_{2,n} \\ \vdots &{} \ddots &{} \vdots &{} \ddots &{} \vdots \\ x_{i,1} &{} \cdots &{} x_{i,j} &{} \cdots &{} x_{i,n} \\ \vdots &{} \ddots &{} \vdots &{} \ddots &{} \vdots \\ x_{N-1,1} &{} \cdots &{} x_{N-1,j} &{} \cdots &{} x_{N-1,n} \\ x_{N,1} &{} \cdots &{} x_{N,j} &{} x_{N,n-1} &{} x_{N,n} \\ \end{array} \right] \end{aligned}$$

(1)

in this context, X refers to a collection of potential solutions that are generated randomly using Eq. 2. Here $x_{i,j}$ represents the value at the j-th position of the i-th solution. N represents the total number of potential solutions in the set X, and n represents the dimension size of the given problem.

$$\begin{aligned} x_{ij} = \text{rand} \times (\text{UB} - \text{LB}) + \text{LB}, j = 1, 2, \ldots , n \end{aligned}$$

(2)

in which the term rand refers to a randomly generated value. The lower and upper bounds of the given problem are represented by LB and UB, respectively.

3.2.2 Encircling (exploration) stage

The algorithm employs two distinct search stages: exploration and exploitation. The transition between behaviors is determined by four variables, which involve separating iterations into four segments. During the exploration phase, the RSA deploys different search strategies to explore the search space and approach a better solution. These strategies include the high walking strategy and the belly walking strategy.

The current stage of searching is governed by two conditions, with the high walking movement strategy triggered for t values less than or equal to $\frac{T}{4}$, and the belly walking movement strategy activated for t values between $\frac{T}{4}$and $\frac{T}{2}$, as well as t values greater than $\frac{T}{4}$. This indicates that the aforementioned condition is met during approximately half of the total exploration iterations, with high walking and belly walking strategies being utilized for the respective portions. Both of these approaches involve exploration-based search techniques. Additionally, to generate diverse solutions and explore varied regions, a stochastic scaling coefficient is considered for each element. This coefficient follows a simple rule that mimics the encircling behavior of crocodiles. In this study, we introduce position updating equations for the exploration stage, as outlined in Eq. 3.

$$\begin{aligned} x_{x,j}(t+1) = {\left\{ \begin{array}{ll} \text {Best}_j(t) \times -\eta _{(i,j)}(t) \times \beta - R_{(i,j)}(t) \times rand, &{} t \le \frac{T}{4} \\ \text {Best}_j(t) \times x_{(r_1,j)} \times ES(t) \times rand, &{} t \le \frac{T}{2} \; \text {and} \; t> \frac{T}{4} \end{array}\right. } \end{aligned}$$

(3)

where the j-th position in the best solution obtained so far is represented by $\text {Best}_j(t)$. The variable rand represents a random number ranging from 0 to 1. The value of t indicates the current iteration number, while T represents the maximum iteration count. The operator $\eta _{(i,j)}$ is used for hunting and corresponds to the j-th position in the i-th solution. By applying Eq. (4), $\eta _{(i,j)}$ can be computed. The parameter $\beta$, fixed at 0.1, determines explorative accuracy during the encircling stage. It determines the high walking behavior over iterations. The reduction function, $R_{(i,j)}$, is used to decrease the search area and is calculated using Eq. (5). The position of the i-th solution is denoted by $x_{(r_1,j)}$, where $r_1$ is a random number between 1 and N. The value in N represents population size. The probability ratio, $\text{ES}(t)$, takes decreasing random values in the range $[-2, 2]$ over the iterations and is calculated using Eq. (6)

$$\begin{aligned} \eta _{(i,j)}= & {} \text {Best}_j(t) \times P_{(i,j)}, \end{aligned}$$

(4)

$$\begin{aligned} R_{(i,j)}= & {} \frac{\text {Best}_j(t) - x_{(r_2, j)}}{\text {Best}_j(t) + \epsilon }, \end{aligned}$$

(5)

$$\begin{aligned} \text {ES}(t)= & {} 2 \times r_3 \times (1 - \frac{1}{T} ), \end{aligned}$$

(6)

in the given equations, the value of epsilon is small, $r_2$ is an arbitrary value from a range [1, N]. The correlation value of 2 is used in Eq. (6) to generate values in the range [2, 0], while $r_3$ represents a random integer value in the range $[-1, 1]$. The percentage difference between the j-th location of the optimal obtained solution so far and the j-th position of the current solution is indicated by $P_{(i,j)}$. This value is computed with Eq. (7)

$$\begin{aligned} P_{(i,j)} = \alpha + \frac{x_{(i,j) - M(x_i)}}{\text {Best}_j(t) \times (\text{UB}_{(j)} - \text{LB}_{(j)} + \epsilon '} \end{aligned}$$

(7)

in which, $M(x_i)$ represents the average position of the i-th agent, which can be determined using Eq. (8). The upper and lower boundaries of the j-th position are represented by $\text{UB}_(j)$ and $\text{LB}_(j)$, respectively. The parameter $\alpha$, fixed at 0.1, represents a sensitive control value that determines the exploration accuracy, which determines the difference between agent fitness during hunting through the run.

$$\begin{aligned} M(x_i) = \frac{1}{n} \sum _{j=1}^{n} x(i,j). \end{aligned}$$

(8)

3.2.3 Hunting (exploitation) stage

This section describes the predatory behavior of RSA, which involves hunting. The hunting behavior of crocodiles is discussed, specifically their two strategies: hunting coordination and cooperation. These strategies utilize different intensification techniques, which focus on exploiting local search areas. With their intensified approach, crocodiles can more easily approach their target prey compared to encircling mechanisms. As a result, the exploitation can identify a near-optimal solution, although it may require multiple attempts. In addition, during this stage, exploitation mechanisms are utilized to conduct a more focused search close to the optimal solution, while also emphasizing communication between the mechanisms.

The RSA’s exploitation mechanisms use two primary search strategies (hunting coordination and hunting cooperation) to explore potential solutions and locate the optimal solution. These strategies are represented mathematically in Eq. (9). During this phase, the search is guided by specific conditions: The hunting coordination strategy is employed when t is between 3/4T and 2T/4, while the hunting cooperation strategy is used when t is between T and 3/4T. Additionally, stochastic methods are used to generate denser solutions and focus on promising regions locally. To simulate the hunting behavior of crocodiles, the authors employed a simple rule. The paper proposes position-updating equations for the exploitation phase, which are also represented in Eq. (9).

$$\begin{aligned} x_{x,j}(t+1) = {\left\{ \begin{array}{ll} \text {Best}_j(t) \times P_{(i,j)}(t) \times \text{rand}, &{} t \le \frac{3T}{4} \; and \; t> \frac{T}{2} \\ \text {Best}_j(t) - \eta _{(i,j)}(t) \times \epsilon - R_{(i,j)}(t) \times \text{rand}, &{} t \le T \; and \; t> \frac{3T}{4} \end{array}\right. } \end{aligned}$$

(9)

The variable $\text{Best}_j(t)$ represents the j-th location in the best solution found up to the current time step t. $\eta _{(i,j)}$ refers to the hunting operator for the j-th location in the i-th solution, which is determined by Eq. (4). The variable $P_{(i,j)}$ represents the percentage difference between the j-th location in the best solution and the j-th location in the current agent and is computed using Eq. (7). The value of $\eta _{(i,j)}$ is also calculated using Eq. (4), with a small constant value $\epsilon$. Finally, $R_{(i,j)}$ is applied to sharing the search space and is computed using Eq. (5).

The exploitation search mechanisms, including hunting coordination and cooperation, aim to avoid being stuck in local optima. The mechanisms help the exploration search find the optimal agent and maintain diversity among candidate agents. The authors designed two parameters, $\beta$ and $\alpha$, to generate a stochastic variable following every iteration, which facilitates exploration during the early iterations and the later ones. This aspect of the search is particularly useful when faced with local stagnation, especially in the final iterations.

3.3 Multi-swarm RSA (MSRSA)

Metaheuristic algorithms rely on an effective balance between the two primary mechanisms. Exploration helps algorithms locate promising areas, while exploitation focuses on promising regions helping locate near-optimal (sub-optimal) solutions within the smaller region. While these mechanisms help metaheuristics overcome many difficult and even NP-hard tasks, it is also important to note that as per the no-free lunch (NFL) [63] theorem of optimization, no single metaheuristic is equally suited to all problems. All metaheuristics have certain advantages as well as limitations. It is important to emphasize that constant experimentation and improvement of existing metaheuristics are essential for determining the most suitable tools for tackling emerging challenges.

One promising approach for tackling deficiencies present in certain metaheuristics is hybridization. By combining attributes of compatible algorithms, the resulting approach can overcome the deficiencies of the original and even produce results that are more than the sum of their parts. While this may result in a slightly increased complexity, hybrid algorithms demonstrate excellent performance justifying the slight increase in complexity. Two major approaches of hybridization exist today: low-level (LLH) and high-level hybridization (HLH). In LLH, search mechanisms of an algorithm are replaced by different mechanisms, while in HLH can be considered as self-contained.

The introduced metaheuristic takes the LLH approach, introducing mechanisms of two well-known metaheuristics, the ABC and FA, into the robust base of the novel RSA. The FA is well known for its powerful exploitation mechanism, while the ABC poses a powerful exploration mechanism. These two algorithms are, therefore, respectively, used to boost the exploration and exploitation of the original RSA.

The initialization procedure for the introduced method incorporates random population generation as well as two additional mechanisms to boost initial population diversity. The two mechanisms incorporated into the metaheuristic are chaotic maps initialization and quasi-reflection-based learning (QRL). Given a population with a size N, an initial portion of $\frac{N}{2}$ is randomly initialized within the constraints of the given search space. This subpopulation is then divided into two. These are then processed using QLR and chaotic maps, respectively. The one reminding $\frac{N}{4}$ of this population is generated by applying chaotic maps, while the final $\frac{N}{4}$ is created through the use of the QRL mechanism. These mechanisms are further described in the following.

Applying chaotic maps can help aid the search procedures of metaheuristics. Several options for chaotic maps exist; however, empirical experimentation suggests that the application of logistic maps yields is best suited to this application.

In the initialization stage, a pseudo-random number $\theta _0$ is used to seed a chaotic sequence as per Eq. 10

$$\begin{aligned} \theta _i + 1 = \mu \theta _i \times (1-\theta _i),\; i=1,2,\ldots ,N-1 \end{aligned}$$

(10)

in this context, N refers to the size of the population, i represents the sequence number, and $\mu$ is a control parameter for a chaotic sequence with an empirically selected value of 4. The value of $\theta _0$ falls between 0 and 1, but is not equal to 0.25, 0.5, 0.75, or 1.

Each potential agent is mapped based on the generated chaotic sequence as demonstrated in Eq. 11.

$$\begin{aligned} X^c_i = \theta _i X_i \end{aligned}$$

(11)

where the variable $X^c_i$ represents the updated position of individual i following chaotic disturbances.

The QRL method involves producing quasi-reflexive-opposite solutions by following the principle that if an individual is located far away from the optimal solution, there is a higher likelihood that the opposite solution could be situated closer to the optimum.

When implementing the QRL process as described previously, the quasi-reflexive-opposite individual $X^{\text{qr}}$ of the solution X can be generated using Eq. 12 for each component j of the X solution:

$$\begin{aligned} X^{\text{qr}} = \text{rnd}\bigg (\frac{\text{LB} + \text{UB}}{2}, X\bigg ) \end{aligned}$$

(12)

where $\text{rnd}\bigg (\frac{\text{LB} + \text{UB}}{2}, X\bigg )$ denotes the generation of a random value form a uniform distribution, between $\frac{\text{LB} + \text{UB}}{2}$ and X, with LB and UB denoting the lower and upper boundaries, respectfully. This initialization procedure is used for every individual in the population.

The population is divided into a pair of subpopulations. Each subpopulation is utilized to apply an LLH version of the RSA algorithm. One population leveraging the ABC algorithm hybridized into the RSA for a boost in exploration, while the second utilizes the FA introduced in the RSA to focus on exploitation.

The utilized mechanisms from the ABC algorithm are described with a set of equations. The scouting phase is described in Eq. 13.

$$\begin{aligned} x_{i,j} = ;b_j + \text{rand}(0,1)*(\text{ub}_j - \text{lb}_j) \end{aligned}$$

(13)

in which $x_{i,j}$ represents the j parameter of bee i form the population, $\text{rand}(0,1)$ denotes a random value from a uniform distribution between 0 and 1, and $\text{lb}_j$ and $\text{ub}_J$ represent the lower and upper bounds of parameter j.

The bee and onlooker formulas are given in Eq. 14.

$$\begin{aligned} v_{i,j} = {\left\{ \begin{array}{ll} x_{i,j} + \phi * (x_{x,j} - x_{k,j}), &{} R_j < MR\\ x+{i,j}, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(14)

in which $x_{i,j}$ denotes the j-th element of the previous solution i, neighboring solution k parameters are denoted with $x_{k,j}$, while $\phi$ denotes a random value in range [0, 1], and MR defines the modification rate.

The primary search mechanism for the FA algorithm is described in Eq. 15.

$$\begin{aligned} x_i^{t+1} = x_i^{t} + \beta _0 * e^{-\gamma r^2_{i,j}}(x^t_j - x^t_i) + \alpha ^t(k-0.5) \end{aligned}$$

(15)

where $\textbf{x}_i$ and $\textbf{x}j$ are the positions of the i-th and j-th fireflies, r ij is the distance between them, $\beta _0$ and $\gamma$ are the parameters controlling the attractiveness, $\alpha$ is the step size, and $\textbf{rand}_i$ is a random vector.

The hybrid search mechanisms for the LLH ABC and FA subpopulation are shown in Algorithm 2 and Algorithm 3, respectively.

The described ABC and FA have been carefully chosen for their characteristics. The role of the ABC algorithm is to boost the hybrid algorithms’ exploratory power. Likewise, the FA has been chosen due to the potential of its exceptionally powerful exploitation mechanism to boost the hybrid algorithms’ exploitation power.

One additional mechanism inspired by the GA is introduced called transfer learning. This mechanism considers the best-performing individuals from each subpopulation as potential parents for new agents. The new agents adopt a certain set of traits from their parents and represent a combination of the two. This is simulated using a uniform crossover between agent traits. To govern this process, an additional value control parameter representing population crossover and denoted as PC is introduced. This value has been empirically determined to give the best results when $\text{PC} = 0.1$. When applying uniform crossover, two parents are selected from a population, and offspring is generated based on agent values, these offspring agents replace the worst performing in each respective population. The fitness of these newly generated agents is not evaluated after generation, thus computational complexity is maintained.

Finally, a high-level overview of the complete proposed MS-RSA algorithm can be seen in Algorithm 4.

3.4 Experimental framework

The experimental cooperative framework is comprised of two layers. Each layer is tasked with handling one specific task. The layers are labeled Layer 1 (L1) and Layer 2 (L2). Both evaluated datasets have been subjected to both stages of the framework to select the optimal models suited to forecasting yields. A visualization of the framework is shown in Fig. 7.

3.5 Metaheuristic optimizers

Metaheuristic algorithms present a popular choice for selecting near-optimal parameters of baseline algorithms. Contemporary ML and AI algorithms are usually designed with good general performance in mind. However, while showing good general performance, algorithms usually require adaptation for a specific problem. This is done through a set of exposed parameters that are available to programmers. The process of selection is not without its challenges, as the number of combinations often makes this an NP-hard problem. This work evaluates several contemporary optimizers alongside the introduced algorithms. Brief descriptions of these algorithms, with their respective inspirations and basic search strategies, are discussed below.

The recently introduced ChOA [31] is inspired by the individual intelligence and mating motivation observed in chimpanzees during group hunting. The algorithm is designed to address two common issues in optimization problems: slow convergence speed and getting trapped in local optima, especially in high-dimensional scenarios. The algorithm incorporates a mathematical model representing diverse intelligence and mating motivation in chimps. Four types of simulated chimps—attacker, barrier, chaser, and driver—are utilized to capture the range of intelligence observed in chimpanzee groups. The hunting behavior is divided into four main steps: driving, chasing, blocking, and attacking.

Several well-known algorithms have also been compared including in the comparisons as well. Such as the GA [38], a search and optimization method inspired by natural selection. It operates with a population of potential solutions, represented as chromosomes. The algorithm involves the selection of individuals based on their fitness, crossover (genetic recombination) to create offspring, and mutation to introduce random changes. Through iterations, the population evolves, mimicking the process of natural evolution. GAs are versatile and effective for solving complex optimization problems across various domains due to their ability to explore vast solution spaces and handle multiple local optima simultaneously.

The PSO [62] is an optimization algorithm based on the collective behavior of social organisms, particularly the movement patterns of bird flocks or fish schools. In PSO, a population of potential solutions is represented as particles, each navigating through the solution space. These particles adjust their positions based on their own experience (local best) and the shared knowledge of the entire swarm (global best). The movement is influenced by both the particle’s current velocity and the historical best positions. This collaborative exploration and exploitation process encourages convergence toward optimal solutions. PSO is known for its simplicity and efficiency in finding solutions to optimization problems, particularly in continuous and high-dimensional spaces. The algorithm’s ability to balance exploration and exploitation makes it suitable for various applications, including engineering, finance, and ML.

The ABC [29] algorithm is a technique inspired by the foraging behavior of honeybees. In ABC, the population consists of artificial bees, and the algorithm is designed to mimic the food source exploration process observed in a bee colony. The optimization process involves three main components: employed bees, onlooker bees, and scout bees. Employed bees explore the solution space, representing potential solutions. Onlooker bees select solutions based on the employed bees’ performance and exploit these solutions for further exploration. Scout bees, in turn, introduce randomness by exploring new solutions when the algorithm stagnates or fails to improve. The iterative nature of ABC allows for the continuous refinement of solutions, making it suitable for various optimization problems, especially in domains such as engineering, logistics, and data analysis.

The FA [67] ) is inspired by the light-flashing behavior of fireflies in nature. In FA, potential solutions to an optimization problem are represented as fireflies, and the algorithm seeks to improve these solutions iteratively. The attractiveness of a firefly is determined by its brightness, influenced by both its distance from other fireflies and their respective brightness levels. Brighter fireflies attract others, and the algorithm simulates this process to converge toward optimal solutions.

The main steps of the firefly algorithm include the initialization of firefly positions, the computation of their attractiveness, the movement toward brighter fireflies, and the updating of the solution space. FA effectively balances exploration and exploitation, making it suitable for various optimization problems. It has been applied in fields such as engineering, finance, and image processing, demonstrating its versatility and effectiveness in finding solutions to complex optimization challenges.

The BA [68] is a metaheuristic optimization method inspired by bats’ echolocation behavior. It utilizes a population of virtual bats for solving optimization problems, incorporating both local and global search strategies. BA represents solutions as bat positions, adjusting their emission rates (loudness) for exploration–exploitation trade-offs. The algorithm’s random walks facilitate global exploration. Known for its simplicity and effectiveness, BA has been successfully applied to diverse optimization domains, including engineering, finance, and data science.

The HHO [23] algorithm is a nature-inspired optimization technique based on the cooperative hunting behavior of Harris’s Hawks. In HHO, potential solutions to an optimization problem are represented as hawks, and the algorithm mimics their collaboration during hunting. The optimization process involves exploration, exploitation, and communication among the hawks to improve solutions iteratively.

The key features of HHO include the representation of solutions as hawk positions, the integration of explorative and exploitative movements, and the adoption of a leader-follower strategy. The leader attracts followers based on their fitness, and the followers adjust their positions accordingly. HHO has shown promise in solving various optimization problems due to its ability to balance exploration and exploitation inspired by the collaborative hunting nature of Harris’s hawks.

The WOA [37] is a method based on the cooperative hunting behavior of humpback whales. It represents potential solutions as whale positions and incorporates exploration and exploitation strategies. WOA has demonstrated effectiveness in solving diverse optimization problems, making it applicable to fields such as engineering, finance, and data

4 Experimental setup

During experimentation, WWANNs are assigned the task of predicting crop yields across two distinct datasets. Metaheuristics are employed to optimize the hyperparameters of WANN within a two-layer framework to enhance performance. The first layer involves architecture selection, and in the second layer, shared weights are optimized. Subsequently, the outcomes undergo meticulous validation and interpretation using the SAGE method.

4.1 Employed datasets

The wild blueberry (Vaccinium angustifolium Aiton) that can be found on Kaggle ^{Footnote 1} is influenced by cross-pollination that requires bees [2, 12]. Thus, it is affected by current bee density [2], but also by other factors including weather, soil fertility, pests, disease, and others. To produce useful results, ML algorithms for crop yield prediction generally require large amounts of data, and the availability of training data of sufficient quality and quantity may appear as a problem. The wild blueberry predictive yield models require data that sufficiently characterize the influence of spatial characteristics of plants, bees, and weather conditions on production. Experiments may be performed on a calibrated version of the blueberry simulation model. The simulated dataset is examined with proper feature selection and afterward used to build four ML-based prediction models, which may be used for comparison with real-life data acquired in the fields. Wild blueberry yield prediction dataset [41, 54] was generated by a wild blueberry pollination model, a spatially explicit simulation model validated by field observations, and experimental data collected in Maine (USA) during the past three decades.

Another crop yield prediction dataset [44] that can also be found on Kaggle ^{Footnote 2} brings yield data on the most common agricultural cultures per country per year (maize, potatoes, rice, wheat, sorghum, soybeans, yams, cassava, sweet potatoes, and plantains) combined with data on average rainfall, temperature, and use of pesticides. Data sheets were compiled from publicly available datasets from the Food and Agriculture Organization (FAO) and World Data Bank.

Both datasets underwent pre-processing, involving the transformation of categorical features, notably the crop species, through the utilization of the one-hot encoding technique to enhance forecasting accuracy. However, this approach results in an expansion of the feature set for each dataset. Specifically, the crop yield dataset experienced a transformation from the original seven features to a total of 116 features, while the blueberry dataset expanded from the initial 18 features to a total of 17 used as inputs in the specific test case.

4.2 Evaluation metrics

To ensure a thorough examination, several metrics have been utilized during experimentation. The utilized metrics include the mean square error (MSE) described in Eq. 16, root-mean-square error (RMSE) described in Eq. 17, mean absolute error (MAE) described in Eq. 18, as well as the coefficient of determination $R^2$ described in Eq. 19 metric. It is important to note that the $R^2$ metric is utilized as the primary guiding objective function utilized during the network architecture selection process in L1 of the framework. While MSE is used as an objective function when optimizing shared weights in L2 of the framework.

$$\begin{aligned} \text{MSE}= & {} \frac{1}{n}\sum _{i=1}^{n}(y_i - \hat{y_i})^2 \end{aligned}$$

(16)

$$\begin{aligned} \text{RMSE}= & {} \sqrt{\frac{1}{n}\sum _{i=1}^{n}(y_i - \hat{y_i})^2} \end{aligned}$$

(17)

$$\begin{aligned} \text{MAE}= & {} \frac{1}{n}\sum _{i=1}^{n}|y_i - \hat{y_i}| \end{aligned}$$

(18)

$$\begin{aligned} R^2= & {} 1 - \frac{\sum _{i=1}^{n}(y_i - \hat{y_i})^2}{\sum _{i=1}^{n}(y_i - \bar{y})^2} \end{aligned}$$

(19)

where n is the total number of observations, $y_i$ is the actual value of the i-th observation, $\hat{y_i}$ is the predicted value of the i-th observation, and $\bar{y}$ is the mean value of all observations.

4.3 Experimental setup and framework adjustments

The initial step is the framework involves selecting a suitable network architecture. To accomplish this, a population of potential architectures is created and evolved through a predefined number of iterations. For the blueberry dataset, a population of 200 individuals was used with 400 generations allocated for improvement. Due to the larger number of input parameters, the crop yield dataset was assigned a larger population of 300 individuals, with 800 generations allocated for improvement. Metaheuristics were leveraged to select control parameters of the WANN. The parameters selected for optimization and their respective ranges are shown in Table 1. It is also important to note that apart from hyperparameters, activation functions for each node also needed to be selected. The considered functions include sigmoid, tanh, gauss, relu, sin, inv, and identity. Due to extensive computation demands, each metaheuristic was assigned a population size of 6 and given 15 iterations to improve network weights to attain optimal performance. Furthermore, to provide grounds for a fair comparison, the evaluations have been repeated over 20 independent runs to account for the heuristics inherent in this class of algorithms.

Table 1 Hyperparamaters optimized by metaheuristics in L1 with their respective ranges

Evaluating the performance of metaheuristic-tuned weight agnostic neural networks for crop yield prediction

Abstract

Similar content being viewed by others

Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues

Geyser Inspired Algorithm: A New Geological-inspired Meta-heuristic for Real-parameter and Constrained Engineering Optimization

Machine learning in agriculture: a review of crop management applications

1 Introduction

2 Background and related work

3 Methods

3.1 Weight agnostic artificial neural networks

3.2 Original reptile search algorithm (RSA)

3.2.1 Initialization stage

3.2.2 Encircling (exploration) stage

3.2.3 Hunting (exploitation) stage

3.3 Multi-swarm RSA (MSRSA)

3.4 Experimental framework

3.5 Metaheuristic optimizers

4 Experimental setup

4.1 Employed datasets

4.2 Evaluation metrics

4.3 Experimental setup and framework adjustments

5 Results and discussion

5.1 L1 observed outcomes

5.1.1 Wild blueberry optimal network architecture

5.1.2 Crop yield optimal network architecture

5.2 L2 observed outcomes

5.2.1 Wild blueberry dataset results

5.2.2 Crop yield dataset results

5.3 Comparative analysis with other well-known ML and ANN models

5.4 Findings validation and best model interpretation

6 Conclusion

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation