1 Introduction

The world's food supply is under increasing threat from things like climate change, natural disasters, drought, resource instability, and a growing human population [1]. By 2050, the global population is expected, which will lead to increased levels of hunger and food insecurity [2]. Food can either be produced locally or imported from other countries [3]. Knowing how much food can be grown in a certain area is crucial for crafting effective food security policy. As a result, progress in developing reliable prediction models is viewed as crucial governance and commercial models [4]. Using accurate food forecast models, governments may reevaluate the size and cost of their yearly food imports [5]. Further, understanding the worth of food production helps disadvantaged populations cope with food insecurity and manage poverty [6].

Rationing out the many foodstuffs crafted by farmers is the best conclusion of farming, and it's common knowledge that humans can't survive without it. The goods fashioned by the food business are any nation [7]. It's also crucial to the growth of the national economy and the global economy as a whole. As a consequence, there is a pressing want for high-quality, secure food items that can be widely distributed. New technologies, such as artificial intelligence (AI), have been very successful in recent decades at accomplishing their goals [8]. Therefore, studying the smart agricultural and cutting-edge food business facets of artificial intelligence is crucial. Artificial intelligence (AI) is a form of technology that mimics intelligence and aptitude using technologies, primarily computers, robots, and electronic devices [9]. Natural language processing (NLP) is one AI application that allows machines to understand spoken human language in real-time; computer vision allows machines to see analogue-to-digital conversions like videos; speech recognition and expert systems replicate human judgement; and so on [10]. Learning (obtaining data and then developing algorithms to convert it into usable information), reasoning (deciding which algorithm to use to achieve a desired outcome), and self-correction (constantly modifying designed algorithms to ensure the most accurate results) are the three pillars upon which AI is built [11]. just some of the fastest-growing industries where AI is being employed. One of AI's main focuses, machine learning (ML), allows humans to be more productive and creative in their job [12]. Machine learning (ML) uses statistical and mathematical techniques to learn from datasets and make judgements based on the collected info. There are a variety of approaches that may be used. There are two broad categories that may be differentiated from one another: symbolic techniques (in which the generated rules and instances are represented openly) and subsymbolic approaches (artificial neural networks: ANN) [13].

There are three main applications of ML methodology, and they are known as supervised learning. The goal of this method, as defined by supervised learning [14], is to establish a connection between the input variables and the desired output variable. Using the labelled data and prior knowledge of the input and target output variables, a predictive model is developed. Many algorithms, and analyses fall under the umbrella of supervised learning. Unsupervised learning employs unlabeled datasets and methods such as artificial neural networks, clustering, genetic algorithms, and deep learning [15]. Dimensionality reduction and exploratory data analysis are two of the most common applications of this type of unsupervised machine learning technique [16]. Q-learning and deep Q-learning are only two examples of the many algorithms used for machine skill acquisition, robot navigation, and in-the-moment decision-making that fall within the third class of ML tasks known as reinforcement learning [17]. Learner interaction with the environment for data collection and the training/testing dataset stage are coupled in this ML assignment. The learner receives reinforcement for his interactions with the environment, creating a dichotomy between discovery and exploitation. Learners can't only rely on using what they already know; they have to venture into uncharted territory [18].

The potential for AI applications in the agriculture and food industries has recently expanded. As a matter of fact, AI techniques contribute significantly to the knowledge of model identification, procedures in service to the many agri-food applications and supply chain stagesAI tools issues in agriculture and for the identification of pests and its suitable method of treatment, as well as for the management of crop yields and resource use [19]. Abiotic and biotic factors must be measured with remote sensing and sensors for agricultural and animal management optimisation [20]. Furthermore, there are significant benefits to AI deployment and applications that may completely alter the agri-food industry and associated businesses.

There is an issue with drought in Iran, and the literature cites a number of causes, including climate change and insufficient agricultural irrigation infrastructure. In Iran, food security is at risk due to the severe obstacles posed by the drought. Because of the wide range of climates around the country, many different types of crops may be grown in Iran [22]. But the nation's food security has been threatened by drought and a growing population. Foreign farming is seen as a potential answer to the problem of Iran's food insecurity. Machine learning (ML) is a cutting-edge technique that has been shown to successfully integrate disparate data sources, both structured and unstructured, in order to anticipate the existence of food safety concerns [23]. Machine learning allows computers to "learn" from the information they are given. Inputs of experience (such as historical data) are transformed into outputs of knowledge (such as categorization and prediction) through the learning process (using a learning algorithm). Multiple research have employed ML models to track and forecast food safety, showing that this technique has great potential as a means of tackling this challenging problem.

It is not common practice to employ ML models for assessing and forecasting food safety, despite the fact that they may greatly facilitate these processes. This is due to several factors, including but not limited to safety records.

As a result, DenseNet-LSTM, a hybrid kind of deep learning, is introduced in this study. When compared to CNN, the more beneficial design is the Dense Convolutional Network (DenseNet), which is better at training information from little input and avoids issues like disappearing gradients and parameter increases as the CNN layer deeper. Time-series data are ideal for detecting temporal characteristics, and the LSTM architecture is intended to tackle the long-term reliance problem of Recurrent Neural Networks (RNNs). The work adopts the two optimisation techniques to provide a notable improvement in LSTM through the optimisation of a large number of weighting factors, and the resulting hybrid heuristic-based approach is called A-ROA. The validation effectiveness of the suggested model is demonstrated experimentally.

The residual sections of the paper are structured as shadows: The literature review and identified gaps in the research necessary to advance the suggested model are presented in Sect. 2. The dataset is detailed in Sect. 3, and the suggested model is explained briefly in Sect. 4. In Sect. 5, the validation analysis and discussion is presented. Section 6 concludes with recommendations for moving forward.

2 Related Works

Over 73% of the food supply in the United States is ultra-processed, according to a machine learning technique introduced by Menichetti et al. [24]. We demonstrate that a greater dependence on food is associated with an increased risk of metabolic syndrome, diabetes, angina, and raised age and that it also decreases vitamin bio-availability. Last but not least, we discover that switching to less processed alternatives can greatly lessen the negative effects of ultra-processed meals, indicating that providing consumers with information on the degree of dispensation, which is now unavailable to them, might enhance population health.

Using Landsat satellite pictures from 1996 to 2031, Kafy et al. [25] hope to evaluate and forecast BT's susceptibility to drought. Soil moisture content, index, vegetation condition index, vegetation health index, and normalised difference vegetation index (NDVI, MNDVI, SMC, TCI, VCI, and VHI). Cellular Automata (CA)-Artificial Neural Network (ANN) techniques have been used in VHI in order to detect and forecast DS in 2026 and 2031 according to VCI and TCI features. From 1996 to 2021, the decline of healthy vegetation (19%) and surface water bodies (26%) and augmented higher temperature (> 5 °C) were all factors in the acceleration of DS trends. Furthermore, the VHI finding represents a dramatic rise in extreme drought circumstances between 1996 (2% of the year) and 2021 (7% of the year). According to DS's forecast, the likelihood of extreme has increased. Planners and decision-makers will be better able to improve communities' readiness to cope with drought vulnerability if they have a better grasp of the potential effects of drought.

When compared to the modified chick swarm optimisation technique, the Discrete Deep belief Net classification technique provides more accurate estimates of agricultural output, as proposed by Vignesh et al. [26]. The data parameters were input into the Network's sequentially layered layers. The network architecture is used to build a prediction environment for crop production based on the input parameters. The best data are selected using the modified chick swarm optimisation method, and the resulting data is then fed into the classification procedure. To categorise the data and foretell agricultural output, a Discrete Deep belief network equipped with the Visual Geometry Group Net classifier is deployed. The proposed model outperforms retaining the standard data distribution while accurately forecasting crop yield with a 97% confidence interval.

Khan et al. [27] propose and assess ML-based techniques to forecast process kinetics in a diversity of food processing processes, including canning, extruding, encapsulating, and fermenting. Detailed instructions for creating a model using machine learning and putting it into practice are provided. We cover the limitations and significant issues of neural network training and testing algorithms to help readers choose the appropriate algorithms for addressing problems in the food processing industry. This study also discusses the ML-based approaches in a hybrid setting, such as the food processing industry. Possible uses and approaches for ML modelling techniques guided by physics in the food processing industry are also examined. This work has the potential to contribute significantly to the expansion of ML-based technologies for use in the food processing industry.

A literature overview of ML applications in food safety monitoring and prediction is presented by Wang et al. [28]. The article explores and classifies the many forms of data used for ML modelling, suggests additional data variables, and summarises existing ML applications in this area. Scopus, CAB Abstracts, and the IEEE database were used to compile the review. Research papers written in English between 2011 and 2021's April 1 are included. Most research used either, according to the findings. All of the investigations that were relevant to the examined ML models were validated to have excellent prediction accuracy. This article suggests data sources and input factors for future predicting food safety based on the ML applications.

Crop Support Vector Machines are the three crop prediction models proposed by Aworka et al. [29]. To create a decision-making system based on cutting-edge machine learning models, we merge information on weather, agricultural yields, and pesticide applications. Despite the scarcity of agricultural data in Africa, we have proposed a decision system that can accurately estimate crop yields across 14 East African nations. Our experiments validate the superiority of the three machine learning models we presented for analysing agricultural data. Due to the high accuracy of our agricultural forecast values, the Root Mean Percentage Error of our models is rather small. Predictions of agricultural success in East Africa may be safely made using the presented models.

The Household Dietary Diversity Score (HDDS) have been highlighted as essential indices of food security by Deléglise et al. [30]. In light of the high cost of producing such indicators, we propose a framework which uses state-of-the-art machine and deep learning models to estimate FCS and HDDS beginning with heterogeneous data that is available to the public. Indicators calculated using data from the Burkina Faso government's Permanent Agricultural Survey, which ran from 2009 to 2018, are considered. Our estimates are derived from an assortment of sources, some of which are rasters, GPS points, Bank variables, meteorological data) and time series (such as Smoothed Brightness Temperature—SMT, rainfall estimates, maize prices). Our experimental results reveal that our framework outperforms the state-of-the-art data science tools now in use, and this opens the door to the creation of cutting-edge food security prediction systems.

Using a systematic literature review approach, Kler et al. [31] examine the publications published on many scholarly websites that connect artificial intelligence (AI) and supply chains on the one hand and food business on the other. According to this study's findings, artificial intelligence (AI) and machine learning technologies are still in their infancy, but there is considerable potential for them to improve the performance of the food industry (FI). The application of AI and ML in FI networks gives competitive advantages for development since several researchers have developed and tested AI and ML-related models that have been shown to be beneficial in optimising FI. Others in the academic community argue that AI and ML are already contributing; others maintain that these technologies are still underutilised; and still others argue that their tools and techniques may unlock the full potential of the food industry. The research indicates that using AI and ML might help the food business cut down on wasteful spending and improve its agility and responsiveness.

When primary data is unavailable, Martini et al. [32] suggest a machine-learning technique to estimate the prevalence of persons with inadequate food ingesting and of those adopting crisis or above-crisis food-based coping. The suggested models leverage a one-of-a-kind global dataset to account for up to 81% of the variance in under-consumption and up to 73% of the variance in crisis or coping. We also demonstrate that the presented models can foretell the food security situation in near real-time, and we propose a method for determining which factors are responsible for the shifts in trend forecasts, which is essential for making forecasts useful to decision-makers.

Two machine-learning models for forecasting agricultural output have been proposed by Nosratabadi et al. [33]. The prediction models are improved with the help of ANFIS and MLP techniques, which are adaptive network-based fuzzy inference systems. In our analysis, we focused on two sources of food production: animal production and agricultural production. Livestock production was measured in terms of yield, sum of animals, and sum of animals slaughtered; crop output was measured in terms of yields and losses. The focus of this investigation is on Iran. Therefore, we have collected FAOSTAT time series data on livestock and agricultural outputs in Iran from 1961 to 2017. First, ANFIS and MLP were trained using 70% of the data, and then the models were tested using 30%. Predictions of food production were shown to be most accurate using the ANFIS model equipped with generalised bell-shaped (Gbell) membership functions. The results of this study may be used by policymakers to properly prepare for future food supply by predicting using the model used in this work.

A unique method for predicting fruit production using deep neural networks is presented by Khan et al. [34], who use this method to construct a quick and accurate prediction system for agricultural output. Apple, banana, citrus, pear, grape, and total fruit production statistics were evaluated, analysed, and their expected future output was forecast using deep neural networks in this article. The figures come from the Pakistani government's official statistics agency and reflect the harvest of the country's most popular fruits. We used a triad of techniques to forecast future fruit yields. Levenberg–Marquardt optimisation (LM) came in at 65.6% accuracy, scale conjugate gradient came in at 70.2%, and Bayesian regularisation back propagation (BR) came in at 76.3% accuracy. Because these techniques can compare productivity with growing populations, they may be used to inform new strategies for expanding fruit production in developing nations. According to these estimates, the government of Pakistan has to both boost fruit output and strengthen policy for farmers in order to meet demand.

For demand prediction in a phygital internet network, a LSTM is proposed, as shown by Kantasa-Ard et al. [35]. It is proposed that a hybrid of a genetic algorithm and a scatter search be used to automatically modify the hyperparameters of an LSTM. One real-world case study using agricultural products in a Thai supply chain was used to assess the merits of the proposed method. Accuracy and coefficient of resolution were used to evaluate the proposed method against those of preexisting supervised learnings such as ARIMAX, Support Vector Regression, and Multiple Linear Regression. When demand is less volatile, the findings demonstrate that the LSTM method performs better than the others, demonstrating its superiority in efficiency predictions. It has been shown that hybrid metaheuristics outperform trial-and-error approaches. In conclusion, the findings of the model may be used to reliably anticipate the transportation and storage costs associated with the Internet's distribution process.

Bennett-Lenane et al. [36] investigated two ML algorithms for their ability to foresee possible FE. Drugs licenced between 2016 and 2020 were compiled into a database and then sorted into three categories: positive, negative, and no FE. Training predictive replicas with procedures required the prediction of more than 250 pharmacological attributes for each medication. In terms of FE classification, ANN beat SVM in both the training and testing phases (82% vs. 72%). When compared to the Biopharmaceutics Classification System (BCS), both models showed improved FE prediction accuracy. Since the Octanol, the Number of Hydrogen Bond Donors (HBD), and the Dose (mg) were all relevant for prediction, this preliminary investigation shed fresh light on the relationship between FE and pharmacological characteristics. This study showed that ML can be useful for anticipating four common adverse effects of drugs during their pre-clinical stages.

An innovative portable method (termed NIR-Spoon) for concurrently analysing the mixing fraction of multi-mixture powdered food was presented by Zhou et al. [37]. For processing spectra, researchers proposed using for feature selection (CNN-FS). The NIR-Spoon was used to analyse samples of powdered food that had many ingredients. These ingredients ranged from milk to rice to corn to wheat. The partial least squares regression (PLSR) model for estimating the amount of mixture was 0.059, (R2) was 0.938, as shown by the results. With an RMSE of 0.035 and an R2 of 0.976, the suggested CNN-MR is an improvement above the standard PLSR approach. Even with only 25 features chosen by the CNN-FS process, the CNN-MR maintained an R2 of 0.970. In addition, the percentage might be converted to the weight of each component by using the inbuilt load sensor. The NIR-Spoon incorporated the necessary hardware and software.

3 Research Gap

There are sophisticated and reliable techniques for extrapolating patterns from the past that may be found in the literature. Artificial intelligence models may acquire knowledge from data and make very accurate predictions of non-linear processes. Artificial intelligence methods like neural networks have been shown to be actual in predicting time series data. Existing algorithms forecast high-quality meals using current data, but no studies have concentrated on foreseeing future food production in an effort to eliminate shortages. This provides impetus for the research, which uses data from cities in Iran to make long-term projections about food quality. Furthermore, Iran was selected since it is one of the primary countries that experience increased shortages.

4 Materials

4.1 Dataset Description

A food production forecasting model for Iran over the next decade is the focus of this research. Agricultural output and animal output were taken into account as independent variables in this analysis of food production [38]. Livestock yield, the sum of animals kept alive, and the number of animals killed are the three main indicators of livestock production. The yields and losses of agricultural output have also been taken into account in this analysis. The study's conceptual framework is shown in Fig. 1. The agricultural output of Iran is defined as the cultivation of barley, beans, dates, rye, and olives for the sake of this analysis. This model assesses two input variables of agricultural manufacture: harvests and losses of the aforementioned items. The arrows indicating losses are directed outward since they represent lost outputs. For livestock production, statistics were gathered using information found at http://www.fao.org/faostat/en/#data (retrieved on September 20, 2020) in the FAO database known as FAOSTAT. The years 1961–2017 are represented in the data set.

Fig. 1
figure 1

The projected perfect of the study for food construction in Iran

Figure 1 depicts the quantity of domestic animal output and the quantity of domestic agricultural production in Iran, both of which are taken into account when estimating the country's potential food manufacture. Agricultural output was evaluated and quantified using the two variables yield and losses, whereas livestock output was quantified using the three variables living animals, livestock yield, and killed animals.

4.2 Preprocessing

Before they can be utilised in data processing, the values need to be standardised. In order to obtain the values associated with another variable, certain normalisation techniques just need a rescaling operation. Knowing the characteristics of the crop population will allow us to make the necessary, straightforward adjustments. Once the errors have been fixed, the population values might be normalised instead of being randomly distributed. As a first step in normalisation, the Z-score is calculated. The Z-score is determined using the following equation:

$$X=\left[\left(Z-\mu \right)/\sigma \right]$$
(1)

The standard deviation of a population may be easily calculated. If the mean and standard deviation are not known, then the sample mean and deviation can be represented by Eq. (2)..

$$X=\frac{Z-\overline{Z}}{S }$$
(2)

\(\overline{Z }\) mean value, and S the value of the standard deviation were computed. When doing normalisation, the regression analysis method is utilised to correct errors and provide results that are comparable to the input numbers. The Eq. (3) depicts a simple linear regression model Y that may be utilised for this.

$$Y={a}_{0}+{a}_{1}X+\in$$
(3)

The random model is in the form of,

$${Y}_{i}={a}_{0}+{a}_{1}{X}_{i}+{\in }_{i}$$
(4)

where \({\in }_{i}\) is the different errors, and it is reliant on the \({\sigma }^{2}\)

The residuals that can be shaped. \(\sum_{i=1}^{n}{\widehat{\in }}_{i}\) are the remainders:

$$\sum_{i=1}^{n}{\widehat{\in }}_{i}=0$$
(5)
$$\sum_{i=1}^{n}{\widehat{\in }}_{i}{z}_{i}=0$$
(6)

Once this Eq. (7) is used, matrix (H) may be strongminded,

$$H=Z*{\left({z}^{T}z\right)}^{{-1z}^{T}}$$
(7)

The variance for the Hat matrix is characterized in Eq. (8)

$$Var\left({\widehat{\in }}_{i}\right)={\sigma }^{2}(1-{h}_{ii})$$
(8)
$$Var\left({\widehat{\in }}_{i}\right)={\sigma }^{2}\left(1-\frac{1}{n}-\left[\left({z}_{i}-{\overline{z} }^{2}\right)/\sum_{j=1}^{i}\left({z}_{j}-{\overline{z} }^{2}\right)\right]\right)$$
(9)

The next step is to use the standard deviation to make the variable's change comparable to other variables. To calculate the moment scale deviation, one might use the following formula.,

$$K=\frac{{\mu }^{k}}{{O}^{k}}$$
(10)

In Eq. (11), k signifies the instant scale

$${\mu }^{k}=E(z-\mu {)}^{\bigwedge }K$$
(11)

where \(z\) is an accidental variable, and \(E\) is the predictable value

$${O}^{k}=\left(\sqrt{E((z-\mu {)}^{\bigwedge }{K}^{\wedge }2)}\right)$$
(12)

In order to get a more uniform distribution, the mean m is used, and this is especially true for the normal ordered distribution.

$${C}_{v}=\frac{S}{\overline{z} }$$
(13)

where \({C}_{v}\) is the variance coefficient shown by Eq. (13). The final form of the normalised information will be X'.

$${z}{\prime}=\frac{(z-z)}{\left({Z}_{max}-{Z}_{min}\right)}$$
(14)

After data is normalised, the range and variability may be standardised and equalised. In most situations, redundant data should be minimised or deleted. The normalised information can then serve as a feed for further operations.

5 Methods

5.1 System Model

Before feeding it into perfect, the data undergoes some form of preparation. The output of the preprocessing phase is a map, which is then fed into DenseNet and ultimately into LSTM. Therefore, the proposed model uses food data for training and then predicts future food production.

5.2 Deep Learning Architecture

5.2.1 DenseNet

One potential issue with increasing network depth is that input or gradient information may be lost by the time it reaches the output nodes. Several investigations are underway to find a solution, and they all share the same characteristic of skipping the intermediate layer on the way to the final one. [39], a novel concept was introduced to convolutional networks, proposing a concentrated sum of parameters. A dense connection is presented in Fig. 2 as a means of strengthening the information flow between layers by continually linking layers.

Fig. 2
figure 2

A case of extensive interconnection. The square indicates multi-channel input feature maps. The line is a channel-wise concatenation of feature maps from preceding and succeeding layers

The neural network DenseNet has a dense block component and a changeover layer. The dense region includes a rate of expansion. DenseNet's feature maps across layers are joined by channel-wise concatenation, although this approach might result in excessively large network parameters, which in turn reduces computation speed. The author of DenseNet used a hyperparameter, and the DenseNet structure was subjected to a nonlinear transformation consisting of Batch Normalisation Normalisation (BN) > Rectified Linear Unit (ReLU) > Conv(3 3). Figure 3a depicts the bottleneck layer. Again, it's a tool for streamlining computation by decreasing the amount of feature maps required as input.

Fig. 3
figure 3

a Bottleneck Layer b Transition Layer

If the input is positive, the ReLU's output will be a positive number, and if it's negative, the function will return zero. this allows for independent batch normalisation of each feature map. To boost computational performance, we employ 11 convolution to minimise the amount of feature maps.

As can be seen in Fig. 3b, the transition layer is accountable for decreasing the feature maps' width, height, and total number of maps. It comprises of the steps BN > ReLU > Conv(1 1) > Avg pool(2 2), and it is linked behind the dense block. The hyperparameter value between 0 and 1, called the compression factor, is used at this point to determine how much to compress the feature map. If this value is 1, there will be no change in the total number of feature maps. DenseNet also used the layer's composite function, which is composed of the orders BN, ReLU, and Conv, based on the efficiency findings of the aforementioned [40] experiment.

5.2.2 Long Short-Term Memory (LSTM)

LSTM is a solution to the long-term dependence problem [41]. LSTM is a specialised structure of RNN. When knowledge from the past is not passed forward to the conclusion, we have a problem known as long-term reliance. Since LSTM avoids these pitfalls, it excels not just at analysing and predicting brief words but also at handling longer data types, including audio, video, and time-series (Figs. 4, 5, 6, 7, 8).

Fig. 4
figure 4

Graphical Comparison of proposed pre-trained models

Fig. 5
figure 5

FPR Analysis

Fig. 6
figure 6

Representation of the proposed model in terms of F1-score

Fig. 7
figure 7

Graphical comparison of Proposed Hybrid Optimization based on various DL models

Fig. 8
figure 8

F1-score analysis of various classifiers based on AROA

The LSTM's "cell state" is its most fundamental component. Information is added and subtracted at the gate and sent on to the next level as the cell state moves like a conveyor belt. It also allows the past to have an immediate impact on the results to come. There are essentially four stages to LSTM. Equation (15) describes the first forget gate layer. The sigmoid layer uses this step to choose what data to discard. The LSTM unit takes in a vector with the form x_tR_d as input and outputs a vector with the form h_(t-1)(0,1) as the prior hidden state. During model training, it is also necessary to optimise the forget layer's weight matrices (W_fRd) and bias vector parameters (b_fRd). The LSTM unit will choose a value from the range 0–1 from the sigmoid function's return (y-axis), where s is a sigmoid function. The input gate layer (16) and the tanh layer (17) make up the second stage. The tanh layer generates a new contender value of C _t, which is a cell input activation vector based on the values selected for updating by the input gate layer. At last, the cell state is updated with the sum of the two layers' values. Third, using the previous state as a starting point and updating it in accordance with Eq. (18), a new cell state is created. The data that was chosen to be forgotten first goes through the forget gate, and then the data that was selected for addition is attached. Using the output gate layer of Eq. (19), the final step is to select a value to be output. To calculate the output, first, identify the region of the cell state that will be exported then multiply this value by the one acquired from the cell state, as indicated in Eq. (20).

$${f}_{t}=\sigma \left({W}_{f}.\left[{h}_{t-1},{x}_{t}+{b}_{f}\right]\right)$$
(15)
$${i}_{t}=\sigma \left({W}_{i}.\left[{h}_{t-1},{x}_{t}+{b}_{i}\right]\right)$$
(16)
$${\widetilde{C}}_{t}=tanh\left({W}_{c}.\left[{h}_{t-1}{x}_{t}\right]+{b}_{c}\right)$$
(17)
$${C}_{t}=\left({f}_{t}\times {C}_{t-1}+\left({i}_{t}\times {\widetilde{C}}_{t}\right)\right)$$
(18)
$${O}_{t}=\sigma \left({W}_{o}.\left[{h}_{t-1},{x}_{t}\right]+{b}_{o}\right)$$
(19)
$${h}_{t}={o}_{t}\times tanh\left({C}_{t}\right)$$
(20)

In this research work, the weight of the LSTM model is optimally selected by the hybrid optimization model that is described below.

5.2.2.1 Mathematical Model of Hybrid Optimization

The conventional algorithms ROA and AOA, as well as the suggested technique, are discussed here. A-ROA.

5.3 Rider Optimization Algorithm

The concept of ROA [42] is built on a race to the finish line mid the group of riders. The four-rider sets of "bypass rider," "follower," "overtaker," and "attacker" are used to update the solution. The entire number of riders is divided into four-rider sets, each of which has its own unique winning strategy. In this case, the bypass rider looks at the most direct route to the destination. The trailing riders go in lockstep behind the leader. It doesn't depend on anything else to go where it's going; it just goes there on its own. The attacker's top speed is calculated based on the rider's relative positioning relative to the destination. There are 7 stages to the ROA process, and they are outlined here.

Produce the rider group and limit: The four groups of riders and their order within the ROA are selected at random. Equation (21) depicts the group's initiation.

$${B}_{n}=\left\{{B}_{n}(x,y)\right\};1\le x\le E;1\le y\le F$$
(21)

Here, E represents the total sum of riders, F represents the total duration of the optimisation problem, n represents a single instant in time, and the location of the xth rider is represented by Bn at that point in time. As shown in Eq. (22), the sum of riders in the race is taken into account based on the sum of the riders in each category.

$$E=L+F+O+A$$
(22)

The variables \(L,F,O\) and A are used in Eq. (22) to represent the total number of riders as "bypass riders," "followers," "over-takers," and "attackers," respectively. After the team has been set up, we can then collect the rider's "steering," "brake," "accelerator," and "gear" characteristics. S denotes the \(navigation angle at time n as calculated by Eq. (23).\)

$${S}_{n}=\left\{{S}_{x,y}^{n}\right\};1\le x\le E;1\le y\le F$$
(23)

In Eq. (23), the triangulation angle of the xth vehicle is given by \({S}_{x,y}^{n}\). The early.

navigation angle is calculated at the 0th period using Eq. (24).

$${S}_{x,y}=\left\{\begin{array}{c}{\theta }_{n}; if y=1\\ {S}_{x,y-1}+\phi ; if y\ne 1 and {N}_{x,y-1}+\phi \le 360\\ {S}_{x,y-1}+\phi \le 360; otherwise\end{array}\right.$$
(24)

The term \({\theta }_{i}\) is uttered as the xth rider’s site angle, and \(\phi\) is represented as the coordinate angle.

To measure success rate: After "initialising the group and rider parameters" is complete, the success rate of each rider may be calculated using distance, as shown in Eq. (25).

$${s}_{x}=\frac{1}{\Vert {B}_{x}-{J}_{Q}\Vert }$$
(25)

Here, the target site is given \({J}_{\mathrm{Q}}\), and the xth rider site is signified as \({B}_{\mathrm{x}}\).

To declare the successive rider: Here, the rate of success in meeting the next rider is crucial. A rider is regarded to have the "highest success rate" and to be the subsequent rider if he or she travels the shortest distance between starting point and final destination. This frontrunner shifts and evolves with time, so it is impossible to predict its behaviour.

1. Renew the location of the bypass rider: The location update is done arbitrarily to avoid the rider, as demonstrated in Eq. (26) because the rider is ignoring the typical route.

$${B}_{n+1}^{M}\left(x,y\right)=\delta \left[{B}_{n}\left(\eta ,y\right)*\beta \left(y\right)+{B}_{n}\left(\xi ,y\right)*\left[1-\beta (y)\right]\right]$$
(26)

Here, the random value ξ lies in the intermission of 1 to E , the random value β lies in the intermission of 0 to 1, the random value δ lies in the intermission of 0 to 1, and value η lies in the intermission of 1 to E.

$${B}_{n+1}^{N}\left(x,cs\right)={B}^{J}\left(J,cs\right)+\left[cos\left({S}_{x,cs}^{n}\right)*{B}^{J}\left(J,cs\right)*{u}_{x}^{n}\right]$$
(27)

Here, the coordinate selector is indicated by cs, \({B}^{J}\) is the foremost rider position, \({u}_{x}^{n}\) the “distance travelled by the xth rider” and J the directory of the leading rider.

2. Modify the site of the overtaker: The "direction indicator," "coordinate selector," and "relative success rate" are the three most important components in the process of updating the overtaker position. The formula for the overtaker's position being updated is given in Eq. (28).

$${B}_{n+1}^{V}\left(x,cs\right)={B}_{n}\left(x,cs\right)+\left[{di}_{n}^{k}\left(x\right)*{B}^{J}(J,cs)\right]$$
(28)

3. Modify the location of the attacker: The leader is the primary target, and the location of the assailant is established by looking at where the followers are.

$${B}_{n+1}^{M}\left(x,y\right)={B}^{J}\left(J,cs\right)+\left[cos\left({S}_{x,cs}^{n}\right)*{B}^{j}\left(J,cs\right)*{u}_{x}^{n}\right]$$
(29)

Here, the term \({S}_{x,}^{ can}\) is denoted as xth rider of the ith coordinate and \({B}_{n+1}^{M}\left(x,y\right)\) denotes the site of the rider.

  • Once all riders' positions have been updated and a new leader has been identified, the success rate may be recalculated.

  • Recover the rider constraints with an improved solution: In order to obtain the optimal solution, it is necessary to change the rider settings. The activity counter must be updated together with the "gear values and steering angle" to meet the new limits.

  • End process: In order to determine who wins the bike race, the process must be done over and over again until the cutoff time (Toff) has passed.

5.4 Arithmetic Optimization Algorithm

AOA is inspired by the arithmetic operatives used to address mathematical issues [43]. This method uses basic arithmetic operations such as adding, subtracting, dividing, and multiplying. It's a mathematical model, and it all began with the optimisation process's randomly generated collection of candidate solutions, denoted by the letter B. With the help of Eq. (30), we can initiate the solutions.

$$B=\left[\begin{array}{ccc}{b}_{\mathrm{1,1}}& K& \begin{array}{ccc}{b}_{1,i}& {b}_{m-\mathrm{1,1}}& {b}_{1,m}\end{array}\\ {b}_{\mathrm{2,1}}& K& \begin{array}{ccc}{b}_{2,i} & K & {b}_{2,m}\end{array}\\ \begin{array}{c}M\\ {b}_{M-\mathrm{1,1}}\\ {b}_{M,1}\end{array}& \begin{array}{c}O\\ K\\ K\end{array}& \begin{array}{c}\begin{array}{ccc}M & O & M\end{array}\\ \begin{array}{ccc}{b}_{M-1,i}& M& {b}_{M-1,m}\end{array}\\ \begin{array}{ccc}{b}_{M,m-1}& {b}_{M,m-1}& {b}_{M,m}\end{array}\end{array}\end{array}\right]$$
(30)

After that, we picked the search phase and the exploitation phase using the "Math Optimizer Accelerated (MOA)" we calculated. In order to determine the MOA coefficient, we use Eq. (31).

$$moa\left({cr}_{itn}\right)=min+{cr}_{in}\times \left(\frac{max-min}{{MX}_{in}}\right)$$
(31)

The minimal values of the accelerated function are denoted by min, and the highest values by max. The current iteration, given by cr_itn, is between 1 and MX_in, where MX_in is the maximum sum of repetitions. In this case, the value of the function at the n-th iteration is expressed as moa(cr_itn). Next, the arithmetic operators indicated in Eq. (32) to update the candidate's location are simulated as part of the rule-based exploration phase.

$${b}_{x,y}\left({cr}_{in}+1\right)=\left\{\begin{array}{c}BS\left({b}_{y}\right)\div \left(mop+\varepsilon \right)\times \left(\left({ub}_{y}-{lb}_{y}\right)\times \mu +{lb}_{y}\right) {R}_{1}<0.5\\ BS\left({b}_{y}\right)\times mop\times \left(\left({ub}_{y}-{lb}_{y}\right)\times \mu +{lb}_{y}\right) otherwise\end{array}\right.$$
(32)

Here, the term \({b}_{x,}\left({}_{ screen}\right)\) indicates xth the solution at the the position in the current iteration, \({b}_{x}\left({cr}_{in}+1\right)\) indicates the xth solution at the next iteration, and \(BS\left({b}_{y}\right)\) by is the the position in the best-obtained solution so far. uby stands for the highest value and lb_y for the lowest value at the yth position, where u is a small integer number. Math Optimizer Probability (MOP) is denoted by mop, which indicates the analysed control parameter and Eq. (33) is used to calculate mop.

$$mop\left({cr}_{in}\right)=1-\frac{{cr}_{itn}^\frac{1}{a}}{{MX}_{itn}^\frac{1}{a}}$$
(33)

The term \(mop\left({ cr}_{in}\right)\) is the function value of a coefficient, and crime indicates the current iteration, and \({MX}_{\mathrm{itn}}\) expresses the limit on the possible number of repetitions. The "sensitive parameter and defines the exploitation accuracy over the iterations" is denoted by the letter a. If the MOA function value is satisfactory, then the search will enter the exploration phase. \({R}_{1}<moa(c{r}_{itn})\). Then, the exploitation phase takes place, and two methods are used to obtain a better solution for TSP, as given in Eq. (34). These methods involve the use of operators such as subtraction and addition in the search space.

$${b}_{x,y}\left({cr}_{itn}+1\right)=\left\{\begin{array}{c}BS\left({b}_{y}\right)-\left(\left({ub}_{y}-{lb}_{y}\right)\times \mu +{lb}_{y}\right) {R}_{3}<0.5\\ BS\left({b}_{y}\right)+mop\times \left(\left({ub}_{y}-{lb}_{y}\right)\times \mu +{lb}_{y}\right) otherwise\end{array}\right.$$
(34)

Here, the final information takes residence for procurement of the ideal solution.

5.5 Hybrid A-ROA

Mathematical operations like addition, subtraction, division, and multiplication are used to create the standard AOA. This approach uses a small set of parameters while still resolving issues with getting stuck on local optimums. However, it can't handle problems with more than two binary or discrete objectives in the optimisation setting. Further, the hybridised approach may be employed to improve the effectiveness of the projected model. Therefore, a hybrid optimisation approach known as A-ROA is created by uniting the well-known ROA and AOA optimisation methods. In order to boost the proposed model's efficiency when solving the TSP, ROA is incorporated into AOA due to its many benefits, including the provision of high accuracy as a result of the rider position being randomly designated in the search space and the presence of the key. The traditional AOA's method of updating solutions is based on the operators throughout exploration and exploitation; here is where the proposed A-ROA excels. The proposed A-ROA updates the site of the solution using the first rule phase in the conventional AOA if condition 5.0 R2 is fulfilled; otherwise, the update of ROA is employed. Pseudo-code for the projected A-ROA may be found in Algorithm 1.

figure a

5.5.1 Hybrid Model

The research suggests a hybrid model that takes elements from both DenseNet and the modified LSTM (MLSTM). The foundation of the suggested model is based on DenseNet. In order to include sequence info into the feature map, we propose a hybrid perfect that employs a sigmoid function for classification and use the feature map as input data for MLSTM. The Conv layer processes the incoming data and generates a feature map with a growth rate twice as fast as the original. Next, in order to maintain the same overall size of the feature map, Conv(3 3) performs 1-pixel zero-padding in all 3-layer dense blocks. The transition layer is employed following the dense block. The feature map is average pooled and converted to a smaller size by the transition layer using the Conv(1 1). Finally, pooling is employed to generate and feature map as a 1-dimensional vector rather than a fully linked layer, which would lead to excessive parameter inflation. Then, data is reshaped into an MLSTM-friendly input format before being fed into the system. Finally, the Sigmoid function is used to divide the MLSTM-generated features into interictal and preictal categories. Table 1 displays the organisational details.

Table 1 Construction of DenseNet-LSTM

6 Experimentation, Results and Discussion

6.1 Experimental Setup

Workstation setup, DenseNet-MLSTM's hyperparameters, experimental approaches, and evaluation metrics are all outlined here. The CPU was an AMD Ryzen 7 3700X, and RAM was 64 GB, as stated in Table 2.

Table 2 Workstation Conformation

GeForce RTX 2080 Ti was used as the GPU in the suggested model's training. Python 3.6, Tensorflow 1.14, and Keras 2.2.4 are used in the software's experimentation. DenseNet-MLSTM's hyperparameters were adjusted so that its expansion rate was 32 and its compression factor was 0.5. ReLU was utilised as the optimizer, and a learning degree of 0.01% was used.

6.2 Parameter Evaluation

FPR, as well as F1-score derived as given in Table 3, are utilised as presentation indicators to evaluate the model's meal prediction performance. The term "accuracy" refers to the percentage of a dataset that was properly labelled. Sensitivity is the fraction of normal data that were correctly anticipated as normal. The false positive rate (FPR) is foretold by the real false among false data, and the specificity is the proportion of false positives predicted by the actual false among false data. Accuracy measures how well actual values match predictions. The F1-score is the arithmetic mean of the recall and accuracy scores.

Table 3 Metrics for evaluation

6.3 Validation Analysis

In this research, 70% of the data was used to train the replicas, and then the perfect with the best prediction power was designated. Following the training phase, the remaining 30% of the data was used to assess the predictive presentation of the representations, and the models' accuracy was quantified and compared using accuracy measures. According to Table 4, the inputs of quantity are represented by the variables xt1, xt2, and xt3, which stand for the number of living animals, the number of slaughtered, and the livestock yield, while the inputs of agricultural production measure are represented by the variables xt4 and xt5, which stand for the number of crops harvested and the number of crops lost. The present model's outputs are, in other words, agriculture making.

Table 4 The equipped dataset for time-series forecast

6.3.1 Training Results

Above, we see that 70% of the data is used for model training. Three separate times, we put the trained models through their paces with varying numbers of neurons. The MLSTM model may be fine-tuned to provide the most accurate model by adjusting the number of neurons. As can be understood in Table 5, the MLSTM perfect was trained using datasets with 10, 14, and 18 neurons. Models with ten neurons performed best for forecasting livestock output, while models with eighteen neurons performed best for predicting agricultural production because their respective RMSEs were lower.

Table 5 RMSE numbers for proposed MLSTM models with diverse amounts of neurons in the training step

6.3.2 Comparison of Proposed Hybrid Model

The existing pre-trained techniques [44,45,46,47,48] considered different data for food prediction; therefore, the models are considered and implemented by using the dataset, and then the consequences are averaged in Tables 6, 7 and 8.

Table 6 Analysis of Pre-trained Classifier
Table 7 Analysis of Proposed Hybrid Model without A-ROA
Table 8 Analysis of the proposed hybrid model with A-ROA

In the above Table, 6 represent the Analysis of a Pre-trained Classifier. The analysis comparison of the LeNet model reached an accuracy of 89.89 and also a sensitivity analysis of 80.79 and another Specificity ratio of 98.98 and another FPR value of 0.10, and finally, the F1-score value of 0.877, respectively. Another ResNet model reached an accuracy of 91.47 and also a sensitivity analysis of 82.94 and another Specificity ratio of 100 and another FPR value of 0.15, and finally, the F1-score value of 0.897, respectively. After that, the VGGNet model reached an accuracy of 90.46 and also a sensitivity analysis of 89.8 and another Specificity ratio of 91.11 and another FPR value of 0.089, and finally, the F1-score value of 0.9, respectively. AlexNet model reached an accuracy of 91.01 and also a sensitivity analysis of 99.05 and another Specificity ratio of 82.97 and another FPR value of 0.17, and finally, the F1-score value of 0.933 respectively. DarkNet model reached an accuracy of 93.23 and also a sensitivity analysis of 95.72 and another Specificity ratio of 90.73, 0.092, and finally, the F1-score value of 0.936, respectively. And finally, the DenseNet model reached an accuracy of 96.6 and also a sensitivity analysis of 95.41 and another Specificity ratio of 97.79 and another FPR value of 0.022 and finally, the F1-score value of 0.963, respectively.

The other DL models from [49,50,51,52,53] are used for food prediction, and it is compared with LSTM and MLSTM model.

The analysis of the projected hybrid model without A-ROA is exposed in Table 7 above. According to the analysis of the MLP model, the accuracy was 80.54, the sensitivity was 81.97, the specificity was 79.12, the FBR value was 0.208, and the F1-score was 0.817. The AE model then achieved an accuracy of 94.32, a sensitivity of 97.82, specificity of 90.83, and FBR values of 0.091 and 0.946, respectively. The DBN model then achieved the following results: accuracy of 90.78, sensitivity of 98.54, specificity of 98.99, FBR value of 0.03, and finally F1-score value of 0.987. Following that, the CNN model achieved the following results: accuracy of 97.29, sensitivity of 96.56, specificity of 98.02, FBR value of 0.02, and finally F1-score value of 0.972. The RNN model then attained an accuracy of 95.91, sensitivity of 94.39, specificity of 97.43, FBR value of 0.025, and finally, F1-score value of 0.953, all of which are in accordance with each other. The LSTM model then achieved the following values for accuracy, sensitivity, and specificity: 92.69 for accuracy, 95.41 for sensitivity, 97.79 for specificity, 0.022 for FBR, and 0.963 for F1-score, respectively. Following this, the DenseNet-MLSTM model achieved an accuracy of 95.41, sensitivity of 93.81, specificity of 100, FBR value of 0.03, and finally, F1-score value of 0.952, correspondingly.

Table 8 above shows the investigation of the recommended hybrid model with A-ROA. The investigation of the MLP model yielded the following results: accuracy of 91.05, sensitivity of 88.19, specificity of 93.9, FBR value of 0.106, and finally F1-score value of 0.901, correspondingly. The AE model then achieved an accuracy of 94.32, a sensitivity of 97.82, specificity of 90.83, and FBR values of 0.091 and 0.946, respectively. The DBN model then attained the following values for accuracy, sensitivity, and specificity: 98.76 for accuracy, 98.54 for sensitivity, 98.99 for specificity, 0.03 for FBR, and 0.987 for F1-score, respectively. Following that, the CNN model achieved the following results: accuracy of 97.29, sensitivity of 96.56, specificity of 98.02, FBR value of 0.02, and finally F1-score value of 0.972. The RNN model then attained an accuracy of 95.91, sensitivity of 94.39, specificity of 97.43, FBR value of 0.025, and finally, F1-score value of 0.953, all of which are in accordance with each other. The LSTM model then achieved 96.6 per cent accuracy, 95.41 per cent sensitivity, 97.79 per cent specificity, 0.022 per cent FBR, and 0.963 per cent F1-score, respectively. The DenseNet-MLSTM model then achieved an accuracy of 99.82, sensitivity of 99.65, specificity of 100, FBR value of 0.01, and finally, F1-score value of 0.998, correspondingly.

7 Conclusions and Future Work

With a growing global population comes a greater need for food, and every day, more and more people are being forced to go hungry. One of the most imperative industries for mankind is the food and farming sector. Food security for future generations is an issue that is being planned for and prepared for by governments and organisations working in the food business. Food is mostly supplied through domestic production and imports to accomplish food security goals. Therefore, the first step in ensuring a nation's population has access to nutritious food is to assess the nation's agricultural potential. Predicting food production helps policymakers and activists in the agriculture and food sectors make more informed decisions in the long and near term. This work aimed to fill that gap by offering a model with strong predictive ability for forecasting agricultural output. This research looked ahead 10 years and made projections for Iran's agricultural and livestock output. Results indicate that in the coming decade, Iran will produce more food and cattle than it has in the past.

Planned production volumes, budgets, agricultural subsidies, and the number of people actively employed in agriculture and animals can all benefit from this study's conclusions. Further, with the help of projections, policymakers may arrange for the import of essential food production and the export of local surplus production. To solve issues in the food and agriculture industries, such as the prediction of crop yields, experts have turned to deep learning. However, studies to forecast food production do not exist. This research utilised deep learning algorithms to make predictions about Iranian food and livestock goods. The DenseNet method is used in the hybrid model to boost computational efficiency, improve information flow inside the network, and solve the CNN problem suggested in this research. LSTM models are utilised for long-term sequences, with the weights being chosen using a hybrid optimisation approach called A-ROA. Two popular algorithms, AOA and ROA, were combined to create this hybrid method.

This research aids future food security studies by offering a reimbursable resource for forecasting agriculture and animal output. This model may be used by researchers and policymakers to foretell a region's future food security. Accordingly, the present study suggests utilising the presented model to anticipate food production in various nations and give suitable methods to prevent food insecurity in future research. Forecasts for agricultural and animal output in this study are limited by the fact that they are based only on data, while external factors like climate, policies, and technology advancements are assumed to remain constant.