Temporal fusion transformer-based prediction in aquaponics

Aquaponics offers a soilless farming ecosystem by merging modern hydroponics with aquaculture. The fish food is provided to the aquaculture, and the ammonia generated by the fish is converted to nitrate using specialized bacteria, which is an essential resource for vegetation. Fluctuations in the ammonia levels affect the generated nitrate levels and influence farm yields. The sensor-based autonomous control of aquaponics can offer a highly rewarding solution, which can enable much more efficient ecosystems. Also, manual control of the whole aquaponics operation is prone to human error. Artificial Intelligence-powered Internet of Things solutions can reduce human intervention to a certain extent, realizing more scalable environments to handle the food production problem. In this research, an attention-based Temporal Fusion Transformers deep learning model was proposed and validated to forecast nitrate levels in an aquaponics environment. An aquaponics dataset with temporal features and a high number of input lines has been employed for validation and extensive analysis. Experimental results demonstrate significant improvements of the proposed model over baseline models in terms of MAE, MSE, and Explained Variance metrics considering one-hour sequences. Utilizing the proposed solution can help enhance the automation of aquaponics environments.


Introduction
The increasing global population has led to an increase in food demand, putting a strain on traditional agriculture.This demand is further exacerbated by climate change, leading to challenges in water availability [1].The adoption of chemical fertilizers and pesticides in traditional agriculture has helped increase crop yields but has also had negative impacts on the environment and human health.The excessive use of fertilizers and pesticides has led to soil degradation, water pollution, and health problems, leading to debates worldwide about the sustainability of agriculture.Organic agriculture proposes a healthier alternative to reduce the impact of pesticides but provides less food compared to traditional agriculture [2,3].In recent years, there has been a growing interest in sustainable farming practices that can address these challenges, such as the use of aquaponics.
Aquaponics is a new farming method that has gained popularity due to its benefits over traditional agriculture.It eliminates the need for synthetic fertilizers and pesticides, making it an organic farming method.The aquaponics technique proposes a more complex yet highly rewarding alternative to conventional agriculture using a combination of hydroponics, fish farms, and bacteria.In aquaponics, fish waste provides nutrients to the plants, which in turn filters the water and provides clean water back to the fish.Fish waste produces ammonia, which is toxic to fish but can be converted into nitrate by beneficial bacteria.Figure 1 depicts a sample aquaponics ecosystem.Hydroponic farms need nutritious water to reproduce.Fish waste contains ammonia, which can be collected through water pumps and turned into nitrate using beneficial bacteria.
The only input to the aquaponics ecosystem is fish food.Many existing aquaponics ecosystems require careful maintenance through human intervention, which can be time-consuming and labor-intensive.Keeping pH between certain levels, feeding time for fish, and maintaining temperature [4] are all things that must be periodically checked.This manual maintenance can be more challenging for larger aquaponics systems that require more frequent monitoring.With the developments in the Internet of Things (IoT) technology, there is now an opportunity to reduce human intervention and improve food yields by implementing a smart environment capable of mass production.
To predict nitrate levels in an aquaponics environment accurately, highly accurate prediction models are needed.While several models are possible using traditional machine learning algorithms, their performance needs to be improved.Therefore, this study utilized the most recent machine learning and deep learning algorithms.The algorithms include Long Short-Term Memory (LSTM), Encoder-Decoder LSTM, Attention LSTM, Extreme Learning Machine (ELM), and Temporal Fusion Transformer (TFT).Since LSTM models are effective in capturing temporal dependencies, which are often present in time series data, they have been successfully used for several problems involving time series data [5].Recently, a new deep learning algorithm named Transformer has been developed, and several variations of this algorithm have been implemented [6].For longer sequences with complex dependencies, transformer-based models provide relatively better performance than LSTM models.To the best of our knowledge, Temporal Fusion Transformers have not been applied in this problem before.Therefore, we aimed to utilize these algorithms in this research.Since the main objective is to achieve a highly accurate prediction model, state-of-the-art machine learning algorithms were applied and compared.
The contributions of the study are given as follows: • A novel deep learning model has been developed for forecasting aquaponics.The model comprises a TFT-based solution.• The TFT network improves the hourly forecasting performance for the aquaponics environment over the previous works regarding Mean Absolute Error (MAE) and Explained Variance.• The high-performing forecasting accuracy provides opportunities for automated processes.
The paper is organized as follows: Sect. 2 provides the related work.Section 3 explains the methods, including the ELM, LSTM, Encoder-Decoder networks, the TFT technique, and with Evaluation Metrics.Section 4 presents the dataset and the experimental results.Section 5 includes the discussion, and Sect.6 concludes the paper.[9].The solution only analyzes calcium, sulfate, and phosphate and can be expanded to include other heavy metals such as iron, copper, and zinc.Another limitation of the work was that, while there was a high-dimensional feature space, the observation size was small, resulting in reduced prediction performance.Advancements in modern cameras improved industrial image quality in the last decade and enabled more powerful image processing techniques to be applied in the management of aquaponics.Handling these images through inference is still a challenging topic, as these systems consist of several low-capacity IoT devices.Another solution requires server-based handling and fast communication technologies.With the adoption of 5 G, improved data transfer speeds provide better opportunities to establish autonomous aquaponics systems.Kumar et al. provided an end-to-end aquaponics system to detect anomalies in the physical conditions of fish [10].The fish tanks are periodically imaged and subjected to classification through Bayesian classification.The tanks are connected through 6LowPAN, which provides the bandwidth for image transfer.Another advantage of vision systems has been that the stage of growth can become monitorable and the study proposed by Lauguico et al. has shown that crop yielding can be assisted by Machine Learning (ML) based algorithms [11].The analysis lacks number of observations and does not provide an autonomous handling and resolution strategy.
In recent years, studies have shown great opportunities for handling both aquaculture, hydroponics and aquaponics anomalies through time-series-based analysis and future forecasting.Cardenas et al. proposed an RNN-based solution to forecast sudden changes in pH using Recurrent Neural Networks (RNNs) [12].Thai-Nghe et al. conducted univariate time-series analysis to monitor water quality in real-time [13].The study has shown that the LSTM algorithm can produce better results against baseline ML methods when tackled with univariate representations.Liu et al. implemented a water quality forecasting framework using a Bi-directional Stacked Simple Recurrent Unit (Bi-S-SRU) [14].The Bi-S-SRU framework, compared to a vanilla RNN, shows improved forecasting accuracy in terms of longer sequences while also providing good inference time.Both the fish and the plant need certain conditions for a healthy aquaponics environment.Thus, nutrient-based analysis to detect input anomalies in the environment is an important problem.Dhal et al. provided the much needed research with proportional nutrient data analysis [15].The researchers deployed an IoT-based aquaponics laboratory and collected inputs with high-dimensional feature space.Still, the research lacked a proper number of observations and is supported by data aggregation techniques to enable AI-based assistance.The drawback of having small datasets is investigated in other studies using baseline ML algorithms [16,17].While the studies offered a benchmarking viewpoint on small datasets, the general applicability of the results remained low.
Table 1 presents the relevant studies and the current research.It is clearly seen that most papers used proprietary datasets instead of public datasets.

Materials and methods
In this section, the proposed TFT model as well as the baseline techniques leading to the development of the proposed model (LSTMs, encoder-decoder networks, the attention concept) are briefly presented.The proposed model is also compared with the Extreme Learning Machine (ELM) as another baseline method.ELM is a simple and fast-converging algorithm, which can excel at representing complex datasets.Still, the need to employ high number of neurons makes the algorithm slower on inference time.Liu et al. demonstrate the power of ELM against LSTM in the estimation of photovoltaic power, where the ELM algorithm is both more accurate and also computationally more efficient [18].LSTM is appropriate for modeling sequences with long-term dependencies, whereas encoder-decoder networks are appropriate for modeling complicated sequences with variable-length input and output.Encoder-decoder networks function better when attention mechanisms are used, and the Temporal Fusion Transformer is a neural network architecture that was specifically designed for time series forecasting tasks.It combines the strengths of LSTMs, attention mechanisms, and transformers to produce accurate and robust predictions for multivariate time series data.The inefficiency of simpler models like LSTM or GRU in comparison with more sophisticated models like TFT is also demonstrated by research that estimates energy usage [19].
The quality of the data and how it is represented determines how well time series analysis turns out.The characteristics of the dataset and the procedures to prepare it for time forecasting models are described in the section on the dataset and data preparation.The section ends with an explanation of the evaluation metrics applied in the study.This entire method process is illustrated in the workflow diagram in Fig. 2.
Aquaponics environments, when integrated with several sensors, produce a series of time-stamped measurements such as levels of nitrite, ammonia, and pH.A fusion-based transformer deep learning model to perform a precise forecast of 1 3 nitrate levels for the upcoming time window using historical data is proposed.From a machine learning perspective, the problem can be classified as a regression problem.Classic machine learning techniques are inadequate for modeling the complexity of the problem due to the high input dimension; instead, more complex models are required.

Dataset
This study utilized the sensor-based aquaponics dataset proposed by Ogbuokiri et al. [20].This dataset was selected because it is a recent dataset, has highquality data points with reliable sensor measurements, includes several relevant parameters, is easily accessible, and has a suitable frequency of data collection.The prediction of nitrate levels in aquaponic systems was performed for the first time using a time-stamped dataset with a high-dimensional feature space.The dataset contains 6 parameters of water quality sensors (i.e., temperature, turbidity, dissolved oxygen, pH, ammonia, nitrate), time, and physical conditions of fish (i.e., length, width, population).The dataset's default data collection interval is five seconds and contains sensor data for nine freshwater catfish ponds.Each sensor was initially calibrated in accordance with industry standards before being tested [20].The trustworthiness of the data and the lack of a prior timestamped investigation led to the selection of the proposed dataset.The basic statistical analysis of dataset parameters is depicted in Table 2. Temporal fusion transformer-based prediction in aquaponics

Data preparation
Normalization is required if the features have drastically different values.The Aquaponics dataset shows a high variational difference between parameters, which leads to some features being dominant, as shown in Fig. 3.The proposed features are normalized to similar sizes because they should be equally important for estimating the nitrate level in an aquaponic system analysis.Without normalization, the training model could blow up with NaNs if the gradient update is too large.By defining a unique effective learning rate for each feature, optimizers like Adagrad and Adam provide protection against this problem.Still, ELM is not a gradient-based algorithm, meaning that the optimizer functionality cannot be utilized.Unlike gradientbased models, high-variance input data causes input saturation in the ELM model, where the activation function gets saturated at spiked values, limiting the model's capacity to understand the underlying patterns in the data.This can be addressed Fig. 3 Box-plot for aquaponics dataset before normalization by applying normalization techniques to the dataset.According to studies [21,22], min-max normalization performs better than its counterparts in time-series-based analysis.Each piece of input is normalized to a value between 0 and 1, which minimizes the impact of noise and guarantees that neural networks update parameters effectively, accelerating the training of the network.Therefore, min-max normalization is utilized in the study as given in Eq. [1] to normalize the features.
X shows the input variable; min and max values point to the lowest and highest points present in the series, and X indicates the normalized value.The dataset also contained missing values with a percentage of around 0.001%.These missing values have been cleaned from the dataset.The original dataset reports at an interval of 20 s, which leads to a noisy structure.Thus, the interval length has been selected as 60 s for the study, effectively merging each of the three sensor reports by using the arithmetical mean operator.There are 421140 data points in chronological order.There are three subsets of the entire data set: the training dataset, the validation dataset, and the test dataset.The data set consists of 90% training data, 8% validation data, and 2% test data.

LSTM
RNN and ELM differ in that RNN is better at handling sequence data.It has the ability to back-propagate as well.The gradient's functionality is to update the recurrent neural network's weight value.If the weight is kept too low, the gradient vanishes, and the hidden layer's ability to learn diminishes.The gradient also explodes if the specified weight is too high.RNNs also include feedback connections in the hidden layer units of their architecture.They are able to process temporal information and learn sequences as a result of this capability.The hidden layer functions as a memory and has the capacity to store sequential data.
The LSTM method was developed as an improved version of RNNs, where the vanishing gradient problem is addressed [23].The term "gated" cell refers to this type of cell because it allows the user to choose whether to retain or disregard stored information.LSTM comprises three gates, namely an input gate, a forget gate, and an output gate (Fig. 4).
Forget gate selectively decides what information from earlier time steps should be retained.Equation 2 is applied.
In order to determine which information should be retained in the LSTM memory, this control gate employs a sigmoid function.The values of h(t − 1) and x(t) are largely responsible for the selection.f(t) produces output values between 0 and 1.The values close to 0 denote the total loss of the previously acquired information, while the values close to 1 preserve the entire information. (1)

Temporal fusion transformer-based prediction in aquaponics
Input gate that determines which information from the most recent time step should be added.According to Eqs. 3, 4 and 5, this gate is made up of a sigmoid layer and a hyperbolic tangent (tanh) layer.
A vector of new candidate values that will be added to the LSTM memory is represented by i 2 and i 1 represents whether the value needs to be modified or not.Follow- ing that, element-wise multiplication is applied to the tanh and sigmoid outputs.Cell state, which carries information during the entire sequence and in Eq. 6, serves as a representation of the network's memory.
First, the forget gate's output is multiplied element-wise by the cell state from the previous time step.This makes it possible to reject values in the cell state when they are multiplied by values close to 0. Next, the cell state is enhanced by adding the input gate's output element by element.The new cell state is what is produced in Eq. 7. The output gate decides the value of the output at the current time step.
(3) This gate first determines which side of the LSTM memory contributes to the output using a sigmoid layer.The information that the hidden state should contain is ultimately determined by multiplying the tanh output by the sigmoid output in Eq. 8.
The new hidden state is the output.

Encoder-decoder networks
The Encoder-Decoder standard model [24] is generally incapable of accurately handling long input sequences.The encoder processes the input sequence and compresses the information into a context vector of fixed length.Therefore, only the last hidden state of the encoder RNN is used as the context vector for the decoder.It is expected that this representation will be a good summary of the full capture sequence.On the contrary, the first part is mostly forgotten once it completes the processing of the entire entry.In the Encoder-Decoder model, an encoder reads the input sentence, a sequence of vectors x = (x 1 ,...,x T ) , into a vector c.At each time step t, the hidden state h t of the RNN is updated by using Eqs. 9 and 10 where f and q are nonlinear activation functions.
The suggested model's decoder is conditioned to produce the output sequence by anticipating the subsequent symbol y t given the hidden state h t .Additionally, y t and h t are dependent on y t−1 and the input sequence's summary c.
Consequently, the decoder's hidden state at time t is computed.

Attention mechanism
Initially, attention focused on solving the main problem around the Encoder-Decoder model and achieved great success.The attention was presented by Bahdanau [25], who revolutionized the field of deep learning with the concept of parallel treatment of words instead of processing them sequentially.The central idea of this layer is as follows: Each time the model predicts an output word, it only uses parts of the input where the most relevant information is concentrated, rather than the whole sequence.It only pays attention to the most relevant inputs and the computing process, as follows: The input sentence is mapped by an encoder to a sequence of annotations (h 1 , ...,h T ) , which determine the context vector c i .Although each annotation h i contains details on the input sequence, only a specific portion of the input is highlighted.As a weighted sum, the context vector is subsequently calculated as follows:

Temporal fusion transformer-based prediction in aquaponics
This is known as the alignment model.Based on how closely the input at position i and the output at position t match, the alignment model gives a score e t,i .The weights a t,i are computed in Eq. 13 by applying a softmax operation to the previ- ously computed alignment scores.Each time step, the decoder receives a distinct context vector, c t .It is calculated using the weighted total of all the hidden states of the encoder in Eq. 14.

The temporal fusion transformer
The TFT provides a neural network architecture that combines the workings of a number of existing neural architectures, such as LSTM layers, encoder-decoders, and the attention heads used in transformers, as shown in Fig. 5 [26].The transformer primarily consists of an encoder and a decoder, where the encoder part uses the time series data as input and the decoder part produces contextaware embeddings to predict future values.LSTM encoders and decoders summarize shorter patterns, whereas long-range relationships are left to the attention heads.The temporal multi-head attention block finds and prioritizes the most important long-range patterns that the time series may include.Each attention head can focus on a different temporal pattern.
The context vector is supplied to the Gate layer and then, to the Add & Norm layer.The dropout layer is only used during training and helps prevent the overfitting of the network by randomly eliminating some weights at a rate set by the user.The gated layer merely regulates the bandwidth of information flow within a particular neuron.Self-attention gathers information from a couple of different neurons.The layer first combines the weights from the gated layer with the residual connection weights in the Add & Norm layer.The dependency on batches is eliminated by later normalizing each input to a particular layer across all features.Because of this feature, sequence models like transformers and recurrent neural networks are well suited for layer normalization.
In terms of processing and predicting time series data, TFT models have proven to be more sophisticated than conventional LSTM models.By taking advantage of self-attention, this model offers a novel multi-head attention mechanism that, when analyzed, sheds further light on feature significance.Therefore, in contrast to other deep neural networks, these features are no longer regarded as black box.The following is a list of TFT's primary components: (13) a t,i = exp(e t,i )

Gated residual network (GRN)
GRNs are used to eliminate unnecessary and unimportant inputs.In order to avoid over-fitting, nodes can be dropped arbitrarily.A more sophisticated model may not always produce greater prediction performance for machine learning models.The ELU (Exponential Linear Unit) and GLU (Gated Linear Units) activation functions assist the network in determining which input transformations are straightforward and which require more complex modeling.The output is the standard Layer

Temporal fusion transformer-based prediction in aquaponics
Normalized before being output.Additionally, the GRN has a residual connection, which enables the network to learn, if required, to ignore the input.The GRN has two different inputs: an optional context vector c and a primary input p, which are depicted in Eqs.15-17 as follows: Where ELU is the activation function, 1 ∈ R d model , 2 ∈ R d model are intermediate lay- ers, LayerNorm is standard layer normalization and the index w indicates weight sharing.Following is a description of the GLU: If the input is , the sigmoid activation function is represented by .In addition, w and b represent weights and biases, respectively.The element-wise Hadamard prod- uct is ⨀ .The model's structure can be managed by GRN through the GLU, and extra layers can be disregarded.Because nonlinear contributions may be suppressed by having all the GLU's outputs close to zero, this layer may be completely omitted if necessary (Fig. 6).

Variable selection network (VSN)
TFT's variable selection networks are able to decide which input variables are suitable for each time step.Additionally, to enhance forecast accuracy, this module can remove the impact of irrelevant variables.TFT uses three instances of the Variable Selection Network (VSN) because there are three different input modalities.As a result, each instance has a distinct weight.Categorical variables are represented with entity embeddings, and continuous variables are represented with linear transforms.GRN is managed internally by the VSN for filtering.Following is a description of the VSN: The flattened vector of all previous inputs, called Ξ t , which is from the correspond- ing lookback period, is fed through a GRN unit and a softmax function at time t to produce a normalized vector of weights, denoted by the v x t .The context vector, abbreviated as c s , comes from a static covariate encoder.ξ(i) t is the output of a gated residual network.ξ(i) t was calculated by feeding (i) t in GRN.

Interpretable multi-head attention
The self-attention mechanism is used in this step to assist the model in learning long-range dependencies across various time steps.Contrary to the standard implementation, the novel Multi-Head Attention mechanism proposed by TFT provides feature interpretability.To project the input into different representation subspaces, the original architecture included various "heads" like Query, Key, and Value weight matrices.This method's disadvantage is that there is no common ground between the weight matrices, making it impossible to interpret them.The addition of a new matrix by TFT's multi-head attention allows the various heads to share some weights, which can then be explained in terms of seasonality analysis.According to the following relationships between keys K ∈ ℝ N×d attn and queries Q ∈ ℝ N×d attn , attention mechanisms scale values V ∈ ℝ N×d v generally as follows: A() is a function that normalizes data.The scaled dot-product for attention values is typically given as follows: To improve the model's capacity for fitting, the TFT employs a multi-head attention structure.The mathematical interpretation of Multi-head attention is given as follows: Attention weights alone would not be a good indicator of the significance of a particular feature because of the different values used in each head.Therefore, a 1 3 Temporal fusion transformer-based prediction in aquaponics multi-head attention technique to share values across different heads and use additive head aggregation is utilized.This approach particularly improves the multi-feature representative capability of the proposed model.The characteristics of an interpretable multi-head are described as follows: It is clear to see that the result of interpretable multi-head attention is very similar to that of a single attention layer, with the main distinction being the process used to produce attention weights Ã(Q, K) .While paying to a common set of input features V, each head can learn various temporal patterns A(QW (h) Q , KW (h) K ) , which can be understood as a simple ensemble over attention weights into combined matrix Ã(Q, K) .When compared to A(Q, K) , Ã(Q, K) , the representation capacity is suc- cessfully enhanced.

ELM
Backpropagation employs gradients as a basis.The structural nature of the algorithm provides high computational capacity to model complex problems.Gradient-based neural network algorithms also show a high predisposition to local optimus.Thus, regarding the nature of the problem analyzed, the algorithm is also included in the study for comparative purposes.
The Extreme Learning Machine (ELM) offers a rapid and powerful alternative for both Machine Learning and Deep Learning-based solutions [27].ELM is a training approach for a single hidden layer feed-forward neural network (SLFN).The architecture employs the following three layers: an input layer, a hidden layer, and an output layer.The hidden layer bias and input weights for the Extreme Learning Machine are determined at random and frozen during training.The ELM only optimizes the hidden layer weights.A single training iteration and random hidden layer weights enable faster convergence to the global optimum.
Mathematically, ELM can be formulated according to the following equation:

Experimental environment
The development language for both pre-processing and network implementation was Python version 3.7.The proposed LSTM networks were implemented using the Keras framework with version 2.9.0, and the TFT network was developed with Pytorch Lightning version 1.8.0.post1 and Pytorch Forecasting version 0.10.1.
The hyperparameter optimization has been conducted at the B.T.U.High-Performance Clustering Laboratory (HPCLAB). 1The model is trained on Nvidia 3090 GPUs with CUDA version 8. Eight GPUs were used in parallel to accelerate the overall training process.

Evaluation metrics
Three measures, namely Mean Absolute Error (MAE), Explained Variance Score, and Mean Square Error (MSE), were used to evaluate the proposed model's predictive ability.The Mean Absolute Error is the difference between the expected and actual values expressed in absolute terms.In MSE, the average squared difference between the observed and predicted values is assessed.Any unfavorable indications are changed by the squaring.The error measures are defined in Eqs.[31] and [32].
Explained Variance Score is a metric for measuring the disparity between a model's predictions and the actual data.In other words, it is the portion of the model's total variance that is not attributable to error variance but is explained by factors that are actually present.Scores close to 1.0 are highly desired, and error measures are defined in Eqs.[33].The variance of the predicted errors and the variance of the actual values are denoted by Var(y a − y p ) and Var(y a ) , respectively.
where N is the total number of the forecasting value, y p is the predicted value, y a is the original actual value, and y average is the average of the original value.Perfect forecasting results in a value of 1, whereas a value of 0 means that the performance is identical to that of a simple model that consistently forecasts the mean value of the data.There is minimal association between the findings and the dataset when the R 2 value is negative.

Hyperparameter optimization
The selection of hyperparameters has a significant impact on how well deep learning models perform.Because of this, fine-tuning becomes crucial in the training stage to produce a successful model.This work uses the Keras Bayesian optimizer within the Keras-Tuner framework [28] to construct a hyperparameter optimization utilizing a random search method for the LSTM models.The ELM hyperparameters are optimized using a hand-made search technique.The hyperparameters for the suggested TFT network for predicting aquaponics systems have also been determined using Optuna [29] optimizer techniques within the Pytorch framework.The resulting combinations are shown in Table 3. Random combinations of all accessible hyperparameters defining the search spaces are produced, with the number of combinations generated depending on the maximum number of trials and the number of models to be trained for each test.A model is trained for each of these combinations, and the one that performs the best is saved as the best model.

Discussion of results
Table 4 and Fig. 7 present the error rates of the proposed study for different methods concerning the metrics RMSE, MAE, Explained Variance and R 2 .The proposed TFT algorithm demonstrates remarkable improvements over baseline models in all metrics.The higher rates of Explained Variance represent better association of variance between the original data-space and the generated data-space.In all time windows, the performance improvement is also competitive.As the measurements deteriorate, the proposed model's limitation of a one-hour forecasting window remains.The proposed TFT algorithm performs better than all baseline methods over an all period.As the value of the forecasting period rises, the forecasting performance of all models gradually deteriorates.The proposed model also shows clear improvements in terms of metric scores over the previous works.Table 5 shows a comparison of the proposed study with similar studies.
In the aquaponics nitrate forecasting study, having lots of training data can improve accuracy and produce more effective outcomes.Training, validation, and test rates must be very carefully adjusted in order to prevent over-fitting.Otherwise, the likelihood of receiving inaccurate forecasts is high.Figure 7 displays the estimation graph that the entire method generated.The test results made using the hourly nitrate data gathered for the following all test data are displayed.Short-term representational strength is lacking in the baseline models.Figure 7 shows that the models can pick up certain aspects of the long-term structure.Although the dataset's characteristics have been normalized, the internal structure of the dataset poses a complex modeling challenge for the baseline models.Because of the sudden jumps it contains, the basic models fail to simulate temporal representation.The result is that the look-ahead output is skewed and vulnerable to temporal drift.Shifts were always present in the forecasts, despite the ELM and LSTM models in Fig. 7 being able to detect the noise.However, alternative algorithms produce forecasts with a smoother transition while not capturing noisy data.This guarantees that the models produce results that are accurate.When looking at Fig. 7e, it is clear that the TFT's predicted and actual nitrate level values overlap and that there is no excess in the deviations or variances between them.Given the near values of the data and the similarity of the directional breaks, it is clear why the graph is successful.
The results of the various temporal test data sets of the proposed model are shown in Fig. 8, and it is clear that even though it does not exactly match the data, the model enables the noise to be estimated with a smooth transition.The model is quite good at forecasting long-term predictions, but due to the enormous noise in short-term projections, it cannot capture such predictions accurately.However, the short-term model can sufficiently describe the pattern.
The computational performance of the models can be examined through the models' parameter sizes or the total time spent on the training phase [14].In terms of both model parameter size and total training time (Table 6), the suggested TFT model is higher to the basic models.The Encoder-Decoder layers operate with much fewer parameters while providing the model with improved learning capacity.The proposed model needs more epochs to train properly and requires more time in total.Prediction timeframes are the longest of all methodologies yet provide for sufficient time to run in production settings.The proposed TFT model outperforms existing algorithms in real-world settings in terms of performance needs.

Conclusions
Soil usage in agriculture poses a great limitation in matching the increasing demand for food.The soilless farming proposed by aquaponics creates an ecosystem where the only input is fish food and fertilizers are moved out of the system.Autonomous handling of aquaponics with less human intervention provides increased efficiency for higher throughput and fewer maintenance costs.
The baseline solutions for simulating aquaponics environments include LSTM, Encoder-Decoder architectures and Attention-based methods.As shown in Table 5, the proposed model provided 0.0322 MSE value for predicting the nitrate level.Also, different metrics such as RMSE, MAE, and explained variance scores per method are shown in Table 4.The TFT has a more intricate architecture and better learning capabilities than other baseline deep learning algorithms.In addition, the TFT is capable of taking into account a variety of factors in datasets, including static input, known input, and observed input.In contrast, traditional deep learning architectures may overemphasize factors that are irrelevant to the target variable.
TFT proposes several improvements over these baseline methods by combining the Encoder-Decoder LSTMs, which model short-term relations with good accuracy, with the feature weighting mechanism of attention matrices to capture longterm relations.Transformer, a recently created encoder-decoder model based on the attention mechanism, accurately calculates sequences without the use of any repeating neural networks.As a result, fewer parameters are needed to produce significantly better results.The architecture also improves the predictive performance of multi-variate forecasting due to the masked multi-head attention method used in the attentive layer.Thus, a transformer-based Deep Learning solution to forecast nitrate levels using multiple input-features in an aquaponics environment is utilized.
The predictive performance of the proposed method shows clear improvements over baseline models in terms of metrics MAE, MSE and Explained Variance when considering all sequences.The problem is impractical for sequences longer than an hour due to the memory space needed to process the attention matrix.Thus, rendering the simulation infeasible with limited computational resources.Besides, multi-step forecasting performance for longer sequences deteriorates after a certain sequence length.The proposed method offers increased sequence modeling capacity, enabling longer sequences to be represented.The employment of auto-regressive models along with TFT would mean even better sequence handling.Also, the Memory Transformers can be employed in multi-head attention architectures to lower the memory requirement of the general approach.To further capture noisy data in short-time predictions as in the LSTM model, new LSTM or RNN layers might be added to the proposed model.Short-time noise data are believed to be properly learned in this manner.The proposed solution improves the overall viewpoint of aquaponics handling with better simulation performance.Obtained results can be used to handle anomalies in the ecosystem, such as fish diseases, pump failures, or bacterial problems.The employment of real-life applications along with the simulation can increase the effectiveness of the aquaponics architecture.Furthermore, the improvements obtained will bring us closer to the reality of the unmanned agriculture of aquaponics.

Fig. 7
Fig. 7 All test data forecasting performance for methods: a ELM b LSTM c Encoder-decoder LSTM d Attention LSTM e temporal fusion transformer (TFT)

Fig. 8 3
Fig. 8 Performance prediction for the TFT model using test data in hourly forecasting: a 800 test data b 3500 test data c 8000 test data [8]this section, previous studies and the current state of literature in the context of the maintenance of smart aquaponics systems based on Artificial Intelligence (AI) are discussed.Monitoring and maintaining self-sufficient smart aquaponics systems requires autonomous control through sensors.Arvind et al. developed a miniature smart aquaponics ecosystem through several IoT sensors and used the produced data to implement an autoML regressor[7].The regressor is then utilized to create autonomous anomaly signals, which can be used to reduce the maintenance burden of the proposed ecosystem.Mehra et al. proposed an artificial neural network (ANN) to classify several anomalies, such as lack of nutrients and changes in levels of humidity or lighting, through sensor reports[8], though the system lacks reproducibility as the accuracy metrics are not provided.Hydroponics systems require a certain pH level to operate properly.One of the main factors affecting the pH level is the presence of heavy metals in the ecosystem.
Dhal et al. proposed a real-time machine learning-based solution supported by a real-life application to monitor and detect anomalies in heavy metal levels

Table 1
The relevant studies

Table 2
Dataset statistics for sensor-based attributes

Table 4
Performance comparison of the proposed model with baseline methods in different time windows Temporal fusion transformer-based prediction in aquaponics

Table 6
Computational complexity of models