1 Introduction

Agriculture, as the most fundamental industry for human, is facing a serious threat of climate change.Meteorological disasters account for over 70% of the natural disasters globally,have caused serious economic losses (Qin et al. 2002). More severely, global climate is changing dramatically,due to amount of greenhouse gases human activities producted. Because of climate warming, the frequency and intensity of drought and flood are increasing, and the harm to agricultural production is increasingly intensified (IPCC. 2012). From 2010 to 2017, global average annual economic losses due to drought reached US $23.125 billion, with annual grain production cuts ranging from millions of tons to more than 30 million tons (Buda et al. 2018).In past 40 years, flooding events caused over a trillion dollars damage as well (UNDRR. 2020).Monitoring and analysis of meteorological data are able to reduce food and economic losses (Ziolkowska Jadwiga and Jesus 2018).

In order to effectively guide agricultural production, meteorological monitoring usually includes meteorological parameters such as temperature, humidity, water vapor pressure, wind speed and sunshine.What is more, conducting research on climate forecasting requires long-term, large-scale and comprehensive climate data (Bonnet et al. 2020). Governments and scientific communities have been committed to the construction of meteorological databases (Anderson et al. 2008), a large number of professional meteorological monitoring stations have been established around the world, including China. However, there is much missing historical data due to temporal differences of monitoring station establishment, sensor failure, and other reasons.Thus, it is crucial to reconstruct the complete meteorological monitoring data.

Usually, researchers use interpolation method combined with manual correction to reconstruct missing meteorological data.Which not only consumes a lot of manpower, but also, due to the spatial variability of geographical conditions, the data results reconstructed by the traditional method are too smooth and inaccurate (Yao et al. 2023). Machine learning is a better interpolation tool (Li et al. 2023),but which performed poorly when deal with long-sequence missing data scenarios. A simple and efficient method for data reconstruction, deep neural network, has a great potential in meteorological data reconstruction tasks. The neurons in the hidden layers of the neural network can constantly update the weights under supervision of true value, learning high-dimensional association among different data, and more accurately complement missing data (Rajaee et al. 2019). In the task of reconstructing monitoring data of turbomechanical particle flow. Deep learning method was more accurate compared with six commonly used interpolation methods (Ghasem and Nader 2022). In fact, in the field of weather forecasting, some available deep learning models have been published. Training baseed on large amounts of data, FourCastNet2 can calculate the next 24 h of climate for 100 sites in just 7 s (Jaideep et al. 2022),orders of magnitude faster than the numerical weather prediction (NWP). The Pangu model proposed by Huawei team can accurately and quickly predict the global climate by learning global meteorological monitoring data of past four decades (Bi et al. 2023).

Selection and design of neural networks is a key step in reconstructing climate data. Ghose selected recursive neural network (RNN) for groundwater level prediction (Ghose et al. 2018).Vu reconstructed 50 years groundwater level data in Normandy (France) based on the long and short-term memory (LSTM) (Vu et al. 2020). Differently, meteorological parameters are greatly spatially correlated, as a typical spatiotemporal sequence.Nature Subissue Geoscience published related research, which using image restoration technology combined with HadCRUT4 global historical temperature grid dataset, reconstructed complete global monthly grid temperature, and the reconstructed data sequence has extremely high correlation with the non-reconstructed data (Kadow and Ulbrich 2020). Continuity of time and spatial correlation must be considered simultaneously in the data reconstruction.Most of the frontier studies of spatio-temporal prediction are modeling based on graph neural network (GNN) and Transformer (Zhisong & Li. 2021). But they have high computational complexity and memory overhead. Although MLP is a relatively simple deep learning model, the ability of spatiotemporal prediction is not inferior to complex models in recent studies.The MLPST model shows that, compared with RNN, GNN and Transformer, it can be very accurate even completely based on MLP (Zhang et al. 2023).Usually, different type of time series data have different characteristics, and screening some obvious characteristics can significantly improve the model performance (Tang et al. 2024).Therefore, in specific tasks, feature engineering and special model design need to be carried out to improve the prediction performance of the model.

To meet the demand of meteorological data fast reconstruction in agricultural productions, reduce the workload of meteorologic data reconstruction, we designed a spatio-temporal MLP as a reconstruction tool. A total of 143 missing data (44 weather stations) were reconstructed obtained from three divided study areas in Xinjiang, which contrasted with general MLP,testing the availability of the proposed model while obtaining the reconstruction results. The parameters reconstructed include Max_T, Min_T, Ave _ T, Ave _ WVP, Ave _ RH, 10 m Ave _ WS, and Sun_H. Inputs make up with short-term, cyclical, long-term trends and the same time data of weather stations with the highest sequence similarity, length of filled sequences ranged from one month to 38 years.The confidence of the results is measured by the correlation with the most adjacent station. Finally, datasets automatic construction module, automatic training module, missing positions automatic query module, and automatic rolling prediction module are integrated, realizing end-to-end data reconstruction and published as a micro service.

2 Study area and data

The study area is located in Xinjiang, northwest of China. Which is one of the most important cotton production bases in China and most developed drought agricultural technology region (Liu. 2022). Located in the hinterland of Eurasia, due to complex terrain and frequent weather system activity, drought is the main climatic feature of this region (Weiyi et al. 2008). The Tianshan Mountains crosses the central region, divides Xinjiang into northern Xinjiang and southern Xinjiang. The water vapor could enter northern Xinjiang but hardly reach southern Xinjiang, so the drought degree difference of drought between the north and the south is obvious (Wang. 2023). Yili River Valley located in west of the Tianshan Mountains in Xinjiang, surrounded by mountains on three sides, with abundant precipitation, forming special climate (Yan et al. 2017).

105 weather stations in this study are distributed in three areas: northern Xinjiang (A), southern Xinjiang (B) and Yili (C), having 48, 44, 13 sites respectively, recorded meteorological data for nearly 62 years (Fig. 1). Among those, 44 stations exist data missing in varying degrees, with missing parameter types and missing duration varied. Table 1 listed codes and parameters of weather stations with missing data. Of these, 16 stations exist Max_T and Min_T data missing. The number of Ave_T is 13, Ave_WVP and Ave_RH is 23, 10 m Ave_WS is 28, Sun_H is 24. In totally, we need to reconstruct a total of 143 sequences,with time spans from 1961 to 2022. Figure 2 corresponding to Table 1, shows that the specific missing period, these missing lengths are long, the missing location is different, increased the difficulty of the reconstruction.

Fig. 1
figure 1

Study area division and location of weather stations

Table 1 Meteorological parameter types of missing data (Number of weather stations and percentage missing)
Fig. 2
figure 2

Measurement time-window at 105 weather stations over 62 years from 1961 to 2022

3 Methodology and model design

Traditional meteorological data reconstruction methods are time-consuming, at the same time, professional and experienced personnel are required to complete it. Which is obviously unrealistic for software engineer or other relevant personnels. Spatio-temporal MLP proposed in this study can mine rules from existing data, which can give Non-professional people the ability to reconstruct missing meteorological data.

3.1 MLP

MLP is the most classic deep neural network, widely used to solve the classification and regression problems of nonlinearity. Compared to other deeplearning models, the network structure of MLP is very simple, and calculation speed is very fast. Whose structure (Fig. 3 lower-right) includes input layer, hidden layer and output layer, each layer contains several neurons, and neurons in the upper and lower layers are connected to each other for information exchange (Benedict. 1988). When training it, the weight parameters of the neurons are constantly updated until a good fit is achieved. Forward propagation and backpropagation are required to complete each time the weights are updated (Rumelhart et al. 1986). Forward propagation takes the outputs of the previous layer as the inputs of the next layer, calculates the outputs of the next layer according to the weight. Consider the layer1 and layer2 as examples, outputs of the layer2 is:

$${a}_{i}{}^{layer2}=\sigma ({b}_{i}+\sum {w}_{j}{a}_{j}{}^{layer1})$$

where σ is the activation function, which is the key for MLP to achieve nonlinear fitting. The most commonly used activation function, ReLU, is selected in this study (Glorot et al. 2011):

Fig. 3
figure 3

MLP and Model Framework

$$f(x)={\text{max}}(0,x)$$

Prediction errors is measured by cost function LOSS. The back propagation process is based on the chain conduction law, calculating gradients each layer parameters in the network to represent the influence of the parameters on the prediction errors, and updating the weight through multiply by learning rate α, until the loss value no longer drops. Which can be considered that the MLP model fitting has reached the optimal solution. The initial learning rate selected for this study was 0.001. The backpropagation process is:

$${w}_{jnew}={w}_{j}-\alpha \cdot \frac{\partial LOSS(y,\widehat{y})}{\partial {w}_{j}}$$

3.2 Spatio-temporal MLP

Past studies have proved that,climate shows a short-term dependence, and which is cyclical and shows a trend in the long term (Kai et al. 2020), and closely associated with the adjacent site data. According to these experiences,we designed four modules based on the MLP (Fig. 3): Spatial MLP, Short-term MLP, Periodic MLP, Trend MLP. Time series, with different time scales resampled, were entered separately Short-term MLP, Periodic MLP, Trend MLP models, extracting short-term trends, cyclical and long-term trend characteristics of historical data respectively. Monitoring values from nearby stations were fed into the Spatial MLP module to obtain spatial associations between them. Results of the four modules are combined as inputs of predictive header, two fully connected layers. Which enable spatio-temporal association in the sequence is captured.

Dataset size and input sequence length are negatively correlated, need to be balanced when designing the inputs. In our model, all of the inputs length were set to 8, ensuring input format is unified. Inputs of short-term MLP are values last 8 days. Inputs of Periodic MLP and Trend MLP are values resampled according to 90 days intervals and 365 days respectively. Inputs of Spatial MLP were the monitoring values of eight stations with highest similarity to the target sequence within the study region. Using pyramid structure, the number of neurons in each layer is half the number of neurons in the previous layer, and which is usually able to extract features at different scales more effectively (Yang et al. 2020). Rolling prediction, which mean each step to only predicts one day, and then predicted value is used as part of new input to predict the next day, gradually supplementing all data.

All meteorological data were standardized according to the following formula:

$${x}_{i}{}^{{}_{normal}}=\frac{{x}_{i}-{x}_{{\text{min}}}}{{x}_{{\text{max}}}-{x}_{{\text{min}}}}$$

where \({x}_{i}{}^{normal}\) is normalized value,\({x}_{i}\) is actual value,\({x}_{\text{max}}\) and \({x}_{\text{min}}\) are maximum and minimum value of sequence, respectively. This normalization method standardizes the values to between 0–1, be able to eliminate the effect of dimension and negative values for model fitting.

When predicting, we restore the results,and output the dimensional results:

$${y}_{i}{}^{pred}={\widehat{y}}_{i}\cdot ({x}_{{\text{max}}}-{x}_{{\text{min}}})+{x}_{{\text{min}}}$$

where \({y}_{i}{}^{pred}\) is predicted value,\({\widehat{y}}_{i}\) is output value of neural network.

3.3 Assessment methods

Meteorological similarity is measured by Euclidean distance of the sequence commonly,but there is a big difference between the different parameters. In order to standardize this index to 0–1, we define similarity based on Euclidean distance of two sequences:

$$S{M}_{\text{mn}}=\frac{1}{{e}^{{\sum }_{{\text{i}}=1}^{\text{n}}\left|{y}_{i}-{{{\text{y}}}{^\prime}}_{t}\right|/100*n}}$$

SMmn represents the similarity of m sequence and n sequence, yi, y’i are the value of two sequences at the same time, respectively. n is the number of non-missing value. SM is closer to 1, the more similar, the closer to 0, the lower similarity. SM was used to select the inputs for the Spatial MLP module.

We used two common indicators, mean squared error (MSE) and mean absolute error (MAE), to evaluate the quality of the prediction:

$$\begin{array}{c}MAE=\frac{1}{{\text{n}}}{\sum }_{{\text{i}}=1}^{\text{n}}({y}_{i}-{\widehat{y}}_{i})\\ MSE=\frac{1}{{\text{n}}}{\sum }_{{\text{i}}=1}^{\text{n}}{({y}_{i}-{\widehat{y}}_{i})}^{2}\end{array}$$

where \({\text{y}}_{i}\) is the real measure of climate data; \({\widehat{y}}_{i}\) is the estimated value of climate data; and \({\overline{y} }_{i}\) is the mean of \({\text{y}}_{i}\). MAE and MSE were used to select the best number of hidden layers and evaluate the error of the reconstructed data.

When evaluating the credibility of the reconstructed data, in addition to MAE and MSE, we used the correlation coefficient as the evaluation index:

$$cor{r}_{m-n}=\frac{{\sum }_{i=1}^{n}({h}_{i}^{m}-{\overline{h} }^{m})({h}_{i}^{n}-{\overline{h} }^{n})}{\sqrt{{\sum }_{i=1}^{n}{({h}_{i}^{m}-{\overline{h} }^{m})}^{2}}\sqrt{{\sum }_{i=1}^{n}{({h}_{i}^{n}-{\overline{h} }^{n})}^{2}}}$$

\({h}_{i}^{m}\) and \({h}_{i}^{n}\) are value of m sequence and n sequence,respectively, \({\overline{h} }_{i}^{m}\) and \({\overline{h} }_{i}^{n}\) is the average value of m sequence and n sequence, respectively. Correlation coefficient is closer to 1, the reconstructed data is more credible, and correlation coefficient closer to 0, the more unreliable it is. Based on reconstruction results, larger average correlation coefficient means more accuracy. Meanwhile, comparing the distribution of correlation coefficients in different tasks, the more concentrated the results, the better the robustness.

4 Application instances of reconstruction of missing climate data

4.1 Sub-task division

The reconstruction task was divided into 21 scenarios, 143 sub-tasks depending on the region and the parameters. Climate region A consists of 53 sub-tasks, among these, Max_T, Min_T and Ave_T take up 5 sub-tasks respectively; Ave_WVP and Ave_RH take up 6 sub-tasks respectively; Ave_WS and Sun_H take up 9 and 17 sub-tasks respectively.Climate region B has 75 sub-tasks, Max_T, Min_T, Ave_T, Ave_WVP, Ave_RH, Ave_WS, Sun_H take up 9, 9, 6, 15, 15, 15, 6 sub-tasks respectively. The numbers in climate region C are 2, 2, 2, 2, 2, 4, 1 respectively, total sub-tasks numbers were 15.

The SM of each sequence was calculated and derived as SM table. We show the SM of the target station and stations in the same region in Fig. 4 due to the large amount of table data. The more pronounced the yellow color, the higher the similarity, and the more pronounced blue the lower similarity, medium similarity shows green. Under all of the reconstruction scenarios, overall, the SM of these sequences ranged between 70.6% and 99.48%.Based on SM, 8 stations with the highest similarity to each target task, 143 groups in total. Following the spatiotemporal sampling method shown in Fig. 3, build model inputs.

Fig. 4
figure 4

The similarity between the target weather station and the related weather station (under the sequence reconstruction scenario of different regions and different parameters)

4.2 Determine the number of MLP hidden layers

The number of layers of the MLP hidden layer is one of the most important parameters of model.A suitable number of hidden layers can enhance the ability of network to extract the data features, and then improve the recognition accuracy. While too many hidden layers can largely increasing the number of model parameters, and causing slower run speed of model. To determine the number of hidden layers, we randomly picked five datasets, testing MSE, MAE and training time of the model prediction. Numbers of hidden layers is seted from 1 to 7(training 1000 epochs). Figure 5 displays, when setting 1–4 hidden layers, MAE and MSE of the model did have a significant downward trend as the hidden layer increasing. But when number of hidden layers greater than 4, MAE and MSE showed little improvement, even rose on some datasets. This may be related to the appearance of gradient explosion when model is too deep.

Fig. 5
figure 5

MAE and MSE trend with the number of hidden layers increases

Figure 6 shows, time consumption to complete training increased significantly with the increase of the number of hidden layers, on the five randomly selected datasets, which displays almost linear. Considering the results of the above trials, the number of hidden layers chosed for MLP is 4, and the longest time consumption for a single task is 519.35 s.

Fig. 6
figure 6

Time-consuming trend of training model with the number of hidden layers increases

4.3 Train model and reconstruct missing data

Figure 7 shows our prediction process.Using pycharm2022.1 for our programming, we integrated multiple modules to implement end-to-end programs, with pre-processing data automatically, training model automatically,detecting missing data automatically,and reconstructing data automatically. The situation of missing data is complex, and the task of manually constructing datasets is large. Automatic data pre-processing model could generates datasets by reading data sheets according to task list, and completes normalization.Data were disrupted the order before entering.Automatic training model could Completes multiple tasks and saves as different parameter files.When predicting, missing data detection model could detects location of missing data.Later, according to detecting results, rolling prediction model automatically forward or backward predicting.

Fig. 7
figure 7

Structure of ensemble forecasting

Tensorflow (Abadi et al. 2016) was selected to be development framework in this study.Using Adaptive Moment Estimation(Adam) optimizer to improve learning efficiency (Kingma and Ba 2014), it can adjust automatically the learning rate according to historical gradient information. At the beginning of training, the larger learning rate helps the model to converge quickly. While later, learning rate adjusts smaller to improve accuracy of model.Meanwhile, Adam normalized the weight parameters, which also alleviates overfitting. MSE was choosed to be loss function.As a skill of training, Dropout layer can effectively prevent model overfitting (Srivastava et al. 2014). In this study, the super-parameter of Dropout layers was set to 0.5.

Figure 8 shows reconstruction results of Max_T, Min_T, Ave_T, Ave_WVP, Ave_RH and Sun_H.Except Sun_H,from the figure, reconstructed sequences of other five parameters are indistinguishable from the real sequences. Sun_H usually represents time length,that the solar radiation above certain intensity. Which influenced by all kinds of meteorological factors, especially the change of the clouds. Our prediction values almost no 0 while measured exists some 0 values, can not mining to the occurrence of 0 values. But we can clearly see that, even without filtering, proposed model could dig out the cycle and trend laws. These reconstructed Sun_H data still useful. For Ave_WS (Fig. 9), model is difficult to predict it. Although Ave _ WS also has annual periodic changes in the long term, it is almost and unpredictable on a smaller time scale.

Fig. 8
figure 8

Results of reconstructed sequence(except Ave_WS)

Fig. 9
figure 9

Result of reconstructed sequence(Ave_WS)

4.4 Evaluate model and quality of reconstructed data

We compared our designed model with LSTM and general MLP model in five completely random (both parameters and weather stations are random) reconstruction tasks.

As can be seen from Fig. 10, prediction error of spatio-temporal MLP is minimal, LSTM is followed, MLP has the largest error. Both MAE and MSE, design model takes lower error than general MLP in all tasks, and spatio-temporal structure improved the predictive power of the MLP model. Due to the obvious differences in values of different meteorological parameters, percentage of error reduction was used to measure precision improvement. Compared to the second-ranked LSTM, in the five tasks, MSE of per task decreased by 7.61% and MAE by 4.80%, in average.

Fig. 10
figure 10

Comparison between Spatio-temporal MLP and commonly used models

Compared with the visual evaluation, the evaluation index can give more information about the reconstruction results. Assessing the quality of reconstructed data is very difficult due to the difficulty in tracing the real data of the past. By calculating correlation coefficient of sequences between reconstructed data and the nearest weather station data, credibility of reconstructed meteorological data was scientifically evaluated. A higher correlation coefficient indicates a higher confidence.

Correlation coefficients of all reconstructed sub-tasks are shown in Table 2. From the reconstruction effect of temperature,we very approach the results of Christopher (0.9941) (Kadow and Ulbrich 2020), even exceed theirs in 4 tasks of temperature reconstruction (45 in total). More importantly, the data we reconstructed are of daily scale, smaller than their time granularity (monthly). Our work demonstrates that MLP with special spatiotemporal design can better reconstructe climate data.

Table 2 Correlation of reconstructed sequences and nearest neighbor sequences

The consistency of evaluation indicators in different tasks is also one of the goals we pursue,which can help us to suggest parameter types that the model is suitable for reconstruction. Figure 11 shows the distribution of its correlation coefficient index.Which can be easily see, Max_T, Min_T, Ave_T, Ave_WVP shows excellent performance, with average correlation coefficient is over 0.9 and distribution is very concentrated; Ave_RH and Sun_H performance unstable, although average correlation coefficient is over 0.7 but dispersed; Ave_WS shows poor results, average correlation coefficient is around 0.5 and distributing is dispersed. Which shows that our model is very suitable for four parameters: Max_T, Min_T, Ave _ T, and Ave _ WVP, while Ave _ RH, Sun_H, and Ave _ WS are less suitable.

Fig. 11
figure 11

The evaluation indicators distribution of different type parameters reconstruction

4.5 Release model

In order to provide convenient services for people in the agriculture field, the model accomplished in this study will published on the Agricultural Smart Brain platform as a tool.Which developed by Beijing Lianchuang Siyuan Measurement and control Technology Co., LTD, providing scientific research data, computing power and publishing AI micro-services for scientific researchers. Users can obtain it by purchasing access authority to the platform. The link of our microservices as follow: http: // 192. 168. 50. 201: 15000/ app/ services/ visionarytech/ test-1/ alg- 26bc86f42a137f8f

5 Conclusion

In order to reconstruct the long-term missing meteorological monitoring data in agricultural field, proposed an end-to-end rapid reconstruction method based on MLP.

Spatio-temporal datasets were built according to the similarity indicator SM, standardized data to ensure good performance of the model. Backbone was designed four MLP modules with 4-hidden layers to jointly learn short-term trends, periodicity, long-term trends, and spatial associations. Predictive head consisted with two fully connected layers. After that, the automatic preprocessing, automatic detection of missing data location, automatic model training and rolling prediction modules are coded and integrated to realize end-to-end long sequence reconstruction. Our model is able to complete a single reconstruction task within 10 min.

In contrast with MLP and LSTM, MAE and MSE obtained by our model were reduced by 7.61% and 4.80%, indicating that our design effectively improved the performance of MLP model and outperforms LSTM. Daily meteorological monitoring data of 44 meteorological stations (143 tasks) in Xinjiang from 1961 to 2022 were reconstructed using our method. The evaluation indexs show that, average correlation coefficient of Max_T, Min_T, Ave_T and Ave_WVP are 0.969,0.961,0.971, and 0.942 respectively,showing high consistency and high credibility; average correlation coefficient of Ave_RH and Sun_H are 0.720 and 0.789 respectively, showing low consistency and general credibility; average correlation coefficient of Ave_WS is 0.488, showing low consistency and low credibility.Which is recommended to use our method when reconstructing Max_T, Min_T, Ave _ T and Ave _ WVP, providing an important solution to solve the problem of missing data in agrometeorological field.

Finally, we released our model on Agricultural Smart Brain platform, provided users a tool of data reconstruction, in the form of micro service.