Keywords

1 Introduction, Motivation, and Goals

Under the climate change scenarios, the rapid increase of world population and industrial development is expected to increase carbon dioxide concentration in the Earth’s biosphere. At the same time, environments are predicted to be warmer and dryer, all of which will favor the cultivation of crops with a C4 photosynthetic pathway over C3 crops [1,2,3]. Humans will, therefore, rely heavily on C4 crops like sorghum (Sorghum bicolor (L.) Moench). As sorghum is becoming a world’s staple food and a biofuel-dedicated biomass business, its cultivation and yields will have to be closely monitored and forecast for efficient management locally and globally.

Potato has been the major crop in the Netherlands for many years. Due to the reform of the CAP (Europe’s Common Agricultural Policy), the market is changing and farmers are urged to increase their yields, but in a sustainable way. This means they need to be more conscious of the energy and other resources they use in producing their crops. AVEBE is a cooperative for the potato growing farmers and supports their growers in an innovation program called “Towards 20-15-10”, to realize in 2020 an average of 15 tons of starch per ha with a variable cost price of €10 per 100 kg starch. To monitor these objectives, farmers are sharing data about their yields and farming practices in study groups. Crop yield forecasting is a key strategy in agriculture as it enables sustainable development and helps avoid famines and commodity shortages [4,5,6,7]. Crop monitoring and yield forecasting represent a good source of actionable information that can be used by governmental institutions, companies, and farmers for price predictions and adjustment and for efficient agricultural trade. They simplify business operations through better planning of harvest, delivery of the produce, deployment of machineries, logistics, and the use of resources [8].

Conventionally, crop monitoring and yield forecasting rely on field surveys, censuses, and sampling in predefined locations (e.g., potato), which are costly processes associated with high uncertainties [9]. Results are hard to relate to other fields that were not visited, making it difficult for the farmer to objectively examine the status of his crop and for the processing industry to plan logistics of transport and processing capacity at an early stage. Modern crop monitoring relying on remote and proximal sensing technologies resulted in a superior solution [9,10,11,12,13,14,15]. This sensor-based monitoring is dependent upon differential reflectance of light by plants [16] which generally absorb the portion of light in the wavelength range of 400–700 nm (i.e., in the blue 440–510 nm, and red 630–685 nm wavelengths), and reflect light in the green and near-infrared portions of the light spectrum. Crop monitoring technologies have been used to exploit this phenomenon, including satellites and hand-held sensors measuring light in narrow wavebands or wavelength intervals. Plant reflectance measurements have been successfully used in several instances including the quantification of canopy vigor [17,18,19], nutrient, and soil moisture stresses [20, 21] and to predict yields [8, 22]. However, in most studies, remote sensing-based biomass yield estimation or prediction makes use of low- or medium-resolution satellite images from sensors such as SPOT-VEGETATION or MODIS [8]. These satellite products have a coarser spatial resolution (250–1000 m) compared to the data collected from the two Sentinel-2 satellites in this work (10-m spatial resolution). With the launch of the Sentinel-2 constellation of satellites the overpass frequency (five days and locally even two to three days), the temporal resolution is nearly as good as for SPOT-VEGETATION and MODIS satellites (one to two days). The high spatial resolution of the Sentinel-2 images is a valuable asset when monitoring crops in agricultural regions characterized by many small fields like in the Mediterranean region where this study was conducted.

Deriving yield information from satellite imagery has shown promising results but this technology is not extensively applied across farmers and crop species worldwide [8, 22]. In the sorghum pilot, we developed models for in-season prediction of annual and perennial sorghum biomass yields in Emilia-Romagna, Italy, based on the fraction of absorbed photosynthetically active radiation (fAPAR) measurements from Sentinel-2A and Sentinel-2B satellite images on 42 mostly full-fledged commercial sorghum fields. Unlike other crops in which the yield is directly correlated to the aboveground biomass, potatoes follow a different pattern in the growth of the productive yield (Fig. 17.1). Crop growth models simulate both the aboveground dry matter and the tuber dry matter and can help to estimate the yield gap and yield at an early stage.

Fig. 17.1
figure 1

Difference in the vegetative and reproductive growth stages of potato. Adapted from [23]

In the potato pilot, we therefore used imagery from the Sentinel-2 satellites to provide a semi-continuous flow of data about the development of the potato crop and the WOFOST [24] crop model using local weather data to provide field-specific yield information. Sample data were used to calibrate the remote sensed data.

In the pilots implemented in this work, we used machine learning algorithms to create yield prediction equations. These equations can be implemented in decision support systems to allow farmers and/or farming stakeholders to predict biomass yields from sorghum fields of interest early on in the cropping season. This information is very helpful to efficiently schedule fleets of harvesting machinery, transport vehicles, and storage facilities. The fAPAR-derived predictive models for biomass yields can also be implemented by extension services and policy-makers for several purposes, including the possibility to anticipate potential biomass availability and plan ahead, to avoid specific crises such as fuel shortage. The potato pilot’s goal was an online decision support system (DSS) for potato farmers, which would provide them objective information about the yield gap and yield potential of their fields given the actual weather conditions. The developed online platform provided the farmers more insight by benchmarking their crops during the growth period with crops in the region and/or previous growing seasons. These new insights will improve farm management decisions on timely and more efficient location-specific treatment of the crops.

2 Pilot Set-Up

The sorghum pilot consisted of private farmers and/or farming cooperatives. During the 2017 and 2018 cropping seasons, 43 sorghum pilots were run covering 240 ha. The access to EO platform was made through “WatchITGrow” (VITO, Vlaamse Instelling voor Technologisch Onderzoek N.V., Mol, Belgium), which was also the end-to-end backbone for the technical pipeline used in this pilot. The plot sites were geolocated and the coordinates used for site-specific monitoring the fAPAR index throughout the cropping season. Fields were geolocalized, geolocation data saved as kml files before they were integrated into WatchITGrow application. The fAPAR estimates were generated at decametric spatial resolution (10 m pixel size), and a temporal resolution of 5 days up to 2–3 days in those areas where the different satellite overpasses overlapped. Spatial resolution refers to the surface area measured on the ground and represented by an individual pixel, while the temporal resolution is the amount of time, expressed in days that elapses before a satellite revisits a particular point on the Earth’s surface. For each experimental field, fAPAR or “greenness” maps were produced (Fig. 17.2), and a growth curve was built, showing the evolution of the fAPAR values throughout the cropping season. To correct for artifacts in the curve (such as abnormally low fAPAR values due to undetected clouds, shadows, or haze) and to interpolate fAPAR values between subsequent acquisition dates, a Whittaker smoothing filter was applied on the curve. Finally, the fAPAR values from the curves were used for further analytics.

Fig. 17.2
figure 2

Greenness (fAPAR) maps derived from Sentinel-2 satellite imagery for five sorghum fields in Anzola (from left to right: T5-grain sorghum, T4-dual purpose sorghum, T3-sweet sorghum, T2-forage sorghum, T1-biomass sorghum) for a selected number of dates in 2017, as available via WatchITGrow. T5-grain sorghum was not included in this study (refer to Sect. 2.1 for detail)

During the two years (2018 and 2019), groups of AVEBE farmers provided information about their potato crop, like the location of their plot, planting date, and variety. The plots, in total an area of 111 ha, were geolocated and the coordinates were entered into the platform. Based on the plot location, the soil characteristics were determined from the BOFEK2012 [25] soil map. Moreover, the plot locations were used to identify the nearest official weather station, providing a daily update of rainfall, temperature, and solar radiation. Both soil characteristics and weather data were input for the WOFOST model. Due to the extraordinary dry seasons in 2018 and 2019 modeling, the potential crop growth was strongly complicated. With the coordinates of the plots, the cloudless Sentinel-2 images were selected, providing Weighted Difference Vegetation Index (WDVI) data which were used to calculate the potato Leaf Area Index (LAI) (Fig. 17.3).

Fig. 17.3
figure 3

Crop monitoring based on Sentinel-2 images expressing variability in LAI

3 Technology Used and Yield Prediction

The DataBio technological components implemented in these pilots were developed and deployed by VITO, CREA (Consiglio per la Ricerca in Agricoltura e l’Analisi dell’Economia Agraria, Rome, Italy), and NB Advies. VITO provided the platform “WatchITGrow”, while CREA and NB Advies deployed crop species tailored machine learning technology, all of which were the backbone technology and end-to-end solutions of the pilot. The pilots were implemented in the form of advisory services under real-world commercial farms settings. The smart farming services were offered according to the specific cropping systems.

In biomass sorghum, services were centered around crop monitoring using proximal sensors to derive vegetation indices, and crop growth and yield modeling using fAPAR derived from satellite (Sentinel-2A and 2B) imagery and appropriate machine learning technologies.

The models used in this study were evaluated using symmetric mean absolute percentage error (SMAPE), mean absolute percentage error (MAPE), mean absolute error (MAE), and the coefficient of determination (R2) as suggested in Habyarimana et al. [22]. The use of MAPE was justified as this metric allows the comparison of the values predicted from variables measured in different scales. On the other hand, the mean absolute error measures the magnitude but not the direction of the prediction errors; MAE is therefore an accurate representation of the average error and is considered as a better prediction metric in comparison with the root mean square error for dimensioned model assessments for the mean performance error. The symmetric mean average percentage error was implemented to account for the limitations observed in the mean absolute percentage error. SMAPE as well as MAPE average the absolute percentage errors, but in SMAPE, the errors are calculated using a denominator comprising the average of the predicted and observed values. The upper limit of the symmetric mean absolute percentage error is 200%, resulting in a 0–2 range that is suitable for evaluating the accuracy without the confounding effects of extreme values. In addition, the symmetric mean average percentage error corrects for the asymmetry in the computation of the percentage error. In this work, MAE was used to assess the reliability of the models during the cross-validated (CV) training (Fig. 17.4). A repeated CV was run for each model and produced resample vectors of mean absolute errors, each with 50 elements. We observed that the dispersion of the mean absolute errors at the training stage decreased in the order simple linear model > Bayesian generalized linear model > eXtreme Gradient boosting > Bayesian additive regression trees methods. Over the experimental duration evaluated, the simple linear model showed mostly higher prediction errors in the validation set; the coefficient of determination was also weakest in this model (Table 17.1). Overall, the Bayesian additive regression trees method displayed relatively high values of the coefficient of determination and the lowest prediction errors. The best explanatory variables were D.150 and D.165, i.e., the second half of May and the first half of June, respectively (Fig. 17.4). The days 240, 195, 210, and 120 of the year displayed minor effects, while the days 135, 180, and 225 displayed no importance in terms of predicting ability [22].

Fig. 17.4
figure 4

Figure adapted from Habyarimana et al. [22]

From left to right: boxplot for models cross-validation MAE (t ha−1) using fAPAR data. LM, bayesglm, xgbTree, bartMachine, respectively, simple linear model, Bayesian generalized linear model, eXtreme gradient boosting, and Bayesian additive regression trees. Relative importance of day of year (D) on sorghum biomass yields using bartMachine.

Table 17.1 Model performance metrics

In potato, the pilot’s final result is a decision support system (DSS) for potato farmers that can provide data about the overall status of the crop and the potential yield based on EO, weather, and soil parameters. Figure 17.5 represents the concept of a simple (starch) potato DSS.

Fig. 17.5
figure 5

Concept of the decision support system

The DSS involves the following data collection, processing, and visualization technology.

Data Collection: To provide benchmark data for potato crops, five types of data were collected: (1) historical data about crop performance in the past (i.e., emergency date, LAI, greenness, yield development, and actual yield and date of yield); (2) historical data about the field soil (soil texture, soil moisture status, and elevation maps); and (3) actual data about daily weather (temperature, solar radiation, humidity, precipitation, and wind speed); (4) reference values for indexes from literature; and (5) real-time EO data and IoT data (soil moisture status).

Data Processing: Data processing involved three steps: (1) calibration and calculation of a crop growth model, (2) real-time collection and processing of EO data, (3) benchmarking of the values, i.e., indexes resulting from the growth model and from the analysis of EO data. In the first step, the soil, crop, and weather data from field measurements, satellites, weather stations, literature, and other sources were collected, and after pre-processing, stored in a database and were used as input in a crop growth model. In order to benchmark crop performance, the WOFOST crop growth model (FAO) was introduced in the pilot and was calibrated using historical data (2017, 2018) and recent samples. Parallel to the calculation of the growth model, Sentinel-2 data were collected and calculated in real time, providing information about the most recent value of the indexes applied (LAI). The EO data processing involved the following steps: adjustment of the data with cloud mask and cloud-shadow mask, calculation of a-factor for Weighted Difference Vegetation Index (WDVI), calculation of WDVI from spectral data, and calculating LAI for potato fields based on WDVI-LAI correlation data. Finally, in the third step, the model then establishes the benchmark for crop performance: An estimate of the best possible performance under the given set of circumstances.

Data Visualization: The DSS is provided through an online platform, i.e., as data as a service for the farmers, in form of an early warning system that alerts farmers when their attention is needed. The online platform provides crop monitoring and benchmarking services that show the field variation. Sentinel-2 satellite images are very helpful for crop monitoring over a large area. But for use in a DSS, it is more useful to show just the field information and not the complete images.

3.1 Reflection on the Availability and Quality of Data

The Sentinel data proved very useful to extract the LAI information. However, during the growing season, there were quite extensive periods (15–20 days), in which no cloud-free images were available. Also, the cloud-shadow gave sometimes disturbing information. The historical yield data was collected and processed without the spatial location of the sample fields, which made them unusable for correlating it with the historical EO data. Privacy issues raised by the farmers prevented collecting this georeferencing information. The conclusion is that there is a lot of data available, but they are not always with a quality suitable for use. When the product is based on third-party service providers, a solid agreement about the availability is necessary. With more demands for service level agreements (SLA), the price of data-services may go up, making it less interesting to use for farmers. Reflecting on the big data technology (BDT) used in the sorghum pilots allows us to express a word of caution to scientists in the field. The IoT farm telemetry technology was used in year one for preliminary observation, but this technology revealed itself ill-adapted to biomass sorghum as the hardware, particularly the cables, were frequently damaged by rodents.

4 Business Value and Impact

The importance of sorghum as food, feed, and biofuel crop cannot be overemphasized. Biomass sorghum demonstrated higher yields with better energy balance relative to major crops of agroindustrial interest. As dedicated biomass sorghum crops are steadily increasing and precision farming is driving agricultural economies worldwide, harnessing satellite technology is well poised to bring about agricultural advantages, including cutting operational farming costs. The Sentinel-2-derived index describing the fraction of absorbed photosynthetically active radiation and the implementation of machine learning technology modeled in our sorghum pilots satisfactorily crop phenology and the aboveground biomass yields up to six months ahead of harvesting. In addition, we achieved promising key performance indicators as reflected in Table 17.2.

Table 17.2 KPIs of the biomass sorghum yield monitoring trials

This study’s outcomes can serve several purposes, including farmers being able to improve their sorghum biomass business operations through informed decision-making in planning field work, logistics, the supply chains, etc. Policy-makers and extension services will also benefit from the technologies implemented in this work allowing early in-season information on potential biomass availability, which is critical to wider energy planning and avoiding energy-related crises.

In potato, the online platform shows the variability in Leaf Area Index (LAI). The LAI represents the area intercepting the solar radiation for crop growth. The online platform provided the farmers more insight by benchmarking their crop during the growth period with crops in the region, previous growing seasons, etc., and provided actionable information about the in-field variability and areas for inspection, and site-specific management, based on the relative performance of their field compared to the surrounding fields and the relative performance of their field compared to the potential. These new insights help farmers make better decisions for timely and more efficient, location-specific crop treatment. It was this benchmark information which was mostly appreciated by the farmers. The actual added value of the service is hard to tell because there is not really a baseline. The farmers were not used to an online crop monitoring system, so the pilot was much about raising awareness about the big data approach. The farmers appreciated much the field-specific information instead of a general satellite image, which needs to be interpreted by the farmer himself, the alerts when new data is available, avoiding the farmer’s action to go and search for information, even when there is nothing new to find, and crop development benchmark. Farming is a business with a lot of variables, which not all can be controlled by the farmer. Therefore, a well-informed farmer has the advantage to be able to adapt to the circumstances. This benchmark enables farmers to spot problematic fields and areas in the field earlier and to react appropriately to save the crop and yield.

The crop growth model was used for potato yield prediction, which was calibrated with the yield data. The data for 2017 and 2018 was used to train the system and the data for 2019 was used to test the accuracy of the model. The potential crop growth was calculated only taking into account the solar radiation, assuming there were no limitations due to water or fertilizer shortages at any stage, whereas the water-limited crop growth was based on the actual rainfall in the growing season as the first limiting factor (Fig. 17.6).

Fig. 17.6
figure 6

Potential crop production

Footnote 1

In general, the model has under-estimated the yield with water-limited growth and the potential yield compared to the samples for 2019. Due to limited data availability, the algorithm is not sufficiently trained yet for reliable yield predictions. The prediction of the potential yield (dry matter) based on the weather data of the last 10 years shows the relative differences between the years, but largely over-estimates the yield at harvest time. The crop growth model proves its benefit for yield prediction purposes, but the accuracy is too limited yet.

5 How to Guideline for Practice When and How to Use the Technology

Using satellite imageries and supervised machine learning technologies, it allowed us to model biomass sorghum phenology and carry out an early prediction of biomass yields up to six months before harvesting. This pilot combines expertise from Earth observation, ICT, artificial intelligence, and agricultural farming. The Earth observation data were mined to derive the biophysical parameter fAPAR, the agricultural farms provided the information that is critical for modeling farming outcomes, while the artificial intelligence expertise integrated the above information to model the solutions that would later be delivered to stakeholders in the form of advisory services. The equations produced in this pilot can easily be used in sorghum biomass farming businesses. As data science was done, the next big step should be putting the models into production, making them useful for any business. This is the beginning of our model operations life cycle including the following (but not necessarily limited to) key focus areas of machine learning engineering: the data pipeline (the data used to make the features used for model training such as fAPAR, phenology, biomass yields records), model training, model deployment, and model monitoring. At this level, the farmer knows how much he/she will produce early on in-season using only satellite imagery-derived fAPAR. In addition, the phenology stages can be monitored handily by the farmers using Web capable devices. In the real world, the farmer and other stakeholders will benefit from this technology as an advisory (Web) service either in-house or from third party, depending upon the expertise at the beneficiary level.

Like in any crop, potato farming is a business with many variables that not all can be controlled by the farmer. Therefore, a well-informed farmer has the advantage to be able to adapt to the circumstances. Therefore, there is a growing need for information generated over several cropping seasons and locations, which would allow for more reliable predictions. A farmer will be able to anticipate risk based on the big data analytics and subsequently change the management accordingly. Through big data sources and devices, the goals around profitability, efficiency, and cost management will be achievable. The availability of historical potato yield data with location information during the pilot was too limited to give reliable results. For training of the model much more field data is necessary to make the prediction more reliable. Especially the yield data per field is essential field data. For the 2,500 farm members and about 44,000 ha (2017), with an average field size of 10 ha, this would mean that there would be 4,400 fields for collecting yield data every year. For farmers, the analysis provides them insight that would not have been available with only data about their own fields. In this respect, it is very important that farmers share their field data with each other or a trusted party. Privacy issues (and trade secrets) hinder the sharing of the data. A trusted party, like a cooperative, should provide farmers trust that their data will not be misused and thus facilitate the data sharing which will benefit them all.

6 Summary and Conclusions

These pilots were established as a solution to current limitations in crop monitoring in Europe Yield forecasting is based mainly on field surveys, sampling, censuses, and the use of coarser spatial (250–1000 m) resolution satellites (e.g., MODIS, SPOT-VEGETATION), all of which are undependable and/or costly. Our pilots were therefore designed to address these shortcomings. The main challenge in these pilots was being able to use high-resolution satellite images to predict sorghum biomass and potato yields early in the season, and with high precision to avoid stakeholders’ aversion. The obtained results were encouraging. We were able to accurately predict aboveground sorghum biomass yields six months before harvesting with the best prediction times identified as days 150 and 165 of the year, i.e., late May and early June. These results show that crop monitoring can translate into global business without borders. They point on a remarkable opportunity for farmers and farming cooperatives for several business purposes. The models developed in this work can also help the extension services and other policy-makers in strategic planning, including assessing alternative means for energy supply and ways to avoid energy crisis. In the potato pilot, we gained insight about the possibility to apply the technologies provided by big data to smart farming services in order to gain a competitive advantage in terms of possible cost-effective services based on satellite imagery. Extensive field trials are expensive and will not predict yield in normal field conditions. The results from the DataBio project have been useful to speed up the process of improving the growth model on the basis of big data analysis. The approach contributed to better yield prediction based on the actual growing conditions with a limited number of samples or field trials. Once the model is validated through more empirical tests and observations, the processing industry will be able to enhance their sales process based on the yield prediction. Big data sources, like EO and sensor data, provide a continuous flow of data, which will certainly support the development of solutions that support the farmer in the decision process. New business opportunities can be found by implementing the yield prediction model that was tested in the pilot with AVEBE and other potato processing cooperatives, implementing a farmer decision support system, and elaborating on the potato growth model to create new services like variable rate application and irrigation planning.