Inverse meta-modelling to estimate soil available water capacity at high spatial resolution across a farm
- First Online:
- Cite this article as:
- Florin, M.J., McBratney, A.B., Whelan, B.M. et al. Precision Agric (2011) 12: 421. doi:10.1007/s11119-010-9184-3
- 519 Views
Geo-referenced information on crop production that is both spatially- and temporally-dense would be useful for management in precision agriculture (PA). Crop yield monitors provide spatially but not temporally dense information. Crop growth simulation modelling can provide temporal density, but traditionally fail on the spatial issue. The research described was motivated by the challenge of satisfying both the spatial and temporal data needs of PA. The methods presented depart from current crop modelling within PA by introducing meta-modelling in combination with inverse modelling to estimate site-specific soil properties. The soil properties are used to predict spatially- and temporally-dense crop yields. An inverse meta-model was derived from the agricultural production simulator (APSIM) using neural networks to estimate soil available water capacity (AWC) from available yield data. Maps of AWC with a resolution of 10 m were produced across a dryland grain farm in Australia. For certain years and fields, the estimates were useful for yield prediction with APSIM and multiple regression, whereas for others the results were disappointing. The estimates contain ‘implicit information’ about climate interactions with soil, crop and landscape that needs to be identified. Improvement of the meta-model with more AWC scenarios, more years of yield data, inclusion of additional variables and accounting for uncertainty are discussed. We concluded that it is worthwhile to pursue this approach as an efficient way of extracting soil physical information that exists within crop yield maps to create spatially- and temporally-dense datasets.
KeywordsInverse modellingSoil available water capacityMeta-modellingPrecision agricultureCrop growth simulation modelling
One aspect of precision agriculture (PA) is site-specific crop management (SSCM). To match inputs better with soil and crop requirements as they vary in space and time, SSCM involves management of spatial units that are smaller than fields (often using potential management classes). Spatially-dense, geo-referenced information is an integral component of SSCM. Whelan and McBratney (2003) consider crop yield, topography and apparent soil electrical conductivity to be basic information layers for determining potential management classes in Australia.
Analysis of crop yield in any one place across several years illustrates that temporal variation driven by climate is as important as spatial variation on the potential to undertake SSCM (McBratney and Whelan 1999). This suggests that yield maps from many years would provide important information for PA, but a library of these for a field accumulates slowly. Crop growth simulation models can provide a wealth of temporal data, but are traditionally point-based, and model complexity and data requirements limit the number of feasible simulation points across an area of interest. This produces a mismatch between the spatial resolution of outputs from crop growth simulation models and the spatial resolution required by PA. The challenge is, therefore, to develop methods that simultaneously satisfy spatial and temporal data requirements with crop growth simulation models.
Attempts to address this challenge within the PA literature consist of running crop growth simulation models at individual points on a grid spanning the study area (Link et al. 2006; Braga and Jones 2004; Timlin et al. 2001). These studies acknowledge the intensity of data required for input to their models and consequently take advantage of readily available high-resolution yield information. Inverse use of crop growth simulation models (inverse modelling) has proved to be a useful approach that capitalizes on this yield information to estimate soil properties site-specifically. This type of work is akin to model parameterization or model calibration where a range of estimation procedures can be used to search multivariate space resulting in optimal choices of model parameters.
The literature documents a variety of optimization approaches. Soil water properties are included in all of the examples. Link et al. (2006) calibrated the CERES-Maize crop growth simulation model by varying potential rooting depth and available soil water using simulated annealing. Yield data from 1998 to 2002 were used for calibration across a single field divided into 30 grid cells. It was reported that for this field the calibrated model explained 60% of yield variability in each grid cell over the same 5 years. Braga and Jones (2004) also used simulated annealing to optimize several soil water properties and a root growth factor. Timlin et al. (2001) used a genetic algorithm (an algorithm that attempts to emulate evolution) for the optimization of water holding capacity (WHC) across a single field for 2 separate years.
Irmak et al. (2001) described an innovative and exhaustive search technique to estimate five spatially variable soil properties that involved creating a database with different combinations of these five properties (74 536 model runs). Four different searching scenarios were used to find optimum estimates for each grid. This approach was reported to be computationally efficient, and the resulting yield predictions explained as much as 90% of the variation in soybean yield. Similarly, Morgan et al. (2003) created ‘look-up’ tables containing yield values as a function of plant available water for several years.
These studies demonstrate the use of high-resolution yield data in conjunction with crop growth simulation models for estimating high-resolution soil data and subsequently for predicting spatial and temporal variation in yield. Nevertheless, each of these studies focus on single fields and the next step for PA is to extend the spatial cover to an entire farm. This dramatically increases the number of simulation points to thousands, and at the same time raises an additional methodological question regarding the appropriate level of model complexity.
Meta-modelling is a statistical technique that approximates complex simulation models by transforming model inputs into model outputs (Kleijnen and Sargent 2000). Meta-models are used for a variety of purposes such as gaining insight into critical relationships within simulation models, validating simulation models and reducing their size (Ruben and van Ruiven 2001). In the context of this work, a clear advantage of meta-modelling is that complex models can be simplified into their most important relationships thereby reducing input parameter and computer processing requirements, and increasing their potential for use at a higher spatial resolution.
The methods presented in this paper depart from the current state of crop modelling within the PA literature by introducing meta-modelling in combination with inverse modelling to estimate site-specific soil properties that can then be used in crop growth simulation models. In combination, inverse- and meta-modelling should provide a novel method towards better matching of potential outputs from crop growth simulation modelling with the requirements for PA across whole farms. To achieve this goal this research targets three distinct aims: (1) to create an inverse meta-model from the agricultural production simulation model (APSIM) that can estimate soil available water capacity (AWC) from crop yield; (2) to generate AWC maps across a farm and (3) to validate the inversely modelled AWC maps for high-resolution yield prediction. We demonstrate this approach for a 3000 ha dryland grain farm in South Australia.
Materials and methods
The farm under study, ‘BrookPark’, is 200 km north of Adelaide near the town of Crystal Brook in South Australia. The climate is characterized by a mean daily maximum temperature of 24.4°C, a mean daily minimum temperature of 12.6°C and a mean annual rainfall of 345 mm that falls during the winter. Dominant soil types are Planosols, Chernozems and Durosols (FAO 1998). The most important crops grown are wheat and barley.
APSIM is a point-based crop growth simulation model that runs on a daily time-step. It has been used widely, particularly in Australia, and it features in a range of publications (e.g. Asseng et al. 1998; de Voil et al. 2006; Wong and Asseng 2006).
The model incorporates a number of modules that can be added and removed with relative ease. For this study three modules that describe crop growth, soil hydraulic behaviour and management decisions will be outlined further. Numerous other modules are available and explanations can be found on the APSIM website (http://www.apsim.info/wiki/).
The module describing soil hydraulic behaviour in APSIM is SOILWAT. This module describes runoff, evaporation, saturated flow, unsaturated flow and solute movement in one dimension. SOILWAT is a successor of CERES and PERFECT (Godwin and Jones 1991; Littleboy et al. 1999 respectively). Infiltration and runoff are partitioned using the curve number method (Mishra and Singh 2003). Water movement within the profile is described using a cascading water balance model. Saturated and unsaturated water flows are described with separate algorithms. For saturated flow the parameter SWCon is used, which defines the fraction of water above the drained upper limit (DUL) (water content at −0.1 bar) that will flow to the soil layer below. For unsaturated flow, two parameters describing diffusivity are used. Evaporation is treated in two stages. The first depends on atmospheric demand (assuming that the soil profile will meet demand). The second stage is limited by soil moisture and is a function of time from the end of first stage evaporation. The parameter U defines the amount of cumulative evaporation to occur until atmospheric demand exceeds soil moisture supply. The parameter Cona describes the rate of second stage evaporation as a function of the square root of time. It is also necessary to indicate initial moisture conditions. This module interacts with climate, crop growth and management modules.
APSIM contains a generic crop module (GCROP) that describes biological processes common to many crop species (Wang et al. 2002). GCROP simulates seven processes that occur in a daily loop; these are transpiration, phenology, biomass accumulation, leaf area development, senescence, crop N and plant death. These processes interact with climate, management and soil properties. APSIM approaches root growth and water uptake as an extraction front depth meaning the depth of water depletion rather than actual rooting depth is modelled (Wang and Smith 2004). As a result the root water uptake factor (kl) includes root length density (l) and a diffusion constant (k). This factor and a root advancement factor (xf) are defined for each soil layer.
The manager module allows the user to make agronomic decisions relevant to growing a crop. Inputs include cultivar variety, sowing date, sowing depth, row spacing, plant density, fertilizer and irrigation amounts. These management decisions have an effect on crop development to differing degrees. The model is particularly sensitive to cultivar choice. Cultivars are distinguished in terms of photoperiod sensitivity and vernalization sensitivity, both of which affect the rate at which a crop matures.
It is assumed that water is the most important limiting factor in the dryland grain farming system examined in this study. This is supported to a substantial degree in the literature; for example, Irmak et al. (2002) showed correlations between root-zone plant available water and soybean yield across a field in Iowa, USA. A practicality of this assumption is that hydraulic properties were the only model inputs considered in detail. Consequently, other parameters that potentially affect crop yield were held constant both spatially and temporally. The hydraulic property considered was AWC, which is defined as the difference between DUL (water content at −0.1 bar) and lower limit (LL) (water content at −15 bar). Previous studies have derived linear relationships between wheat yields simulated by APSIM and AWC of the topsoil (Wong and Asseng 2006). These authors demonstrated the varying importance of AWC depending on interactions with rainfall, initial moisture content and nitrogen application. The AWC was considered to be the same as plant available water capacity (PAWC) (or extractable water). This was justified by the fact that wheat is the only crop considered in this study.
Soil, crop and landscape data
Soil and climate information were the two main data types of data required to create an inverse meta-model. The envisaged domain of applicability for the meta-model is all the possible climate and AWC scenarios that might occur across the study site. This gives meaning to the requirement for a ‘representative’ collection of AWC and climate data for the derivation of a meta-model.
A soil survey across the farm undertaken prior to this research resulted in 140 bulked soil samples (0–30 and 60–90 cm from 70 locations across the farm). A random stratified soil sampling scheme was designed to ensure that the main soil types were sampled. Seven strata were delineated using a combination of elevation, apparent electrical conductivity and gamma radiometrics data (Florin et al. 2005). The particle size analysis (hydrometer method) and organic carbon (Walkley Black) measurements from these samples were used to predict AWC using PTFs.
Bulk density (BD) was first predicted with a PTF that includes total sand%, organic carbon and depth (Tranter et al. 2007). Second, moisture content at saturation (SAT), DUL, LL and air dry moisture content were predicted from the BD, total sand%, silt%, clay% and total carbon content using a neural-network PTF trained with Australian data (Minasny and McBratney 2002). The AWC was calculated by subtracting the LL from the DUL.
Next, LHS (McKay et al. 1979) was used to generate AWC and LL for 1000 hypothetical soil profiles. Latin hypercube sampling is an efficient method for sampling a multi-dimensional statistical space and is often used for uncertainty analysis (e.g. Post et al. 2008). Again each profile is characterized at two depths, 0–30 and 60–90 cm. The minimum and maximum predicted AWC0–30 cm, AWC60–90 cm, LL0–30 cm and LL60–90 cm for the 70 sampled points and correlations between these properties were used to guide this generation of data. The outcome was 1000 equally probable combinations of these four soil properties.
Daily rainfall and radiation data for the study site were obtained from the SILO Data Drill (Jeffrey et al. 2001). This is a database of continuous daily climate information constructed from the Australian Bureau of Meteorology ground-based observations. Twenty years of weather data were considered adequate to include a number of dry, wet, hot, cold and average years. Twenty consecutive years including 1980 and 1999 were selected.
Spatially referenced yield information was required to apply the inverse meta-model. Eight years of data from wheat yield monitoring with a positional accuracy of 0.1–0.2 m across ‘BrookPark’ between 1999 and 2006 were available. As a result of crop rotation, between 2 and 4 years of wheat data were available for individual fields. Four years of yield data were considered a minimum requirement for this work. A 10-m grid was generated for the farm that excluded the farmhouse, water bodies and natural vegetation. Raw yield data were inspected for distributional outliers within individual fields and erroneous values were removed. Detailed description of this ‘cleaning’ process is in Florin (2008). Next, the wheat yield data were predicted onto the grid using local variograms and block kriging (Walter et al. 2001) with the software Vesper (Minasny et al. 2002).
Creating the inverse meta-model
APSIM parameters and variables that change with depth and describe water movement, plant root function and soil chemical properties
APSIM parameters and variables that describe soil evaporation, runoff, unsaturated flow and management decisions
APSIM parameter or variable
U (to describe first stage evaporation dependent on atmospheric demand)
Cona (to describe second stage evaporation limited by soil moisture)
Diffusivity constant and slope (to describe unsaturated water flow as a function of the average water content between two layers)
88 and 35
Curve number 2 (to specify the runoff response curve for average antecedent rainfall conditions)
Curve number reduction (to define the reduction in runoff due to crop and residue cover)
Sowing window start date
Sowing window end date
Sowing density (plants m−2)
Sowing depth (mm)
Row spacing (mm)
The APSIM-predicted yield for the 20 years was plotted against total AWC (summed over the rooting depth of the profile in mm) and the relationships were observed. Linear stepwise regression was used to determine which climate variables (monthly rainfall and radiation, including both pre-season and in-season months) together with AWC best predicted the APSIM yield output.
Results from the stepwise linear regression determined which climate variables were entered into the meta-modelling process, together with APSIM-yield, to predict AWC and LL (at two depths). A neural network model was the proposed meta-model because neural networks can identify non-linear relationships (Tamari et al. 1996). Neural networks with two, three and four nodes were investigated. Two thirds of the data were randomly selected as a subset for data training and the remaining third was used to validate the neural network’s predictive ability. The best fitting models were chosen with reference to a cross-validated (CV) R2.
Applying and validating the inverse meta-model
The neural network model was applied across the farm by replacing APSIM-yield with yield-monitor yield. As a result, estimates of AWC and LL for two depths were made at every point across the farm where yield monitor data existed. ‘Best’ estimates of AWC were calculated by averaging estimates from as many years as there were yield data available for each field.
A three-step validation process was used. First, the ‘best’ estimates were compared to the hydraulic properties estimated by PTFs at the sampling points that coincided with the locations that were inversely modelled. The ‘best’ estimate of AWC was plotted against PTF estimated AWC.
Secondly, a year of yield data was excluded from the ‘best’ estimate of AWC. This year was used to validate the inversely modelled hydraulic properties from an APSIM yield prediction potential. One hundred points were chosen randomly from the sample fields and estimates of LL, DUL and SAT were used as input into APSIM to predict yield for the excluded year. Modelling scenarios using the correct sowing date and cultivars were input into APSIM. To obtain an initial moisture content for the simulations, APSIM was run continuously for 3 years prior to the validation year. This validation process was undertaken for 2 years across three example fields on the farm. Plots of yield monitor yield and the APSIM predictions were compared by fitting a linear model to these data.
Third, estimates of LL, DUL and SAT (those used for validation of APSIM) were used to create simple linear yield prediction models across the same fields. Stepwise linear regression was used to predict yield using the hydraulic properties and monthly rainfall data. The R2 values for these models were recorded.
Hydraulic properties generated using LHS
Range and mean values for the hydraulic properties generated for ‘BrookPark’ using LHS and PTFs; also a comparison with a similar soil characterized within the APSIM soil database
Lower limit (0–30 cm)
Lower limit (60–90 cm)
Drained upper limit (0–30 cm)
Drained upper limit (60–90 cm)
Available water capacity (0–30 cm)
Available water capacity (60–90 cm)
Similar soil from APSIM soil database
The inverse APSIM meta-model
The climate variables that were identified as important predictors of APSIM-yield using stepwise linear regression were rainfall observations for June, July, August and November. Consequently, rainfall for these months and APSIM-yield formed the predictor variables in the neural network model. The model with three nodes proved slightly better at approximating APSIM than that with two nodes (CV R2 values were 0.76 and 0.74, respectively). A visual comparison of Fig. 3a and b shows how well the inverse meta-model approximates the relationship between crop yield and profile AWC that is contained within APSIM.
Applying the inverse meta-model across a farm
Validating the ‘best’ estimates of AWC
Comparing ‘best’ estimates of AWC with PTF-derived AWC
Estimated AWC for yield prediction with APSIM and with linear regression models
APSIM and linear yield predictions: the predictors (for the linear models) and the R2 values associated with the linear model
APSIM prediction year—R2
Yield data used for hydraulic property estimationa
Linear prediction year—R2
Table 4 also gives the linear model parameters that were used to predict wheat yield across the three individual fields. The differences in performance between years and fields are consistent with the APSIM predictions. Again, predictions for ‘Hill’ and ‘Quarry’ show greater promise than for ‘Randals’. On the whole, the linear regression models based on the estimated AWC values appear to out-perform APSIM. Reasons for these results are discussed in the following sections of this paper.
The meta-model for approximating APSIM
The CV R2 value obtained from the neural network analysis suggests that the meta-model identifies about 70% of the variation in yield predictions that APSIM produces. There are several possible reasons why 30% of the variation was missed by the meta-model. One possibility is that a sample of 1000 different soil profiles does not describe the whole population. Perhaps, more combinations of hydraulic properties could be simulated with APSIM and be used to derive the meta-model. A second consideration is that the temporal resolution of the climate data was decreased from daily to monthly. This may have reduced the effectiveness of reproducing the APSIM model with a neural network.
One benefit from this type of meta-modelling is that information about important variables within the model is gained. For example, the rainfall variables selected for the meta-model confirm the importance of in-season rainfall for winter cropping across ‘BrookPark’. Furthermore, in terms of reducing computer processing requirements, the meta-model shows promise. In this respect, the neural network model is dramatically more efficient than APSIM.
The realizations of different AWC estimates for different years (Fig. 4) are consistent with some previous reports in the literature, e.g. Timlin et al. (2001). This explains why ‘best’ estimates of AWC were obtained by averaging across as many years as were available. However, this approach requires that temporal variability is considered. Some previous spatio-temporal yield analysis (Florin et al. 2009) suggested that certain fields appear temporally stable, in which case this averaging across years to estimate AWC would be useful. However, it might not be useful in the case of a temporally unstable field where no correlation in yield patterns between years can be detected. Information about different degrees of temporal stability between fields and years might also be gained from this analysis. From a PA management perspective, information about temporally stable yield-AWC relationships might be more valuable than yield alone.
The lack of a significant relationship between inversely estimated AWC and PTF-estimated AWC (Fig. 6) is of concern. This result might indicate that the inverse meta-model has reproduced yield values at the expense of valid AWC values. This phenomenon of over-fitting was discussed by Braga and Jones (2004). These authors demonstrated that prediction of soil water content using yield as the objective function variable leads to unreliable soil water predictions compared to using soil water content as the objective function variable (i.e. a variable that is closely related to the properties being estimated).
The differences in this relationship between the years also suggest that the estimated AWCs contain some information about climatic interactions with the crop, soil and landscape. Given that the only variables considered in the meta-model were AWC and rainfall, it is likely that other crop, soil and landscape variables are affecting the estimated AWC values. Some possible spatial, temporal and spatio-temporal soil, crop or landscape properties that may be interacting with climate variables are: root growth, soil depth, soil structure, soil chemical properties, pests, diseases and topography. Further work is required to understand which ‘implicit information’ has been incorporated into the inversely estimated AWCs. For example, information on the spatial variation in soil depth and plant rooting depth would be useful. The assumption that root extraction front depths (determined by the ‘kl factor’) do not vary spatially is a possible source of ‘implicit information’ in the AWC estimates. The relationship between AWC and yield will change if the extraction front depth of crop roots varies because this soil and crop property affects the rate at which crop roots can take up water. In addition, crop roots might not always reach the maximum rooting depth every year, thereby introducing an interaction with climate.
Further, APSIM yield predictions are sensitive to management variables such as sowing dates, crop varieties and crop rotations. In reality crop varieties and sowing dates are different between fields and between years. The meta-models were derived from APSIM simulations where the crop (type and variety) and sowing date were held constant across all points and years. These assumptions provide further entry points for the inclusion of ‘implicit information’ in the inversely estimated AWCs.
Finally, this method does not account explicitly for the uncertainty propagated through the process. Sources of uncertainty within the data arise from the PTFs, interpolated crop yields and climate data. Inclusion of an uncertainty factor for the AWC values would be a useful addition to this method.
AWC estimates for predicting yield
Yield prediction using APSIM and by linear regression lead to contrasting results between fields and between years (Fig. 7; Table 4). On the whole the outcome is disappointing; however, a few exceptions suggest some promise for this methodology. Therefore, it is worthwhile discussing differences between fields, years and the results generally.
Results were most promising for ‘Hill’ compared with the other fields. One explanation for this may be that the most dramatic changes in elevation occur within (aptly named) ‘Hill’ and as a result water plays the most important role in the determination of yield. ‘Hill’ includes a relatively flat hilltop in the west surrounded by steeper slopes to the north, east and south which again give way to relatively level sections. The estimated AWC values for ‘Hill’ follow the slope pattern, with larger values on the flatter sections of the field. The result suggests that this method of AWC estimation and yield prediction may be most suitable for particular combinations of terrain and soil types.
It is clear that the quality of the results is specific to the year. For example, results for ‘Quarry’ (predicting yield using both APSIM and linear regression) appear reasonable for 1999 yet poor for 2002 (Table 4), which might be because 2002 was a particularly dry year. As a result, water was an equally limiting factor across the whole field whereas in 1999 the variation in AWC results in spatially variable yield. These differences between years indicates that 3 years of crop yield data to estimate AWC and 1-year’s data for validation are not adequate. Temporally unstable yield patterns and processes certainly demand more data through time.
It should also be considered whether APSIM is too complex a model and if AWC alone is adequate for robust simulation of spatial and temporal variation in yield. A logical point for improvement would be to parameterize APSIM more thoroughly. This raises questions surrounding model complexity versus model simplicity. One could ask: What degree of model complexity is optimum? Reynolds and Acock (1985) attempted to answer to this question by separating the total error of model predictions into error due to simplification of the system and error due to uncertainty in parameter estimation. This partition of total error enables an optimal level of model complexity to be identified. Within the literature there are several examples where simple models have proved to be more reliable than more complex models. For example, Bell and Fischer (1994) found that a regression model was superior to CERES for potential wheat yield prediction in Mexico. Some further examples are outlined by Sinclair and Seligman (1996).
Potential errors related to assumptions about interactions between soil, water and crop that were raised in relation to estimates of AWC also have some relevance to the issue of model complexity. This methodology has focused explicitly on AWC with the assumption that this hydraulic property controls soil water supply to the crop, which in turn determines yield response. There is scope to challenge this in terms of other sources or limitations to water supply and other important factors limiting yield. An intuitive source or limitation of water might be lateral surface and subsurface flow. Previous studies have mentioned the inability of point-based crop growth simulation models to account for lateral movement of water (e.g. Ferreyra et al. 2006; Timlin et al. 2001; Fraisse et al. 2001).
Finally, in the discussion of model complexity, it is useful to compare the linear yield predictions with those from APSIM. The differences in performance between fields and years are consistent between approaches. With respect to predictive capabilities, the linear models were either similar to or better than the APSIM predictions. It is apparent that the value of simple linear models, as compared with APSIM, is the greater ease of prediction at a high resolution across fields and ultimately across farms.
This methodological contribution to the challenge of using crop growth simulation models to identify variation in spatial and temporal crop yield adequately shows promise and raises many avenues for further work.
Inverse modelling is an efficient way of extracting site-specific soil properties from readily available crop yield information. Furthermore, meta-modelling is useful in terms of simplifying a complex model and addressing the challenge to computer power that simulation modelling brings to PA. The combination of inverse- and meta-modelling is a useful concept to be pursued.
However, future research to address the ability of the meta-model to approximate APSIM closely is imperative. The conceptual model underpinning this research requires modification. The AWC and climate variables alone do not adequately explain all of the spatio-temporal variation in crop yield across a farm. Incorporation of additional variables into the meta-model is necessary to provide a better understanding of how climate interacts with the soil, crop and landscape. More years of yield data are required to improve model building in the light of spatio-temporal variation. Finally, inclusion of uncertainty estimates within this approach are necessary to create confidence in the validity of the model and would be useful to explore the trade-off between model complexity and simplicity.
The authors are grateful to the Grains Research and Development Corporation, Australia for funding this research and to Malcom Sargent for providing access to the study site and the necessary data. Thank you to the reviewers and editor for their careful and useful comments.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.