Introduction

A FAO report of the Commission on Genetic Resources for Food and Agriculture (CGRFA 13/11) clearly identified “spatial analysis to identify varieties likely to have climate-adapted traits as an aid to plant breeding” as one of the eight priorities in multidisciplinary research. In addition, Earth Observation data (EO) has proven the capacity to provide measurements of key environmental conditions to predict the production of the healthy crops and potential disease threats.

Such agricultural modelling and simulations need access to elaborated geolocated genetic-trait information as well as complementary data sources coming from geospatial data providers and geospatial data hubs, e.g. soil moisture data, climate data. The BBSRC funded project GRASPgfs “Geospatial Resource for Agricultural Species, pests and Pathogens with workflow integrated modelling to support Global Food Security” initiated the design and prototype of an e-infrastructure linking together (i) a geo-germplasm database, (ii) a dynamic metadata catalog and (iii) a workflow modelling tool to enable optimal usage of the geo-genetic-trait information. This is desired in various types of simulations for crop trait variation, forecasts under climate forcing scenarios and crop disease epidemics.

This initiative and the prototype of the e-infrastructure based on open standards is described.

Background

Access to a wide range of information, from rigorous scientific results to ‘hear-say’ farmers’ knowledge is becoming critical to be able to target efforts in food security planning at community or at country levels. Also, designing scientific and intervention strategies within changing climates and markets is a fundamental challenge. The Plant Genetic Resources for Food and Agriculture part of the FAO’s Global Plan of Action for plant genetic resources has been established as a world-wide dynamic mechanism using WIEWSFootnote 1 to foster information exchange among members (more than 150 countries) and as an instrument for periodic assessment.Footnote 2 The enhanced use of this type of resource and other germplasm banks with additional contextual information is nonetheless still highly desired. In a recent FAO reportFootnote 3 from the Commission on Genetic Resources for Food and Agriculture clearly identified “spatial analysis to identify varieties likely to have climate-adapted traits as an aid to plant breeding” as one of the eight priorities in multidisciplinary research.

Technologies for the collection and dissemination of geolocated information, using broad-band mobile communications, sensor platforms, spatial search and pervasive computing, are fundamentally changing the access to and use of location-based data in agriculture [20]. However, the necessary cross-disciplinary research needed to transform raw data and information into useful intelligence and knowledge to improve the planet’s environmental, economic and societal well-being is still constrained by disciplinary and organizational silos and legacy concepts. Even if this was already acknowledged in the 1940’s, the geo-location of genetic data in ecology and agriculture for further spatial analysis: spatial genetic, is still a recent concept [17]. Within the “from farm to fork” chain, various heterogeneous data including genetic-trait information are to be considered as part of the computational modelling for prevision and forecast; most of them have a geo-location or spatial component or would be required to have one to be used from a plethora of model applications of various complexities. These are either biophysical, agro-economically based and more mechanistic or deterministic orientated [6, 13, 14, 19, 30, 36] or more stochastic orientated [2, 5, 12, 23, 42, 48, 52], more rule-based including agent-based orientated [32, 45, 49] but all contain a combination of those types. Therefore, a cross-disciplinary expertise driven from geospatial sciences methodologies appeared to be needed to develop an integrating framework for relevant data sources, in order to allow knowledge gathering across all subjects relevant to Food Security.

The objective of the geospatial integration e-infrastructure framework that GRASPgfs initiative has proposed to establish is to facilitate the use and reuse of trait data in crop, animal and microbial species of agricultural importance. The initiative relies on the position that the geospatial realm, as an entry point and end-point of this e-infrastructure, enables researchers but also stakeholders and policy makers to ground their development strategies but also to elaborate more easily alternatives [3, 6, 26, 30, 31, 36, 40]. Not only this binding is conceptual and interdisciplinary but also it has conrete technical impacts on the e-infrastructure utilizing Open Geosptial Consortium (OGC) standards. Linked with dynamic climate records within the framework capabilities on scientific workflow modelling, this would allow addressing food security issues for sustainable agriculture by enabling predictive modelling with identification and characterization of new sources of germplasm.

The paper describes the initial overall architecture design and first results on establishing the eGRASP platform and e-infrastructure. Section 2 concentrates on expressing the method in defining the approach and initial challenges for long-term objectives; section 3 highlights the bases in designing the eGRASP solution and architecture along with illustrative examples of initial results on using such approach.

Requirements

Targeting global food security issues and sustainable agriculture, related to crop selection and climate change needs the development of models integrating a range of disciplines such as genetic, agro-ecological modelling and land-climate forecasts. Geospatial science can be the mediating component of an e-infrastructure enabling data and processing to be retrieved, integrated and made available within a geospatial scientific workflow modelling interface with uncertainty management.

The main objective of the geospatial integration framework wihin GRASPgfs was to facilitate the use and reuse of known (and new) sources of crop traits together with dynamic climate records within the framework capabilities of workflow modelling addressing food security issues concerning sustainable agriculture. Describing agricultural species germplasm for genotype characteristics with the data ordered by geospatial origin, the higher-level descriptor being “agricultural trait” has been put forward as enabling new way of expressing and analysing trait variations [34, 41].

Highlighting model complexity for integrated assessments required for global food security, a recent review of crop models under climate forcing pointed out the need of generic solution enabling or facilitating the combination of various models together [14]. Figure 1 encapsulates the challenges of facilitating the elaboration of such analysis via an integrated workflow modelling. This workflow grasping the “big picture” and illustrating the cross-disciplinary expertise required is conceptual and each data entry or processing task may in fact illustrate the use of complex data structures and sub-workflows themselves. The framework to develop would need to facilitate the integration of the driving key conceptual aspects of this model. Following this conceptual approach and being able to re-use data and models available to instantiate such model would enable new perspectives on crop genetic diversity by (i) identifying new sources of trait variation, (ii) geolocating suitable germplasm, (iii) planning breeding objectives with the greatest likely impact from the added information of local market and farmer’s knowledge [8, 34], and (iv) evaluating the effects of climate change scenario.

Fig. 1
figure 1

Workflow design of a generic model to be used in food security and sustainability: the Genetic Agro-ecological Sustainability Proposal model (BPMN diagram)

Integrating heterogeneous datasets coming from various sources within a generic platform means being able to access and understand the semantics of these data and processes in order to allow the platform to present the data, analyse them, or instantiate a workflow model using them [9, 25]. Easily integrating various data and processing resources has considerable advantages in terms of rapid development of models and their execution but gives less control on the quality of the results as various uncertainties coexisting in the components of the workflow model. Therefore, bounding with uncertainty assessment the outcomes of the models should also be the aim of the eGRASP platform to allow better decision-making. Specific capacities are also needed to integrate information such as genetic-trait encoding and ontology binding with disparate germplasm data sources. Pests, pathogens and weeds are encompassed at the crop information level and in the process themselves as they are often impacting from interacting with the environmental conditions.

Designing, developping and implementing

The GRASPgfs has therefore focused on designing and implementing a flexible, interoperable platform based on open source softwareFootnote 4 compliant with GEOSSFootnote 5 using OGCFootnote 6 standards and services for data and processing capabilities. From delivering a flexible, integrative and sharing eGRASP web platform based on openess, the objectives of enabling researchers in crop modelling, agro-ecological modelling either as developer of new models or evaluating agriculture strategies (agro-ecomic modelling), to seamlessly re-use existing models and specific data such genetic-trait information will be achieved. For efficiency and controls on the quality in terms of uncertainty and variability of the outcomes, the design of the platform allowed functionalities to easily browse and visualise metadata as well as has to geo-computationally evaluate workflows output uncertainties [15, 25, 28]. Spatial analysis of the spatial variations either of the predicted outcomes and their uncertainties were included in the design to be part of the platform as well. That way the modelling part and of the decision making part are interlinked, allowing more flexibility and adaptability. The approach and the concept of the eGRASP platform has been the result of multidisciplinary exchanges leading to a real transdisciplinary vision [4, 21, 38] that is highlighted in the next section.

Emergence of a transdisciplinary vision

Whilst building up a core collaboration on this topic from a range of disciplines (within environmental and human geography, crop science, geospatial information, and computing science) at the University of Nottingham by meeting regularly and having small funding for a few summer internships in 2010, the common vision expressed in Fig. 1 started to emerge. Later on, thanks to a 18 months pump prime funding from the BBSRC the research work could start. The workflow of Fig. 1 encapsulates the vision put into the design of the eGRASP platform as much as it is a template of potential modelling scenarios envisioning the various components as data and processes needed to consider fulfilling our objectives for GRASPgfs. If at first it may have seemed that the geospatial sciences brought tools enabling this research within a cross-disciplinary perspective, it transformed rapidly into acting as a media of a more holistic integrated approach [16], which then expressed itself in challenging its specific developments within a context beyond the disciplines involved. In addition to providing more opportunities for expanding the capabilities and applications looked for in the first place, this advancement also created new avenues for interdisciplinary research and practices in the use of GIS in agriculture research.

Beyond the global concept and concepts encapsulated in it, Fig. 1 is a truly transverse vision that not only put each specialist of a sub-model within a contextual flow but also enriches the geospatial e-infrastructure modelling framework. This resulted from various flow diagrams of conceptual information into a technical and standardised representation using a cross-disciplinary encoding standard, the BPMN standard (Business Process Modelling Notation from the OMG standard organisation). As far as the cross-disciplinary concerns, Fig. 1 as a BPMN representation is also a scientific geo-computational model seen from a meta-level description that can be linked to a workflow engine enabling its computational execution once instantiated (Fig. 2).

Fig. 2
figure 2

Use case model (UML) for the GRASPgfs platform

In order to instantiate such models (entire Fig. 1 or sub-models encapsulated) the design of the eGRASP platform is based on the Use Case model in Fig. 2, which translates the requirements exposed earlier. In this figure only general use cases are presented with different colours to express the different domains or disciplines concerned: the green use cases reflect the crop genetic with genetic-trait information aspects, the yellow use cases concern geospatial science with visualisation and selection of environmental constraints, the bleu use cases are to do with geocomputational modelling and scientific workflow composition and evaluation, and the pink use cases concerns crop epidemiology with the risk factors associate with the crop modelling including pests and disease risks from pathogens information.

Like UMLFootnote 7 (Unified Modelling Language) particularly using class diagrams for object modelling and use case diagrams such as in Fig. 2, has been enabling cross-disciplinary exchanges from data modelling [22], the BPMN language establishes a bridge between the conceptual integrated modelling towards the effective execution of the models [44]. Facilitating the composition of such workflows using existing resources is paramount [11].

Crop modelling complexity

Well-known crop modelling approaches such as APSIMFootnote 8 [19], AquaCropFootnote 9 are considered here as expressing or being a sub|-model of the “trait variation forecast integration”. The purpose of the GRASPgfs is to re-use directly these established models within a flexible platform; they can be wrapped into OGC web processing services (WPS) and made available for the platform as such [10, 35] or via a brokering system [7, 39]. When the models can be broken down into sub-components, if required by the crop-trait variation scenario, this can be made available to the processing service. When possible the interaction of these models can be complex to set up and to combine, the BPMN editor is seen as a simplification, particularly when a few models are to be combined. Ultimately it brings interoperability in interfacing heterogeneous data and processing models that do not necessarily impose standardisation for each of them. This does not preclude of course a good understanding of the models used, but the goal of the eGRASP platform is to hide this complexity and to focus on the ability to re-use the resources within a more macro scenario for global food security. Models and types of models identified in introduction can be potentially re-used here and the platform objectives are also to facilitate their encapsulation as WPS services (Fig. 3).

Fig. 3
figure 3

The GeoGermPlasmDB schema, an evolution from CropstoreDB

When looking at trait variation with genotypic information, the crop modelling may start with building up a selection for trait-variation linked to genotype linkage and environment interaction. This corresponds to the “Trait Hypothesis Construction” process task in the generic workflow. To this end, it is described in Fig. 2 among the other capabilities of the eGRASP platform; the functionalities associated to this genetic-trait selection, before performing the crop modeling for example, are the green part of the use case model. To achieve this aspect the platform is reusing the CropStoreDBFootnote 10 database, called GeoGermplasmDB in the architecture design (Fig. 4). The GeoGermplasmDB has an extended schema in order to record the geometry associated to few tables using the OGC standard (Fig. 3) and also to be able to encode the pest and pathogens characteristics along with model parameters associated to the crop varieties as stipulated in the requirements. The GeogermplasmDB allows users to record genotype information and trait information with geo-location depending on the origins of the seeds and the trial sites and implements the component “Bio-genetic Knowledge” component of the platform. Geospatial variations associated with genetic variations can lead to breed selection programs [18, 33]. An example using the underutilized crop of the Bambara groundnut (Vigna subterranea) is detailed in the example section (Fig. 5).

Fig. 4
figure 4

The eGRASP platform architecture design as sub-module of the CropBASE system

Fig. 5
figure 5

A landscape genetic analysis workflow on bambara groundnut (Vigna subterranea) landraces

The other aspects of complexity considered here come on one hand from the interaction of farmer’s knowledge with respect to the land races linked to their strategies to make a living [24, 32] and on the other hand to the climate forcing interacting with the current land conditions. Due mostly to aggregation and topological properties when modelling these models, the spatial complexity can now be also introduced [26, 47, 51]. Specific models for climate forcing more often mechanistic can be used to predict future ground conditions but are usually integrated with interaction from general land use categories [43, 50].

The eGRASP capacity

The approach pursued in GRASPgfs and for the design of the eGRASP platform has been as much top-down as bottom-up from leading disciplines such crop genetics, geospatial information modelling and crop modelling. Basically besides strong top-down emphasis on geolocated genetic-trait database (the GeogermplasmDB), and on a workflow modelling (based on OGC WPS and BPMN standard), case studies analysis were used to gather requirements. Mixing these two aspects as well as envisaging direct use of the top-down elements into the bottom-up approach, the UML use case diagram of the required functionalities of the eGRASP platform was obtained (Fig. 2). From the adoption of the use case diagram, disciplinary research took place to refine the case studies with focus on use case matching and potential new developments whilst the computing architecture was design to fit these requirements.

The architecture designed for the eGRASP platform to enable global spatial data infrastructure functionalities, as well as the ones described above, is given in Fig. 4. This viewpoint gives an overview of the different components without detailing on how specific analytical functionalities are implemented. The objective for this pump-prime funding was to establish the design and to demonstrate a prototype. Therefore, specific functionalities are still to be developed; further funding is required to pursue these efforts. In Fig. 4, front-end services with their clients are represented as square boxes and back-end services often associated with specific information (e.g., databases, repositories) are represented as cylindrical boxes. The eGRASP system appears in this design as sub-architecture of the CropBASEFootnote 11 initiative led by CFF (Crops For the Future), a wiki-knowledge sharing platform integrating multiple CFF programs also in development.

For the sake of demonstrating the architecture the set of services implemented and facilities currently available,Footnote 12 but the platform as well as the CropBASE portal are not yet operational. The OGC services, for example using WPS and WFS, can also be used directly in other clients such as in QGIS (from the OSGeoFootnote 13 stack), currently:

  • the Geovisualisation is supported from QGIS and from the WMS client provided from the Geoserver serving the GeogermplasmDB

  • the Discovery via Metadata Catalogue service (OGC CSW) is supported by GeoNetwork12, queries on GEOSS registered catalogue can brings re-usable resources (data or processing services) as well a s local ones.

  • the GeoWorkflow is supported by a bespoke specification for OGC services using the jBPMFootnote 14 suite with a web editor and a workflow engine [35].

  • the GeoGermplasmDB services as well as local environmental data are served using GeoServer12; the results of the simulations or other workflows can be stored in the local environmental data storage.

  • a set of ontologies can be used to enrich the data and processes enabling refined queries via the metadata catalogue client.

Quality information available for data and processes in the metadata catalogue are used for uncertainty assessmet from the error propagation, by then allowing better decision-making. This is currently available as added functionality from the web editor from re-using the MetaPUnT WPSFootnote 15 service [27, 28] and allowing to meta-propagate the uncertainties.

First applications

Two illustrative examples are presented here to highlight the potential of the eGRASP. The first one, a landscape gentic modelling, uses directly the GeoGermplasmDB and WFS associated to describe spatially genetic distances of germplasms. The second one illustrates the crop disease modelling of the eGRASP facility by designing an examplar wheat eyespot disease model [1]. Both examples, the landscape genetic association analysis and the crop disease modelling are using a BPMN scientific workflow representation, by then demonstrating the range of modelling situations that eGRASP is intending to cover.

For the landscape genetic modelling, a glasshouse trial with 128 plants from 4 repetitions each of 32 landraces was analysed (Figs. 5 and 6). Here only the genotypic information was used to retrace geo-location associations of similar genetic profiles based on 20 microsatellites molecular markers (SSR) [37, 46]. Five genetic profiles were identified from k-means on main principal components of the SSR response data. In Fig. 6, the green and red profiles capturing most of the genetic variability are relatively clustering spatially with an East-west gradient in the Sahel for the reds and a North-south gradient in the East and South-East Africa for the greens. Adaptations to similar climatic environment can be though as explaining these zones with the Sahel zone for the reds and a more humid tropical zone in the East-Africa for the greens. Trade routes can be also involved. Further analysis including the phonologic data with comparison to local data will be needed to confirm these sorts of hypotheses.

Fig. 6
figure 6

Bambara groundnut (Vigna subterranea) landrace origins classified by genetic distance (bottom: first two principal components and kmeans classes, top: geo-locations of the sample)

Each task of the workflow in Fig. 5 was performed from R scripts based on existing packages. These R scripts are in the process of being encapsulated as WPS in order to be used and shared from the eGRASP platform.

The second example illustrated in Fig. 7 is a scientific workflow for crop modelling with potential occurrence of the eyespot disease. The purpose was to integrate specific epidemiological disease modelling within a normal growth simulation model. The Eyespot disease is modelled using few sub-models interfering with the normal development of the crop:

  • The inoculation potential model (IPM) determines the amount of inoculum available for infection of the host depending on land condition risks and weather data.

  • The disease development model (DDM) based on the inoculation level and key environmental factors related toinfection and disease developement.

  • Finally at a key developmental growth stage the severity of the disease is determined (DSM) and is based on estimates from the previous two models.

  • The impact of the severity of disease is then evaluated iteratively (HRM) at the subsequent growth stages until the crop has been harvested.

Fig. 7
figure 7

Eyespot disease workflow modelling using APSIM for wheat crop growth simulation

Each one of the models: IPM, DDM, DSM and HRM are stochastic models and estimated at the given growth stages that were identified as crucial during the development of the crop on controlled data: GS13, GS32, GS39 and GS65 [1]. The models are to be combined with physiological based model for crop growth as in the BPMN representation in Fig. 7. The disease evolution models have been implemented in RFootnote 16 and APSIM was chosen as crop growth model. Within APSIM and using the script manager, R scripts can be ran, making APSIM the orchestrating engine. Nonehteless, encapsulating APSIM within a WPS could be a future solution using the workflow engine wihtin eGRASP. Details of first results and variables involved in the IPM, DDM, DSM and HRM model can be seen in [1] as well as the full validation of the models. Nonetheless, despite the capacity of APSIM to run R scripts, the targetted variables by the disease modelling couldn’t be updated during simulations which led to a much simpler adaptation of Fig. 7.

For the eGRASP the interest lies in the fact that such composition and conceptualisation of the models can be facilitated and controlled, e.g. looking for model adequacy. The interoperability ensures that the models designed according to the BPMN standard can be then shared using a standard graphical representation for better communication but also as XML encoding enabling any workflow engine to run the scientific model represented as a workflow.

Like UML (Unified Modelling Language) used as a computing science tool to design of application systems, leading both to databases and object programing implementations, the meta-language of the BPMN can be very rapidly understood from the scientists involved [22, 29]. This transdisciplinary process enabled to conceptualise the disease evolution and impact in a comprehensive way that has been also efficient to put in practice once each sub-model (tasks in the BPMN diagram) has been established and fitted.

Future research

Interdisciplinary projects often reduce to cross-disciplinary spill over; however over a shared building up initiative to advance on the GRASPgfs concept, a real transdisciplinary collaboration has been initiated and experienced. Not only the co-design of the eGRASP platform with its embryo of capacities has enabled to envisage new potential research ideas in each of our disciplines, it also concretised global food security strategies and analyses. The recent development of the GRASPgfs framework along with the design of the eGRASP was limited as due to the budget and not all the disciplines firstly envisaged could be adequately integrated. Whilst in Fig. 1, the agro-ecologial interaction would derives mostly from re-using models in landscape genetic and landscape ecology, as well as the agro-economic would benefit from models mentioned in the background section, their data modelling integration represented on the left hand side of the model has not been yet investigated. For a prototype design this was not crucial as long as we could still represent its future influence when composing the models.

If some of the services in Fig. 4 are in place the actual data and processes content is rather small as this was a proof of concept exercise. Nonetheless PhD students and recents projects are providing valuable examples also enhancing the capacity of this platform. The interoperability principle adopted by the eGRASP, including its open source and open standard focus, is the chance for maximum dissemination of this capacity as a set of cross-platform clients and services. Geospatial risk assessments in agriculture in relation to species and pests, can be greatly facilitated from sharing data and processes which can then reused by the eGRASP.