1 Our Modern Food System

Key to developing any future system of data linkage in agriculture is the need to understand the structure of the current system. To fully understand how our current system of varietal registration, approval and certification has arisen and therefore some of the pitfalls and opportunities for more efficient data linkage and improved system function, it is also important to understand which drivers led to their establishment and how this has shaped the extant systems today and the flow of data between them. It is also important to note that our current set of statutory and advisory systems for the registration and marketing of crops of major agricultural importance is intrinsically linked to the development and fate of the organisations that developed and implemented them. While this may seem unimportant, it serves as an indicator of how often function (or dysfunction) follows from and may be of importance when considering future alternations to both national and international organisations.

The modern food system is one that for a large part of the twentieth century prioritised productivity (Benton & Bailey, 2019). The integration of modern plant breeding, agronomy and mechanisation led to rapid and sustained productivity growth and has in-part allowed humanity across the globe to continue its shift from an agrarian society to an industrial society. Modern agriculture has been (along with modern medicine) a key contributor to the enormous reductions in global poverty and hunger (von der Goltz et al., 2020). The shift to industrialisation has also resulted in the primary energy source for human society to shift from photosynthesis-derived phytomass to fossil fuel, a trend that is broadly present in agriculture, as well as most other sectors of society and the economy. For most nations, there remains a linear relationship between fossil fuel energy usage and GDP, though there is emerging evidence that decoupling fossil fuel usage from GDP is possible (Haberl et al., 2020). Industrialisation and the ability to produce cheap food has led to a significant reduction in infant mortality, which in the short term has driven global population growth and overall global prosperity. Balanced against this is the overall lack of effective integration of the myriad externality costs of industrialised societies, including in the area of agriculture. Modern agriculture and the food system that it supports is currently responsible for between 10% and 30% of primary emissions. For agricultural crops, greenhouse gas emissions are largely due to soil-associated, microbially-driven nitrous oxide (N2O) emission through excess fertiliser use (and the carbon dioxide (CO2) used in the production of inorganic fertiliser through the Haber-Bosch process- the so-called input footprint) and soil disturbance and carbon release in the form of CO2 through tillage. In submerged cropping systems such as rice, the action of anaerobic methanogenic microbes in waterlogged soils leads to methane (CH4) production (Smartt et al., 2016). There are also then the opportunity costs of agriculture, for example the loss of land for natural carbon sequestration through conversion to agriculture. There are then of course the onward uses of agricultural commodities, for example animal feed, which then leads to animal-associated CH4 emissions. Beyond the narrow lens of emissions, it is also clear that through both land use change and the use of chemical controls of pests and diseases there is an overall decrease in the carrying capacity of the environment for many species, primarily due to ecosystem fragmentation and destruction (Dudley & Alexander, 2017). Agriculture is the largest contributor to biodiversity loss (Dudley & Alexander, 2017) and as such the need again to either find effective mitigation or simply to reduce the footprint of agriculture is required if we wish to reverse the current and ever declining viability of the ecosystem.

While all of this may seem unrelated to the use and linkage of data in agriculture, it is not. The regulatory and advisory systems that are in place in much of the world can directly shape the traits that are brought to market and at present in many places prioritise yield advantage (which is a key component of agricultural efficiency) under high input farming systems above other traits that deliver public and/or private goods. These include specific efficiency traits (i.e. nutrient use per unit of production), pest and disease resistance traits but also traits that may contribute to improved soil structure, reduction in nutrient loss to the wider environment, performance in mixed cropping systems etc. In this review we will put forward the case for how ensuring more effective data linkage can play a key role in designing, developing and delivering a sustainable farming system through both enabling access to data and through expansion of publicly available data types and enabling trait measurements for key resilience and efficiency traits alongside productivity traits. Of key importance is also the rapidity and urgency that is required to reform our farming system if we are to meet the joint goals of protecting and securing food production, reducing biodiversity loss and reaching net zero emissions.

2 A Short Historic Perspective on the Current Breeding, Protection and Registration Systems in the UK and Their Reliance on Data Linkages

The National Institute of Agricultural Botany (NIAB) was founded in 1919 as response to the food crisis of 1917–1918, when there was a serious shortage of imported food and a lack of seed, fertiliser and equipment to crop large areas of newly ploughed pasture, required to meet quotas for food crops imposed by governmental bodies. From the outset it was a public-private partnership supported through personal donations and governmental support. It has been an Independent Charitable Trust, though for the first 75 years of its life operated effectively as a government-funded institute until its full privatisation in 1996. Its founding Director was Sir Lawrence Weaver, who was at the time Controller of Supplies at the Food Production Department, which had been set up by the Board of Agriculture and Fisheries to deal with the national crisis during the first world war (Wellington & Silvey, 1996). NIAB is a classic example of an organisation whose form followed its function, built to deliver a specific objective.

Parallel developments to NIAB had also led to the implementation of a seed testing system, which primarily had the responsibility of ensuring that seed testing schemes were put in place for domestic and imported seed. Although long called for and already implemented in other European nations, the national food crisis led to Weaver establishing the official seed testing station for England and Wales which began in London but later became a part of NIAB in Cambridge (Wellington & Silvey, 1996). From 1921 the Official Seed Testing Station (OSTS), became a member of the International Seed Testing Association (Wellington & Silvey, 1996), which to this day continues to ensure common standards for data collection are developed across the world for seed testing.

Aside from NIAB’s role in seed testing, its objective was to achieve two aims: ‘to promote the improvement of existing varieties of seeds plants and crops in the UK’ and ‘to aid the introduction or distribution of new varieties’ but not to breed new varieties itself. This was at a time at which new wheat varieties, such as ‘Little Joss’ and ‘Yeoman’, developed in the light of Mendelian principles of inheritance and resistant to yellow rust (a disease caused by the fungus Puccinia striiformis. f.sp. tritici) and with higher grain quality had been recently produced by Rowland Biffin the Plant Breeding Institute (PBI), a slightly older Cambridge-based state institute (and part of the School of Agriculture at Cambridge University) that was also under the purview of Lawrence Weaver. Of note is that neither ‘Little Joss’ nor ‘Yeoman’ were reported to outperform varieties at the time for yield parameters, but did show improvements for disease resistance and quality in bread making respectively (Charnley, 2011).

In early years NIAB, as well as providing a role in seed multiplication for state-bred PBI varieties, established voluntary schemes for the approval of seed crops and certification of multiplied seed for use on farm, which led to the improvement of the quality of seed for the national harvest. This led to the development of a broader voluntary seed certification scheme, that evaluated both the in-field performance of seed lots at different stages of the multiplication process and the performance and characteristics of the seed as part of the Official Seed Testing function (Wellington & Silvey, 1996).

This undoubtedly contributed to the success of varieties such as Little Joss and Yeoman, as high-quality seed was always available. Of further interest is the fact that ‘Little Joss’ was reported to be a good low-input variety, that did well in light soil, enabling economic production in the fact of stiff price competition due to cheap imports and ‘Yeoman’ was suitable for intensive production with heavy fertiliser input enabling higher yields through altered agronomic practice, again allowing profitable production, this time of high-quality bread flour (Charnley, 2011).

Through detailed measurement and increasingly the use of statistical tests and approaches to experimental design, some developed at nearby Rothamsted Research, detailed observations of plant characteristics could be measured and compared for a range of performance characteristics. This led to great success in ‘cleaning up’ the practice of duplicate naming of varieties, a dubious practice that had occurred since Victorian times. This was especially successful in the area of potato and cereal varieties where synonymous varieties were reported on an annual basis and could be shown to be statistically indistinguishable (Wellington & Silvey, 1996).

In 1923 NIAB established a system of performance trials, in order to compare new varieties to existing varieties. These comprised of multi-site trials that then gave a ranked estimate of overall and regional varietal performance. These trials evolved over time and led to the establishment of ‘Descriptive Lists’ (DL) and ‘Recommended Lists’ (RL), designed to allow farmers to select independently-evaluated varieties. Descriptive Lists fulfilled the function of allowing variety attributes to be documented without providing a ranking. The objective of Recommended Lists was to fulfil the national requirement to maximise productivity and hence production of UK-grown agricultural and horticultural crops.

In post-war years, having seen the benefits of the voluntary seed certification and process in ensuring quality of both domestic supply and imported seed, new legislation began to be developed, to ensure that analytical standards for declarations of purity, germination and weed content were in place as well as freedom from disease. Furthermore, as by now plant breeding had developed many new varieties, the need for seed purchasers to understand the genetic quality of seeds offered for sale was also a key consideration (Wellington & Silvey, 1996). Parallel developments in the area of intellectual property rights for plant breeders ultimately led to the UK legislation, the 1964 Plant Varieties and Seeds Act, which allowed for the establishment of Plant breeders rights, allowing plant breeders to be granted the right to protect their intellectual property in the same manner as other inventors do, therefore affording legal protection and the right to prevent unauthorised multiplication and sale of seed. Although there was national implementation of this scheme, through the 1964 Act, the specified requirements for a common standard of adoption for plant breeders’ rights was internationally agreed in 1961, when the International Convention for the Protection of New Varieties of Plants was agreed in Paris for a Union for the Protection Of Varieties (UPOV). This specified international standards for Distinctness (that the variety can be clearly distinguishable from any other variety whose existence is common knowledge at the time of filing of the application), ‘sufficient homogeneity’ (Uniformity) and Stability (that the relevant characteristics of the variety do not change over generations) across 13 initial crops over a period of 8 years; collectively known as DUS testing. This allowed reciprocity across member states, meaning breeders could have multi-territory protection of their variety, as defined by a common data standard (Wellington & Silvey, 1996). Crucially the botanical varietal descriptors that were (and still are) used for protection bore little or no resemblance to agronomic performance, meaning that other processes were required for evaluating these characteristics. Therefore, the second part of 1964 act established the official index of varieties and the requirement for required statutory performance trials before seed marketing was allowed, for a range of crops deemed important to national food security, effectively the Recommended List system.

Of linked importance to the granting of plant breeders’ rights is the convention of the breeders’ exemption for the use of genetic material (registered varieties) in further development of plant material (Würtenberger, 2017). The protection and release of intellectual property for societal advancement is in the common interest- for example as stated in the US Constitution “To promote the progress of science and useful arts.” (US Constitution, Art 1, s.6). It is on this principle that within the Plant Variety Protection system, breeders are able to utilise other breeders’ material in their own crossing and selection process, following protection; this exemption is estimated to have led to tremendous economic returns since its implementation (Lüttringhaus et al., 2020).

Upon entry into the common market in the early 1970’s, EEC directives stated that National Lists and Official certification of seed were needed to meet common market standards and for entry into a Common Catalogue enabling marketing throughout the EEC (Wellington & Silvey, 1996). Once again productivity was at the heart of the agricultural policy. Only seed of high quality, with approved performance and distinct identity could be marketed under EEC policy. This meant that seed of important crops could only be marketed after a variety had been accepted for inclusion in a “National List” (NL) and the seed certified by a member state or third country operating to equivalent standards (a full list of species for which National Listing is essential is provided in Supplementary Table 1). Common grades for seeds at different levels of multiplication were also developed and meant that UK voluntary systems were converted into statutory systems that conformed to OECD standards (Fig. 1).

Fig. 1
A schematic representation of high to low category of the seed certification scheme. Categories are breeders, pre-basic, basic, first, and second generation.

A simplified overview of a seed certification scheme, ensuring quality standards throughout the propagation chain

Of note- farm saved seed was (and is) still permitted to be used outside the certification system. Entry into the common market effectively ceded national sovereignty in defining what could be grown and marketed in any single country for market access. Following Brexit the UK no longer participates in the common catalogue and therefore breeders now have to register their varieties in both the UK through APHA (Animal and Plant Health Agency) and in Europe through the Community Plant Variety Office (CPVO).

In practice the established system of IP protection and National Listing means that parallel evaluation of DUS characteristics, required for Plant Varietal Rights (PVR) to be granted and Value for Cultivatable Use (VCU) trials are required to be carried out in parallel for National Listing to occur. At the time of entry into the common market ‘Recommended Listing’, remained a government-funded activity for many crops, but nowadays, following widespread reform of near market research and development in the mid 1980’s, the RL is wholly industry funded through the statutory levy which is administered by the Agricultural and Horticultural Development Board (AHDB). The National List is now delivered by a combination of breeder-funded trials, delivered both by breeders and by NIAB and government-funded disease resistance trials and operated on a cost recovery basis. Often VCU trials and RL trials are intertwined, though this varies on a crop-by-crop basis.

The UK system of a dual NL and RL leads to a two-tier system that means following National Listing a second non-statutory bar is created, meaning that in addition to passing the first statutory approval for marketing and certification, the second advisory tier exists, the Recommended List (see Figs. 2 and 3) to highlight varieties that show a clear improvement in performance to a set of existing varieties. The overall criterion for RL candidate selection- places a strong emphasis of promoting varieties which are 2% or more above a yield target for each market segment of a crop group, though exceptions are possible. Specifically the RL variety selection criteria are stated to be “considered to have the potential to provide a consistent economic benefit to the UK cereals or oilseeds industry“(AHDB, 2020). As can be seen from the evaluated criteria for winter wheat (Table 1) other factors are considered, but entry onto the list is largely based on the primary results of NL yield trials.

Fig. 2
A flow chart depicts the linkage between international function, agencies, national responsibility, and national delivery with additional advisory steps.

Overview of the linkage between National and International systems and agencies involved in the registration, regulation and recommendation of crops of agricultural importance

Fig. 3
A block diagram of the national and recommended list system for R L, V C U, and D U S trials for year 1 to year 6.

The National and Recommended List system. DUS and VCU trials (Y1, Y2) define the criteria for entry on to the National list and certification schemes, while subsequent years evaluate the ‘best of the best’ for inclusion onto the Recommended List. Inset decision tree for the AHDB RL process is reproduced from (AHDB, 2020)

Table 1 The evaluation and criteria for selection of winter wheat lines on the AHDB Recommended List 2020–21

This long and complex history, which is insufficiently summarised here is presented in order to illustrate that the requirement for data standards and data linkage in complex systems is not at all new. The consequence of system of varietal registration, testing and certification for marketing, built largely under policies promoting food production, is that many of the international data standards have been developed by international organisations with functions for seed testing, Plant Variety Rights and Seed Certification schemes (ISTA, UPOV, OECD) an overview of which is presented in Fig. 4. As such the integrity of data stretching back sixty or more years is in-part preserved.

Fig. 4
A chart for the data linkage between seed certification scheme, O S TS data, D U S data, V C U data, R L data, national statistics, and productivity.

Current data linkage between nationally applied statutory and advisory varietal registration and trialling processes and their onward linkage to national statistics

3 Who Owns What Data?

In the current registration and evaluation system, as a result of the many changes in organisational ownership and funding of national listing systems, ownership of the data is disaggregated. For DUS purposes the data is owned by the registrant (i.e. the breeder), and national databases of performance are kept by the bodies that perform the tests (e.g. NIAB) as well as summary data held by the international body UPOV in their “PLUTO” database (https://www.upov.int/pluto/en/). Summary data is made public at the national level to allow seed certification protocols to be administered (which are evaluated in-part based on DUS characters) and while similar to DUS data, has no legal status and is not identical. For example, for certification purposes in Scotland, SASA grow national list varieties to develop character lists for distinctiveness to aid with seed certification.

Plant breeders own VCU trial data, which they give APHA (the Animal and Plant Health Agency) permission to use for National Listing. There is agreement between APHA, breeders and AHDB for the data to be used in combinable and forage crops for RL, or DL purposes. In general practice currently, NL data cannot be used for either research or commercial purposes (e.g. extra analysis) unless permission of both APHA and individual breeders is sought. APHA own the VCU ‘matrix’ of trials and the analyses (an agreement between APHA and BSPB).

Historic data, prior to transfer of registration systems onto the private sector are somewhat patchy. Electronic data going back to the late 80’s is held at NIAB for cereals, pulses and sugar beet and late 70’s for herbage and oilseeds. Yield data for wheat and barley trials is held going back to the 1940s. Most of the statistical analysis of these trials has been disposed of apart from a few of the historic paper analyses which were kept and archived, more for posterity than for future utilisation. However, for some crops there are paper records e.g. for sugar beet going back to the 20’s. Post-1986 the levy body owns Recommended List data while prior to that, as a government-funded activity delivered through NIAB the data was in public ownership.

Less attention has been given so-far in this overview to the quality of record keeping within breeding programmes, both in the public and private sector. From personal experience, the availability of historic, field and trial data are often patchy and dispersed among paper and digital records of varying quality. It is usual that simple structural problems, such as turnover of staff, the patchiness of digitisation of paper records and the lack of resources to curate and archive data all lead to loss of potentially valuable data. Due to the simplicity of much of the historic data, the issue of data standards is usually not an issue. However, it is often the case that phenotypic descriptors are not necessarily well designed and can be highly subjective. Moreover, the immediate lack of identification of a lack of purpose for some datasets often leads (in the author’s own experiences) to short-term decisions being made about the investment in data archiving of the majority of within breeding programme data. This is especially true within breeding programmes when dealing with historic data, as living material may no longer exist and therefore the immediate utility of the data is sometimes not apparent.

4 Data Linkage with Statutory Information: Examples

The revolution in affordable genome sequencing and genotyping technologies has led to the ability to measure genetic variation in plant varieties to a degree and precision that was unthinkable even 10 years ago (Pavan et al., 2020). As such many publicly funded initiatives and collaborative public-private initiatives have led to the widespread availability of DNA sequence data in the public domain. Other multi-omics data is following, but principally it is DNA sequence polymorphism data that is of immediate value. Many studies have shown the value of incorporation of molecular data into both DUS (Cockram et al., 2010, 2012, 2015; Saccomanno et al., 2020) and VCU (Wang et al., 2012) and suggested strategies for deployment (Jamali et al., 2019). For example, in crops where heritability of DUS traits is low, it is of significant benefit to utilise molecular data (Cockram et al., 2010). Within plant variety protection UPOV already have models laid down by the Biochemical and Molecular Techniques (BMT) Working group of UPOV for the use of molecular markers (Jones et al., 2013). However, in the UK use of molecular data or prediction of performance does not yet extend to VCU trial data, nor RL trials, despite some obvious advantages of doing so.

4.1 The Use of Historic Data

Recent work of Fradgley et al. has shown what a valuable resource even relatively simple pedigree information can be in the light of modern genomics (Fig. 5) (Fradgley et al., 2019). Through an analysis of global wheat data, coupled with genetic marker data, they were able to first construct a pedigree of over 2600 wheat varieties and identify signatures of artificial selection across the pedigree and demonstrate that these genomic regions could correspond to genes of known functional importance in key yield components.

Fig. 5
A pedigree chart of the global wheat.

The pedigree of global wheat, as reconstructed by Fradgley et al. (2019), drawing on Recommended List data, among other sources. (Reproduced without modification under CC-BY 4.0 licence, with permission from the author)

Subsequent Genome-Wide Association Studies GWAS using VCU yield data finds around a third of the signatures of selection identified in the pedigree paper to overlap with GWAS hits for yield (White et al., 2021).

This demonstrates the principle that high-quality data that is publicly available can have scientific value far beyond that originally envisaged through linkage of newer datasets with historic data. Further potential for much more detailed analysis of breeding and selection activity lies within the vast datasets for the National and Recommended lists.

4.2 Data Linkage Between Public and Private Sources and Use Within Breeding Programmes

So far, this review has primarily concentrated on the vast array of data that is generated as part of the varietal listing process, which is largely unknown to the majority of academic researchers working in the area of crop genetics and improvement. However, over the past twenty or so years the generation of breeder-relevant data within the academic sector has grown substantially, especially as the explosion in molecular biology techniques has led to the creation of large datasets. As a general aid, dataset can be separated into two classes. The first is data that informs about the biological function of the crop as a whole- for example a detailed timecourse of gene expression regulation in multiple tissues of a single variety of wheat, grown in a controlled environment. This data and the associated analysis are clearly relevant to breeding a crop improvement- for example in determining specific genes involved in biological process; information which a breeder could use to devise a screen for genetic variation in breeding material. However, it cannot necessarily be integrated directly into a breeding programme or selection scheme. It is likely that raw data and associated metadata is deposited in an archive and that the relevant conclusions will remain available for some time.

The second type of data is that which is breeder-relevant is most likely to take the form of population- level data, potentially associated to some physical genetic resource or relevant growing environment. This could take the form, as we have seen in the examples above, of genotypic data of publicly available varieties, sequencing or resequencing of publicly available diversity panels. A good example of this is the publicly available ‘diverse MAGIC’ resource which can be used for trait discovery in breeder-relevant material (Fig. 6). The selection of this material was based, again, upon the analysis of important founders of wheat breeding programmes (Scott et al., 2021). As both the dataset and the genetic resource is available, it is possible for the breeder to use this population as a discovery tool and then to directly cross in variation.

Fig. 6
A pedigree with 16 founders consists of a 2, 4, 8, 16-way cross, and R I L s. A color scale depicts the contributions to next generation.

Pedigree showing the construction of 504 Recombinant Inbred Lines (RILs). One exemplar pedigree is highlighted to show how all 16 founders are intercrossed into each RIL. (Reproduced without modification from Scott et al. (2021) under CC-BY 4.0 licence, with permission from the senior author)

It is of course likely that some of these resources are time-limited in their utility and as such not likely to be available as seed beyond a brief window, unless other financial support is provided for their long-term storage. A similar story may also be true for some, but not all of the data. Data from this specific example is available from a variety of sources. Taking this as an example the resources are spread among five different entities: (1) The preprint and ultimately the published paper and its supplementary materials. (2) A laboratory website with files required to associate genetic variation with individual lines (3) A github repository with scripts for analysis (4) The European Nucleotide Archive for sequence data and (5) The location of the physical genetic resource- in this case NIAB in Cambridge. This is typical of modern scientific publication and it is easy to see how breakage or removal of any component of this pipeline and archival system will lead to loss of utility of the resource. Over the medium term (5+) years the risk of this occurring is probably highly likely and is on the whole the current state of affairs across many disciplines, not just crop research.

5 Linkage of Variety Performance Data – Making Better Use of RL Data in the Light of Genomics

With multi-environmental trial (MET) data such as the RL the desired outcome is to measure the performance of varieties on a regional basis as well as simply reporting the outcome of the MET. However, it is also possible to predict the performance of a variety, through the use of mixed models.

This use of use mixed models has gained favour in the breeding community, where information about relatedness of individuals (an all-all pairwise relationship matrix) is combined with performance data gathered across sites. The treatment of incomplete trial block designs, replicates within sites, different trial sites, and relationships between individuals as random effects allows predictions to be made about those random effects. For example, information drawn from across multiple sites can be used to better predict variety performance for yield in a single site that may be lacking in data (Millet et al., 2019). Similarly, treating relationship data (e.g. pedigree) as a random effect allows performance information to be drawn from closely-related individuals to predict the performance of an individual variety in a site where it has not been phenotyped (Millet et al., 2019). Moving beyond pedigree data, the absolute (rather than estimated) relationships between individuals can be calculated through comparing their DNA sequences. The all-all pairwise comparison of any group is termed the genomic relationship matrix (Molenaar et al., 2018). The basic premise around the use of genomic data in combination with phenotypic data is that within a mixed model framework it allows a proxy for phenotype to be estimated and used as a tool. Performance predictions are referred to in this mixed regression model framework as BLUPs (Best Linear Unbiased Predictions).

Furthermore, the use of covariates, for example weather data can lead to enhanced predictive abilities, as the incorporation of other relevant information into the analysis of the trial can lead to a stronger predictive ability which the covariates have a large effect on varietal performance (Gillberg et al., 2019). Similarly incorporating covariates of phenotypic measures into predictions of yield, especially when segregating data into common environmental groups can enable better predictive ability of varietal performance for key yield or resilience components (Ly et al., 2018).

In private breeding programmes there is often similar data to that outlined in public trials programmes. This may take the same form of the data above, but is likely to be held in a local database, or file structure of some kind and never publicly available. This data may have value, in combination with other proprietary data, or through the integration with public data but beyond the provision (as outlined above) of the data for statutory systems the ‘internal workings’ of breeding programmes are currently rarely revealed. For example, by combining recommended list data with internal genotypic data (present in most modern breeding programmes), better predictions of varietal performance in a given region can be made that are present in the RL itself through the application of mixed-models and BLUP to estimate random effects based on combined RL and private breeder data. This could be improved still further if access to all genotypic data in the trial was possible (Robertsen et al., 2019).

The use of high throughput phenotyping in trials programmes will also add additional insights, especially in combination with environmental covariates and better models of plant development (Zhang et al., 2019). Recent work from Zhou and colleagues illustrated how relatively cheap devices could be deployed for quantification of key developmental (and linked environmental changes) in crops through the use of internet of things devices (Zhou et al., 2017), while the same group also showed how the use of low-altitude, low cost Unmanned Aerial Vehicles (UAV) could be used to determine key yield related traits in wheat (GuoHui et al., 2019). Finally, the use of cameras and machine learning algorithms for seed imaging and analysis could provide key data for the analysis of seed quality (Colmer et al., 2020).

6 Unintended Consequences of the Current System – Do Data Standards Help or Hinder?

It has been known for many years that there is immense value in the historic data captured in NL and RL data, however, lack of adoption of new methods due to lack of national international evolution in standards has hindered progress (Mackay et al., 2011). This is primarily due to the fact that national authorities must now follow international standards for DUS, certification and seed testing, which at their heart are systems based upon botanical characterisation, which although laborious (and sometimes inaccurate) is scalable and relatively low-tech. International bodies must ensure that the broader considerations about equity and implementation of processes around the world are put first. This has the unintended consequence of holding back the application of cutting-edge technology. However, this is not the complete story, as countries that are now signing up to OECD and UPOV schemes are implementing them differently and probably placing more emphasis on data integration at the national level, while countries with established systems find it harder to drive forward change.

As a result of multiple factors, discussed later in this review, developments in scientific research have become more distant from the process, leading to significant divergence in what is technically possible and what is carried out in practice and many are now calling for innovation in VCU and DUS systems (Gilliland et al., 2020; Wang et al., 2016).

A recent study by Yang et al. highlights the weaknesses in the current varietal registration system in the light of new information, specifically in the area of DUS testing (Yang et al., 2020). The study revealed that low heritability traits (i.e. those that are not influenced by segregating variation) are commonly used for DUS and certification purposes, meaning that confidence can be low about assigning varietal identity and in proving distinctness (which requires stable differences to be expressed between varieties). This aspect of the system could be completely avoided through the use of molecular markers. However, if mandated internationally this would increase cost of registration and potentially reduce the accessibility of systems to LMICs, lacking reliable or affordable access to more advanced protocols.

For VCU and recommended list trials, consideration must also be given to the fact that potentially more environmentally sustainable varieties (for example those able to give reliable yields in marginal conditions) are not necessarily able to enter the system due to the fact that many trials occur under near optimum conditions, which are unlikely to be the norm on farm, or may require extremely high levels of inputs and therefore may be less sustainable.

The use of distributed ledger technology (DLT) in certification could lead to less cumbersome processes and could in fact increase adoption of certification systems. The technology does not yet exist to deploy DLT efficiently for genomic data at scale but consideration of this as a useful technology to help maintain some privacy around genomic data, while leveraging benefits may be valuable (Lee et al., 2018; Thiebes et al., 2020).

7 Linking Breeding to Wider Farming Systems and On-Farm Practice

By the last quarter of the twentieth century, the impact of international quality schemes using linked data standards, common markets and highly productive agriculture, due to genetic, agronomic and statutory innovations had led to a fairly centralised, highly regulated, but costly set of agricultural systems. By the beginning of the 1980’s this increasing cost and rising waste due to the overproduction caused by the Common Agricultural Policy meant that the last 20 years of the twentieth century were spent attempting to move much of the cost of both the systems (PVR, NL and RL) onto the industry and the cost of the strategic and applied research that NIAB and other institutes did onto industry to reduce government expenditure in what by this time was a system that produced sufficient (even surplus) food (Wellington & Silvey, 1996). In 1986 through the Agriculture Bill arms-length levy bodies were formed following industry consultation. These were tasked with collecting a levy from the industry to commission near market research, development and knowledge exchange. A unified levy body, the AHDB, now commissions RL and DL work and produces the recommendations for industry. More generally over the next 20 years the privatisation or closure of many strategic research organisations led to some fragmentation in the UK between the previously well-connected research institute structure and both the industry it serves and policy makers (Wellington & Silvey, 1996). It can be argued that this step to full cost recovery on near-market research and development has had the unintended consequence of disaggregating knowledge bases, especially at the strategic level, leading to a lack of general oversight of the steps required for system innovation as it effectively separated the science-led decision-making functions from the process-led delivery of the systems.

While past challenges were focussed squarely on productivity, it is now clear that a broader set of considerations are required for our longer-term food and environmental security and that our food system must adapt rapidly to ensure that the joint goals of biodiversity protection, economic production and net zero agriculture are met.

The use of these trials datasets to simulate forward under local and global weather and climate models is crucial for increasing our understanding of how best to adapt crops to changing weather patterns as a result of climate change and contribute both to better recommendations, but better data for breeders to utilise in creating more resilient and lower input varieties.

Moving beyond the trials and listing processes and onto the farm the availability of linked data would allow regional performance estimates to be validated (and improved) through using on-farm data of farm-grown crops and therefore a more dynamic recommendation system could exist. In fact, data from all stages of the registration process should be able to feed back into one another creating more dynamic systems able to update predictions and confidence estimates of predicted performance of varieties all the way back to the breeder.

Linkage of trial data to further agronomic development- where significant differences exist between on farm performance and trial performance, should allow better insights can be made into the causes of these yield gaps. It is highly probable that in many farming systems the performance of varieties in trial does not match the on-farm setting. This could be due to factors such as local pest, disease and weed pressure, specific issues with soil or microenvironment or difference in farming practice. All of these factors could potentially be decomposed at the genetic level (and therefore be subjected to improvement through breeding) through the use of robust and open trial data, if common data standards were used in data capture.

Recalling the original examples of Yeoman and Little Joss, the former suited for high-input, high yield farming, the latter for low-input situations. It is likely the latter would have not success in the current system, despite some potential broader environmental benefits of slightly lower yielding, but much lower input varieties.

Care must be taken to select the appropriate ontologies for trial and registration systems and (just like in the historical examples) be aware of and integrate current international efforts in this area (e.g. https://www.cropontology.org/) (Fig. 7).

Fig. 7
A chart of the possible data linkage between genomic data, environmental or simulation data, phenomics data, distributed ledgers, and other data.

Possible new data linkages made possible by integration of new data to the system described in Fig. 4. Genomics data (green) would allow better integration of DUS and VCU data, with aligned benefits in seed certification schemes. Furthermore, internal linkages between private breeder data and public data in VCU, RL trials as well as in on farm production could lead to more dynamic recommendations. Integration of environmental and simulation data (orange) would again lead to huge leaps in prediction accuracy as well as paving the way for model-based predictions of crop performance under changing environments as well as greater precision of prediction for farm level growing practice. Distributed ledger systems (yellow) could have an impact on certification systems and potentially offer new methods to track seed thoughout the supply chain in a more accessible way. Phenomics data (red) again impacts all aspects of statutory, advisory and on-farm systems allowing greater linkage between DUS and certification data, as well as providing more phenotypic data to include covariates in yield or other trait predictions. Finally, integration of all methods through the development of common data standards, adopted by statutory and advisory systems would lead to greater power at the national level to assess productivity, environmental service and system-level performance characteristics required to deliver the joint goals of productive, sustainable, net-zero agriculture in tandem with enhanced environmental service

8 Potential Systems-Level Solutions That Could Be Achieved Through Improved Data Linkage

In moving from a largely linear set of approval systems, consideration should be given to a more circular or ‘systems’ approach to improvement ensuring that data at all points in the knowledge chain are utilised and that feedback of data are made possible.

The ability to measure and manage better is made possible through the development of standards and provides a much more coherent dataset upon which policy decisions could be made.

This is crucial to recognise that in enabling the characterisation of and then integration of negative externality costs into the regulation of wider farming system, changes may occur both in the way in which we farm and the performance characteristics of our crop varieties. Careful thought must be given to how the joint considerations of productivity, net zero and biodiversity protection will likely drive new farming systems which may not solely rely on monoculture. The use of bi-cropping or poly-cropping (where extended phenotypes may emerge- e.g. enhanced biodiversity) are potentially important and trials evaluation systems will likely need to integrate this into the evaluation system. The use of genetics may also change, with variety mixtures (of varying forms), which may be genetically diverse but functionally homogeneous for key traits all being future possibilities that registration systems must deal with.

The ability and desire to derive appropriate metrics at the national level and to have a more holistic view of on-farm performance of varieties and then onward linkage to other data will allow better quantification of life cycle emissions and environmental impacts of primary production. The onward linkage of domain-specific data to metrics about system performance, allowing better estimates of the environmental impact of farming on the environment and both identification and modelling of ‘what works’ at the policy level, allowing more dynamic implementation of future agricultural policy (Fig. 8).

Fig. 8
A schematic of the possible market opportunities. From below, the levels are inputs, modelling and integration, and outputs and outcomes.

Examples of possible market opportunities made possible through on-farm data linkages. Underpinned by linked evolution of statutory and advisory systems, these linked datasets could have multiple public good and economic uses, some of which are illustrated in this figure

9 What Structures Are Required for Future System Change and Who Benefits?

Expansion of the national component of statutory systems should be considered to drive forward innovation. This should be considered alongside renewed international engagement with equitable innovation within statutory systems at the global level. In order to maintain market attractiveness for breeders, these enhancements to the national systems should be state funded. This would change data ownership structures to ones where government co-owns data, but provide ‘win-win’ situations for all points in the supply chain that would likely drive up innovation and overall productivity and sustainability. This may have particular benefit to small breeding companies, that often lack access to genomic resources, but could benefit from the use of them within their own breeding programmes.

The initial pillar of this innovation is principally this need genome sequencing to be a prerequisite for national listing- this should be funded by the government, operate under FAIR principles and be seen as an extension to the current patent system, operating under the principle of publishing and protecting for both the common good and for the benefit of the rights holder (Wilkinson et al., 2016).

Analysis of both varietal data for DUS purposes and DUS data (and onward certification) should evolve in the light of genomics, ensuring that global standards are maintained, but that additional innovation is unlocked through the use of genomes. The use of molecular markers in DUS, VCU and onward RL trials should be made a priority.

However, effort should also be given to drive forward innovation in gathering supporting data, for example phenomic and environmental data of trials that is interoperable used between both public and private functions should be co-funded (much in the same way that pathology data for VCU is funded for the public good). Again, co-benefits could be recognised throughout the food system. An example of success comes from the life science sector where the Pistoia Alliance (www.pistoiaalliance.org) brings together member companies (numbering over 100) from life science companies, technology and service providers, publishers, and academic groups to transform R&D through pre-competitive collaboration. In a fragmented landscape such as agriculture, this may be a new form of collaborative national or international network, that would more rapidly advance linked innovation and statutory innovation than the present separated systems that are currently in operation.

Automated capture and development of data standards for image data and development of standards need to be developed and applied both in registration and advisory systems, but also in on-farm improvement measures; the adoption of common data standards for statutory systems would likely ‘cement’ a data standard across the industry, even though it could be used on a voluntary basis.

In exchange the use of proprietary data to derive co-benefits that benefit the public good should be requested, enabling DUS and NL data that is currently privately owned to be released into the public domain in some way. Appropriate consideration of the need to retain commercial advantage should be at the forefront of this discussion, but methods are available to ensure that some benefits can be derived without necessarily requiring full data release. The use of trusted intermediaries could be one simple mechanism.

Historically meeting challenges such as these has been done by centralising the challenge and creating form follows function vehicles ‘i.e. NIAB’ which leverage both public and private investment, but other models could be possible- for an example see the Pistoia alliance in the biomedical sciences, but it is likely that in this case some ‘re-grouping’ of functions are needed to drive change at the required pace. This should include the same three pillars that were there at the inception of our current statutory systems, government and public funders, private enterprise and scientists/academics and be treated as a shared challenge with the power to both design and implement science-based programmes of varietal evaluation and monitoring.

10 Overall Conclusions

There are ample opportunities for improved data linkage to transform our understanding of the changes needed for improved system design. Building depth into reformed long-lived statutory and descriptive systems is likely a good idea as this provides longevity to data and ensures security of data.

Establishment of data standards within statutory and advisory systems often preserves key data linkages but can also ossify and stifle progress and so flexibility is required in any future design and additions and any enhancements or divergence from current international standards by individual nations needs to be supported through public funds to ensure that it is commercially viable to operate and register varieties in the national market. This will lead to a period of duplication, but is required for any broader transformation to occur.

Making any additional data added to statutory and advisory systems open is crucial and should be viewed as a public good. The current system does not allow for FAIR data at this point in time. Making the data standards open and accessible is critical to drive wider adoption of that data. Further standardisation of trials between the public and the private sector is likely a good idea and the use of common ontologies will ensure interoperability.

Extending linkage of data from statutory and advisory systems both into academic research and onto the farm provides great scope for new approaches to measuring and managing system-level properties and assessing the performance and impact of a broader array of crop genetic innovations on farm, with more dynamic feedback into crop breeding programmes.

Function is currently following form, the disaggregation of what was a centralised strategic response has led to ‘masters of none’ and the inability to drive reform at pace. This can be seen at both the national and international levels. This is not necessarily the fault of any single actor, but a property of the diffuse structure without coordinated oversight.

Current systems emerged out of a need to drive up productivity and were highly centralised efforts, both at the national and then the EU level. The challenge of responding to the joint challenges of economic, resilient and sustainable production, low emissions agriculture and reversing biodiversity decline is too great and swift action is required, likely through a form follows function approach and a re-design of current national systems and organisations but ensuring close cooperation between public and private stakeholders. Plus ça change!

Author Contributions

RH devised the overall review with input from MC. RH wrote the first draft of the manuscript. MC provided editorial support and constructive discussion throughout.