1 Introduction

Proposals that point the way forward are nowadays routinely called “roadmaps.” But on Richard Harrison and Mario Caccamo’s showing elsewhere in this volume, the data-world of the future for agricultural plants in Britain is more handily visualized with an image akin to the map of the London underground (Harrison & Caccamo, 2022, Fig. 7). Instead of tube stations we see different kinds of data – genomics data, environmental/simulation data, phenomics data, plant breeding/trial data, Recommended Lists data, Distinctness Uniformity Stability data, Value for Cultivable Use data, Official Seed Testing Station data – plus a range of activities and systems where those data may be integrated and acted upon: the seed certification scheme; the growing and evaluating of certified seeds on farms; the collecting of national statistics bearing on productivity, performance, and environmental impact; and the tracking of seeds, and the profits accruing from innovations in their development, through a distributed ledger system. Looping between these are brightly coloured one-way arrows, mostly solid, occasionally dotted, with the caption below the image spelling out the envisaged benefits. Genomics data, for example, will feed into the determination of how distinct, uniform and stable a variety is (these being the standard criteria for the award of intellectual property rights to the breeder) as well as how valuable it is for cultivatable use, in a way that helps integrate the data generated from these exercises and so increases their value for seed certification.

Here is an ideal of frictionless movement between various kinds of plant data across time – an ideal also encapsulated in phrases such as “historical data mining.” Plant data on such a vision is like oil: a valuable resource that only needs tapping to become potentially useful. Between ideal and reality there are, of course, impediments. But nothing here suggests that these are other than infrastructural, as when data are locked away in filing cabinets, or in journals that no one has yet digitized, or on floppy disks written in an outmoded computer language, readable on machines that no one runs anymore or – almost as bad – that are run by firms charging exorbitant fees for the service. These are, in principle, soluble problems, some of them technical, others social, still others as much technical as social. Solve these problems, open up access to the data, and the data will start to flow along the mapped-out channels, to the good of future food security and the knowledge that will underpin it.

I want to suggest in what follows that there may be another class of impediments worth being reflective about: intellectual ones. I will dwell in particular on what, for historians and philosophers of science, is an especially conspicuous candidate: the problem that data are, in the canonical phrase, “theory-laden.” The basic thought here is that, in important ways, the categories used in classifying observations, and the choices made about which observations to file under which categories, can reflect background theory (see, e.g., Hanson, 1958). By way of making this abstract issue concrete, I’m going to offer two stories, both involving that exemplary Mendelian plant, the garden pea, Pisum sativum. Because I intend to draw morals from these stories, I’m calling them “parables.” The moral from my first story will be pessimistic: the problem of data theory-ladenness needs to be taken seriously. But the moral from my second story is optimistic: one way to overcome the problem of data theory-ladenness is to retain access to the seeds of the plants featuring in historic data. At this point my chapter will intersect with other chapters in the volume, notably those by Helen Curry, Courtney Fullilove and Richard Ostler on the seed banks that are sometimes labelled – in splendidly theory-laden manner – “germplasm collections.” As we shall see, the fact that seeds are more than just containers for genomes can be consequential.

2 The Pessimistic Parable

“The rediscovery of Mendelian genetics ushered in an agricultural revolution. For the first time, varieties that combined performance characteristics were systematically developed, based upon the principles of heredity and the genetic control of characters.” So begins the abstract originally circulated with the chapter by Harrison and Caccamo. They are based at the National Institute of Agricultural Botany (NIAB) in Cambridge, and, in the role they assign to revolutionary Mendelism in the making of modern agricultural success, keep faith with NIAB’s foundational vision. In the Memoranda on the Establishment of a National Institute of Agricultural Botany published in November 1918, A. B. Bruce, superintending inspector for the Board of Agriculture and Fisheries, wrote:

The undoubted success of plant breeding work at Cambridge is due primarily to the fact that in recent years an entirely new science has been built up as the result of the discoveries made by the monk Mendel in the early sixties. At the time these discoveries were overlooked, and it is only in the last ten years or so that they have received proper recognition. Without going into scientific details, the effect of Mendel’s and subsequent work can be summed up by saying that it is now possible to make a new plant possessing valuable economic qualities. Just as an architect in building a house has at his disposal different kinds of building materials, so the modern plant breeder can make a new plant out of, as it were, the fragments of another. It will readily be recognised what a powerful weapon this new discovery has placed in the hands of the agricultural botanist… . Now we no longer require to wait for nature to act; we can deliberately set about manufacturing what we require.

Bruce proceeded to illustrate with examples from the work of the leading exponent of Mendelian breeding, Rowland Biffen, recently installed as Director of the new Plant Breeding Institute, also in Cambridge. Biffen’s first great success was “Little Joss,” a high-yield, rust-resistant variety of wheat created by crossing a high-yield but rust-susceptible English variety with a rust-resistant but low-yield Russian variety. Little Joss, Bruce reported, “has now been on the market for nearly ten years, and so far has shown no tendency to revert either to the low yielding character of one of its parents, or to the liability to Rust of the other.” More recently, Bruce went on, Biffen had introduced other new varieties of wheat, among them “Yeoman,” a Mendelian synthesis of English high-yield with the superior baking quality associated until then with Canadian varieties (Bruce, 1918: 12–13).

When Bruce sang the praises of Mendelian breeding, it had been 18 years since Mendel’s “Experiments on Plant Hybrids” (Mendel, 1866) had become an unexpected sensation among botanical hybridzers. By the time of Little Joss’s release in 1908, a new science of heredity elaborated around Mendel’s paper – the science later known as “genetics,” but at this time mainly known as “Mendelism” – had taken off internationally, thanks above all to the efforts of William Bateson and his students at Cambridge, Biffen not least. From then until now, Mendelian principles have been fundamental to the organization of knowledge of heredity. Around the world, at every level of education, the standard point of entry into a scientific understanding of heredity is Mendel and his peas, in a form that Bateson first made teachable (Radick, in press). One key to Mendel’s success, students learn, was his focusing in on traits that come in distinct either/or versions: seed colour in the pea as either yellow or green; seed shape as either round or wrinkled; and so on. Another was his assiduity in ensuring that his parental stocks were pure and so “true-breeding,” i.e., the yellow-seeded stocks only ever produced yellow seeds, and the green-seeded stocks only every produced green seeds. Yet another was the care he took in ensuring uniformity in the treatment of the large number of plants he dealt with. Thus did Mendel get the data which enabled him to discover what had eluded his predecessors: dominance and recessiveness; the 3:1 ratio of dominant to recessive plants in the second generation of hybrids; and, crucially, the production by hybrid plants of gametes that were not themselves hybrid but were pure for one or the other of the trait-versions (see, e.g., Campbell, 1993: 258–67).

All of that is familiar, indeed foundational. Much less familiar is a scorching critique of all of that from W. F. R. Weldon (1860–1906), who at the time of the Mendelian rediscovery was Linacre Professor of Zoology at Oxford. On a Weldonian perspective, what you gain in control via Mendelian breeding experiments you lose in generality. Yes, if you assiduously expunge from your parental stocks all the variability except for the single either/or difference that interests you, and you then carry out your experimental breeding under uniform environmental conditions, you may well get, at least roughly, the patterns that led Mendel to infer what he did. But take different decisions about what to focus on, what to exclude, and which environment to impose, and you could well find yourself examining different patterns, which could in turn lead you to different, even opposite, conclusions (Weldon, 1904–1905).

In the garden pea, for example, is it really the case that yellowness is dominant to greenness, across the board? And what about the Mendelian corollary that if a seed is green, it cannot harbour any yellow-making factor? Yes, in the particular purified varieties that Mendel worked with, those conclusions seemed to hold. But when Weldon surveyed the world of commercial pea breeding, he found enormous variability – a continuous spectrum of colours stretching between yellow and green, a smooth gradation from extreme roundedness to extreme wrinkledness – as well as longer-range inheritance patterns that, under Mendelian theory, were impossible and so invisible (Weldon, 1902). For Weldon, all of this heterogeneity in traits and their inheritance was not just intelligible but, in a modest way, predictable, given what experimental embryologists had learned in recent years about the role of context in conditioning development. In Weldon’s view, the twentieth century deserved a science of heredity that took this heterogeneity, and the multiple, interacting causes that brought it about, as its subject matter – whereas Mendelism was set to treat it as a nuisance, and Mendelians, in line with their training, to categorize actual variability within the capacious categories that elementary Mendelism favoured (Radick, 2016).

For the most part Weldon’s perspective on the theory-ladenness of Mendelian observations remains locked away in unpublished letters and manuscripts. The exception is the well-known suspicion that Mendel’s pea data are “too good to be true”: that is, the numbers he reported are improbably close to the ratios predicted by his theory, given the number of trials he did. Nowadays this suspicion is associated with the mathematical geneticist Ronald Fisher, who published a classic paper about it in the 1930s. But the discovery was Weldon’s, made in 1901 and published in the same 1902 paper where he also published photographic plates showing the variability he had found in pea-seed colours and shapes. The suspicion became an object of public hand-wringing and finger-wagging over the possibility that Mendel was guilty of fraud only from the mid-1960s (Radick, 2022). Since then there has emerged a small industry devoted to examining the case (Franklin et al., 2008). Amidst the tremendous ingenuity and technicality, the larger lesson that Weldon drew has mostly been lost: the problem stemmed not from Mendel’s character but from his categories – binary categories upon which Mendel erected a theory of heredity which ignored context as a source of variability, and which in turn directed him to classify traits according to his either/or scheme (Radick, 2015). When confronted with a trait not unambiguously belonging to one side or the other of an either/or classification, he probably judged it to belong on whichever side made for tidier ratios. (It’s been shown in a classroom experiment with students that if you give them three instead of two categories to work with in classifying pea seeds – say, “yellow,” “green,” “ambiguous” – they will use all three categories; Root-Bernstein, 1983.)

So: whenever we are dealing with historic plant data from the post-1900 period, we need at least to consider whether what is reported is not just what any competent observer would have reported, but is – sometimes subtly and sometimes unsubtly – inflected with Mendelian expectations, and/or with Mendelism’s legacy for intellectual property rights: the insistence on distinctness, uniformity and stability (see Berry, forthcoming; Kochupillai & Köninger, forthcoming).

3 The Optimistic Parable

Given all the natural heterogeneity actively controlled for in a Mendelian experimental garden or laboratory, one might predict a “return of the repressed” once the products of Mendelian experimental breeding enter the wider world. A related prediction is that, when the repressed does not return, it’s thanks to some combination or other of two sorts of remedy. One is selection. By selecting lineages in which Mendelian traits of interest get expressed most fully and reliably, across the broadest range of environments, the skilled breeder gradually builds up, and builds in, whatever internal context best buffers trait expression in the new breed against the slings and arrows of outrageous fortune. The other remedy is, in some form or other, to extend the controls beyond the limits of the experimental space.

For all Biffen’s promotion and, indeed, self-promotion as the Mendelian breeder, he relied heavily on selection, as Berris Charnley has noted (Charnley, 2011: 144–5). With Little Joss, it worked a treat. But with Yeoman, selection proved insufficient to ensure the stability of the released variety. Farmers who grew the seed eventually found a noticeable proportion of “rogue” plants – that is, plants departing from the advertised type, in the direction of older, lesser wheat stocks. By the early 1920s, Yeoman’s rogue problem had become so bad that Biffen was being quoted in Nature as saying “the sooner Yeoman is off the market the better.” Biffen placed the blame on an external source: the threshing machines that travelled from farm to farm, contaminating Yeoman-planted fields with seed from older stocks. By the time a successor breed, Yeoman II, was released in 1924, a new, NIAB-run distribution system was in place, with the seed sold in sealed sacks bearing NIAB’s emblem. As an anti-contamination effort, it was a modest step. But we do well to see in it the start of the larger-scale control efforts to come, in the form of the fertilizers, pesticides, and herbicides sold along with post-Biffenian seeds and required in order to make them flourish as advertised (Charnley & Radick, 2013).

None of this would have surprised Weldon. He had a lively sense of the commercial value of selection in creating breeds that gave farmers what they’d paid for whatever the vicissitudes of environment (Radick, in press). He also knew how badly breeders often struggled with rogues when attempting to establish varieties sufficiently differentiated from starting stocks to count as new. In the 1902 paper discussed above, he even documented persistent controversies among pea breeders due to rogue troubles (Weldon, 1902: 246–50; Charnley, 2013; Charnley & Radick, 2013: 229).

Problematic for breeders, rogue peas plants are nevertheless the stars of my optimistic parable. Perhaps Weldon’s most attentive reader was Bateson, whose Mendel’s Principles of Heredity: A Defence (1902) is at heart an extended take-down of Weldon’s paper (Bateson, 1902). Addressing breeders at a New York hybridization conference a few months after the book had come out, Bateson trumpeted Mendelism’s solution of the rogue problem as one of its greatest attractions for his audience. According to Bateson, once it was understood that a plant showing a dominant trait could be either homozygous or heterozygous, and care was taken to ensure breeding from homozygotes only, the tragedy of rogues would disappear. But Bateson well new that the kinds of roguish returns that fascinated the likes of Weldon were not the absent-for-a-generation recessive traits featuring in Mendelian analyses but the absent-for-many-generations atavisms which Mendelian analyses, with their indifference to ancestors beyond the true-breeding parents, did not even register, let alone explain (Bateson, 1904; Radick, 2013).

In the heat of battle with Weldon, Bateson declared Mendelism victorious over rogues. When that victory was secure, however, Bateson allowed that maybe there was indeed more to be learned about rogues. During the 1910s, when he directed the newly founded John Innes Institute, the study of rogue peas became a major research project, conducted in collaboration with Caroline Pellew. Bateson and Pellew became convinced that though some rogues could be explained away as due to contamination or heterozygosity, not all could. As they put it in a 1915 paper:

The term “rogue” is applied by English seed growers to any plants in a crop which do not come true to the variety sown.… When peas are grown for seed on a commercial scale it will be readily understood that untrue plants are introduced in various ways, mixture, crossing by insects, and the persistent recurrence of a recessive form being the most obvious sources of such plants… . but the facts preclude the supposition that the special rogues with which we are here concerned are introduced either by mixture or crossing, nor can they be regarded as recessives coming from a heterozygote in the ordinary sense.

When Bateson and Pellew crossed these “special rogues” with normal peas, the hybrids all showed the rogue phenotype (indicated that rogueishness was dominant). On Mendelian expectations, the self-fertilizing of these hybrids should have produced offspring showing a 3-to-1, rogue-to-normal ratio. Instead, however, all of the offspring showed the rogue phenotype (Bateson & Pellew, 1915, quotation on 13–4; Charnley, 2011: 114–20).

Bateson and Pellow never got to the bottom of what was behind the rogueish characters and inheritance patterns of the pea plants they collected. But their research was well regarded, so much so that in 1922–3 Bateson served as an expert witness in a court case on whether a pea breeder was liable for the extreme proportion of rogues in some seed bought from him (Radick, 2013).

The rogue pea data Bateson and Pellew reported have remained accessible from their day to ours. What has kept their data tantalizing is not just the gradual emergence of a body of theorizing and technique suitable for investigating such cases (Le Goff et al., 2021: 38), but the prospect of getting beneath the data by working with similar-looking and similar-behaving pea plants as these have come to the attention of breeders. That was true in the 1960s and 1970s, when two John Innes researchers in succession, Kenneth Dodds and Peter Matthews, had a go – but with little to show for it (Matthews, 1973). And it was true in the 2010s, when the agronomist-geneticist José Leitão, based at the Laboratory of Genomics and Genetic Improvement in Faro, Portugal, became intrigued (Anon, 2021). What piqued Leitão’s interest was the resemblance he noticed to similar inheritance patterns in other plants known to be due not to genetic differences but to epigenetic ones – that is, to differences in the immediate biochemical environment of the DNA sequence. He honed in on pea seeds held at the GermPlasm Resource Unit of the John Innes Centre (as it is now called) from two lines: a non-rogue variety, called Onward; and a rogue variety derived from Onward and showing the same off-type characters which Bateson and Pellew had studied (known as “rabbit ears,” because the narrow, pointed leaflets and stipules give the plants a rabbity aspect). Analysis of DNA in the two lines revealed them to be highly similar genetically. Epigenetically, however, they were different, with Leitão’s team identifying a number of methyl groups present in the epigenome of the rogue line but absent from the non-rogue line (Santo et al., 2017).

Are the epigenetic differences responsible for the differences in character? The answer remains elusive. Leitão’s team managed to carry out expression studies on fourteen pairs of genome segments, methylated (from the rogue line) and unmethylated (from the non-rogue line) – but no significant differences in gene expression were found. In their paper Leitão and his colleagues suggested that perhaps resolution lies with analysis of larger segments of genome/epigenome:

additional studies are needed to unveil the biological consequences of the identified differential methylation. For the moment, we can only speculate that the observed alterations in DNA methylation, and eventual modifications in chromatin conformation, probably spread over larger genomic regions encompassing the identified sequences, and eventually affect the expression of other, surrounding, genes. (Santo et al., 2017: 6)

It is early days for the study of the molecular epigenetics of rogue peas (see too Pereira & Leitão, 2021). But they already look not just “non-Mendelian” but potentially Weldonian, in that the key to understanding them may turn out to lie in differences in internal context of the sort routinely stripped out in the course of Mendelian standardization. And that key will have been found because investigators had access not merely to historic data but to plausible surrogates for historic plant material.

4 Conclusion

So, seeds matter, not just for all the familiar reasons, but for what access to them can do for anyone wishing to make new uses of old plant data. To say that is not, of course, to say that only seeds matter, as though contexts for DNA are interesting up to the seed-coat barrier but not beyond it. Undoubtedly, my second parable would be more fully illustrative of the moral that I wish to draw from it had the John Innes Centre looked after its seeds in situ rather than ex-situ; had the soil and climatic conditions under which the rogue pea seeds studied by Bateson and Pellew proved somehow indispensable for the expression of the rogue phenotype; and had, over the decades, the seeds and the conditions alike been subjects of rigorous monitoring programs, enabling detection, and remedy, of any deviations. Even so, the rogue pea phenotype’s depending not on genes but on extra-genetic context suffices to underscore the point that, when it comes to dealing with the theory-ladenness of old data, the greater our access to the original materials that generated that data, or to plausible surrogates, in all their contextualized complexity, the better, because the less beholden we are to old conceptual choices that we might now want to question.

How much better? On the one hand, as have seen, ours is a scientific agriculture that grants to the systematic study of phenotypic plasticity not just a name (“phenomics”) but a place on the data-linkage map of the future. Context looks well catered for already, thank you. On the other hand, that map is one where all the data generated and integrated so frictionlessly support the development of plant varieties which are distinct, uniform and stable. As another contributor to this volume, Mrinalini Kochupillai, has emphasized, the commercial promotion of varieties meeting these criteria has been a disaster for global crop biodiversity, with knock-on effects for human health and for the environment, not least because chemical “inputs” are typically part of the package that farmers buy when they abandon local landraces for commercial varieties (Kochupillai & Köninger, forthcoming). There is room, then, even in the age of phenomics, for taking a much more expansive view of what our duties are when it comes to the contexts in which the genes in our seeds have their effects: duties of conservation, curiosity and care.

Let me end with a story that I learned from Kochupillai’s brilliant 2016 book Promoting Sustainable Innovations in Plant Varieties. There she wrote about Albert Howard, a Cambridge-trained agricultural botanist from just before the Bateson-Biffen era (Kochupillai, 2016: 84, 90). Howard went on to become a scientific student of traditional agriculture in India. In the counterfactual history that no one has written in which world agriculture in the twentieth century went organic rather than chemical, Howard is the Norman Borlaug figure. According to Kochupillai, Howard reported that Indian farmers in the 1930s were getting sugarcane yields that, she says, have not been surpassed even today. That is an interesting datum. But what would be more interesting still would be an attempt to recreate that feat, using seeds descended from those varieties in use in the 1930s as well as the “green manure” that Howard wrote about, with the seeds planted and the manure applied in the soil types and climatic conditions where the sugarcane that he observed was grown. The fields growing those seeds under those conditions would be true grounds for optimism.