Tree Genetics & Genomes

, Volume 3, Issue 2, pp 141–152

Implications of natural propagule flow for containment of genetically modified forest trees


    • Department Ecology, Evolution and Natural Resources, Cook CollegeRutgers University
  • Juan J. Robledo-Arnuncio
    • Department Ecology, Evolution and Natural Resources, Cook CollegeRutgers University
    • Laboratoire Génétique et Environnement, Université de Montpellier IIInstitut des Sciences de l’Evolution
  • Santiago C. González-Martínez
    • Departamento de Sistemas y Recursos ForestalesCIFOR-INIA
Original Paper

DOI: 10.1007/s11295-006-0075-8

Cite this article as:
Smouse, P.E., Robledo-Arnuncio, J.J. & González-Martínez, S.C. Tree Genetics & Genomes (2007) 3: 141. doi:10.1007/s11295-006-0075-8


Propagule flow in populations of virtually all organisms has importance for both the genetic cohesion of the species and for its interaction with natural selection. It’s relevance` for the deployment of genetically modified organisms (GMOs) is that propagules can be expected to move, under a wide range of circumstances, and will carry transgenic elements with them. Any consideration of the potential risks of deploying GMOs in the wild must include an assessment of how far and how fast introduced elements are transferred to surrounding conspecific (and sometimes congeneric) populations. In practice, we need estimates of the rates/distances of both pollen and seed movement. There are analytical methods to characterize seed (maternity), pollen (paternity), and established offspring (parent-pair) data, but spatial limitations restrict the area that one can study, and these approaches require modification for application to propagule flow in GMOs. We can apply indirect methods to estimate male gamete dispersal based on pollen pool analysis for single mothers, when some degree of precision can be sacrificed in return for compensating gains in the spatial coverage, but the loss of precision is problematic for GMO tracking. Special methods have been developed for GMO tracking, and we shall show how to assess spatial movement of both transgene-carrying seeds and pollen and will illustrate with an example from Brassica napus, a well-studied crop species.


Forest treesGene flowGMO escapesMonitoringTransgenic risks


Any thorough evaluation of the risks associated with deployment of a genetically modified organism (henceforth, GMO) into field situations and particularly into large-scale forest tree plantations must consider: (1) propagule dispersal rates and distances, (2) the mating rate of GMOs with native individuals and the level of inter-fertility with congeners, (3) the probability of successful establishment in wild populations of either the GMO genotypes or derivative progeny carrying the GMO element, (4) the long-term viability of GMO-derived germplasm under natural conditions, and (5) the potential detrimental effects of an altered genetic element on native populations. Evolutionary biologists can inform the discussion by deploying available theory and by contributing estimation methods for transgenic flow distances and introgression rates that are readily usable in GMO context (e.g., Rieger et al. 2002; Klein et al. 2006), as well as by elaborating the fitness consequences of GMO escape (e.g., Meagher et al. 2003; Williams and Davis 2005).

This special issue is designed to deal with many questions surrounding the risks of GMO release, but our particular and proximal charge for this paper is to deal with measuring the flow of the altered genetic element into neighboring conspecific and congeneric populations. Early assessment of GMOs is typically done in a fashion that does not permit substantial dispersion of transgenic elements into the environment, so we still have limited experience with GMO dispersion, especially in forest trees, where deployment of GMOs is relatively recent. Some theory now exists (Slavov et al. 2002; Meagher et al. 2003; Messeguer 2003; Potts et al. 2003; Linacre and Ades 2004; DiFazio et al. 2004), and empirical experience is starting to accumulate (Saeglitz et al. 2000; Eastham and Sweet 2002; Rieger et al. 2002; Arnaud et al. 2003; Stewart et al. 2003; Watrud et al. 2004; Williams and Davis 2005; Andow and Zwahlen 2006), but we are still, for the most part, extrapolating from studies of propagule flow in natural populations. Traditional methods of gene flow estimation provide us with a backdrop of what to expect, but they have their limitations in GMO context, and new approaches are being developed.

The literature on the estimation of pollen movement across the landscape is extensive, and both direct (paternity-based) and indirect (pollen cloud structure-based) methods are in current use (reviewed by Adams and Burczyk 2000; Smouse and Sork 2004; Sork and Smouse 2006, see also Smouse and Robledo-Arnuncio 2005; Robledo-Arnuncio et al. 2006). A parallel literature is developing on seed movement (Cain et al. 2000; Nathan and Muller-Landau 2000; Nathan et al. 2003; Grace et al. 2004; Williams et al. 2006) including studies based on genetic assay of maternally-inherited seed/seedling markers (Godoy and Jordano 2001; Ziegenhagen et al. 2003; Grivet et al. 2005; Jones et al. 2005). Several studies have been based on movement and successful establishment of offspring, but they have generally failed to distinguish between the pollen and seed flow components of the dispersal process (e.g., Meagher and Thompson 1987; Dow and Ashley 1996; Isagi et al. 2000; González-Martínez et al. 2002; Valbuena-Carabaña et al. 2005; Sato et al. 2006). This methodology has been extended to jointly estimating seed and pollen flow parameters from established offspring (Burczyk et al. 2006; González-Martínez et al. 2006). In general, we know that forest tree propagules move long distances but that successful establishment is greatly affected by post-dispersal processes related to microhabitat, competition, and a variety of other factors.

Additional considerations are particularly germane in GMO context. For long-lived perennials such as forest trees, a single reproductive bout might or might not capture what can happen over the course of a reproductive life span (e.g., Irwin et al. 2003; Smouse et al. 2005; Nakanishi et al. 2005, but see Schnabel et al. 1998). Indeed, inter-annual variance in gene flow is of considerable importance within a forest tree GMO context because recurrent gene flow in long-lived species can accelerate transgenic assimilation processes (Haygood et al. 2004, see Table 1 in Andow and Zwahlen 2006).

In general, whereas short-distance dispersal (SDD) determines the immediate local consequences of GMO release, the long-term impact of release is dominated by long distance dispersal (LDD) because rare events (in time and space) dominate long-term rates and geographic scales of spread, determining the ultimate evolutionary impact of genetic elements (LeCorre et al. 1997; Cain et al. 2000; Austerlitz and Garnier-Géré 2003; Bohrer et al. 2005; Bialozyt et al. 2006; Williams et al. 2006). In GMO context, it will be important to optimize genetic and statistical resolution for both SDD and LDD parameter estimates. Estimating SDD is relatively easy, given adequate data and statistical approaches; LDD is generally much harder to characterize.


Our charge in this paper is to provide a brief overview of the available statistical-genetic methods for estimation of pollen and seed flow. Specifically, (1) we will contrast the strengths of direct (parentage-based) and indirect (structure-based) approaches. (2) We will describe what they can tell us about GMO propagule movements. In specifically GMO context: (3) we will elucidate some special features of GMOs that require/allow us to modify existing analytical methods. (4) We will comment on our ability to establish the LDD parameters for a particular GMO from the SDD data we can easily obtain. Finally, (5) we will illustrate some of these approaches with a series of studies from Brassica napus, a well-studied agricultural crop.

General methods for estimating propagule flow

Pollen pool analysis

The classic challenge for pollen flow analysis is to identify the pollen donor so that we can assess the features that determine who mates with whom, typically a strongly declining function of inter-mate distance, sometimes also involving considerations such as phenological overlap, genetic incompatibility factors, and other features of interest. Traditionally, it has been difficult to determine the pollen donor with high confidence due to low genetic resolution and incomplete sampling of parents. It is still possible to assess pollen pool structure with a TwoGener analysis, based on the premise that seed-parents that are spaced far enough apart on the landscape will share very few pollen donors in common (Smouse et al. 2001; Sork et al. 2002). Eschewing the precise designation of any particular pollen donor, we concentrate instead on contrasting the male gametic arrays received by different mothers. The essence of the analysis is to extract information on the multi-locus male gametic genotypes of single seeds, using standard parentage analyses of diploid embryo and maternal genotypes, and then to subject the male gametes to a standard analysis of molecular variance (Excoffier et al. 1992), extracting the intra-class (intra-mother) correlation coefficient of pollen genotypes (Φft), a measure of ‘pollen structure’ that is analogous to the standard measure of population divergence (Fst). From Φft, it is possible to obtain estimates of the effective number of pollen donors contributing to the average seed parent (Nep) and to obtain indirect estimates of the average distance of pollen flow (δ), using one of several different pollen dispersal distributions (Austerlitz and Smouse 2001a, 2002; Austerlitz et al. 2004). Both accuracy and precision are dependent on the validity of a series of assumptions about the randomness of mating, equability of male reproductive contributions, and lack of adult spatial genetic structure. As long as the assumptions are approximately met, the estimates are reasonable (Austerlitz and Smouse 2001b; Burczyk and Koralewski 2005; Robledo-Arnuncio et al. 2006).

The real utility of TwoGener is that it permits comparison of pollen structure under different ecological and managerial (Dyer and Sork 2001; Robledo-Arnuncio et al. 2004; Sork et al. 2005) conditions. The method can also be used to explore temporal consistency of pollen donor pools (Irwin et al. 2003; Austerlitz et al. 2004; Nakanishi et al. 2005). With some auxiliary information on adult genetic structure, it is possible to assess its impact on pollen pool genetic structure (Austerlitz and Smouse 2001b; Dyer et al. 2004). By forsaking fine detail, we can paint the picture of pollen flow across the landscape ‘with a broad brush’ (Smouse and Sork 2004; Sork and Smouse 2006), developing some sense of the ‘shape of the dispersal tail’ (Austerlitz et al. 2004; Robledo-Arnuncio et al. 2006). We can provide some sense of the scale of concern for pollen movements in GMO tree crops. We note that ‘pollen structure’ analysis has recently been extended to ‘seed structure’ analysis of established seedlings (Grivet et al. 2006). Further extensions seem possible, but the larger issue is that any ‘propagule structure’ analysis sacrifices important nuances that are better elucidated with classical neighborhood-based paternity analysis.

Neighborhood-based paternity analysis

In pollen flow context, the intent of paternity analysis is to use genetic markers to determine the sire for each of a large number of seeds collected from each of several mothers, and then tally the array of inter-mate physical distances. Apart from unambiguous identification of the male parent, achievable only with an extensive battery of DNA markers and large-scale sampling, the central idea is to fit the results to a chosen dispersal kernel, determining the mean dispersal distance, and the tail probability (i.e., the probability of paternity from outside the local neighborhood, treated as LDD). There is an extensive literature, starting with the description of the complete ‘neighborhood’ mating model by Adams and Birkes (1989, 1991; see also Sork et al. 1999; Jones and Ardren 2003; Smouse and Sork 2004; Burczyk and Koralewski 2005). Given a mother (Mi) of known location and her offspring (Oij), both with known multi-locus genotypes and an array of comparable genotypes from an exhaustive collection of local paternal candidates (Fk, k = 1,..., K) within a circumscribed local “neighborhood”, the probability of the genotype of Oij is:
$$ \Pr {\left( {{\text{O}}_{{ij}} } \right)} = s \bullet X_{{iji}} + m \bullet X_{{ij \bullet }} + {\left( {1 - s - m} \right)}\raise0.145em\hbox{${\scriptscriptstyle \bullet}$}{\sum\limits_{k\, = \,1}^K {\lambda _{{ik}} \raise0.145em\hbox{${\scriptscriptstyle \bullet}$}X_{{ijk}} \;,} } $$
where s is the selfing rate (for those organisms that can self), m is the immigration rate (from outside the local “neighborhood”), Xiji is the Mendelian probability of the recovered genotype for Oij, given selfing of its mother, Xij. is the Mendelian probability of some outside (but unknown) father producing that offspring with mother Mi, averaged (with allele frequency weighting) over all possible outside fathers, Xijk is the Mendelian probability of producing that seed from mother Mi and the particular (inside) paternal candidate Fk, and λik is the proportional paternal contribution of male Fk to female Mi, a combination of his intrinsic male contribution (λk) and his distance (zik) from Mi. The third term is summed over all possible internal fathers, each weighted by his relative reproductive contribution and inversely by his distance from the mother. The probabilities are multiplied across progeny, and the product (retrospectively, a likelihood) is optimized for all parameters simultaneously.
The real targets of interest in this studies are the distance equation contributing to the λik-values and the immigration rate, m; all the rest are nuisance parameters that must be jointly estimated. Traditional work has employed the exponential distribution to describe the decay of λik with distance, but accumulating experience suggests that a two parameter (α, β) exponential power curve is better (Clark et al. 1999; Austerlitz et al. 2004),
$$ \lambda _{{ik}} \propto \lambda _{k} \raise0.145em\hbox{${\scriptscriptstyle \bullet}$}\exp {\left\{ { - {\left( {{z_{{ik}} } \mathord{\left/ {\vphantom {{z_{{ik}} } \alpha }} \right. \kern-\nulldelimiterspace} \alpha } \right)}^{{\text{ $ \beta $ }}} } \right\}}, $$
also conveniently written as
$$ \log _{{\text{e}}} {\left( {\lambda _{{ik}} } \right)} \propto \log _{{\text{e}}} {\left( {\lambda _{k} } \right)} - {\left( {{z_{{ik}} } \mathord{\left/ {\vphantom {{z_{{ik}} } \alpha }} \right. \kern-\nulldelimiterspace} \alpha } \right)}^{{\text{ $ \beta $ }}} , $$
where zik is (once again) the distance between the father (Fk) and the mother (Mi), providing a ‘fat tail’ when β < 1 (Oddou-Muratorio et al. 2005). We illustrate the tail probability impact of thin (β > 2) vs fat (β < 1) tails in Fig. 1. Figure 1a shows increasing steepness and ‘fatness’ of the tail as the β-value falls. Figure 1b shows a semi-log plot of the same distributions (Y-axis in logarithmic form), highlighting the different probabilities for large distances. Accumulating experience shows that β < 1 for most forest trees and environments.
Fig. 1

The probability distribution of pollen dispersal, modeled as bivariate exponential power distributions, as a declining function of increasing physical distance, for the β = 2 (bivariate normal), β = 1 (bivariate negative exponential), and β = 0.75 (bivariate exponential power) cases; a probability plotted against inter-mate distance, and b the logarithm of probability plotted against inter-mate distance

More recently, several other families of pollen dispersal curves have been examined, among them the gamma, Weibull, geometric, two-dimensional t, and generalized logistic distributions (see Klein et al. 2006). In principle, any of these dispersal families can be extended to allow for an asymmetric directional component (Burczyk et al. 1996), thus providing a more complete model of spatial effects on pollen dispersal. Meagher et al. (2003) provided an example of such a model in the context of transgene dispersal risk assessment for Agrostis stolonifera L. (creeping bentgrass). Using an asymmetric exponential function, they found high levels of expected gene flow and consequently a high risk of GMO escape, as well as a strong directional component, probably related to prevailing wind direction, consistent over the 2 years of their study. Watrud et al. (2004) have subsequently used a two-parameter Gamma distribution for the same species and purpose, finding even more asymmetric long-distance pollen flow.

The difficulty of fitting any of these models, however, is evident in Fig. 1a. As a practical matter, sampling of potential pollen donors seldom extends beyond 300–500 m from the seed parent, and outside that limited neighborhood, there are no data. We cannot extrapolate the curves beyond the data with much statistical credibility, yet the tails of these distributions are of paramount importance for LDD. Figure 1b shows that the curves diverge radically beyond distances we can easily measure for natural populations. Although we can assess successful the fertilization of a seed drawn directly from a mother, it is more difficult to measure seed dispersal, germination, and survival. We obtain an accurate but partial picture.

Provided that we can identify the father, this method permits direct measurement of pollination distance. The irony is that in practice, the circle defining the local neighborhood, within which parental sampling is exhaustive (and estimation of α and β is effective), is conditioned strongly by sampling constraints and commonly accounts for no more than 30–50% of male parentage. The parameter m decreases as the circle of paternal inclusion is pushed progressively outward. Once the distribution begins to flatten out (see Fig. 1a), that decrease is small, but it is not small for changing neighborhood sizes within the zone where the curve is falling sharply. As a general rule, the zone of inclusion has increased as our sampling capabilities have improved, but the shorter is the spatial extent over which we estimate the dispersal curve, the greater is the uncertainty in extrapolating from measurable SDD (inside a neighborhood) to non-observable (outside a neighborhood) dispersal (LDD).

Seed flow analysis

Traditionally, the physical movement of seeds has been gauged directly (Clark et al. 1999; Levey and Sargent 2000; Nathan and Muller-Landau 2000; Gómez 2003). The genetic analysis of seed flow has lagged behind that of pollen flow, given the difficulties of establishing maternal genotype and location when both parents are unknown, the classic “two parent problem” (Meagher and Thompson 1987; Jones and Ardren 2003; Grace et al. 2004), and categorical maternal designation has been elusive. Petit et al. (2002a,b) have used maternally inherited chloroplast DNA to track LDD on a biogeographic scale, but there is seldom enough local cpDNA variation in angiosperms to determine maternity precisely. The analysis of nuclear DNA from maternally-inherited seed coat tissues has expanded the range of maternity analysis for local seed dispersal, and these techniques have been useful in Prunus (Godoy and Jordano 2001), Quercus (Grivet et al. 2005), and Jacaranda (Jones et al. 2005).

Given multilocus nuclear genotypes of maternal tissue from a collection of seeds, we can assign those seeds to one of several local mothers. Assuming that we have a genetic battery of sufficient resolving power and that the seeds have not been dispersed very far by the relevant seed vectors, we can designate precise maternal locations. All we have to do is establish the dispersal curve, via Eqs. 2a or 2b. With several mothers, we can isolate the effects of the maternal fecundity parameters (λi-values) from the distance function itself, the real target of interest. This is a new and still evolving arena with much still to be done, but it is very promising.

Established propagule analysis

Starting with the early work on Chamaelirium luteum by Meagher and Thompson (1987), several studies, including a few studies in oak (Dow and Ashley 1996; Streiff et al. 1999; Valbuena-Carabaña et al. 2005), honey locust (Schnabel et al. 1998), magnolia (Isagi et al. 2000), and maritime pine González-Martínez et al. (2002), have used parentage analysis for successfully established offspring to obtain insight into effective dispersal processes, dealing with the combined effects of dispersal and pre-recruitment selection in the offspring. Generally, parentage analyses of established offspring have not been able to distinguish between the relative contributions of pollen and seed flow to effective dispersal, with the exception of a few studies dealing with dioecious species such as C. luteum (Meagher and Thompson 1987), Gleditsia triacanthos (Schnabel et al. 1998), and Cercidiphyllum japonicum (Sato et al. 2006).

Seedling neighborhood model

The seedling neighborhood model is an extension of the neighborhood-based paternity analysis (Eqs. 1, 2a, and 2b above) to the more complicated case of established offspring. This new model provides joint maximum likelihood estimates of effective pollen and seed flow and has been successfully applied to oak (Burczyk et al. 2006) and pine González-Martínez et al. (2006), despite generally high levels of pollen immigration in these species. In this model, the probability of observing a multilocus diploid genotype Gi among the offspring is:
$$ P{\left( {{\text{G}}_{i} } \right)} = m_{s} \raise0.145em\hbox{${\scriptscriptstyle \bullet}$}\,P{\left( {{\text{G}}_{i} \left| {{\text{B}}_{s} } \right.} \right)} + {\left( {1 - m_{s} } \right)}\raise0.145em\hbox{${\scriptscriptstyle \bullet}$}{\sum\limits_j {\psi _{{ij}} } }\raise0.145em\hbox{${\scriptscriptstyle \bullet}$}{\left[ {s\,\raise0.145em\hbox{${\scriptscriptstyle \bullet}$}\,P{\left( {{\text{G}}_{i} \left| {{\text{M}}_{{ij}} ,{\text{M}}_{{ij}} } \right.} \right)} + m_{p} \raise0.145em\hbox{${\scriptscriptstyle \bullet}$}\,P{\left( {{\text{G}}_{i} \left| {{\text{M}}_{{ij}} ,{\text{B}}_{p} } \right.} \right)} + {\left( {1 - s - m_{p} } \right)}\raise0.145em\hbox{${\scriptscriptstyle \bullet}$}{\sum\limits_k {\varphi _{{ijk}} \raise0.145em\hbox{${\scriptscriptstyle \bullet}$}P{\left( {{\text{G}}_{i} \left| {{\text{M}}_{{ij}} ,{\text{F}}_{{ijk}} } \right.} \right)}} }} \right]}, $$
where ms and mp are the immigration rates (from outside the local neighborhood) for seed and pollen, respectively, and s is the selfing rate. P(Gi|Bs) is the probability that an offspring immigrating from mother trees located outside of an offspring’s neighborhood (background females) has genotype Gi, and P(Gi|Mij, Mij), P(Gi|Mij, Bp), P(Gi|Mij, Fijk) are probabilities that an offspring has diploid genotype Gi when a mother plant of genotype Mij is self-pollinated, pollinated by a distant and unsampled male, or pollinated by a neighboring plant having genotype Fijk, respectively. The parameter ψij is the relative reproductive success of the j-th female in the neighborhood of the i-th offspring, and ϕijk is the relative reproductive success of the k-th male within the neighborhood of the ij-th female. Reproductive success is regressed against distance, using a log-linear model similar to Eqs. 2a and 2b but using an exponential function.

Burczyk et al. (2006) have shown, using extensive simulation, that this model is robust to different levels of pollen and seed immigration, given sufficient sample sizes (>500 seedlings) and high exclusion power of the marker array (typically obtained with as few as six nSSRs). Sample size requirements are severe in this study (perhaps beyond the scope of some studies), but the approach has the virtue of accounting for post-dispersal processes affecting offspring recruitment. It still does not solve the uncertainty in estimating unobservable LDD (from outside the neighborhood) from observable SDD (from inside the neighborhood). As usual with the neighborhood model, the estimates of ms and mp will depend critically on the radius of inclusion used to define the neighborhoods.

What we have learned, in general

Traditional methods have some limitations for the precise study of GMO propagule flow, but numerous studies of forest tree species yield general lessons that can inform any effort to establish plantations of GMOs. The exquisite detail will obviously depend on the species and situation under examination, but the ‘big picture’ can be summarized as follows. It is appropriate to begin with the realization that propagules (both pollen and seed) have evolved over hundreds of millions of years as the organism’s vehicle for movement, and they are superbly effective at accomplishing that. (a) Most propagule movement is localized, but the dispersal distribution has a long tail. (b) Particularly in forest tree species, at least some wind- and animal-dispersed propagules can and do move substantial distances. (c) Studying SDD is now a tractable exercise, but studying LDD is substantially more challenging. (d) More often than not, we are using the observable SDD to predict unobservable LDD, employing mathematical/statistical models of some assumed pollen or seed dispersal distribution (dispersal kernel). (e) Different dispersal kernels fit the observed SDD data almost equally well but predict quite different tail probabilities and distance characteristics for the unobservable LDD of greater long-term interest. (f) Effective dispersal kernels are very different, depending on when in the life cycle we assess them (pollen, seed, seedlings, and saplings), due to non-random survival from one stage to the next. For example, the so-called Janzen–Connell model (Janzen 1970; Connell 1971, see also Nathan and Casagrandi 2004) predicts greater average distance between mothers and successfully established offspring than expected by seed dispersal alone for species with high density-dependent mortality at early life history stages. Sheer dispersal distance is not the whole story.

Some additional challenges for GMOs

For wild populations, we are typically dealing with the scale over which pollen or seed dispersal occurs under natural circumstances, but the issue for GMOs is the detection of transgenic introgression into wild populations beyond the mere movement of propagules (González-Martínez et al. 2005). Traditional estimation methods for propagule flow provide useful information for general modeling purposes, but for empiric GMO analysis, we have some additional challenges.

Both technical and economic constraints will limit the development of GMOs to a few species of commercial importance. About 85% of the applications for field testing of transgenic tree plantations involve Populus, Pinus, Liquidambar and Eucalyptus, for which transformation, regeneration, and propagation protocols are feasible on a commercial scale (van Frankenhuyzen and Beardmore 2004). Several of these commercially important species hybridize with congeners under natural conditions (e.g., Pinus taeda with other southern pines, such as P. palustris or P. echinata, Schmidtling 2001). Transgenes can be expected to cross taxonomic boundaries with non-trivial probability, and we may well have to extend our tracking system to these congeners, which complicates matters.

The size of the source (GMO) and recipient (wild) populations, as well as their spatial separation, should obviously have an impact on the risk of transgenic escape. Theoretical models, based on the probability density function of dispersal (kernel function) from individual plants, indicate a steep decline of cross-pollination rates with increasing distance between fields for oilseed rape (Colbach et al. 2001) and maize (Angevin et al. 2003), as well as increasing pollen flow with increasing source population size (Klein et al. 2006). Theoretical considerations suggest that these purely phenomenological models will underestimate LDD pollination events from experimental data (Klein et al. 2006), but the relevant empirical evidence remains inconclusive. Some studies show a strong isolation by distance effect for crop species (Eastham and Sweet 2002; Beckie et al. 2003), but others report uniform introgression rates with increasing distance from the source, over the scale of the analysis (Rieger et al. 2002). To determine the impact of population size and isolation for GMO escape rates, we will need landscape-scale experimentation that contrasts demographic conditions, especially for tree species.

Moreover, for long-lived iteroparous forest tree species, analysis of a single reproductive bout is not sufficient to characterize the course of a reproductive life span (Irwin et al. 2003; Smouse et al. 2005; Nakanishi et al. 2005). Accumulating experience suggests that the effective number of pollen donors per female for a single year has to be multiplied by a factor ranging between 1.4 (Irwin et al. 2003) and 2.5 (Smouse et al. 2005) to obtain a credible extrapolation over a decade. Mating pattern tends to be relatively coherent from 1 year to the next for the major pollen donors contributing to a single female, but the minor contributors are unpredictable from year to year. A forest plantation will be producing both pollen and seed propagules for at least a decade before harvest, and a 1-year study will generally not be sufficient for risk analysis.

We have mentioned the fact that long-term spread of the introduced genetic elements depends more on LDD than on SDD. Quite apart from the obvious difficulties of estimating LDD, the effects are exacerbated over time. For example, given a tail probability of 0.10 for successful GMO dispersal in a given year, the decadal rate becomes \( {\left[ {1 - {\left( {1 - 0.10} \right)}^{{10}} } \right]} = 0.65 \), although that could be viewed as a ‘worst case’ scenario. However, even if the single-year rate is only 0.01, that translates into a decadal risk of \( {\left[ {1 - {\left( {1 - 0.01} \right)}^{{10}} } \right]} = 0.096 \). Viewing LDD as ‘escape’, the long-term prospects for escape are sobering. Inasmuch as LDD dominates the evolutionary fate of any particular gene over any extended geographic scale (Petit et al. 2002a,b; Nathan et al. 2003; Austerlitz and Garnier-Géré 2003; Williams et al. 2006), it becomes clear that we have to ‘think longer-term and larger scale’ than is traditional in gene flow and dispersal studies.

Specific methodology to assess GMO escape

Classical methods are primarily designed to allow us to describe the pattern of local propagule flow (SDD), but GMO escape probability will be also greatly determined by LDD, for which classical methods yield inadequate resolution. Long distance movement and subsequent introgression events may well be rare, but they are important, and their detection will require intensive sampling over broad spatial scales. As is the case for agricultural crops, plantations of GMO trees will generally be surrounded by wild populations of interfertile conspecifics and/or congenerics. GMO risk analysis should focus on the effective detection of transgenic propagule flow among patches and/or populations spread across an extended landscape, without regard to which individual parents are the sources of the propagules. We need analytical tools that permit real-time tracking of particular gene sequences on a landscape scale and across generations. The issue is less one of parental designation than it is one of detection of the GMO element.

Following the transgenic element directly

The most straightforward method is to follow the transgenic construct itself, which often includes DNA from organisms different from the target species, basically treating the transgene as its own ‘barcode’ for diagnostic purposes. Testing will require laboratory screening using standard molecular methods. High-throughput genotyping of DNA polymorphisms (single nucleotide polymorphisms) now permits processing of thousands of samples in very short periods of time (Kwok 2001; see also Table 2 in Hirschhorn and Daly 2005), facilitating routine barcode monitoring in natural forests. One of the more impressive transgenic trees deployed to date is a hybrid poplar that incorporates a cytosolic pine glutamine synthase (GS) gene, which results in increases of 41% in height and 36% in stem diameter growth (Jing et al. 2004). PCR primers to amplify the pine GS gene are available and can be applied to monitor gene flow from transgenic poplar plantations.

Tagging methods

Unlike traditional parentage-based estimation procedures for gene flow, detecting GMO immigration into wild populations does not necessarily require the identification of individual donor parents. Our goal reduces to determining whether a sample of propagules from the natural population derives from the GMO population (as a whole), which we can characterize via any diagnostic marker. Given that GMOs are DNA constructs, developing molecular tagging methods will often be straightforward. A range of recently developed bio- and nano-technologies can be useful for tracking GMO movement (review in Stewart 2005). Ready-for-implementation technologies include green fluorescent protein expression (GFP tagging) and DNA barcodes (see Table 1 in Stewart 2005, and references therein, for a comparison of direct transgene monitoring systems). Expressed GFP can be visualized in vivo under field conditions by using a portable source of bright ultraviolet light on leaves. GFP tagging allows the monitoring of large areas and is amenable to the use of remote sensing but requires an initial investment to attach fluorescent proteins to the transgenic construct, as well as testing for health and safety. Such methods should be especially useful for assaying LDD.

Diagnostic or ‘signature’ phenotypes

Some transgenic trees will also exhibit diagnostic phenotypic characters, most notably resistance to herbicides and/or to different diseases and pests. Thus, one can monitor GMO flow by testing progenies sampled from wild adults for increased herbicide or disease/pest tolerance. This technique is already much in use for agricultural crops (Saeglitz et al. 2000; Rieger et al. 2002) because herbicide tolerance is a major target for both conventional breeding and transgenic deployment in these species, but it might be of little use in forest trees engineered for improved wood quality through modification of lignin biosynthesis or other phenotypic traits that are not easily observable in progeny. Where we can deploy ‘quick and easy’ phenotypic assay, it will allow intensive, low cost monitoring, as illustrated by Rieger et al. (2002) in commercial herbicide-resistant canola obtained by mutagenesis.

Illustration: pollen-mediated GMO escape in Brassica napus

Much of the accumulated experience on GMO ‘escape’ has come from agricultural crops, and, whereas those studies will progressively be replaced with comparable studies from forest tree crops of more immediate interest in this study, they have much to teach us in the interim. We will eventually have to measure LDD directly rather than continuing to model it, and Devaux et al. (2005) have done that for B. napus (oilseed rape), establishing several points: (1) By using microsatellite markers in non-GMO cultivars, the investigators were able to characterize the genetic variation of commercial oilseed rape cultivars over a 10 × 10 km area in France and were able to describe the pollen clouds contributing to harvested seed for many different fields. (2) Most pollination was local, but the dispersal tail was quite fat, fitting exponential power and geometric distributions substantially better than normal or exponential curves and with some seeds pollinated from as much as 3 km away. The net result was an effective number of pollen donors for a single field, which was in the order of 70% of the total varietal diversity in the study area. That result confirms the extrapolation (from natural population propagule flow studies), providing hard data from the extended tail of the distribution. Propagules do indeed flow a long way, and the tail is quite fat.

Rieger et al. (2002) quantified the level and spatial pattern of gene flow from fields of mutagenically produced, herbicide-resistant canola to nearby conventional (non-resistant) crops, providing an illustrative example of the great potential of diagnostic-phenotype approaches for large-scale transgene escape assessment. The study involved 63 conventional canola fields, 25–100 ha each, representing over half of the canola growing area in Australia, with intensive sampling (about 300,000 seeds per field) effort. They collected samples of seeds within the conventional (non-resistant) plantings at different distances from the herbicide-resistant plantings and assessed gene flow events by screening with a lethal discriminating dose of herbicide. They found within-field frequencies of resistant plants ranging from 0–0.07% in conventional crops, showing a uniform dispersal distribution up to 3,000 m but with no introgression events observed beyond this distance from the herbicide-resistant plantings. Natural herbicide resistance of native plants was not a confounding factor because its frequency had been measured as 0.0001% in previous experiments.

Working in Canadian fields of the same species, Beckie et al. (2003) also used lethal discriminating doses of herbicide to detect transgenic escape events and employed two additional kinds of markers: (1) protein commercial test strips (detecting a protein produced by the transgene) and (2) DNA–PCR-based screening for the transgene itself. They applied all three of these procedures to assess transgene introgression, assaying seeds along several transects, perpendicular to the border between the fields, in two consecutive years. In contrast to the Australian work, they demonstrated a strong distance effect, with the frequency of transgenic introgression declining with increasing distance from the source population (between 0 and 800 m). Interestingly, they also detected successfully established transgenic plants in the second year, which originated from seed losses during harvesting operations of the first year, demonstrating multi-generational possibilities for transgenic dispersal in subsequent years and into nearby fields. They concluded that the protein test strips provided the most consistent results, whereas the PCR-based procedure failed to amplify the gene product in several cases where the other two methods indicated a positive reaction.

Klein et al. (2006) modeled a decline in the frequency of GMO introgression with increasing distance from a source population, used the model to fit field data, thus predicting transgene flow rates among fields of different size and shapes. They defined an individual pollen dispersal function, γ(x, y), for the probability of movement from a source individual at (0, 0) to a location at (x, y). Given the function γ(x, y), they computed an expected proportion of transgenic pollen gametes in the pollen pool at (x, y), looking backward, as
$$ \mu {\left( {x,y} \right)} = \frac{{{\int\limits_{{\left( {x,y} \right)} \in A} {\gamma {\left( {x - x\prime ,y - y\prime } \right)}dx\prime dy\prime } }}} {{{\int\limits_{{\left( {x\prime ,y\prime } \right)}\; \in \;A} {\gamma {\left( {x - x\prime ,y - y\prime } \right)}dx\prime dy\prime \kappa {\int\limits_{{\left( {x\prime ,y\prime } \right)}\; \in \;B} {\gamma {\left( {x - x\prime ,y - y\prime } \right)}dx\prime dy\prime } }} }}}, $$
where A and B are the set of transgenic and conventional plants, respectively, and κ is the relative pollen production per unit area of conventional plants, relative to that of transgenic plants. In their experiment, B was a 90 × 90 m field and A was a 10 × 10 m plot in the middle of B, sown with transgenic individuals. They sampled seeds from non-resistant plants across B, tested their resistance with a discriminating dose of herbicide, and used the observed frequencies of resistant seeds at different locations across the field to fit the individual function γ(x, y), using maximum-likelihood estimation. This approach revealed a strongly leptokurtic pattern of dispersal, which was best described by one of several alternative power law functions.

Assuming different individual dispersal functions, Klein et al. (2006) then used Eq. (4) to simulate forward pollen dispersal and to predict the frequencies of transgene flow between fields of different shapes and sizes. They reported that the shape of the tail of the dispersal curve was, along with separation distance between fields, a crucial factor determining the frequency of transgenic spread via pollen. Interestingly, they also found that their model underestimated LDD for B. napus, although less so than had previous treatments, demonstrating the continuing difficulty of modeling rare LDD events. On balance, it seems very clear that we need to devote more effort to obtaining more accurate estimates of LDD.

Other aspects of GMO risk analysis

It is appropriate to close with some thoughts on how information on propagule flow fits into the larger risk assessment issue. Risk analysis rests on estimates of: (1) propagule dispersal rates and distances, (2) the mating rate of GMOs with (unaltered) native individuals, (3) the probability of successful establishment of either the GMO genotypes themselves or derivative progeny that carry the genetically modified elements in question, (4) the long-term viability of the GMO-derived germplasm under natural conditions, and (5) the potential detrimental effects of an altered genetic element that has escaped into the wild. In this latter context, the introduction of GMO trees also raises concerns about larger effects on the ecosystem, as trees are the dominant species of many terrestrial ecosystems. Our particular charge was to deal with (1) and (2) above, but there are deeper questions about the extent to which the rate and distance of propagule flow are a sufficient characterization of the risk of GMO introgression into the surrounding wild populations.

DiFazio et al. (2004) used a spatially explicit simulation model to integrate factors such as demography, selection coefficients, and landscape structure (elevation and habitat quality) into the evaluation of the rates of landscape-scale transgenic escape from poplar plantations. Transgenes are typically not neutral in plantations, so the authors parameterized this model with abundant experimental measures of seed and pollen dispersal, seedling establishment, and population dynamics, and an array of selective pressures. A central result was the importance of the shape of dispersal kernel; whereas SDD seemed to have a very limited effect on long-term escape rates, elevated rates of LDD (a fat tail for the dispersal kernel) greatly favored the multi-generation spread of transgenes across the landscape.

Kelly et al. (2005) modeled the impact of variation in the level of insect herbivory on the survival and spread of an insect-resistance transgene in a wild recipient population and as a function of varying degrees of selective disadvantage for the transgene (in the absence of herbivory). Several different scenarios, with changing details of when in the life cycle the herbivory occurs, lead to quantitatively different results, but the general patterns were the following: (a) If the general level of herbivory is high, the transgene will persist and even increase in frequency. (b) A large selective disadvantage for the transgene, in the absence of herbivory coupled with frequent episodes of low herbivory, will compromise its survival in the wild. (c) Intermediate herbivory pressures and/or low selective disadvantage will allow the transgene to persist indefinitely in the recipient wild population.

Using a simple deterministic population genetic model, Williams and Davis (2005) investigated the evolution of transgene frequencies in P. taeda colonies founded by long-distance seed dispersal from transgenic plantations. The goal was to determine the selective and demographic conditions that would favor the spread of transgenic elements into the wild populations subsequent to LDD events. The results suggest that if a transgene confers a selective advantage in the recipient population, it has an elevated probability of persisting, but with a selective disadvantage, it is expected to progressively disappear from the wild population. If it is neutral in the recipient population, of course, standard theory suggests that it will remain in the population for as much as 4Ne generations, via genetic drift. It is worth remembering that transgenes are often deployed precisely because they are not neutral. Whether they are adaptive or maladaptive in wild germplasm is largely a matter of conjecture, but we will need some hard evidence on this point. In any event, as we deploy vast plantations of transgene-bearing forest trees, we can expect the transgenes to escape into the wild population and to persist there for a long time.

In conclusion, we can probably take the view that ‘propagules will travel’. The most pressing residual questions relate to the impact of GMO ‘escape’, specifically: (3) What is the probability of successful establishment in wild populations of either the GMO genotypes or derivative progeny carrying the GMO element? (4) What is the long-term viability of GMO-derived germplasm under natural conditions? (5) What are the potential detrimental effects of an altered genetic element on native populations? Dispersal plays a role in any credible risk assessment, but it is certainly not the whole story. We also have to determine how interactions with other organisms (and particularly any transgene target organisms) will affect the persistence and spread of transferred genetic elements (Rieseberg and Burke 2001). We close with the thought that theory and simulation are valuable and certainly a fine start on a difficult and worrisome problem, but we cannot ultimately avoid the necessity of careful field trials if we want to assess the propagule flow and establishment aspects of the situation, providing a realistic risk analysis of GMO escape.


The authors wish to thank Fred Austerlitz, Eva Gonzales, Victoria Sork, B Wang, and a trio of anonymous reviewers for many helpful comments on the manuscript. PES was supported by USDA/NJAES-17111 and by NSF-DEB-0211430 and NSF-DEB-0514956; JJR-A was supported by a postdoctoral fellowship from the Spanish Secretaría de Estado de Educación y Universidades, financed in part by the European Social Fund; SCG-M was supported by the ‘Ramón y Cajal’ fellowship RC02-2941 and by AGL2005-07440-C02-01 grant (Ministerio de Educación y Ciencia, Spain).

Copyright information

© Springer-Verlag 2007