1 Introduction

Understanding of geospatial patterns of genetic variation advances the knowledge of population genetics in addition to statistical and mathematical modeling (Epperson 2003). Landscape genetics is an effective approach for examining the influence landscape and environmental features have on population genetic structures. Although landscape genetics has deep roots in landscape ecology, population genetics, biogeography and phylogeography, it has only recently emerged as a field due to the increasing application of microsatellites (short, repetitive segments of DNA; in contrast to micro satellite, which is a type of mini satellite in remote sensing technology). Two fundamental aspects in landscape genetics are the detection of genetic discontinuities (barriers) and the correlation and explanation of the discontinuities with landscape features. Using genetic data collected by microsatellite markers, GIS and statistical methods have been effective barrier detectors (Guillot et al. 2005; Manel et al. 2003; Manni et al. 2004; Osakabe et al. 2005; Radke 1998). Barrier detection methods include isolated distance barriers (IDB), maximum difference barriers (MDB) and statistical methods.

Detecting barriers or establishing bounded point sets is a critical step in decomposing observations or data points into meaningful objects to assist spatial characterization and pattern recognition. With barriers identified and mapped, patterns of densities, distances, directions and shape can be classified, assisting in hypotheses generation, testing and eventually the explanation of form.

Genetic distance is an important measure for many indices calculated in landscape genetics, and it serves as an important baseline in barrier detection. The relationship between genetic distance and geographic space helps answer questions such as gene flow, population structure, and species distribution forces. Geographic distribution of species is mainly determined by historical accidents. Barriers between species can be versatile geographic features, and are changing over times (Slatkin 1987). In some instances they are the result of species invasion, while at other times barrier patterns may simply map out as the result of species succession. In population genetic models such as island models, it is believed since genetic distance is a metric of how populations organized spatially, that geographic distance and genetic distance are approximate when calculated across a simple landscape (Bowser 1996; Slatkin 1993; Weir 1990).

Genetic distance is also the key to link geospatial approaches to landscape genetics research. There are many methodological discussions of genetic distance, diversity and differentiation (Hamrick and Godt 1990; Hedrick 2005; Weir and Cockerham 1984; Culley et al. 2002; Nei 1973). Monmonier’s 1973 algorithm of MDB has been widely adopted in landscape genetics to detect boundaries (Manni et al. 2002, 2004). The maximum difference is calculated based on genetic distance. MDB connects sample locations with TIN (or Delaunay triangulation), and assigns genetic distances as values of the edges of TIN. MDB initiates the barriers from the largest genetic distance. Barriers computed from MDB always bisect TIN edges and align with boundaries of ordinary planar Voronoi diagrams (the mathematical dual of the TIN).

MDB has been applied to combine geographic and genetic information to identify genetic zones of plant species such as Manchurian ash across north-east China (Hu et al. 2008); of animal species such as land snail in the Western Mediterranean (Guiller et al. 2006), and common vole in northeast Poland (Ratkiewicz and Borkowska 2006); and of aquatic ecosystems such as yellow perch in Québec, Canada (Leclerc et al. 2008), scallops in the USA and Canada, and wild sea beet along the European Atlantic coast (Fievet et al. 2007). MDB is also utilized in human biology, an example exploring surnames in Spain (Boattini et al. 2007).

Successful adaptations of MDB have been used to geographically demonstrate genetic structure in combination with statistical methods such as spatial analysis of molecular variance (Santos et al. 2008; Guiller et al. 2006). In addition, since geographic distance and barriers are crucial considerations in many genetic barrier analyses, GIS, spatial algorithms and models are sought after as the demand for more effective integration of geographic and genetic information increases (Michels et al. 2001).

Although MDB’s principle follows Tobler’s first law of geography (Tobler 1970) which is an appropriate first order ingredient for constructing indices in landscape genetics, we argue genetic barrier delineation should also consider attributive weight (e.g., measures of genetic attributes). We present a weighted difference barrier (WDB) method to improve the identification of discontinuities in landscape genetics.

2 Weighted difference barrier (WDB) method

Barrier delineation of discrete point data is a problem of spatial tessellations. Among the existing barrier detection methods, especially the MDB, there are some limitations that need attention. For example, although MDB is better at finding predefined genetic barriers, it could also lead to division of populations not differentiated genetically (Dupanloup et al. 2002). In addition, MDB only includes TIN-neighbors, two points connected by a TIN edge, in genetic distance consideration. If two sampling locations are not TIN-neighbors, although there might be a barrier between them, the MDB method cannot detect it. Furthermore, the bisection between a pair of sampled locations overlooks the genetic differences between the two samples. For instance, no matter how different those two samples are, the barrier between them is defined as the bisector, which cannot be reasoned genetically or geographically. We propose a weighted difference barrier (WDB) method to mitigate these limitations.

The MDB method uses the ordinary Voronoi to delineate the barriers between sample locations. Our WDB method incorporates a weighted Voronoi to generate the barriers using genetic characteristics, such as gene diversity, as the weight. The weight assignment scheme is based on research results where there is a positive correlation between gene diversity and the size of patch area (Banks et al. 2005; Osakabe et al. 2005), and that gene diversity has insignificant relationships with fragmentation (Banks et al. 2005). At the species level, although the total species diversity is not significantly correlated with any variables of landscape patterns, large forest reserves tend to have relatively infrequent species. Therefore, large patches of natural forests are regarded as one of the important factors in preserving infrequent species (Fukamachi et al. 1996).

Genetic distance considers between group variation, while gene diversity within group variation. Since MDB is restricted with distance only, it likely overlooks the within group variation. In contrast, our WDB incorporates both between group and within group variation. A weighted Voronoi diagram overcomes the major shortcomings of the ordinary Voronoi, and takes both location and attribute information into the consideration while generating the final spatial tessellation of a point set. Although detailed definitions of weighted Voronoi diagrams exist in the computational geometry literature (Aurenhammer and Edelsbrunner 1984; Okabe et al. 2000), we present one here.

Let S be a finite set of points in the Euclidean plane, p and q denote two points in the plane. Let the weights of the two points be w(p) and w(q). Let x be any point in the plane. The Euclidean distance between x and p is d e (x, p), and the weighted distance between x and p is d mw(x, p). Let region(p) denote the dominant region of point p, that is, p’s influence region in S. The following can be defined.

The planar ordinary Voronoi diagram:

$$ {\text{region}}\left( p \right) = \left\{ {x|d_{e} \left( {x,p} \right) \le d_{e} \left( {x,q} \right),q{\text{ in }}S} \right\} $$

The multiplicatively weighted Voronoi diagram (MW-Voronoi):

$$ {\text{region}}\left( p \right) = \left\{ {x|d_{\rm mw} \left( {x,p} \right) \le d_{\rm mw} \left( {x,q} \right),q{\text{ in }}S} \right\} $$

where \( d_{\rm mw} \left( {x,p} \right) = d_{e} \left( {x,p} \right) /w\left( p \right). \)

Although there are various algorithms to define a weighted Voronoi, such as the additive weights Voronoi,Footnote 1 the compound weighted Voronoi,Footnote 2 or the power weighted VoronoiFootnote 3 (Okabe et al. 2000), we employ the above-mentioned MW-Voronoi to construct the WDB. The essential idea of these generalized Voronoi diagrams (other than ordinary) is to show by incorporating weight, dimension and other considerations into constructing Voronoi diagrams, how different their resultant spatial patterns would be, as well as their spatial and attributive relationships. The MW-Voronoi serves as a vessel to demonstrate this idea of spatial change, and other weighted models are left to be explored in future research.

Genetic information can be obtained from alleles at different loci on a chromosome. At one locus, there are varied alleles. The gene diversity (heterozygosity) for one locus is defined as

$$ h = 1 - \sum\limits_{i = 1}^{m} {x_{i}^{2}}, $$
(1)

where x i is the population frequency of the ith allele at a locus, and m is the number of alleles. From the point of view of population genetics, average gene diversity, or average heterozygosity (H) is simply the average of all hs from all loci.

There are many calculations of genetic distance such as Nei’s genetic distance, Cavalli–Sforza chord measure, and Reynold, Weir and Cockerham’s genetic distance (Nei 1987; Fearnhead 2007; Michels et al. 2001; Nei 1973). We take one of the most frequently used one, standard genetic distance D, defined by Nei (1973, 1987). D is given by

$$ D =-\log_{e} I, $$
(2)

where

$$ I =\frac{{\sum\nolimits_{i = 1}^{m} {x_{i} } y_{i} }}{{\left( {\sum\nolimits_{i = 1}^{m} {x_{i}^{2} } \sum\nolimits_{i = 1}^{m} {y_{i}^{2} } } \right)^{1/2} }}, $$
(3)

x i and y i are the population frequencies of the ith allele at a locus in population X and Y, respectively.

To develop our method, we simulate a set of sample populations and randomly assign their allele frequency values. As a simple scenario, only one locus with 3 alleles is included in the simulation. Gene diversity for each population and genetic distance between populations are calculated based on Eqs. 13, and the results are presented in Tables 1 and 2.

Table 1 Simulated allele frequency and gene diversity
Table 2 Genetic distance of the simulated data

Taking the randomly simulated data as an example, the steps to delineate WDB are outlined in the following:

  1. (i)

    Construct weighted Voronoi polygons. For our sample data, we used MW-Voronoi construction based on pair-wise relationships of Apollonius circles (Aurenhammer and Edelsbrunner 1984; Mu 2004; Okabe et al. 2000). Since each boundary, a line or arc segment, represents a separation between two points, a one-to-many relationship between a genetic distance (of two sampling populations) and the weighted Voronoi polygon boundaries can be built. The “many” part in the one-to-many relationship is due to a geometric property of the weighted Voronoi polygons, that the boundaries between two points might be multi-parts and discontinued (Fig. 1).

    Fig. 1
    figure 1

    The MW-Voronoi of a set of points

  2. (ii)

    Calculate genetic distance for all pairs of points that share a weighted Voronoi boundary. Table 2 shows there are 34 out of 91 pairs of genetic distances from the sample data that satisfy the criteria and are highlighted in bold. Join the genetic distance values to the corresponding weighted Voronoi boundaries.

  3. (iii)

    Initiate the weighted barrier from the weighted Voronoi boundary formed by two points with the largest genetic distance. Figure 2 shows that the largest genetic distance of the sample data, 1.813 (Table 2) is formed by points 4 and 8.

    Fig. 2
    figure 2

    The initialization of WDB from the weighted Voronoi boundary formed by two points (4 and 8) with the largest genetic distance (1.813)

  4. (iv)

    The weighted barrier is then extended in both directions following the weighted Voronoi boundaries associated with the highest distance. In Fig. 2, 0.972 instead of 0.171, then 0.404 instead of 0.225. The process is continued until it has either formed a closed region around a population, e.g., point 8, or reached the outer limit of the study area. The result is the first level barrier, which bisects the whole space to two regions: the enclosed region surrounds point 8, and the rest of the space.

  5. (v)

    Depending on the data set, the weighted barrier could be multi-levels. Each upper level barrier bisects the region it belongs to. From each of the bisected regions, the next lower level of weighted barriers are formed following the same criteria as outlined above. Figure 3a shows three levels of WDB barriers formed by the sample data set. For comparison purposes, Fig. 3b shows three levels of MDB barriers formed by the same data set.

    Fig. 3
    figure 3

    WDB and MDB for the same set of points

3 Testing the WDB method with empirical data

To test the WDB method we use a set of published data of sea scallop collected from 12 locations across coastal areas ranging from Newfoundland, Canada, to New Jersey, USA (Kenchington et al. 2006). At each location, geographic and genetic data are compiled based on measures from six microsatellite loci. This empirical data set includes the geographic longitude and latitude of the locations, pairwise genetic differentiation and the gene diversity (heterozygosity) (Table 1 in p. 1784, Table 3 in p. 1787, and Appendix in p. 1796, Kenchington et al. 2006). The adopted data is summarized in Table 3.

Table 3 Population heterozygosity and pairwise genetic differentiation of sample data (compiled from Kenchington et al. 2006)

We construct weighted Voronoi polygons for 12 locations using average heterozygosity as weight. As discussed earlier, the use of heterozygosity is only an example to explore the vast potential of incorporating more considerations in constructing weighted instead of ordinary Voronoi polygons. Although there is no direct genetic distance measure provided by this dataset, we take pairwise genetic differentiation (measured by ϑ values) as an indicator of the genetic distance. This approach is supported by previous research in the field, that genetic distance can be estimated as pairwise F ST distancesFootnote 4 for all alleles (Dupanloup et al. 2002), which is equivalent to G ST among populationsFootnote 5 (Nei 1973) and therefore quite similar to Weir and Cockerham’s θ (1984),Footnote 6 only that θ can be negative values (Culley et al. 2002). According to the WDB method, there are a total of 30 pairs of genetic differentiations need to be considered after the construction of the weighted Voronoi polygons. We link those pairwise differentiation values to every weighted Voronoi boundary and begin the barrier detection process starting from the maximum weighted boundary. In Fig. 4, the first three levels of WDB are captured. For the purpose of comparison, the first three levels of MDB are also constructed using Monmonier’s algorithm. We have the following major findings.

Fig. 4
figure 4

MDB and WDB delineation of the sample data

First, there are more shape variations in the WDB boundaries than in the MDB boundaries. Instead of straight lines only and instead of always bisecting two population locations in the MDB, WDB boundaries can be curved and run between two locations based on genetic information such as heterozygosity. For example, the heterozygosity of Georges (Can), 0.701 is smaller than that of Georges (US), 0.793, so the WDB between them is concaved toward the Canadian site showing a relatively smaller and enclosed region. Such a spatial pattern corresponds to the relationship that population with larger gene diversity tends to have larger patch sizes (Banks et al. 2005; Osakabe et al. 2005).

Second, the first barrier of the MDB isolates the site of Georges (US) from others, and the first barrier of the WDB isolates not only that site, but also the Gaspé in the far north. The spatial formation is caused by possible multi-parts and discontinued areas of a weighted Voronoi polygon as described earlier. The pairwise differentiation between the Gaspé and George (US) is 0.006, and average pairwise differentiation between the Gaspé and all other sites except for Georges (US) is 0.022 (Table 3). Therefore, the scallop’s population from the Gaspé is more similar to those from Georges (US) than to the other ten sites, and the WDB method captures this relationship. This WDB delineation matches one of the observations in Kengchington’s (2006) work that “Georges (US) and the Gaspé are significantly differentiated from each other and all other Populations”.

Third, the hierarchy of barriers changes more rapidly in the WDB method. The east [Nfld(TB), Nfld(SP) and PEI) and west (Digby, Annapolis, Wester, Lurcher, Brownis, and Georges (Can)] clusters are formed at the second level of the WDB barrier, and Georges (Can) and Nfld (TB) are distinguished at the third level of the WDB. In contrast, in the MDB, the east and west clusters are not divided until the third level.

4 Discussion

The WDB method calculates genetic distance for weighted Voronoi neighbors as two points separated by a weighted Voronoi polygon boundary, and MDB calculates genetic distance for Voronoi neighbors as two points separated by a Voronoi polygon boundary. By doing so, both within group genetic information (gene diversity), and between group genetic information (genetic distance) are considered. Usually, the number of weighted Voronoi neighbors is larger than the number of Voronoi neighbors, indicating more relationships are being considered. In our sample data, 34 pairs of genetic distance are calculated for WDB and 31 for MDB.

The spatial pattern of the WDB boundaries are often curved and enclosed, while the MDB boundaries are always straight and often opened (Fig. 3a, b). A MDB boundary between two points is a single-part, and a WDB boundary between two points could be multi-parts and disconnected due to geometric properties of weighted Voronoi diagrams (Mu 2004). For instance, in Fig. 5, the two disconnected solid line segments are potential WDB boundaries between points 5 and 12.

Fig. 5
figure 5

Multi-parts of a WDB boundary

Since the Voronoi and TIN are mathematical duals, MDB always run between points that are connected by TIN edges, thus potentially there is a MDB boundary between them. However, WDB is not constrained by this criterion. In Fig. 6, WDB boundary a is formed by points 4 and 12, and such a boundary is not possible for MDB because there is no TIN edge connecting the two points. In our sample data, there are 42 segments of WDB boundaries, 33 of them are between TIN-connected points, and the rest 9 are between non-TIN-connected points.

Fig. 6
figure 6

WDB boundary formed by non-TIN-connected points

Our WDB method tessellates sample points with a weighted Voronoi diagram. Genetic attribute values of each sample point determines the weight, thus the genetic discontinuities between points will not always be the bisection between them. The WDB boundaries are constructed based on weighted Voronoi and can generate hierarchical levels. Segments of WDB boundaries have more variations than those in MDB. They can be multi-parts and disconnected, and can be characterized beyond the straight lines of the MDB and often scribe circular curves.

5 Summary and conclusion

Identifying barriers of species and characterize their effects on spatial distribution provide essential background information to research in landscape ecology, population genetics, biogeography, historical biogeography, and phylogeography. Overall, WDB provides quick and straightforward improvements to the drawbacks of MDB. WDB integrates more sample location relationships into the barrier construction and reveals potential barriers that would otherwise go undetected. WDB incorporates both within group and between group genetic information, and delineates the barriers as a more complex pattern.

Besides the WDB, there are other techniques being explored for boundary delineation that make use of simulated annealing algorithm (Dupanloup et al. 2002), Bayesian criteria or specific distance-decay behaviors (Guillot et al. 2005; Santos et al. 2008; Culley et al. 2002; Guiller et al. 2006; Hull et al. 2008; Manel et al. 2003; Sambridge 1998). We argue the method introduced here is an alternative approach, and a beginning in initiating and integrating more spatial modeling and methods into the problem solving process. This raises an interesting discussion on whether gene diversity only should be applied to assign weights to each population site. Further research will explore other weighted attributes and test the method on data with genetic distances collected from microsatellite markers. Furthermore, embedded within a GIS environment, we explore the correlation of genetic discontinuities detected based on our weighted method and landscape features.

New spatial algorithms that decompose observations or data points into meaningful objects, presents us with a variety of ideas for delineating barriers. The WDB defines a more appropriate model and logically should map more realistic barriers. These new benchmarks can prove quite useful in characterizing spatial patterns and can lead to more enlightened hypotheses or at the very least, help us ask more intelligent questions. Built upon this additional understanding of geospatial genetic variations, future research should be extended to not only the static forms, but also dynamic processes of landscape genetics.