Background

Microsatellites or short sequence repeats (SSRs) are relatively small 1–6 base-pair (bp) tandem repeats that are found in the genomic DNA of pro- and eukaryotes. Although the majority of microsatellites are located in non-coding sequences [1, 2] and considered to be selectively neutral, some microsatellite loci are located in functional regions and involved in chromatin organisation, regulation of gene activity and metabolic processes such as DNA replication and recombination [3]. Microsatellites exhibit high mutation rates of 10-2 to 10-6 per locus per generation [4]. The main mutational processes responsible for the variability in microsatellites are considered to be replication-slippage and recombination [3]. Both processes change the length of the microsatellite by altering the number of repeats of the microsatellite.

Microsatellite variability has been associated with a number of microsatellite characteristics. Mutation rates of microsatellites have been found to be taxon-specific [46]. Microsatellite variability covaries with allozyme diversity in a taxon [7]. The number of repeats can predict the variability and stability of a microsatellite motif, with longer loci found to be more variable, but also more unstable than shorter ones [4, 5, 79]. The type of microsatellite motif may affect abundance and variability. For example, it was found that microsatellites with tri- or hexanucleotide motifs are more frequent in coding than in non-coding regions, possibly because mutations of these microsatellites in coding regions are less likely to result in deleterious frameshift mutations [1, 2].

The high variability of microsatellite markers and their straightforward analysis using the polymerase chain reaction (PCR) have led to their frequent application in studies on natural populations. However, one obstacle to the wider application of microsatellites is the difficulty of developing a sufficient number of suitable markers for any given species. Although microsatellites are ubiquitous in eukaryotic organisms, their abundance varies across different groups [2]. Microsatellites are less common, for example, in birds than in other vertebrates [7, 10].

There are two principal strategies to obtain microsatellite markers. First, microsatellite markers can be developed by screening genomic libraries [11]. Success rates differ according to the protocol and taxon, although usually a medium to high number of polymorphic markers can be isolated using this approach. Since microsatellites are present at a relatively low frequency in avian genomes, in this case their isolation is most efficient using enrichment protocols. This involves many stages, is a skilled and time-consuming process and requires significant funding and a well-equipped molecular laboratory, which are not always available in ecological and conservation research.

The second method makes use of existing microsatellite markers isolated in different species to the species of interest (target species). For cross-species amplification tests ("transferability" [9, 12]) existing primers developed in related species are tested for amplification and polymorphism in the target species. One drawback of cross-species amplification is that success rates decline with evolutionary distance between the target species and source species [9, 1315]. In birds, most microsatellite markers have been developed for the orders Passeriformes and Galliformes. Passeriformes is a species-rich relatively recent clade [16] in which more than 550 microsatellite markers have been characterised [17]. Several studies have successfully identified additional polymorphic loci by cross-species testing in birds. The development of complete primer sets from cross-species amplification tests has been successful in Falconiformes [18], Galliformes [1921] and Passeriformes [15, 2224]. However, in many other avian orders fewer microsatellite loci have been isolated and, therefore, the opportunity to develop microsatellite markers by testing loci from other species is limited.

Cross-species amplification success varies not only between taxonomic groups, but also among microsatellite loci. Although many markers fail to amplify even in closely related species, some markers have higher utility than others [[9, 1315, 25, 26], DA Dawson and T Burke, unpublished data and BIRDMARKER webpage http://www.shef.ac.uk/misc/groups/molecol/deborah-dawson-birdmarkers.html]. A few loci, such as HrU2 [26], LEI160 [27], LOX1 [28] and Man13 [29] can be almost universally amplified across the avian taxa (HrU2 &LOX1: [9, 13], LEI160, DA Dawson unpubl; Man13: DA Dawson and G Hinten, unpublished data, see also BIRDMARKER webpage). This suggests that some loci are more conserved than others. Since the degree of microsatellite conservation is usually not known at the time of their isolation, identifying conserved primers usually involves extensive primer testing and only a few conserved markers have been identified to date.

The Charadriiformes order (sandpipers, plovers, gulls and auks) is an ancient monophyletic avian order of 365 species [30] that probably evolved around 79–102 million years ago [31]. Recently, the Charadriiformes have become the focus of a number of studies in evolutionary biology because they harbour many species with an unusual diversity in mating and parental care strategies, flight metabolism, migratory behaviour and sexual size dimorphism [3236]. Appropriate genetic markers would help to increase the understanding of, for example, the evolution of breeding systems and the connectivity between populations, but markers are available for fewer than 15 Charadriiformes species. Additionally, many shorebird populations are declining and genetic markers are needed to monitor and manage their conservation effectively.

In this study, we examine the potential of utilising the available published Charadriiformes microsatellite sequences and the sequenced chicken genome to identify conserved Charadriiformes-chicken microsatellite loci. Initially, we mapped conserved Charadriiformes microsatellites in the chicken genome. Second, we explored their cross-species utility across members of the order Charadriiformes. One concern is that conserved microsatellite loci are located in functional genomic regions and exhibit low or no polymorphism. Therefore, we compared polymorphism and heterozygosity levels across different charadriiform species. Third, we examined correlates of cross-species amplification success and polymorphism to predict the utility of other conserved microsatellite loci.

Results

Mapping

Sixty-eight Charadriiformes microsatellite sequences were assigned to a location on the chicken genome based on sequence homology (with E-values ranging between E-6 to E-121). Two further sequences (BmaCCAT443 and BmaGATA464) showed homology to an unknown chicken homologue that had not yet been assigned to a chromosomal region. Sixty-four sequences were assigned to fourteen autosomal chicken chromosomes and four to the Z chromosome (Calex-26, BmaTATC353, Apy09 and Mopl3; Additional file 1, Figure 1). The mapping of loci assigned to the Z chromosome in chicken was validated in Charadriiformes by analysing the genotypes of birds of known sex (including males and females). A location on the Z was supported if all females were homozygous whilst at least some males exhibited heterozygous genotypes. This was confirmed for all four loci assigned to the chicken Z chromosome: Calex-26 (based on 42 Kentish plovers [37]), BmaTATC353 (based on genotyping of 15 marbled murrelets, Z Peery, personal communication), Mopl3 (126 genotyped mountain plovers, SJ Oyler-McCance and J St. John, personal communication) and Apy09 in whiskered auklet (24 genotyped whiskered auklet individuals, DA Dawson and FM Hunter, unpublished data).

Figure 1
figure 1

Chromosome map of the chicken displaying the genomic locations of 68 conserved microsatellite homologues that were isolated in different Charadriiformes species. If the microsatellite motif was found to be retained in the chicken homologue the locus name is underlined. Microsatellite loci examined for polymorphism are marked by a star. Shaded loci represent the microsatellite loci that could be amplified in Lari, Charadri and Scolopaci with either standard or consensus primers. For loci shown in italics only one of the flanks (forward or reverse) was assigned to the map. Centromere locations that could be deduced by high GC content in the chicken [17] are highlighted in blue. The locations for the four loci assigned to the Z chromosome were all confirmed by hemizygous segregation of genotypes in females.

The microsatellite motif of the charadriiform sequences was not always retained in the homologous microsatellite loci identified in the chicken genome (N = 68). A comparison with the chicken genomic sequences revealed that the same microsatellite repeat motif was present in 32 sequences (47%), a different microsatellite motif was found in 10 sequences (15%) whilst no microsatellite repeat motif was found in 26 (38%) of the sequences (Additional file 1).

Cross-species amplification

In total, we tested 55 'standard primers' (see Methods) from different conserved microsatellite sequences and 10 primers from anonymous microsatellite sequences. In both groups a similar proportion of microsatellite loci was isolated in each of the three test species representing the suborders Charadri, Lari and Scolopaci (chicken-Charadriiformes conserved loci: 13 of 55, anonymous loci: 3 of 10, χ2 test: χ2 = 0.016, df = 1, P = 0.90). In 17 of 55 conserved sequences we obtained a specific product for all three species, whilst we did not obtain specific products for any of the anonymous sequences in all test species. When we compared the proportion of species in which a primer set amplified a product, primers designed from conserved sequences significantly outperformed primers from anonymous sequences (Figure 2a, amplification success: medianconserved = 0.667, mediananonymous = 0.167, Wilcoxon rank sum test: N = 65 (55/10), df = 1, W = 469, P < 0.001).

Figure 2
figure 2

Amplification success for conserved microsatellite loci and primer sets across the major Charadriiformes lineages. Conserved microsatellite loci are those loci for which both flanking regions could be located to a homologue in the chicken genome. Anonymous sequences lacked matching flanks. Each lineage of Charadriiformes was represented by one species: Kentish plover for Charadri, whiskered auklet for Lari and ruff for Scolopaci. (a) Amplification success for standard primers compared between conserved microsatellite loci and anonymous microsatellite loci. (b) Amplification success of consensus versus standard primers for conserved loci for which both types of primer were designed. Consensus primers were designed after alignment of chicken and charadriiform sequence and placed into highly preserved flanking regions between chicken and shorebird. Standard primers were designed using the shorebird sequence only, without comparison to the chicken sequence homologue. Numbers at the bottom refer to (a) the number of microsatellite loci and (b) the number of primers that were tested in each group.

For 24 conserved sequences we designed a second, consensus, primer set with primer binding sites in the conserved regions of the flanks. Cross-species amplification rates were higher for consensus primers than standard primers for the same microsatellite sequence (Figure 2b, Wilcoxon matched pair test: N = 24, df = 1, V = 96, P = 0.006). Amplification success increases when the annealing temperature is reduced [38]. Hence, the reason for the improvement of amplification could have been that consensus primer were designed and tested at lower annealing temperatures (consensus primers: 50–62°C, standard primers: 54–66°C). However, 19 out of 24 (79%) consensus primers amplified best at annealing temperatures of 54°C (Additional file 1) and the difference in amplification success between consensus and standard primers remained significant (Wilcoxon matched pair test: N = 19, df = 1, V = 41.5, P = 0.023) when only the 19 loci were analysed.

Twenty-three of 24 consensus primer pairs exhibited between one and three base-pair mismatches between the chicken and Charadriiformes primer binding sites. Each mismatching primer base was replaced by a suitable degenerate base which included both of the possible bases. The use of degenerate bases will dilute the effective concentration of the primer with the highest affinity to the target, which could potentially reduce amplification efficiency. However, amplification success was not related to the number of degenerate bases per primer pair (Kruskal Wallis test: N = 23 (6/12/5), df = 2, χ2 = 1.20, P = 0.55).

Among conserved sequences, cross-species amplification success was only significantly associated with the E-value of a given sequence (Table 1, Figure 3). E-values for loci for which primers amplified in all tested species ranged from E-110 (Mopl18, Additional file 1) to E-21 (BmaTATC371). Standard primers from sequences with lower E-values amplified in a higher proportion of charadriiform species than those with higher E-values (Figure 3, Generalised Linear Model (GLM) with binomial error structure: df = 53, B = -0.02, t = -2.58, P = 0.013).

Table 1 Generalised linear models for a) amplification success and b) polymorphism of conserved microsatellite loci
Figure 3
figure 3

Amplification success of conserved microsatellite loci in three species of Charadriiformes (Kentish plover, whiskered auklet and ruff) in relation to the E-value of the chicken- Charadriiformes hit. Loci with both flanks matching the chicken sequence at the same chromosomal region of the chicken are considered. Smaller E-values indicate higher probability of identity. Open circles each represent a single microsatellite locus. The line represents predicted values derived from the statistical model (see text).

Polymorphism and observed heterozygosity

Twenty-three of 24 conserved microsatellite loci exhibited two or more alleles in an average of 3 of the 13 species tested (Table 2, range 1 to 8 species per locus). There was considerable variation among species in the number of polymorphic markers. Excluding the markers that had been isolated in the target species, we found on average that 7 of 24 markers per species (range 0–11 polymorphic loci/species) were polymorphic when tested in four unrelated individuals from a single population (Figure 4). None of the 23 markers included in polymorphism tests were sex-linked based on the genotypes of known male and female birds (individuals were sexed using P2/P8 primers [39]) and on their chromosome location predicted from the assembled chicken genome (Figure 1).

Table 2 Expected and observed allele sizes of conserved chicken-Charadriiformes microsatellite markers
Figure 4
figure 4

Number of newly identified polymorphic microsatellite markers for 12 species of Charadriiformes when tested in four unrelated individuals. In total, 24 conserved charadriiform microsatellite markers were tested. Data are only included if test and source species of the microsatellite marker were different.

The proportion of species in which a microsatellite was polymorphic was significantly associated with three factors: i) microsatellite motif, ii) repeat length and iii) whether the microsatellite was interrupted or not (Table 1). Microsatellite loci with dinucleotide motifs were polymorphic in more species than those consisting of tetranucleotides (GLM with quasibinomial errors: df = 19, B = -1.00, t = -2.15, P = 0.045). Microsatellites with longer repeat regions were polymorphic in more species than those with shorter repeat regions (GLM with quasibinomial errors: df = 19, B = 0.02, t = 2.22, P = 0.039). Interruption of the microsatellite repeat regions reduced the proportion of species in which a locus was polymorphic (GLM with quasibinomial errors: df = 19, B = -1.02, t = -2.09, P = 0.051).

When we examined the variability of loci in the larger Kentish plover, whiskered auklet and ruff samples (Table 3) we found that the mean observed heterozygosities across the three test species were lower than heterozygosities in the species in which a microsatellite had been originally isolated and characterised (Wilcoxon matched pair test: N = 23, V = 226, P < 0.001). Heterozygosity in all three test species declined with increasing genetic distance to the microsatellite source species, and the decline was highly significant in all three species combined (Figure 5, Generalised Linear Mixed Model [GLMM]: df = 33, B = -0.04, t = -7.59, P < 0.001). An alternative model excluding those loci that had been developed in the target species gave the same qualitative results (model not shown).

Table 3 Observed allele sizes, heterozygosities and estimated frequency of null alleles of conserved microsatellite loci
Figure 5
figure 5

Observed heterozygosity in relation to ΔT m H DNA-DNA hybridisation distance between source and test species for 23 Charadriiformes microsatellite loci. Size of circles is proportional to the number of data points at a given location. The trend line was drawn using predicted values from generalised linear models for each of the three species separately and from predicted values from a General Linear Mixed model for all species combined, including species and locus as hierarchical random factors.

Discussion

We have shown that sequence information from annotated genomes can be used to identify and map conserved microsatellite loci. Our study has two major findings. First, primers designed from conserved microsatellite loci amplify across a wider taxonomic range than those derived from anonymous microsatellite loci. Second, when highly conserved regions of the flanks of a microsatellite are used as the primer binding sites amplification success can be further improved.

Correlates of cross-species amplification

Amplification success was not associated with genetic distance between microsatellite source species and test species in this intraorder analysis of charadriiform microsatellite markers. The E-values obtained from blast searches served as a better predictor for the width of taxonomic range in which a microsatellite could be amplified.

Using 1147 primers derived from conserved regions of the human genome, Housley et al. [41] identified the number of primer mismatches and primer GC content as factors that predicted amplification success in mammals. In contrast to our study, Housley et al. investigated amplification success using generally conserved sequences in mammals and did not specifically use variable loci such as microsatellite loci. Second, they aligned genomic sequences from human with genomic sequences from dog, rat or mouse to perform intragroup-comparisons of amplification success, whilst we used an outgroup taxon (chicken) for sequence alignments and comparisons. Third, our sample size was much smaller than the one used for the intra-mammalian comparison due to the small number of microsatellite loci available from the Charadriiformes. The results of both studies are very similar, despite the large differences in study design. Sequence conservation (represented by the E-value) was the main predictor for amplification success of a genetic marker. Most loci with an E-value lower than E-20 in chicken amplified in all the tested species, suggesting that this could be a critical value that indicates the utility of a marker for cross-species amplification within the order Charadriiformes. Consensus primer sets that had been designed to include the smallest number of mismatches between sequences amplified better than standard primers, in which no action to counteract the presence of mismatches had been taken. Amplification success of consensus primers was generally very high across the three suborders of Charadriiformes (Figure 2b).

The number of mismatches per primer pair did not affect amplification success among the consensus primers for a number of possible reasons. First, we restricted the sequence mismatches between chicken and Charadriiformes for consensus primer pairs to a maximum number of three. Second, we introduced degenerate bases to account for those mismatches. Under our primer design rules we attempted to minimise the number of mismatches and positioned mismatches away from the 3' end for a given primer, although the position of the mismatch/degenerate base did not affect amplification success significantly (data not shown). Third, inclusion of degenerate bases might have led to amplification failure due to the reduced concentration of suitable primer. However, we did not observe such failures, probably because the total primer concentration in our tests was relatively high (1 μM, see Methods). Thus the amplification success of our consensus primers does not suggest that primers do not have to be a good (or perfect) match for amplification success.

Primmer et al. [13] proposed that the proportion of microsatellite loci that can be amplified declines with increasing genetic distance between source and target species. Our results suggest that the slope of the amplification decline is predominantly locus- specific and will largely depend on the conservation of the sequences flanking the microsatellite repeat. Microsatellite loci with highly conserved flanks can be amplified in more distantly related species whilst those with non-conserved flanks may be useful only in a very narrow taxonomic group.

Polymorphism

Polymorphism of conserved loci varied greatly between species. The proportion of species in which a microsatellite exhibited more than one allele was associated with microsatellite repeat length, motif and whether it was interrupted or not. Our results are consistent with previous theoretical and empirical studies that examined the effect of these microsatellite properties on mutation rate [46, 8] and polymorphism [7, 9]. A positive association between repeat length and polymorphism was found empirically in other vertebrates, arthropods and plants [4, 79]. Dinucleotide microsatellite loci exhibited higher mutation rates than tetranucleotide microsatellites in mice and yeast [5]. In humans and chimpanzees, microsatellite loci with interrupted repeat regions had a two-fold decrease in the mutation rate, which was interpreted as being due to interruptions reducing the opportunities for replication slippage [6].

Although we showed that the amplification success of existing microsatellite markers can be improved by redesigning primer sets, heterozygosity was generally lower in test than source species. Polymorphism declines faster with evolutionary distance than amplification success [14]. A possible explanation is that polymorphism of many loci evolved in the recent evolutionary past and therefore is confined to a phylogenetically narrow range of taxa. This argument is supported by the findings of other studies in amphibians, birds and mammals [9, 14] that show that the probability of polymorphism drops rapidly with increasing evolutionary distance between source and target species. However, the steepness of the decline appears to be locus-specific. There was large variation in the taxonomic range over which a given microsatellite was polymorphic. For instance, the locus BmaTGAA523 was polymorphic in only two of twelve species when tested in four unrelated individuals, whilst another locus, BmaTATC371 isolated in the same species and with a similar repeat length, was polymorphic in seven of twelve species tested. Of the 24 charadriiform microsatellite markers we tested, a median of 7 markers was polymorphic per species (12 charadriiform species tested). Five or more polymorphic markers were found in five of the six test species where previously no microsatellite markers had been identified. For another six species where markers had already been characterised, we found between three and 11 new polymorphic markers. The number of species in which these 23 markers are useful is likely to increase because we tested them only in 12 of the 365 species of Charadriiformes. Finally, we assessed the variability of markers only within a single population of each species to make the detected level of polymorphism comparable to the polymorphism in the source population. Some markers that we found to be monomorphic may exhibit population-specific polymorphism or have different alleles fixed in different populations and turn out to be useful to investigate population differentiation.

The observed decline of polymorphism in relation to the genetic distance to the source species could be partly explained by the selection process during the isolation of microsatellite sequences ('ascertainment bias hypothesis', [7, 42]). During the construction of microsatellite libraries, typically long microsatellite sequences with 10–30 repeat units are selected to maximise the probability that a locus is polymorphic in the species in which it is developed. Moreover, only the sequences of the polymorphic loci are normally submitted to sequence databases, which means that monomorphic loci with fewer repeat units in the source species are lost because they are not reported. However, repeat expansion and microsatellite polymorphism are likely to reflect the recent evolutionary history. Therefore the submission of sequences of monomorphic loci to genomic databases might enable the identification of further conserved markers and the development of useful markers through cross-species amplification.

Differences in the degree of microsatellite polymorphism among species are not exclusively attributable to recent divergence in microsatellite evolution. Genetic diversity, which is often reflected by microsatellite polymorphism varies among populations and species. Low microsatellite polymorphism can indicate depleted genetic variability due to bottlenecks, genetic drift or inbreeding. If genetic diversity for a given population is low, a combination of screening of known microsatellite loci and the development of microsatellite markers using the conventional library approach may be helpful in finding a suitable set of polymorphic markers.

Only a handful of shorebird populations have been investigated for genetic diversity. Low genetic variability of both allozymes and of the mitochondrial control region has been found in several species of sandpipers that breed in the high Arctic and it has been hypothesized that historical population fluctuations that occurred during and after glaciations are responsible for this low genetic diversity [43]. In our study, the greater sheathbill Chionis alba showed the least genetic diversity, being monomorphic at all 23 microsatellite loci that we examined (Table 2). Greater sheathbills breed exclusively in the Antarctic, where they live as scavengers close to other bird colonies. Current population estimates give a stable total number of approximately 20,000 sheathbills [44], but past climatic fluctuations may have led to a small effective population size similar to those of Arctic breeders. Thus the low observed microsatellite diversity might reflect a recent population recovery. Alternatively, the evolutionary distance between sheathbills and the source species from which we derived the tested microsatellites is too large, with the microsatellite being lost or all polymorphism being depleted. Different genetic markers, such as markers from the mitochondrial control region, other microsatellite markers or highly variable nuclear genes, such as genes of the major histocompatibility complex, need to be examined to determine whether the low microsatellite variability truly reflects a general low genetic diversity in sheathbills.

Contrary to sheathbills, whiskered auklets and Kentish plovers showed the highest genetic diversity in our analysis. In the Kentish plover and the whiskered auklet, twelve of the 23 microsatellite loci tested were found to be polymorphic when tested in four individuals. Excluding the markers that had been isolated in both species leads to ten (Kentish plover) and eleven (whiskered auklet) newly described polymorphic markers. Both species live in very different habitats and geographical locations. Whiskered auklets are pelagic feeders that inhabit a number of small islands in the northern Pacific, whilst Kentish plovers are cosmopolitans and found at beaches and saline lakes in temperate and subtropical regions [45]. The high genetic diversity in both species is reflected in the observed heterozygosities at microsatellite loci that had both been identified in these species by cross-species amplification (this study) and isolated from enriched genomic libraries [14, 37]. The high variability of many microsatellite loci in these species suggests that depletion of genetic variation is not a general characteristic of the Charadriiformes order, but rather an attribute of certain species or populations due to their historical demography and phylogeography.

The possibilities for the application of conserved markers go beyond examining genetic diversity. Polymorphic conserved markers can be used, for example, to investigate chromosomal organisation by constructing linkage maps [46, 47]. A major advantage of conserved over conventional markers is that the same loci can be used to investigate and so compare chromosomal structure and genomic organisation among several different species [17, 48].

The sequence conservation of flanking regions can be the result of a direct functional role or linkage disequilibrium with functional genomic regions (e.g. fitness relevant genes, [49]). Selection pressures may affect the variability of a locus by either restricting polymorphism [1, 2] or promoting polymorphism if variability is adaptive [50]. This can be problematic for applications of genetic markers that assume their neutrality. However, for common applications such as parentage assignment or estimating relatedness such markers will nonetheless be useful. Furthermore, if a marker is found to be associated with a locus that is under selection, its function can be explored and changes or retention of functionality can be compared under different environmental conditions, and across different populations and/or taxa.

The conserved markers we designed and characterised are very convenient to use. All consensus primer sets for the polymorphic loci amplify under similar PCR conditions (Ta = 54–55°C, 2.0 μM MgCl2 concentration), which facilitates i) quick and economical screening for amplification and polymorphism in new target species and ii) efficient processing, since several loci can be run together in a single multiplex PCR.

Dealing with null alleles

Five of the 24 primer sets that we tested for heterozygosity had high estimated null allele frequencies (≥ 0.1, CERVUS 2.0) in one of the three test species (Kentish plover, whiskered auklet and ruff, Table 3). There was no obvious relationship between the departure from Hardy-Weinberg equilibrium and the number or position of degenerate bases in each primer pair. Null alleles arise when the primer sequence does not match the target sequence of a given allele and the allele therefore fails to amplify. If not corrected for, the presence of null alleles may interfere with algorithms to estimate relatedness [51]. Sequencing the locus in the study species and redesigning the primers can be used to prevent the occurrence of null alleles. Alternatively, if the proportion of null alleles is low, their impact on relatedness estimates can be reduced by using maximum-likelihood correction methods when computing relatedness relationships [52].

Development of conserved markers in other avian groups

Charadriiformes, chicken and most other modern birds belong to the Neognathae. Recent molecular data suggest that the Galliformes, together with the Anseriformes, form a sister taxon to the other neognath birds [53] and therefore have the same phylogenetic distance to all Neognathae. Flanking regions of about one in seven charadriiform microsatellite loci were found to be conserved in chicken. Since the proportion of microsatellite homologues is likely to be associated with the phylogenetic distance between genomic resource species and source species, we expect a similar proportion of conserved microsatellite loci to be found between chicken and other neognath groups to that observed here between chicken and the Charadriiformes. In fact, for taxa that are more closely related to chicken (e.g. Anseriformes) we predict an even higher success rate in identifying suitable microsatellite markers through data mining.

Genomic sequencing of further organisms will facilitate the use of already- characterised microsatellite loci for designing consensus primer sets. In birds, the sequenced genome of another neognath bird, the zebra finch (Taeniopygia guttata) is now available http://www.ncbi.nlm.nih.gov/projects/genome/guide/finch. The Zebra finch is phylogenetically closer to Charadriiformes and other neognath birds than is the chicken, hence more microsatellite homologues and conserved markers might now be obtained using zebra finch sequences as a reference.

Conclusion

We have shown that sequence information available from genomic databases can be used to enhance the utility of microsatellite markers in studies of evolution and conservation, even for taxonomic groups where few sequence data are yet available. Sequence information of translated and untranslated parts of the genome are useful for comparing and designing consensus primers, even when they involve genetically distantly related taxa such as Charadriiformes and Galliformes. Cross-species amplification tests can be carried out more efficiently by identifying and utilising conserved microsatellite loci that will amplify across a broader taxonomic range. By selecting highly conserved regions of the microsatellite flanking sequence for primer design, the number of species in which a locus will amplify can be increased even further. We found that markers derived from conserved loci with an E-value of E-20 or lower amplified across the entire charadriiform order. Our findings will facilitate the use of markers in species where no markers have yet been identified and in species where more markers are needed. To date, 24 vertebrate and 22 invertebrate genomes have been sequenced and fully assembled (source: http://www.genome.ucsc.edu, September 2008). This number is expected to increase rapidly as sequencing costs decrease. The methodology we have outlined will make it possible to extend population genetic and evolutionary studies to further non-model species that have been previously neglected because of a lack of sufficient genetic markers.

Methods

Blast search

We searched for available nuclear microsatellite sequences isolated in species of Charadriiformes that were deposited before 15 July 2006 in the nucleotide sequence databases of GenBank, DNA Data Bank of Japan, and the European Molecular Biology Laboratory (EMBL) through the EMBL web portal http://www.ebi.ac.uk/ebisearch/ using the key words "Charadriiformes microsatellite" and "Charadri* microsat*". Additionally, for one species (oystercatcher, Haematopus ostralegus), eight primer sets for polymorphic microsatellite loci had been published [54] but the microsatellite sequences were not found in the EMBL database. In this case the authors (R. van Treuren et al.) generously provided the sequences of the eight polymorphic and 29 further unpublished monomorphic oystercatcher microsatellite loci which were then submitted to EMBL in agreement with the authors (accession numbers: AM600643-AM600679; see additional file 2).

Only microsatellite sequences that were polymorphic in the source species and had sufficient flanking sequence for primer design were considered (i.e. a minimum of 30 bp of flanking sequence on either side of the repeat motif). In total, we found 163 suitable microsatellite sequences. All sequences were checked for duplicates using the MegaBLAST program available from the National Center for Biotechnology Information (NCBI) website (http://www.ncbi.nlm.nih.gov/BLAST/[55]). Four pairs of homologues were found (K32/LarsZAP26, K56/LarsZAP19, LarsNX24/Rbg27 and LarsZAP11/Rbg29). LarsZAP26 and K32 were identical duplicates and the primer set was designed from K32. For the remaining duplicates the shorter sequence of each pair was dropped from the analysis (LarsZAP19, LarsNX24, LarsZAP11).

We identified homologous charadriiform microsatellite loci in the chicken Gallus gallus as follows. Unique microsatellite sequences from the Charadriiformes were compared against the chicken genome database v2.1 (WASHUC 1, Version e! 41, available at http://www.ensembl.org/Gallus_gallus/) using a WU-BLAST (Gish W. 1996-2004; http://blast.wustl.edu implemented in the Ensembl browser with the "genomic sequence (masked)" and "distant homologies" settings. The E-value was used as a measure of Charadriiformes-chicken homology. The E-value is mainly influenced by the sequence length of the query sequence and its similarity to the homologue in the database. In the absence of duplications and gene orthologues, lower E-values represent a higher probability of sequence homology. All sequences for which both flanking regions matched the chicken genome with E-values lower than E-10 were classified as conserved sequence homologues. Microsatellite sequences with only one flank producing a good "hit" were not considered. In this way, we identified 55 charadriiform microsatellite sequences for which homologues were present in the chicken genome (Additional file 1).

Mapping of charadriiform microsatellite sequences in the chicken genome

We adapted the Blast methods from [48] to map the charadriiform microsatellite sequences to the chicken genome:

  1. i)

    sequences hitting at one location with both flanks at an E-value ≤ E-5,

ii) sequences that hit at one location with one flank at an E-value ≤ E-10 (cf. [48]: ≤ E-5),

iii) sequences that hit at different locations in the genome were mapped only if the best hit (lowest E-value) was higher by ≥ E+5 than the next blast hit (cf. [48]: ≥ E+10).

In total, 68 charadriiform microsatellite sequences were mapped in the chicken genome and displayed using the program MAPCHART [56] (Figure 1). The recorded locations of centromeres are based on the regions of highest GC content on the chromosome (following [17]; data obtained from the NCBIs Gallus gallus Build 2.1: http://www.ncbi.nlm.nih.gov/genome/guide/chicken/index.html.

Cross-species amplification rates of conserved and anonymous microsatellites

To examine whether cross-species amplification was affected by the presence or absence of a chicken homologue for a given sequence we designed new primer sets ('standard primers') for a total of 65 loci using PRIMER3 [57]. We did not use already published primers developed in different laboratories because primer design methods can be very heterogeneous between different laboratories and this may have compromised our results [9]. We randomly selected ten microsatellite loci that had hits with an E-value of E-10 or better of only one flank (anonymous sequences, Additional file 1) and compared their amplification success with the success of conserved chicken -Charadriiformes loci in which both flanks hit at the same location in the chicken genome with an E-value of E-05 or lower. For the design of standard primers, we used default options of PRIMER3 with the following adaptations:

i) melting temperature (Tm) between 50°C and 65°C, with 62°C as the preferred Tm,

ii) Tm difference between forward and reverse primer < 0.5°C,

iii) we checked for an even distribution of all four nucleotide bases (ascertained by eye),

iv) a primer GC content of 20–60%,

v) a product size between 70 and 450 bp.

The reverse primers of seven of the eleven Kentish plover Charadrius alexandrinus and two of the five whiskered auklet Aethia pygmaea loci were ordered with "GTTTCTT" 'pigtails' to reduce variation in stutter bands [58]. The forward primer of each pair was labelled with a fluorescent label, either FAM or HEX.

Following [40, 59] we recognise three major lineages of Charadriiformes: Lari, Scolopaci and Charadri. All primers were tested for amplification success in one candidate species from each charadriiform lineage: whiskered auklet (for suborder Lari), ruff Philomachus pugnax (suborder Scolopaci) and Kentish plover (suborder Charadri). The suborders are separated by ΔTmH (DNA-DNA hybridisation value [40]) of 15.6 for Charadrii/Lari-Scolopaci and 12.8 for Charadri-Scolopaci. All primer sequences are provided in a supplementary table (Additional file 1).

DNA was extracted from blood samples that were stored either in Queen's lysis buffer [60] or absolute ethanol. One of three extraction methods was used: an ammonium acetate method [61], a sodium acetate method [62] or an adapted phenol-chloroform method [63]. All samples were visualised on a 0.8% agarose gel stained with SYBRsafe (Invitrogen) to check for DNA quality. DNA concentration was estimated by measuring the optical density of a sample at 260 nm using a fluorometer. Each sample was checked for amplification prior to tests using the LEI160 primer set [27], a locus that amplifies across all of approximately 100 various bird species tested to date (DA Dawson, unpublished data).

Each 10-μl PCR contained approximately 10 ng of DNA and 0.25 units of Taq DNA polymerase (Bioline) in the manufacturer's buffer with a concentration of 1.0 μM of each primer, 2.0 μM MgCl2 and 0.20 mM of each dNTP. Loci were amplified by PCR using a thermal cycler (MJ Research model PTC DNA engine) and the following program: one cycle of 3 min at 94°C followed by 35 cycles at 94°C for 30 s, annealing temperature (temperature gradient from 54–66°C) for 30 s, 72°C for 30 s and a final extension cycle of 10 min at 72°C. PCR products were visualized on 2% agarose gel stained with SYBRsafe (Invitrogen) to check for amplification success. Amplification success was a binary variable, which we defined as 'successful' if a single clean band could be visualised on the gel; multiple band patterns or no products were recorded as 'failed'.

Cross-species amplification rates of consensus and standard primers

In addition to the standard primers, we designed a second pair of consensus primers with a minimal number of mismatches between chicken and shorebird sequences. Within microsatellite flanks, the degree of sequence similarity varied. Some regions had fewer mismatches between chicken and shorebirds than others. To identify conserved flanking regions we aligned shorebird and chicken microsatellite sequences for 33 sequences with an E-value of E-19 or lower using the CLUSTAL W algorithm [64] with the default options implemented in MEGA 3.1 software [65]. For 24 charadriiform microsatellite loci we were able to design consensus primers with a maximum of three base mismatches per primer pair (sequences are provided in Additional file 1). Only one of the 24 consensus primer sets had a perfect match between the Charadriiformes and chicken sequence. Therefore, we introduced binary degenerate bases into the primer sequence at mismatch positions that provided a consensus for both sequences. If degenerate bases were introduced and several suggested primer candidates had the minimal number of three or fewer base mismatches, we chose the candidate that had base mismatches closer to the primer's 5' end. If a 'pigtail' had been added to the reverse primer of the standard primer set for a locus, the same 'pigtail' was also added to the corresponding reverse primer of the consensus primer pair.

To obtain consensus primers we had to relax the conditions used for primer design (see above). Tm's for consensus primer sets were usually lower than those for the standard primer sets. Therefore we tested all 24 consensus primers using a lower annealing temperature gradient (50–62°C). All other PCR conditions were kept the same as used in standard primer PCR amplifications. Consensus primers derived from a Charadriiformes-chicken alignment are labelled with the prefix "Gga" (for Gallus gallus).

Polymorphism and observed heterozygosities

Twenty-three of 27 loci that amplified successfully in all three species were assessed for heterozygosity and polymorphism (Tables 3 & 4). Primer sets for four loci were dropped. Primers for BmaTATC353 and BmaGACA456 had yielded single amplified products when examined on an agarose gel. However, when we examined polymorphism on the ABI3730 DNA Analyzer, genotypes contained multiple peaks and the loci could not be reliably scored. Loci K16 and Calex-08 were found to be expressed sequence tag (EST) loci. EST loci were not included in the present study. Microsatellite markers have been previously obtained from EST databases [66, 67] and their cross-species utility is described elsewhere [[68] and DA Dawson, in preparation].

To characterise correlates of microsatellite variability we investigated two different measures. First we examined the proportion of 12 test species in which we found two or more alleles for a given microsatellite locus using four unrelated individuals. Polymorphism tests were carried out only with a single primer pair (consensus or standard) for any given locus. If both consensus and standard primers had amplified across all three test species we chose the primer set that produced the cleanest product. PCRs were performed using the same conditions as described for amplification, with the difference that the annealing temperature was a common temperature at which the primer set had amplified in all three species. A fraction of the PCR product was loaded onto an ABI 3730 Analyzer using dye set DS-30, filter set D and ROX size standard for allele size determination, and the resulting genotypes were scored using GENEMAPPER 3.7 software (Applied Biosystems). The twelve test species were chosen from different branches of the Charadriiformes to ensure phylogenetic independence (Kentish plover, whiskered auklet, ruff, collared pratincole (Glareola pratincola), brown skua (Catharacta lonnbergi), gull-billed tern (Gelochelidon nilotica), red-necked phalarope (Phalaropus lobatus), great snipe (Gallinago media), dunlin (Calidris alpina), oystercatcher (Haematopus ostralegus), avocet (Recurvirostra avosetta) and greater sheathbill).

The second variable for polymorphism, observed heterozygosity, was determined in whiskered auklet, ruff and Kentish plover. Here we tested primers in a total of 16 individuals per species. In addition to observed heterozygosity (Ho), we calculated expected heterozygosity (He) and estimated the null allele frequency using the program CERVUS v2.0 [69]. We performed tests for linkage equilibrium and compliance to Hardy-Weinberg equilibrium using the program GENEPOP v3.3 [70].

Statistical analysis

Non-parametric tests were used to test whether locus conservation and primer design affected amplification success, polymorphism and observed heterozygosity.

To examine the correlates of amplification success and polymorphism we used several statistical models. Amplification success was a proportional response variable which could take the value 0/3 (no amplification in any species), 1/3 (amplification successful in one species), 2/3 (amplification successful in two species) or 3/3 (amplification successful in all three species). The variables associated with amplification success were examined statistically by incorporating explanatory variables of the following three categories into the maximal model: i) characteristics of the microsatellite locus (repeat length; whether a microsatellite was interrupted or not; the type of the microsatellite motif, i.e. whether the repeated base unit was a di- or tetranucleotide; observed heterozygosity in the species of isolation; and ΔTmH DNA-DNA hybridisation value between source species and target species as a measure of genetic distance [40]), ii) characteristics of the homologous sequence in chicken (single hit or hitting at multiple locations, microsatellite retained or absent) and, iii) properties of the standard primers (number of mismatches between chicken and charadriiform sequence). For each locus only the amplification results for the standard primers went into the analysis.

The response variables for polymorphism, Polymorphism and observed heterozygosity were tested with the same explanatory variables as amplification success with the following deviation: the explanatory variable 'ΔTmH DNA -DNA hybridisation value' was dropped for the analysis of Polymorphism since we tested for polymorphism over a range of species.

To identify correlates of amplification and Polymorphism we constructed two GLMs with appropriate error structure, including all explanatory variables and two-way interactions. GLMs were then simplified based on Akaike information criterion (AIC, [71, 72]). Model simplification was performed in rounds, removing the highest non-significant parameter at the beginning of each round until the minimal AIC value was reached. The final models contained only explanatory variables with P-values smaller than 0.1. Each microsatellite locus was considered as a unit of analysis.

For observed heterozygosity we used a GLMMs with the same explanatory variables as for amplification success (see above) acting as fixed effects. Target species and microsatellite locus were included in the model as nested random effects (target species | locus (target species)). GLMMs were simplified by removing non-significant parameters hierarchically, starting with high-order terms to minimise model deviance. Model simplification was continued until the current and preceding model deviated significantly from each other as examined by an F-test. The final models contained only explanatory variables with P-values smaller than 0.1.

Statistical analyses were carried out using R software version 2.4.1 [73]. All tests presented are two-tailed.