Genetic mapping and BAC assignment of EST-derived SSR markers shows non-uniform distribution of genes in the barley genome
- First Online:
- Cite this article as:
- Varshney, R.K., Grosse, I., Hähnel, U. et al. Theor Appl Genet (2006) 113: 239. doi:10.1007/s00122-006-0289-z
- 299 Views
A set of 111,090 barley expressed sequence tags (ESTs) was searched for the presence of microsatellite motifs [simple sequence repeat (SSRs)] and yielded 2,823 non-redundant SSR-containing ESTs (SSR–ESTs). From this, a set of 754 primer pairs was designed of which 525 primer pairs yielded an amplicon and as a result, 185 EST-derived microsatellite loci (EST–SSRs) were placed onto a genetic map of barley. The markers show a uniform distribution along all seven linkage groups ranging from 21 (7H) to 35 (3H) markers. Polymorphism information content values ranged from of 0.24 to 0.78 (average 0.48). To further investigate the physical distribution of the EST–SSRs in the barley genome, a bacterial artificial chromosomes (BAC) library was screened. Out of 129 markers tested, BAC addresses were obtained for 127 EST–SSR markers. Twenty-seven BACs, forming eight contigs, were hit by two or three EST–SSRs each. This unexpectedly high incidence of EST–SSRs physically linked at the sub-megabase level provides additional evidence of an uneven distribution of genes and the segmentation of the barley genome in gene-rich and gene-poor regions.
The cereal species assigned to the Triticeae tribe comprise important staple crops including wheat, barley and rye. They are characterized by large genomes, ranging in size from 5.6 × 109 bp for barley up to 1.5 × 1010 bp for wheat (Bennett and Leitch 2003). More than 80% of their genomes consist of repetitive DNA, which in turn mainly consists of transposable elements (Schulman et al. 2004). The large content of repetitive DNA forms the major obstacle for sequencing the Triticeae genomes resulting in only 6.1 Mb of genomic sequence available in the public domain as of July, 2005 (http://www.ncbi.nlm.nih.gov/). Notwithstanding these limitations the available data may shed some light on the organization of cereal genomes at the sequence level in general and on the distribution of genes along the chromosomes in particular. In several instances, gene islands have been identified, which are characterized by a relatively high density of genes spaced between 5 and 10 kb (for references see Keller and Feuillet 2000). Gene islands are contrasted by gene-free regions, which may extend over several hundred kilobases, and which are mainly composed of repetitive DNA and frequently show reduced recombination or no recombination at all (Wicker et al. 2001, 2005). Similar findings have been obtained at a larger resolution provided by genetic and cytogenetic maps. Here, about 50% of the single and low-copy markers from a genome wide map of barley could be assigned to only 5% of the physical genome complement indicating the presence of a distinct gene space (Künzel et al. 2000). Similar observations have been reported from physical mapping studies in wheat using deletion lines (for reference see Gill 2004; Erayman et al. 2004).
Due to the abundance of repetitive DNA, sequencing of expressed sequence tags (ESTs) has been the key approach for systematic gene identification in Triticeae species. In the case of barley 419,146 ESTs (Hordeum vulgare ssp. vulgare and H. vulgare ssp. spontaneum) are deposited at present in dbEST (http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html, dbEST release 071505). These are expected to cover a significant portion of the gene repertoire of barley and provide the groundwork to understand the organization of the barley transcriptome (Zhang et al. 2004). For structural genomics ESTs provide a valuable resource for the development of functional molecular markers to be deployed in comparative mapping studies (Anderson and Lübberstedt 2003; Perovic et al. 2004). Among the most important and popular molecular markers that can be developed from ESTs are simple sequence repeats (SSRs) or microsatellite markers (see Varshney et al. 2005a).
Microsatellite markers have been deployed in a variety of applications in plant genetics and breeding. In the cereals, dense microsatellite maps comprising more than 2,000 loci are available in maize and rice and, similarly, about 2,000 microsatellites have been mapped in wheat (reviewed by Varshney et al. 2004). In barley, about 400 SSR loci have been mapped (Ramsay et al. 2000; Pillen et al. 2000; Li et al. 2003). While most of the available SSR markers were developed from genomic DNA libraries based on experimental approaches, the availability of ESTs facilitated the systematic identification of SSRs and corresponding marker development based on computer-assisted analytical approaches (Varshney et al. 2002; Thiel et al. 2003).
In this study we have searched a set of 111,090 barley ESTs for the presence of SSRs. The corresponding information was used for the development and genetic mapping of a non-redundant set of 185 genic microsatellite markers. The distribution of a randomly chosen set of 129 out of the 185 SSR markers in a large-insert genomic [bacterial artificial chromosomes (BAC)] library provided evidence on the non-random distribution of genes within the barley genome.
Materials and methods
Simple sequence repeat polymorphisms were screened in a set of five barley (H. vulgare L.) cultivars comprising Barke, Igri, Franka, Steptoe and Morex, and two genetics stocks of Oregon Wolfe barley (OWB), OWBDom and OWBRec. Barke was included as a standard because this cultivar was used for the construction of most EST libraries at Institute of Plant Genetics and Crop Plant Research (IPK), whereas the other six genotypes represent the parents of three doubled haploid (DH) mapping populations (Igri × Franka, Steptoe × Morex, OWBDom × OWBRec). Genomic DNA isolation was carried out as given in Thiel et al. (2003).
Marker development and map construction
In order to minimise redundancy, a cluster analysis was performed on ESTs containing SSRs (SSR–ESTs) using the StackPACK2.1 software (Miller et al. 1999). Primer pairs for non-redundant microsatellites were designed using the PRIMER3 software as described earlier (Varshney et al. 2002; Thiel et al. 2003).
Polymerase chain reaction amplification of microsatellite loci, gel electrophoresis, visualization and linkage mapping were performed as described earlier (Thiel et al. 2003). Mapped markers are coded as Gatersleben Barley Microsatellite followed by a four-digit numerical code as locus identifier. Generally, one marker was mapped in one of the three mapping population listed above. However, 14 markers were mapped in more than one mapping population. In addition to these common markers, other common RFLP markers (data not shown) and anchor or BIN markers (Kleinhofs and Graner 2001) available on the genetic maps of these three populations were used to prepare a consensus map using the JoinMap v 2.0 software (Stam 1993).
Polymorphism information content
PCR-based screening of the barley BAC library
A four-step PCR-based screening protocol was established for identifying gene-containing clones in an ordered BAC library of barley with more than 300,000 clones (Yu et al. 2000) utilizing the same primer pairs that were used for genetic mapping of SSR markers. Amplification was achieved in a total volume of 20 μl (buffer: 10 mM Tris–HCl (pH 9.0), 50 mM KCl, 0.1% Triton X-100, 2 mM MgCl2), 0.2 mM each of dATP, dCTP, dGTP and dTTP, 100 pmol of each primer and approximately 1 U of Taq polymerase) using a touchdown PCR protocol (95°C, 3 min/10 cycles: 95°C, 1 min; 65–0.5°C/cycle, 1 min; 72°C, 1 min/25 cycles: 95°C, 1 min; 60°C, 1 min; 72°C, 1 min/72°C, 3 min/15°C). The first round of screening was performed on 90 so-called super pools of BAC-DNA each comprising 3,456 clones from 9 consecutive 384-well microtiter plates of the library. Thus a total of 311,040 clones out of 313,344 clones of the Morex BAC library were used for screening. Amplicons with a typical size of 100–500 bp were analysed on 3% agarose gels. For those super pools that yielded a fragment of the same length as genomic DNA from H. vulgare cv. Morex, all nine individual plate pools of BAC DNA were examined during the second round of screening. The third round of screening was performed on 16 row and 24 column pools running through a positive plate. These row and column pools were derived from rows and columns running through a rectangular arrangement of 24×34 microtiter plates of the BAC library to minimize the number of DNA preparations. From the glycerol stock of a clone present at the intersections of positive row and column pools within a positive plate, a frozen bit of bacterial culture was obtained and grown for 20 h at 37°C in 200 μl LB medium (1 l: 5 g yeast extract, 10 g NaCl, 10 g tryptone, pH 7.0) containing Chloramphenicol (30 μg/ml). Five microlitre of such an overnight culture was used for PCR verification of the results from the pool screenings. On average, this strategy should require 390 PCRs (without control reactions) for the identification of all BAC clones in the library containing a single copy sequence, assuming a six-fold genome coverage of the library.
Preparation of pools of BAC DNA
A total of 1,930 pools of BAC DNA were prepared for the PCR-based screening strategy: 810 plate pools, each from the 384 BAC cultures of a single microtiter plate, and 576 column and 544 row pools. Super pools were assembled after the preparation of DNA from nine plate pools. Using a liquid handling robot equipped with a plate storage device, all bacterial cultures belonging to a pool were collected into a 96-well microtiter plate by placing 4–6 cultures (10 μl each) in a single well. After growth for 20 h at 37°C in a total volume of 200 μl LB medium per well, the whole contents of this microtiter plate was transferred to 460 ml of liquid medium (two portions of 230 ml in a 1 l Erlenmeyer flask each) and grown for another 16 h at 37°C. DNA was prepared from these cultures using Qiagen Maxi Prep kits (Qiagen, Hilden Germany) as recommended by the manufacturer. On average, 110 μg DNA were obtained. Approximately 25 ng of pool DNA was used as template for PCR for screening.
Testing the null hypothesis about uniform distribution of genes
The BAC library comprising 311,040 BAC clones was screened with a total of 129 markers. Two markers did not hit any BAC clone, and a total of 311 BAC clones were hit 318 times by 127 markers, meaning that some BAC clones were hit by more than one marker. The total number of collisions, C, was defined as follows: for each i = 1, 2,...,311,040, let Ni denote the number of BAC hits on BAC i, define by Ci = Ni − 1 the number of collisions on BAC i, and define the total number of collisions by C = ∑iCi, where the sum runs over all indices i for which Ci is positive. The null hypothesis that the 318 BAC hits were distributed uniformly among the 311,040 BACs of the Morex library used for screening was tested by using C as test statistic, and the probability of finding C or more than C collisions under the null hypothesis was estimated by the following simulation.
Simulation of the screening process leading to gene-containing BAC clones
A hypothetical genome was divided into 311,040 overlapping segments of identical size to represent the 311,040 clones of the BAC library. Neighbouring segments were made to overlap by 5/6 of their length to simulate the sixfold genome coverage of the barley BAC library.
In each simulation run 129 markers (or genes) were placed at random positions in the hypothetical genome. All six segments overlapping at the position of a gene were labelled. Each of the labelled segments was selected at random with a probability of 2.465/6 to account for the fact that on average 2.465 = 318/129 BACs per gene were identified in the screening process. From these selected segments the number of collisions was calculated.
Each simulation run was repeated 108 times to obtain reliable estimates of the probability of the occurrence of 7 or more than 7 collisions down to a P value of 10−7. In order to test the reliability of the P value estimates and to obtain their 95% confidence intervals, the batch of 108 simulation runs was repeated 200 times.
Development of microsatellite markers
A dataset of 111,090 IPK-barley ESTs was screened using the MIcroSAtellite (MISA) software (Thiel et al. 2003). This identified 9,564 (8.6%) redundant SSRs in 8,766 (7.9%) ESTs. Cluster analysis of these SSR-ESTs yielded a final number of 3,122 (2.8%) non-redundant SSRs present in 2,823 ESTs. As expected, trimeric SSRs constitute the major portion at 52.6 and 63.4% of the total SSRs identified in non-redundant and redundant SSR–EST sets, respectively. Pentameric and hexameric microsatellites were present at less than 1% of all the SSRs searched. The SSR motif AG/CT among dimeric SSRs and the motif CCG/CGG among trimeric SSRs were the most abundant. The most frequent tetrameric microsatellite motif was ACGT/ATGC. No specific trends were observed for pentameric and hexameric SSR motifs.
For the development of EST–SSR markers, SSR–ESTs were selected by using two strategies (Fig. 1). In the first approach, non-redundant SSR-ESTs were selected from the set of 111,090 IPK-ESTs for designing primer pairs. A total of 665 primer pairs (including 311 primer pairs reported earlier in Thiel et al. 2003) were employed to amplify the corresponding SSR loci in the set of seven genotypes. Of the 464 primer pairs (69.8%) which yielded amplicons in the analysed genotypes, 156 primer pairs (33.6%) displayed polymorphisms between the parents of at least one mapping population.
To enhance the level of polymorphism in the genotypes of interest, a second strategy, based on additional non-IPK barley ESTs from the public domain, was adopted. While the IPK-ESTs were derived from the cultivar Barke, non-IPK ESTs from the public domain were developed from a diverse series of cultivars (Kota et al. 2003). Hence, a comparison of Barke to non-Barke ESTs would allow a pre-selection of polymorphic SSRs. The 207,449 non-IPK barley ESTs were screened for the presence of SSRs. This resulted in the identification of 18,041 SSR–ESTs containing 24,623 SSRs. In the combined set of 318,539 ESTs, a total of 26,807 ESTs were found to contain 34,187 redundant SSRs and 7,438 non-redundant SSRs. This corresponds to one SSR every 4.78 kb in the transcriptome of barley based on 163.5 Mb of EST sequence analysed.
The combined, total set of redundant SSR–ESTs was subjected to cluster analysis. As a result three types of clusters containing SSR–ESTs were observed from (a) only IPK ESTs, (b) only non-IPK ESTs, and (c) mixed clusters containing IPK and non-IPK ESTs (Fig. 1). Mixed clusters, were further analyzed after preparing the consensus sequence of IPK ESTs and non-IPK ESTs, separately. These two consensus sequences of the mixed clusters were compared to detect variation in SSR length. Altogether a total of 197 mixed clusters containing SSR–ESTs that showed variation in SSR length between IPK and non-IPK ESTs were identified. Of this set, 89 IPK SSR–ESTs were selected for amplification of the corresponding microsatellite loci in the set of seven accessions, and amplicons were obtained with 61 (68.5%) of the primer pairs. Of these, 38 (62.3%) SSR–ESTs showed polymorphisms between two or more of the seven genotypes used in the present study, and 29 detected polymorphisms that could be mapped in the corresponding mapping populations. Thus the level of polymorphism detected in parental genotypes of at least one mapping population (29/61) was increased by 13.9%, and this higher level of polymorphism, compared to the first strategy, was statistically significant (χ2 test, P < 0.01).
Using both strategies a total of 754 primer pairs were analysed on the set of seven genotypes. A total of 525 (69.6%) primer pairs yielded amplicons of which 185 (35.2%) primer pairs detected polymorphism between parents of at least one mapping population (Fig. 1).
Genetic mapping of microsatellite loci
Summary on EST-SSR markers
Number of markers
BAC addresses obtained
Range of PIC value
0.24–0.78 (mean 0.48)
Polymorphism information content
For all mapped markers, the PIC value was calculated on the basis of observed alleles in six (with 76 markers GBM1001–GBM1076) or seven genotypes (with remaining markers). The mapped markers detected 2–5 alleles with an average of 2.7 alleles per locus. Their corresponding PIC values ranged from 0.24 to 0.78 with an average of 0.48. About half of the markers (95) displayed PIC values greater than 0.50 (Table ESM 1). Markers derived from 3′ESTs showed higher PIC values than those derived from 5′ESTs. For instance, 33.3% of the markers from 3′ESTs and 25.2% of the markers from 5′ESTs had a PIC value greater than 0.60 (Table ESM 1). Regarding their SSR motifs, 36.9% of the dimeric, 19.5% of the trimeric and 25% of the tetrameric microsatellites had PIC values greater than 0.60.
A putative gene function could be assigned to 103 (55.7%) mapped SSR–EST markers based on a comparison to the NR-PEP protein sequence database. Among these markers, 69 showed homology with known proteins, 21 with putative proteins, six with hypothetical proteins and 7 with (presently) unknown proteins (Table ESM 1). The remaining 82 SSR–EST markers did not show any significant homology to a known protein.
Assignment of SSR markers to BAC clones
Gene-containing BAC clones
A genomic BAC library of barley was screened using 129 EST-based SSR markers. These were randomly selected from the set of 185 mapped SSRs to create anchor points between the genetic map and a ‘future’ physical map of barley. A four-step PCR-based screening strategy was developed employing DNA of BAC pools and a confirmation step at the level of individual clones. Using that strategy the BAC clone addresses were obtained for 127 (98%) of the SSR markers assayed (Table ESM 1).
Almost the complete (> 99%; 311,040 clones of the available 313,344 clones) BAC library was screened during the initial screening step performed on 90 super-pools each containing the DNA of 3,456 BAC clones. In order to save time and costs typically three super-pools were then chosen to continue the screening process. 311 individual BAC clones could be identified from a total of 384 positive super pools selected in this way. Their plate addresses are given in Table ESM 1. For 15 primer pairs, 2 to 5 BAC clones were identified on a single microtiter plate of the ordered BAC library. This frequency is almost 40 times greater than the expected value. Although these BAC clones do not occupy neighbouring wells in all cases, we regarded them as cross-contaminations, which may have occurred during construction, transport or copying of the library and counted them as one BAC clone in the statistical analysis outlined below. Previous testing of the used copy of the library via hybridization of radiolabelled gene probes to colonies spotted onto nylon membranes had not revealed this problem, which may have been uncovered only due to the higher sensitivity of PCR. At present, we observed cross-contaminations in 15 microtiter plates out of 254 from which we obtained individual clones during this study. Therefore, users of the BAC clone information should be aware that testing of individual colonies from an address of the ordered library is required to confirm the given information.
Identification of gene-rich BAC clones
Three positive super pools were selected at random for each marker to be subsequently resolved down to the level of individual BAC clones. Thus only about 50% of the positive pools were analysed (because in a 6× library six hits would be expected on average). While in the majority of cases BACs were hit by a single marker only, six BAC clones harbouring at least two markers were identified. EST cluster information (see project ‘g03’ at http://pgrc.ipk-gatersleben.de/cr-est/ on 135,031 barley ESTs that includes 111,090 barley ESTs, searched for SSRs in the present study) and the BLASTX analysis showed that these markers were derived from independent genes.
Bacterial artificial chromosomes contigs identified by EST–SSR markers
Position on genetic map
064A24, 312D09, 526J23, 579B12
064A24, 312D09, 526J23, 579B12
064A24, 169D02, 312D09, 526J23, 579B12
41C24, 93E12, 305H14
41C24, 274P02, 559M03
194B18, 486M04, 509D02, 537M01, 582L01, 679O07
486M04, 537M01, 679O07
194B18, 486M04, 509D02, 537M01, 582L01, 679O07
301H19, 516M05, 676O18
334A06, 499N10, 516M05, 676O18
334A06, 499N10, 516M05, 676O18
214C05, 256M21, 256M22, 317A19, 676C09
214C05, 230C05, 256M21, 579A07
113D01, 113F01, 306D14, 351F23, 810K09
290K01, 310J18, 474G04, 804L09
290K01, 310J18, 474G04, 804L09
111C05, 274E14, 536G12
111C05, 274E14, 536G12
In addition, re-screening of the BAC clones with neighbouring markers (linked at less than 2 cM) provided the BAC addresses for two additional SSR markers. As a result, the BAC addresses are now available for 129 markers (Table 1).
Computational analysis on gene-distribution
Screening of the BAC library yielded eight BAC clones hit by at least two markers. Two BACs (312D09 and 526J23) were hit by GBM1073 and GBM1159 representing the identical gene. However, the markers for the remaining 6 BAC clones [(194B18, 582L01 (hit by GBM1056 and GBM1059), 486M04 (hit by GBM1050 and GBM1056), 499N10 (hit by GBM1049 and GBM1057), 516M05 (hit by GBM1021, GBM1049 and GBM1057) and 804L09 (hit by GBM1058 and GBM1174)] represent different genes. Thus 310,729 BAC clones were not hit by any marker, 305 BAC clones were hit by exactly one marker, 5 BAC clones were hit by exactly two markers, 1 BAC clone was hit by exactly three markers, and no BAC was hit by more than three markers. This yielded a total of 305 × 1 + 5 × 2 + 1 × 3 = 318 BAC hits, and a total number of C = 5 × 1 + 1 × 2 = 7 collisions (because there are 5 BACs with Ci = 1, and there is one BAC with Ci = 2).
In the present study, 109 novel EST-derived SSR markers were developed further increasing the number of EST–SSRs on our genetic consensus map to 185. The assignment of 129 SSRs to the corresponding clones of the Morex BAC library will provide a first resource of connecting points between existing genetic maps and upcoming local physical maps (http://www.phymap.ucdavis.edu:8080/barley/index.jsp) of the barley genome.
Frequency of microsatellites in the barley transcriptome
Over the past 5 years, large-scale genome and EST sequencing projects were initiated in several plant species including cereals. The data generated from these projects was utilized for studying the frequency, distribution and organization of microsatellites in the expressed portion of the genome, and in some cases also in the whole genome (reviewed by Varshney et al. 2005a). In the present study, a total of 9,564 (8.6%) redundant and 3,122 (2.8%) non-redundant SSRs were identified in a dataset of 111,090 ESTs. The overall trend in the frequency and distribution of different classes of SSRs agreed with results obtained in earlier studies (Varshney et al. 2002, 2005a, 2005b Thiel et al. 2003).
Level of polymorphism of EST–SSRs
For the development of microsatellite markers, two strategies were adopted. Using the first strategy, primers were developed flanking SSR-motifs in a set of randomly selected ESTs. Here 33.6% of the functional primer pairs displayed a polymorphism between the parents of the mapping populations. In the second strategy, which was based on the in silico pre-selection of SSR-containing ESTs that were polymorphic between the cultivars Barke and varieties other than Barke, 47.5% of the functional primer pairs could be mapped in at least one of the populations studied. The application of the second strategy resulted in a significant increase in the detection rate of polymorphisms. In 38 out of 41 SSR–ESTs, in silico polymorphism was confirmed on the experimental level. In three cases, however, a 2 bp polymorphism in silico predicted between Barke and Morex SSR–ESTs could not be confirmed. This may be due to the limited resolving power of the polyacrylamide gel system used. The remaining 20 SSR–ESTs showed polymorphisms between Barke and Japanese genotypes (Haruna Nijo, Akashinriki and H602). While these SSRs were monomorphic in the genotypes analyzed in this study, 11 (55%) out of these 20 SSR–ESTs showed polymorphisms in the parental genotypes of two other mapping populations and were successfully integrated into genetic maps (R. Niks, Wageningen, personal communication). Hence, the computational pre-selection of polymorphic SSRs presents an efficient strategy to increase the success in the development of informative SSR markers. Similar results were obtained from database mining for SNPs, where the computational selection of polymorphic SNPs in the EST-database increased the likelihood of detecting polymorphism in a given set of germplasm (Kota et al. 2003).
Approximately 70% of the analysed primer pairs were functional of which 35.2% were polymorphic in the parents of the mapping populations. Failure of amplification in the remaining 30% primer pairs may have been due to primer mismatch, the extension of primers across a splice site or the presence of large introns in the genomic DNA fragment to be amplified. Moreover, only one standard amplification protocol was applied to all markers, and no efforts were undertaken to optimize amplification conditions for unsuccessful primer pairs (Thiel et al. 2003). As reflected by an average PIC value of 0.48, the polymorphism of EST-derived microsatellites is lower than that of genomic DNA-derived microsatellites (Ramsay et al. 2000; Li et al. 2003).
Development of functional microsatellite markers
Genetic mapping of 185 microsatellite markers showed no obvious clustering around the centromere as was observed for microsatellites derived from genomic DNA (Ramsay et al. 2000; Li et al. 2003). The observation of a higher number of markers on chromosome 3H (18.8%) and chromosome 2H (17.2%) suggests the presence of more genes on these two chromosomes as has also been observed in case of a transcript map containing more than 1,000 genes (unpublished results). Physical mapping of ESTs using deletion lines of wheat also revealed the highest number of EST loci on homologous group 3 (16.3%) followed by group 2 (15.9%) (http://wheat.pw.usda.gov/cgi-bin/westsql/map_locus.cgi, Qi et al. 2004).
Although it seems that further markers are required to cover the distal portions of chromosome arms 1HS, 5HS, 6HS and 6HL, some of these gaps correspond to gaps on other published maps (Kleinhofs et al. 1993; Ramsay et al. 2000). Synteny between rice and barley may facilitate filling up these gaps, if required, by using bioinformatics analyses in combination with available barley ESTs and the completed rice genome sequence, as it has been shown recently (Perovic et al. 2004; Varshney et al. 2005b).
Physical anchoring of mapped SSR–ESTs and identification of gene-rich BAC clones
The availability of genome-wide BAC contigs has been an invaluable resource for sequencing the genomes of Arabidopsis and rice (Sasaki and Burr 2000). However, progress in contig-based physical mapping of the barley genome is slow because of the large genome size and the limited efforts that have been initiated to date. To further circumvent the complexity of large genomes, novel approaches such as methylation and Cot filtration are being investigated to reduce the representation of heterochromatic regions in BAC libraries and to confine the construction of physical maps to the gene space (Rabinowicz et al. 2003; Peterson et al. 2002). Another approach to enrich for coding regions may be the pre-selection of gene-containing BACs.
Bacterial artificial chromosomes screening and genetic mapping revealed eight groups of markers which selected one or more BAC clones with at least two genes. A comparison of the presented data with the US Barley Physical Mapping Database (BPMD; http://phymap.ucdavis.edu:8080/barley/index.jsp) confirms the existence of seven contigs. BAC clones for the remaining contig (IPK-ctg06-6H-95) were not present in the BPMD at the time of this analysis.
The fact that markers are spaced at 2.7 cM and yet co-locate on a single BAC could be indicative of a recombination hotspot. On the other hand the observed map distance may have been overestimated as a result of merging the segregation data of the three different populations employed in this study into a consensus map. Inaccurate estimates of small map intervals are almost unavoidable if neighboring markers were mapped in different populations (Somers et al. 2004), as is the case with markers GBM1423 and GBM1212 in contig IPK-ctg05-6H-50, that were mapped in the OWB and the Steptoe × Morex populations, respectively.
Non-random distribution of genes in the genome
The statistical analysis of the distribution of the SSR markers across the BAC library provided evidence that the distribution of genes across the BACs is not uniform on a genome-wide scale. A non-uniform gene distribution in large genomes of cereal crops including barley was suggested earlier by buoyancy density gradient methods (Carels et al. 1995; Barakat et al. 1997) or by a comparison of genetic with physical maps based on cytogenetic stocks (Künzel et al. 2000; Erayman et al. 2004; Gill 2004). The results of the present study provide evidence that this non-uniform gene distribution on the physical level extends to a resolution of about 100 kb (the average size of a BAC clone). Sequence analysis of DNA contigs in wheat and barley has already pointed at the presence of clusters of closely linked genes forming gene islands that are separated by large stretches of repetitive DNA (e.g. Rostocks et al. 2002; Gill 2004; Wicker et al. 2001, 2005). The majority of the corresponding data are biased towards disease resistance genes, raising the possibility that their pattern of distribution is not representative for the whole genome. In the present study, functional annotation of genes mapping to a single BAC does not show any bias towards a specific functional category. However, two contigs comprised markers representing different members of gene families (GBM1040, GBM1073, GBM1074; GBM1065, GBM1456). Since there is evidence that about 50% of the barley genes represent members of gene families, this result is expected (Zhang et al. 2004). Although unlikely, we can not completely rule out that the findings of the present study pertain only to genes containing SSRs. BAC selection by EST-derived probes devoid of SSRs will help to address this issue.
We are grateful to Timothy J. Close (University of California, Riverside, USA) for his valuable suggestions on the physical mapping data. We thank Uwe Scholz and Christian Künne (IPK) for performing cluster analysis of SSR-ESTs of IPK and non-IPK ESTs, and Paul Krapivsky (Boston University, Boston, USA), Stefan Posch (Martin Luther University Halle-Wittenberg, Halle, Germany) and Roland Schnee (IPK) for helpful discussions. We also thank Christine Künzel, Anita Czech, Brigitte Schmidt and Ingelore Dommers for technical assistance. The present work was funded by grants from the Grain Research and Development Corporation, Australia (GRDC, UA476), the Federal Ministry of Education and Research (BMBF, GABI-PLANT 312271A,B,C) and BMBF Bioinformatics Centre, Gatersleben/Halle 0312706A).