Background

The potential for the molecular analysis of human genetic material has increased enormously with the availability of the human genome sequence, SNP identification efforts and the development of high-throughput genotyping platforms [1]. The expanding demand for single nucleotide polymorphism (SNP) genotyping is a consequence of the recognition that many SNPs will need to be analyzed to characterize the effects of genes on complex disorders [2], especially when performing whole genome association studies [3]. With notable exceptions [4], total DNA requirements for genotyping will increase as the number of loci investigated expands, despite increased efficiency of individual genotyping assays. Whole genome amplification (WGA) is an in vitro procedure to amplify a genomic DNA (gDNA) sample to generate amplified DNA (wgaDNA) for further molecular genetic analyses, and has been considered by some as a potential solution to the problem of limiting gDNA availability. While PCR-based methods of WGA have been under continuous development for over a decade [5, 6], recent application of a highly processive φ29 DNA polymerase [7], has enabled multiple displacement amplification (MDA) WGA, an isothermal, hyperbranching amplification method, with a low level of locus or allelic bias [8]. Dean [8] and Lovmar [9] have evaluated the genotyping performance of MDA WGA using a range of genomic DNA inputs (0.3, 3, 30 and 300 ng, and 0.003, 0.03, 0.3 and 3 ng, respectively). Both authors focused attention in their evaluation of genotyping performance on genotyping wgaDNA derived from 3 ng of genomic DNA template. Lasken and Egholm [10] have recommended 10–100 ng of undegraded gDNA template in the MDA WGA reaction to avoid stochastic amplification. The present study has characterized the yield, composition and genotyping performance of wgaDNA produced from lymphoblast gDNA templates of 1, 10, 25, 50, 100 and 200 ng. Three DNA quantification methods, two genotyping methods, and adequate numbers of genotyping assays and DNA samples were used to detect significant differences in the yield, composition and genotyping performance of the wgaDNA produced from this range of gDNA inputs and to provide additional recommendations on the amounts of gDNA template to be used in the MDA WGA reaction.

Results

WGA reaction yield

The yield of H. sapiens PCR-amplifiable (hereafter "RT-PCR") DNA, ssDNA, dsDNA and total DNA in wgaDNA by gDNA input mass is presented in Figure 1. RT-PCR DNA yield increased significantly as gDNA input increased at each level (all p values ≤ 0.02), where the proportion of the total wgaDNA represented by the RT-PCR DNA increased from 20% to 46%, at 1 to 200 ng gDNA input into the WGA reaction, respectively. The yield of ssDNA decreased, and that of dsDNA increased, as the gDNA input into the WGA reaction was increased. The variability in wgaDNA yield by wgaDNA component was least for total DNA and dsDNA yield, greatest for RT-PCR yield, and intermediate for ssDNA yield.

Figure 1
figure 1

Yield of DNA components of wgaDNA by gDNA input into WGA. Mean ("+"), Median (middle bar), lower and upper quartile (lower and upper end of box), and minimum and maximum of BRCA1 locus equivalents, ssDNA, dsDNA and total DNA.

Genetic profiling with AmpFlSTR® Identifiler®assay (N = 15 STR and AMEL)

gDNA exhibited STR genotype completion and concordance rates of 100%, which were significantly greater than the completion rate exhibited by wgaDNA produced from 1 ng gDNA input and the wgaDNA concordance produced from all gDNA inputs, respectively (Table 1). wgaDNA produced from 1 ng gDNA input exhibited significantly lower STR genotype completion and concordance rates than did wgaDNA produced from other gDNA inputs, while wgaDNA produced from 200 ng gDNA input exhibited STR genotyping completion and concordance rates similar, but not identical to gDNA. 98% of wgaDNA STR genotypes discordant with gDNA genotypes were homozygote genotypes, reflecting loss of heterozygosity. There was a trend for preferential loss of shorter alleles (129 short alleles/232 total alleles, p = 0.088), but only for wgaDNA produced from 1 ng of gDNA was this significant (90 short alleles/145 total alleles, p = 0.0037). Peak heights were significantly and negatively correlated with discordance for all gDNA inputs, and for 1, 10 and 100 ng gDNA inputs separately (Spearman r = -0.58, -0.64, -0.55 and -0.62, with p values <0.0001, = 0.008, = 0.025 and = 0.001, respectively, data not shown), and peak height ratios of concordant heterozygote wgaDNA genotypes (from wgaDNA produced from 1 and 50 ng gDNA inputs) were significantly higher than those from gDNA genotypes (Wilcoxon's p values ≤ 0.03, data not shown).

Table 1 STR genotyping performance

The rate of no amplification and discordant genotypes per STR locus was 0.8% and 4%, respectively. Five STR loci (TPOX, FGA, D7S820, D13S317 and D18S51) accounted for the majority of STR no amplification failures (82%) and discordant (56%) genotypes following WGA (Table 2). The discordance rate for AMEL genotypes for all wgaDNA strata was 0.15%, but was 0.73% for wgaDNA produced from 1 ng gDNA input (Table 2). Composite genotype quality (GQ) scores for gDNA heterozygote and homozygote concordant genotypes were significantly better (fewer genotypes in the poorer quality categories) than for concordant wgaDNA heterozygote genotypes at all gDNA input levels and for concordant wgaDNA homozygote genotypes produced from 1, 100 and 200 ng gDNA input levels, respectively (Table 3). wgaDNA heterozygote and homozygote concordant genotypes produced from 1 ng gDNA input exhibited significantly reduced GQ scores compared to wgaDNA heterozygote and homozygote concordant genotypes produced from all other gDNA input levels, except for wgaDNA homozygote concordant genotypes produced from 200 ng gDNA (Table 3). GQ scores of discordant homozygote wgaDNA genotypes were significantly worse than those for concordant homozygote wgaDNA genotypes at all gDNA input levels except 50 ng (p = 0.02 for 25 ng gDNA input, all other p < 0.0001, data not shown).

Table 2 STR genotyping failures by locus
Table 3 Genotype Quality (GQ) scores of concordant STR genotypes

SNP genotyping with the TaqMan®assay (N = 49 SNPs)

Results of genotyping using N = 49 TaqMan® SNP genotyping assays with 1, 4 and 20 ng of gDNA and wgaDNA using 1, 10, 25, 50, 100 and 200 ng of gDNA input into the WGA reaction are summarized in Table 4. We observed a TaqMan® SNP genotype completion rate of >99.55%, an undetermined rate of <0.45%, zero discordant genotypes and zero "no amplification failures" in 7938 attempted TaqMan® SNP genotypes using gDNA template in the TaqMan® SNP assay. No significant differences in genotyping performance between gDNA template inputs into the TaqMan® SNP assay were observed (Table 4). In pairwise tests, gDNA exhibited a significantly higher TaqMan® SNP genotype completion rate due to significantly decreased undetermined TaqMan® SNP genotypes, compared to wgaDNA produced from 1 ng of gDNA input for all wgaDNA template inputs into TaqMan® SNP genotyping, and when compared to wgaDNA produced from 50 and 100 ng of gDNA input when using 1 or 4 ng of wgaDNA template input into the TaqMan® SNP assay. Over all gDNA and wgaDNA strata, we observed significantly reduced SNP genotyping performance when using 1 ng of gDNA or wgaDNA in TaqMan® SNP genotyping assays with respect to completion rate, due to a significant increase in the undetermined genotype rate (Table 4). However, genotype concordance rates were not significantly different among the three DNA (gDNA or wgaDNA) input levels into the TaqMan® SNP assay, although there was a significant decrease in the concordance rate of 1 ng wgaDNA produced from 1 ng gDNA into the TaqMan® SNP assay, when compared to 1 ng wgaDNA produced from 10, 50 and 100 ng of gDNA input (Table 4).

Table 4 SNP genotyping performance

Predictors of wgaDNA SNP genotyping performance

We were interested to identify parameters from the Core Genotyping Facility's standard DNA sample handling protocol that might be predictive of the SNP genotyping performance of wgaDNA. We performed exploratory correlation analysis among measures of wgaDNA yield (RT-PCR, ssDNA, dsDNA, total DNA, ratio of RT-PCR to dsDNA) and genotyping performance (concordance and completion rates for AmpFlSTR® Identifiler™ and TaqMan® SNP assay genotyping) within gDNA input strata. Measures of wgaDNA yield (especially the ratio of RT-PCR to dsDNA and total DNA) and genotyping performance were observed to be highly correlated with one another (92%, 3% and 5% of 162 pairwise correlations were statistically significant, trending and non-significant, respectively). We then performed linear regression analysis with the dependent variables "SNP completion rate" and "SNP concordance rate", in order to identify WGA reaction, wgaDNA yield and STR genotyping performance factors that are significantly associated with wgaDNA SNP genotyping performance. Independent variables included: gDNA input, STR completion rate, concordance rate, GQ score, peak height, and RT-PCR wgaDNA yield. "STR completion rate" was a highly significant factor in both SNP rate models (p < 0.0001), and "STR concordance rate" and "GQ score" were significant factors in the SNP concordance rate model (p = 0.0008 and 0.045, respectively). The variable "gDNA input" into the WGA reaction was significant only in those models incorporating the 1 ng gDNA input strata.

WGA yield and genotyping performance with no template control (NTC) samples

No template control (NTC) input samples, i.e., where no gDNA was used in the WGA reaction, yielded substantial amounts of wgaDNA, similar in quantity to the total wgaDNA obtained with gDNA inputs, but with a substantially higher proportion of ssDNA than with gDNA inputs (Table 5). The wgaDNA produced from the NTC samples in the 1, 50 and 100 ng gDNA input strata exhibited mean RT-PCR results that were greater than zero (Table 5). We observed N = 35 STR peaks with a signal strength > 50 RFUs that fell within the expected base-pair range of an AmpFlSTR® Identifiler™ locus allele from the wgaDNA produced from the NTC gDNA and wgaDNA samples for an overall false positive STR genotyping rate of 4.2% (Table 5). While these false positive STR peaks fulfill the criteria for valid AmpFlSTR® Identifiler™ STR alleles, they are characterized by low heterozygosity (2 observed versus 26 expected heterozygote genotypes), moderate signal strength (median amplitude = 357 RFUs), and representation of 12 out of 15 AmpFlSTR® Identifiler™ STR loci. The 50 and 100 ng gDNA input strata (Table 5) and three STR loci (D2S1338, D8S1179 and FGA)account for the majority (66% and 51%, respectively) of the wgaDNA false positive STR genotypes produced from the NTC samples.

Table 5 Yield and STR genotypes from NTC samples

In N = 7056 TaqMan SNP genotype attempts with wgaDNA produced from the NTC samples, 80%, 14.5% and 5.5% of the resulting datapoints were incorporated into the no amplification (NTC) cluster, into a genotype cluster ("false positive SNP genotypes"), and into the undetermined genotype space of the two color TaqMan® SNP genotyping assay plot, respectively (Table 6). The number of false positive and undetermined SNP genotypes from the wgaDNA produced from the NTC samples increased significantly with increasing amounts of wgaDNA input into the TaqMan® SNP assay (Table 6). The majority (96.4%) of these false positive SNP genotypes from NTC samples were homozygotes (Table 6), significantly more allele 2 alleles were observed than allele 1 alleles (Table 6), and all N = 49 TaqMan® SNP assays exhibited false positive SNP genotypes (data not shown). wgaDNA NTC samples from the gDNA input strata of 1, 50 and 100 ng exhibited significantly greater numbers of false positive and undetermined SNP genotypes than did the wgaDNA NTC samples from the gDNA input strata of 10, 25 and 200 ng (Table 6).

Table 6 SNP genotypes from NTC samples

Discussion

wgaDNA may not be suitable for STR genotyping

wgaDNA STR genotyping completion rates reach that of gDNA at the 10 ng gDNA input level into WGA. However, the wgaDNA STR concordance rate is significantly worse than that of gDNA, even with 200 ng of gDNA input into the WGA reaction (Table 1). Thus, the use of MDA wgaDNA for accurate STR genotyping will require larger amounts of input gDNA into the WGA than have been recommended in the past [8, 10]. In the absence of sufficient gDNA template for MDA WGA, investigators face the tradeoff of no data, or data with increased loss of heterozygosity, such as that observed with MDA wgaDNA produced from low mass gDNA templates [11, 12]. Development of laboratory and data analysis protocols optimized for STR genotyping of MDA wgaDNA may be required before MDA wgaDNA can be routinely used for STR genotyping. Thus, it might be prudent to adjust genotype analysis algorithms before application of the AmpFlSTR® Identifiler™ panel to wgaDNA for forensic purposes, as has been recommended for the analysis of STR profiles from highly limited unamplified gDNA template [13], or to utilize analysis methods that incorporate STR genotyping error, as has been recommended for the analysis of STR linkage scan data [14].

wgaDNA is a suitable template for SNP genotyping

wgaDNA produced from ≥ 10 ng of gDNA input into the WGA reaction exhibits robust wgaDNA TaqMan® SNP assay genotyping performance rates, similar to that of gDNA TaqMan® SNP assay genotyping performance rates. 1 ng of wgaDNA template into the TaqMan® SNP assay exhibits significantly reduced TaqMan® SNP assay genotyping performance compared to both 4 and 20 ng wgaDNA templates into the TaqMan® SNP assay. 4 ng wgaDNA template into the TaqMan® SNP assay exhibits a significantly increased no amplification rate over both 1 and 20 ng wgaDNA templates, although no amplification rates are very low (all <0.01%) for all three wgaDNA template inputs into the TaqMan® SNP assay. These results suggest that optimal TaqMan® SNP assay genotyping performance, i.e., minimal wgaDNA TaqMan® no amplification and undetermined genotyping rates, should be expected for wgaDNA inputs greater than 4 ng.

False positive NTC sample SNP genotypes

A non-zero RT-PCR yield and significantly increased numbers of observed false positive genotypes in wgaDNA from NTC samples in the 1, 50 and 100 ng gDNA input strata are consistent with human gDNA contamination of these gDNA input strata. However, we also observed significantly more false positive and undetermined SNP assay genotypes in each of the 10, 25 and 200 ng gDNA input strata (the apparently uncontaminated strata) than in the gDNA strata (all p < 0.0001), concordant with the hypothesis that a portion of the NTC TaqMan® genotypes may be due to degradation of TaqMan® SNP assay reagents. Thus, contamination of NTC samples with gDNA and TaqMan® SNP assay probe oligonucleotide degradation during the genotyping of wgaDNA are both associated with false-positive TaqMan® SNP assay genotypes.

Limitations

This study is distinguished by the use of multiple assays to estimate wgaDNA yield and composition, the use of STR and SNP genotyping assays that have been validated by sequencing the same DNA samples used in this study, and the use of an adequate number of samples and assays to provide statistical power to detect small differences in the genotyping performance of wgaDNA and gDNA, when using 1–200 ng of gDNA as template in the WGA reaction. Nevertheless, there are limitations, with respect to generalizing to all gDNA templates, MDA protocols and genotyping methods, respectively.

The gDNA used in this study was extracted from lymphoblasts and samples from most studies are unlikely to be of such high quality. Using a model system to evaluate the effect of significant gDNA degradation on the WGA reaction, it has been shown that MDA wgaDNA produced from irradiated gDNA exhibits significantly reduced yield and genotyping performance compared to MDA wgaDNA produced from unirradiated gDNA [15]. The yield and genotyping differences observed in wgaDNA produced from high quality (this study) and low quality [15] gDNA samples suggest that those gDNA samples with DNA extraction, storage and usage histories that have reduced concentrations of high molecular weight DNA in the sample are likely to exhibit less than optimal MDA wgaDNA yield and genotyping performance.

While only one commercially available MDA WGA protocol was used in this study, we have evaluated two MDA WGA protocols on gDNA extracted from multiple tissue types, and no systematic significant differences in genotyping performance between the two MDA WGA protocols was observed [16]. STR and SNP genotyping performance of MDA wgaDNA derived from 4 ng of gDNA input in that study is seen to be intermediate between the genotyping performance of MDA wgaDNA produced from 1 ng and 10 ng in this study. Alternative WGA technologies that can prepare wgaDNA of acceptable quality from gDNA with reduced complexity or concentration may be required for some degraded gDNA samples. For example, PCR-based methods that reduce genome complexity before amplification are one approach [6, 17], and methods that combines genome circularization with φ29 DNA polymerase are another [18].

Finally, we applied two commonly used genotyping methods to evaluate the genotyping performance of wgaDNA in this study. Different genotyping technologies may be better suited to produce optimal genotyping performance with wgaDNA than the two we evaluated. For STRs, genotyping panels designed for linkage scanning usually employ lower levels of multiplexing and use larger amounts of DNA template than do STR panels designed for forensic analysis, such as the AmpFlSTR® Identifiler™. E.g., reported MDA wgaDNA STR genotype discordance rates using linkage scan STR panels [19, 20] and forensic STR panels [15, 16, 21] range from ~0% to ~6% and the average rate of the five studies cited (2.0%) is similar to the rates observed in this study. For SNPs, those genotyping technologies with redundant data sampling for SNP genotype determination, such as minisequencing [22], the Golden Gate™ assay [23] or the GeneChip® variant detection array [24], may be more resistant to SNP genotype failure when genotyping wgaDNA [9, 25, 26] than those SNP genotyping technologies with single data point genotype determination [27]. However, in a recent direct comparison of Golden Gate™, TaqMan® and Invader™ SNP assays, with gDNA extracted from lymphoblasts using an organic extraction method and MDA wgaDNA produced from 20 ng of this gDNA, the Golden Gate™ assay exhibited a higher exclusion rate of DNA samples, and a higher completion rate and lower concordance rate on the remaining samples, than exhibited by the TaqMan® and Invader™ SNP assays [28]. For all three SNP genotyping technologies evaluated, the genotyping performance of gDNA was observed to be significantly better than that of MDA wgaDNA [28].

Conclusion

We have evaluated the yield, composition and genotyping performance of wgaDNA based on a range of high-quality lymphoblastoid gDNA templates between 1 and 200 ng in order to provide empirical data on the performance of MDA WGA technology. A detailed analysis of the observed genotyping failures has been performed to facilitate an understanding of the reduction in genotyping performance likely to be observed when genotyping wgaDNA produced from a range of gDNA inputs. Increasing gDNA input from 1 – 200 ng in the MDA WGA reaction improves the yield of H. sapiens PCR-amplifiable DNA and improves the genotyping performance of the AmpFlSTR® Identifiler® assay. More than 100 ng of high quality gDNA template into the MDA WGA reaction is required in order to observe MDA wgaDNA AmpFlSTR® Identifiler® STR genotyping performance similar to that observed with gDNA. At least 10 ng of high quality lymphoblastoid gDNA template into the WGA reaction is required to observe optimal TaqMan® SNP genotyping performance from MDA wgaDNA.

Methods

gDNA samples

N = 22 lymphoblast genomic DNA (gDNA) samples were obtained directly from the Coriell Cell Repository (Camden, NJ); these samples were from individuals within the SNP500 Cancer dataset [29]. Each gDNA was quantified by UV spectroscopy, the PicoGreen® assay (Molecular Probes, Eugene, OR), and a Real-Time (RT) TaqMan® assay specific to human DNA [30]. Five of twenty-two Coriell Cell Repository lymphoblast gDNA samples were replicated for a total of N = 27 lymphoblast gDNA samples subjected to WGA and post-WGA analysis in order to increase statistical power to detect genotyping error.

Whole genome amplification

1, 10, 25, 50, 100 and 200 ng of each gDNA sample was used as template and amplified according to the GenomiPhi™ WGA protocol (1X). The 200 ng gDNA template sample was amplified separately after evaluation of the genotyping performance of the wgaDNA produced from 1 – 100 ng gDNA. Each gDNA sample was subjected to the WGA protocol once; four no gDNA template controls (NTC) reactions were included at each gDNA input level. wgaDNA was quantified with OD260, PicoGreen® and RT-PCR, as was performed for gDNA. The concentrations of ssDNA, dsDNA, total DNA and human-specific PCR amplifiable (RT-PCR) DNA in the wgaDNA samples were estimated as described [16].

AmpFlSTR® Identifiler®assay

300 pg of dsDNA (both gDNA and wgaDNA, as determined by PicoGreen®) was used as template DNA for AmpFlSTR® Identifiler® assay (Applied Biosystems Inc., Foster City, CA), and scoring of alleles, assignment of Genotype Quality scores and calculation of genotype failure rates were performed as described [16]. Peak height ratio distributions at a signal strength threshold of = 50 RFUs were evaluated for normality and differences between assigned and observed GQ score category distributions evaluated using Wilcoxon's rank sum test and contingency table analyses.

TaqMan®SNP genotype assays

N = 49 TaqMan® (Applied Biosystems Inc., Foster City, CA) genotyping assays from the publicly available SNP500 Cancer Database portfolio [29] were chosen as described [16]. 1.0, 4.0 and 20.0 ng of dsDNA (both gDNA and wgaDNA, as determined by PicoGreen®) was used as template for genotyping using the N = 49 TaqMan® assays. Reaction and cycling conditions, control samples, fluorescence detection and genotype cluster assignment were performed as described [16]. SNP genotype completion, undetermined genotype, no amplification, and discordance rates were calculated, with the wgaDNA discordance rate calculated to be the number of instances in which a wgaDNA SNP genotype differed from the scored gDNA SNP genotype. Differences in rates were evaluated using contingency table analyses.

Data management and analysis

Data was managed using a Sapphire Laboratory Information Management System (LabVantage, New Brunswick, NJ), exported in Microsoft Excel (Redmond, WA) and statistical analyses (descriptive statistics and tests of normality, distribution and correlation) were performed using SAS (Cary, NC) software. Tests of proportion, correlation, etc., are considered significant at a Type I error level of 0.05, with additional information on p values provided if appropriate.