Background

Studies of genetic influence in complex disorders usually require extensive genome explorations of large cohorts. A major bottleneck, however, is access to DNA from well-characterised patients and healthy controls. This may be circumvented by use of archived residual blood samples from newborn screening programs, which in several countries engage the entire population. The blood is usually collected by heel-prick and applied on special filter paper, a proven robust and convenient medium for transport and storage [1]. Storage policies on residual neonatal dried blood spot samples (DBSS) vary internationally, but several countries store residuals in repositories for later research purposes [28]. Stored DBSS combined with relevant clinical information from medical registries thus constitute an ideal resource for large studies. This set-up enjoys the advantage of representing the entire population under a certain age and of avoiding practically any kind of selection. In addition substantial costs may be saved.

The Danish Neonatal Screening Biobank (DNSB) contains nearly 2 million DBSS from virtually every Dane born after 1982. It has recently been updated to meet the new general guidelines for the establishment and operation of biobanks [9]. Access to samples for research requires approval from the Scientific Ethical Committee System, the Data Protection Agency and the DNSB Steering Committee. In Denmark, all citizens have a unique person-identifying number that is used across all public registration systems, including the DNSB. Denmark also operates a well-established public health care system offering treatment to all citizens. Together this makes it possible to study the "entire country as a cohort" and makes the DNSB an ideal resource for studying common and complex genetic diseases in Caucasians [10]. The major challenge using the DBSS for such studies is however the small amount of blood available. In theory, the amount of genomic DNA (gDNA) that can be extracted from a 3.2 mm punch of a DBSS is about 60 ng [11]. In general, only one or two 3.2 mm punches per DBSS can be reserved for each project, which is scarcely enough to genotype multiple single nucleotide polymorphisms (SNP). This problem may be overcome by whole-genome amplification (WGA) of the gDNA. Previous studies have used whole-genome amplified DNA (wgaDNA) for genotyping, and with fair success, but in most cases the number of polymorphisms that can be tested has been limited [1118].

In this study we investigate if a proper combination of DNA-extraction and WGA procedure can produce wgaDNA samples suitable for 610,000 SNP genome-wide scan using the Illumina Infinium HD Human610-Quad BeadChip. Neonatal DBSS stored for 15 to 25 years in the DNSB are employed, and as reference is used high-quality gDNA samples recently obtained from the same individuals. Two different WGA methods are tested. The multi-displacement amplification (MDA) method (the REPLI-g kit) that produces relatively long wgaDNA fragments > 10 kb [19], and the OmniPlex method (the GPlex2 and the GPlex4 kits) that produces fragments approximately 500 bp long [20]. We also test the effect of using either one or three 3.2 mm disks and of extracting proteins from the disks before the DNA-extraction. Finally, the robustness of the selected approaches was evaluated.

Methods

Subjects

The investigation comprised 24 subjects born between 1982 and 1992, who all had their residual neonatal DBSS stored at -24°C in the DNSB. Four subjects were informed volunteers and 20 were from a genetic study on schizophrenia (ethical approval number: 20020020; data protection agency number: 2002-41-2059).

DNA-extraction and WGA methods

Reference gDNA was purified from venous blood samples from the four volunteers and from four subjects from the schizophrenia study using the Maxwell 16 automatic system and the Maxwell® 16 Blood DNA Purification Kit (Promega). Neonatal DBSS from the eight participants were retrieved from the DNSB, and DNA was extracted from one or three DBSS disks, 3.2 mm in diameter, using Extract-N-Amp Blood PCR Kit (ENA)(extraction volume: 200 μL) (Sigma-Aldrich) or QIAamp DNA Blood Micro Kit (QIA)(extraction volume: 75 μL) (Qiagen). The DNA extracts were amplified using the REPLI-g kit (Qiagen), GenomePlex® Complete WGA Kit (GPlex2, Sigma-Aldrich), or GenomePlex® Single Cell Whole Genome Amplification Kit (GPlex4, Sigma-Aldrich). All procedures were performed according to the manufacturer's instructions. Prior to DNA-extraction, a subset of disks was extracted for proteins as described by Skogstrand et al. 2005 [21]. Please consult Additional file 1 for set up. Furthermore, two DBSS disks from 16 other subjects were extracted for proteins before DNA-extraction using the ENA kit, and the DNA extracts were amplified using the REPLI-g and the GPlex4 kits. DNA was quantified using Quant-iT™ PicoGreen® dsDNA Reagent (Molecular Probes, Invitrogen).

Genotyping

The gDNA and wgaDNA samples were marked on the Illumina Infinium HD Human610-Quad BeadChip (Illumina) according to the manufacturer's instructions, with the exception that 240 ng of wgaDNA starting material was used instead of the prescribed 200 ng. The BeadChips were scanned using the BeadStation 500GX (Illumina) with a high-density upgrade and an AutoLoader (Illumina). The BeadStudio v.3 software (Illumina Corp.) was used for calculating call- and conflict-rates. In the first part of the study all calls were made using the reference Human610-Quadv1B cluster file from Illumina that is constructed from gDNA. In the second part of the study two cluster files, each constructed from 16 wgaDNA preparations made by the REPLI-g and GPlex4 kits (tailored cluster files specific for WGA method), were also used to analyse the respective samples. Conflict-rates were estimated comparing the wgaDNA samples to their respective reference gDNA samples. The percentage of conflicts introduced due to an allelic dropout (eg. AB to AA) was estimated by re-coding the Illumina data to variables allowing comparison using STATA v.9.0.

Results

The genotyping performance of the different wgaDNA preparations is seen in Additional file 1. The ENA DNA-extraction combined with the REPLI-g kit WGA featured the highest call-rates (99.30–99.51%) and the lowest conflict-rates (0.02–0.03%).

Combining REPLI-g WGA with QIA DNA-extraction was less successful and the results were highly variable. The genotyping performance using wgaDNA made by the two OmniPlex method kits, GPlex2 and GPlex4, was independent of the DNA-extraction method, with GPlex4 showing consistently higher call rates than GPlex2 (Wilcoxon's paired test, p < 0.001). The reference gDNA call-rates were 99.8–99.9%. Almost all conflicts between results from the wgaDNA preparations and the reference gDNA were due to an allelic dropout (data not shown). Notably, extraction and amplification procedures that produced high call-rates displayed low conflict-rates with the reference gDNA and vice versa, which indicates that genome-wide scans on wgaDNA are reliable when the call-rates are high [Additional file 1]. This was partially confirmed by calculating the correlation coefficients between the call- and conflict-rates of the three WGA kits [Additional file 1]. It made no significant difference whether one or three DBSS disks were used for extraction. No systematic differences in genotyping performance were related to sample age.

Based on the results displayed in Additional file 1, the combinations of DNA extraction by the ENA kit and WGA by the REPLI-g and GPlex4 kits were selected for further evaluation. For this, 16 new subjects were employed. After DNA-extraction, WGA and subsequent genome-wide scans (GWS), the results were analysed using both a standard Human610-Quadv1B Cluster, provided by Illumina, and WGA kit specific tailored cluster files. The rationale for the tailored cluster files is demonstrated in Figure 1. Generally, the wgaDNA samples cluster nicely, but not always in the area defined by the Illumina Human610-Quadv1B cluster file, which is based on gDNA samples. By creating tailored WGA-specific cluster files and using these for analysis, the genotype call-rates of both set-ups (REPLI-g and GPlex4) improved significantly (Wilcoxon paired test, p < 0.001) as seen in Table 1. Comparison of the call-rates indicated that the REPLI-g samples had a significantly higher call-rate than the GPlex4 samples (Wilcoxon's paired test, p < 0.001). Comparison of the amount of wgaDNA amplified by each kit revealed no significant difference (Wilcoxon's paired test, p > 0.050).

Figure 1
figure 1

Plot of the normalized values measure for the A allele and B allele. The same 16 samples were amplified using the GPlex4 and the REPLI-g WGA kits. The "Illumina cluster" plot shows how the GPlex4 (blue dots) and the REPLI-g (green dots) wgaDNA genotypes compare to the Illumina cluster file. The "GPlex4 cluster" plot shows how a custom-made cluster file based on GPlex4 samples (blue dots) improves both fit and call-rate of the loci. The "REPLI-g cluster" plot shows how a custom-made cluster file based on REPLI-g samples (green dots) improves both fit and call-rate of the loci.

Table 1 Robustness of the GPlex4 and REPLI-g WGA kits.

Discussion

We demonstrate that wgaDNA, made from 3.2 mm disks of DBSS that have been stored at -24°C for more than 20 years is well suited for reliable genotyping of 610,000 SNPs, with call-rates comparable to those obtained using gDNA. The accuracy of genotype calls using wgaDNA from stored DBSS has been of some concern. The issue has been addressed several times, using both low and medium throughput genotyping platforms, and overall with good success [11, 12, 1418]. In this study we took the usage of DBSS one step further by conducting GWS. Moreover the accuracy of genotype calls from wgaDNA was assessed by comparing the results with results from high-quality reference gDNA from the same individuals. Initially, we tested two commercial DNA-extraction procedures, three WGA procedures, the effect of number of 3.2 mm disks used, and the effect of protein extraction prior to the gDNA extraction. The efficiency and reliability of the GWS were highly dependent on the employed DNA-extraction and WGA method. Interestingly, call- and conflict-rates were inversely related; indicating that genome scan of wgaDNA is highly reliable when the call-rates are close to 100%. However because only few samples were available to calculate the correlation coefficient, we cannot clearly define a cutoff for the call-rate that would ensure reliable genotyping. In general, the OmniPlex method performed more constantly than the MDA method, producing fairly the same call-and conflict-rates independently of the other variables tested. Of the two OmniPlex based kits the GPlex4 kit performed the best, showing high call-rates and low error-rates. The MDA method performed excellent using the ENA extraction kit and poorly when using the QIA extraction kit. In general, it appeared unimportant whether one or three DBSS disks were used for extraction. This was surprising since the amount of input gDNA for the WGA reactions is supposed to be critical, and in our set-up it was often below the lower limit of 10 ng that is recommended by the manufacturer. Moreover, the preceding protein extraction ofthe disks did not impair the genotyping of the produced wgaDNA, which is in accordance with similar observations from our laboratory [17].

Because the investigation focuses ondifferent combinations of wgaDNA preparation, it suffers from the weakness that the number of samples in each group is limited. In addition, only samples from the DNSB were used.

The combination of the ENA DNA-extraction with either the REPLI-g or the Gplex4 WGA kit were selected to see if the procedures were robust enough for GWA studies employing numerous samples. Both set-ups produced wgaDNA from 16 DBSS stored for 15 to 25 years that performed well with constant high call-rates. Corresponding reference gDNA samples were not available. Notably, when calling genotypes of wgaDNA preparations with the BeadStudio software, albeit clusters were nice and tight for some loci they did not fit well into the standard cluster positions. This is because the BeadStudio software calls the genotypes of a given locus by comparing the observed values with the expected values, defined by the Human610-Quadv1B Cluster file, which is based on gDNA samples [22, 23]. In such cases, data fit and call-rates can be improved by adjusting the cluster positions to match the observed data [23]. Cluster files tailored for the OmniPlex and MDA method were hence created from the samples available, and the call-rates were significantly improved for both wgaDNA preparations. They were in fact comparable to call-rates obtained using high-quality gDNA, indicating that the approach is robust.

Eighteen WGA reactions, each producing ~5 μg of wgaDNA, can be made per ENA DNA extraction. As the Illumina Infinium HD Human610-Quad BeadChip uses 240 ng of wgaDNA, one WGA reaction is enough to run 20 chips. Thus one to three 3.2 mm disks from a DBSS are sufficient to make repeated GWS as well as fine-mapping genotyping, if required. We have briefly tested the performance of the two wgaDNA preparations on the Affymetrix platform and found that wgaDNA produced by the OmniPlex method was unsuitable, whereas wgaDNA produced by the MDA method gave results comparable to those obtained by the Illumina platform. In addition to being used for GWS, DBSS can also be used for multiplex protein measurements [21], quantitative RNA micro arrays detecting up to 3000 genes [24], and quantitative DNA methylation analysis [25].

Conclusion

The results demonstrate that residual DBSS from neonatal screening that have been stored for several years in biobanks can be used for GWS and hence for large genome-wide association studies. Using DBSS instead of collecting new samples may, in a cost-efficient way, reveal important correlations between genotypes, environment and human diseases. Both the OmniPlex and the MDA method performed excellently in combination with the ENA extraction, and we recommend to test which of the two WGA procedures is most suitable for a given task.