Characterization of degradation and heterozygote balance by simulation of the forensic DNA analysis process
- First Online:
- 1.1k Downloads
Simulation experiments were used to show the impact of varying extraction efficiency, aliquot proportion, and PCR efficiency on the heterozygote balance of a range of diploid and haploid cells. Reducing either parameters introduces variance. It is well-known that the variance in heterozygote balance increases as the amount of DNA is reduced. Surprisingly the distribution is in fact diamond shaped — the variance start to decrease at very low amounts of DNA. Simulations suggest that pristine diluted DNA is an acceptable approximation in validations to infer heterozygote balance. However, the difference in distribution of the variance between diploid and haploid cell types may, under some circumstances, need to be considered in statistical models. Finally, we exemplify how simulations can be used to predict the outcome of PCR for degraded samples. Visualizing the predicted DNA profile as an electropherogram can help to identify the best approach for sample processing.
KeywordsPCR Heterozygote balance Simulation Degradation Validation
The typical forensic DNA analysis process consists of sample recovery, extraction, quantification, amplification, and capillary electrophoresis. Depending on the laboratory’s instrumentation and workflow, there can be additional purification of the DNA extract or PCR product. Forensic laboratories seek to optimize each step of the process to maximize the chance to retrieve DNA evidence. Interpretation of DNA evidence within the likelihood ratio framework has made significant progress in recent years [1, 2] and several software solutions are available [3, 4, 5]. Different models, e.g. qualitative, semi-continuous (without peak height), and continuous (with peak height, stutter, etc), have been implemented. The only assumptions of the qualitative model are the estimated allele frequencies. Usually, unrelatedness between the contributors to the DNA evidence is also assumed, although recently, it has been possible to specify relatedness in some software (e.g. LRmix Studio ). Semi-continuous models include drop-in and drop-out parameters, estimated either from laboratory experiments or from the observed DNA evidence and simulations. Continuous models (with peak height) may require additional parameters to be estimated experimentally, although this is largely circumvented by gamma models  which automatically determine parameters from the crime stain profile itself. Typically a number of known single-source samples are analysed to estimate key characteristics like stutter ratios and heterozygote balance. Each extraction method, STR kit, PCR protocol, and capillary electrophoresis (CE) instrument may result in different estimates of the parameters. To ensure validity, the experiments may need to be repeated if there are significant changes to protocols or equipment (e.g. major service of the CE instrument).
In the laboratory, there is always a loss of DNA during the DNA extraction process [7, 8, 9]. Although there is limited published information on the absolute efficiency, NIST studies indicate an absolute DNA extraction efficiency of 1–37 %.1
The PCR process is not 100 % efficient,  estimated the PCR efficiency to be approximately 82–97 % using real-time PCR. Gill et al.  created a graphical simulation model of the entire DNA process using a binomial selection of molecules, as suggested by , in order to simulate each step of the process. The model was used to predict the behaviour of heterozygote balance and the probability of allelic drop-out for low-template samples.
This paper explores the effect of different parameters (such as PCR efficiency and aliquot) on the heterozygote balance for diploid and haploid cells. Diploid cells contain two copies of each chromosome, while haploid cells (i.e. sperm cells) contain only one copy. Simulation was used to predict the stochastic behaviour of sub-cell (pg) levels of DNA, which was compared to observed experimental data. Finally, implications to casework relative to assumptions used in continuous models are discussed.
Material, methods, and models
Parameters and definitions
Extraction efficiency (exe): The probability that a given DNA molecule survives the DNA extraction process.
PCR aliquot (pcra): A proportion of the DNA extract is transferred to the PCR tube. Therefore, there is a probability that a given DNA molecule will be selected for PCR amplification.
PCR cycles (pcrc): Number of PCR cycles.
PCR efficiency (pcre): During each PCR cycle, there is a probability that a given DNA molecule will be amplified.
Stutter probability (stutterp): During each PCR cycle, there is a probability that a given DNA molecule will be amplified as a stutter one repeat shorter than the allele.
Aliquot to capillary electrophoresis (cea): A proportion of the PCR product is transferred to the capillary electrophoresis injection plate.
Capillary electrophoresis peak height threshold (ceT): The number of fluorescent labelled DNA molecules required to trigger a signal described by the intercept (ceTi), slope (ceTs), and the residual standard error (ceTσ).
Capillary electrophoresis peak height scaling (ceS): Conversion of the number of DNA molecules into relative fluorescent molecules (RFU) described by the intercept (ceSi) and slope (ceSs).
Limit of detection threshold (LDT): Signals above this peak height threshold (RFU) is considered to be reliably caused by actual alleles rather than instrument noise.
Degradation parameter (P(deg)): The probability of degradation per base pair. If a DNA fragment is degraded at one or more bases the amplification of that fragment fails.
Degradation index (DI): The ratio of the low molecular weight target to the high molecular weight target provides a qualitative measure of the degradation.
Simulation was performed using the R package pcrsim2 version 1.0. The package was developed based on the simulation functions in forensim . Both packages are implementations of ‘A graphical simulation model of the entire DNA process’ . In pcrsim the PCR efficiency is assumed to be constant across cycle number, which has previously been demonstrated to be true for the first 10 to 15 cycles [12, 14]. In reality PCR efficiency declines towards the plateau phase mainly because of product inhibition of the DNA polymerase enzyme . However, for STR analysis of low-template samples, the plateau phase is in practice never reached . Hedell et al.  showed that for each increase in number of PCR cycles from 30 to 35, the allele peak height increase was approximately constant, coinciding with ideal amplification. Hence, the application of a constant PCR efficiency per cycle is a realistic approximation. Some published values of the PCR efficiency are 0.82 , 0.85 , and 0.82–0.97 . We will use a PCR efficiency pcre=0.90 to simulate crime stains. Specific simulation parameters are given under the respective simulation experiment. If direct PCR was used, then the extraction efficiency exe=1.00 and the PCR aliquot pcra=1.00 since none of the DNA is lost using this method.
In order to maximize the data collection for simulations with sub-cellular amounts of DNA, the capillary electrophoresis peak height threshold was set to ceT=0 with a peak height scaling of ceS=1. Consequently, there is no drop-out dependant on low peak height, only due to complete absence of template molecules in the PCR reaction.
Serial dilutions vs. crime stains
The extraction efficiency was set to exe=0.30 (the higher end of previously reported values).1
The PCR aliquot was pcra=0.35 (the higher end of commonly used proportions).3
A PCR efficiency of pcre=0.90, approximately in the middle of a previously reported range of 0.82−0.97 , was used. The number of cycles was pcrc=28.
CE aliquot cea=1.0.
CE detection threshold was set to ceTi=14.03744, ceTs=0.82254, ceTσ=0.1319579 based on a previous 3500xL calibration.4
CE peak height scaling was set to: ceSi=−14.38233, ceSs=1.173163 based on a previous 3500xL calibration.4
Limit of detection threshold LDT=200 RFU.
exe=0.30 and pcra=0.35, emulating a realistic process where a relatively efficient extraction method1 is combined with a relatively high aliquot proportion.3
exe=0.30 and pcra=1.00, emulating a relatively efficient extraction method1 combined with PCR of the entire DNA extract.
exe=1.00 and pcra=0.35, emulating single tube extraction combined with a relatively high aliquot proportion.3
exe=1.00 and pcra=1.00, emulating direct PCR.
Heterozygote balance and the ‘diamond’ effect
The extraction efficiency was set to exe=1.00 to mimic one tube Chelex extraction.
The aliquot forwarded to PCR was set to pcra=0.05.
For each simulated sample pcrc=30−35 PCR cycles with PCR efficiency pcre=0.90, approximately in the middle of a previously reported range of 0.82−0.97 , was used.
Stutters were simulated by multinomial selection with stutterp=0.005 .
CE detection threshold was set to ceTi=15.4653, ceTs=0.9044, and ceTσ=0.364 based on a previous 3130xL calibration.4
CE peak height scaling was set to ceSi=−13.66131 and ceSs=1.0047, ceSσ=0.3836 based on a previous 3130xL calibration.4
The limit of detection threshold was set to LDT=50 RFU.
Possible stutter-allele pairs, when the actual partner allele has dropped out, were excluded from the calculations by removing alleles separated with 1 repeat unit. Loci with mean peak heights >10,000 RFU were removed to mimic the saturation threshold of the 3130xL instrument.
See on-line supplement Section C for details of simulation parameters for degraded samples.
Empirical data used by  was kindly provided by the authors. Their experimental set-up was as follows. A dilution series was prepared, by mixing 5 μl whole blood together with 1245 μl of 0.9 % NaCl (commonly referred to as physiological saline). The volume of diluted blood transferred in each subsequent step was at least 400 μl to avoid potential stochastic effects. The use of physiological saline prevented cell lysis; hence, the integrity of complete genomes was conserved. Quantification was performed in triplicate using the Quantifiler®; Human DNA Quantification Kit (Life Technologies). Only three out of twelve samples produced results within the range of the standard curve. Two were negative and the remaining were extrapolated from the standard curve. Therefore  estimated the concentrations for the three lower concentrations based on the sample with the highest concentration. See reference  for further details on the experimental set-up. The actual quantification results are reproduced in Table 4 (online supplement Section A).
Anonymous DNA extracts from nine presumably degraded tissue samples were used. The extraction method was the BioRobot EZ1 (Qiagen) using the EZ1 DNA Tissue Kit (Qiagen) according to manufacturers recommendations. The extracts had been stored for about two years in a freezer prior to quantification and analysis. The Quantifiler®; Trio DNA Quantification Kit (Applied Biosystems), with an 80 bp small autosomal target and 214 bp large autosomal target, and PowerQuantTM System (Promega), with an 84 bp small autosomal target and 294 bp large autosomal target, were used for quantification. Both kits confirmed that the tissue samples were degraded to different degrees (degradation index in Table 2). The PowerPlex®; ESX 17 Fast System (Promega) was used for STR amplification.
Analysis of data
Results and discussion
The effect of PCR efficiency
Theoretical simulations to explore the effect of PCR efficiency on the heterozygote balance were performed (Fig. 1). Direct PCR was simulated at three efficiencies; pcre=0.20, pcre=0.80, and pcre=1.00. Low PCR efficiency may be caused by inhibition (refer to the exhaustive review by  and  for details on mechanisms and solutions to overcome inhibition).
Simulations show increased heterozygote imbalance as the template DNA is reduced from optimal amounts (usually 0.5 to 1 ng). This has been shown in numerous publications, e.g. . Conversely, increased template decreases the heterozygote imbalance until a minimum is reached. Adding more template beyond this point will not improve the balance further. Increased PCR efficiency also reduces the imbalance. Both alleles for diploid cells are perfectly balanced when the pcre=1.00 (Fig. 1). However, this is not true for haploid cells .
If allelic copies are randomly drawn from a pool of haploid alleles that comprises equal number of (a, b) alleles at a heterozygous locus, this leads to a discrete distribution of possible ratios. For example, consider a DNA extract with four haploid genome copies with alleles a and b. There are only three possible copy number ratios that can be randomly drawn for a heterozygous (ab) locus: 1/3, i.e. one a and three bs, 2/2, and 3/1, with probabilities of 0.25, 0.375, and 0.25, respectively. A ratio 0/4 and 4/0, each with a probability of 0.0625, is also possible but will give rise to +- infinity when log10 is taken (for these combinations, alleles a and b, respectively, have dropped out, giving the appearance of a homozygote). This is further elaborated in the online supplement, Section B. For a mathematical model, simulations, and risk assessment of false homozygotes for diploid cells refer to . The discrete or multi-modal nature of haploid peak height ratios is clearly visible at pcre=1.00 with a small number of cells. As the PCR efficiency is reduced the multi-modality is smoothed as previously shown by . As the number of haploid cells increases, the imbalance reaches a maximum at approximately 8–16 haploid cells. We call this the ‘diamond’ effect. The diamond effect is clearly visible at pcre=1.00 and, to a lesser extent, at pcre=0.80 (Fig. 1). The distribution for diploid cells has a funnel shape. This is also true for haploid cells at pcre=0.20 (Fig. 1 left facet). With >16 cells and pcre=1.00, the number of discrete possibilities becomes so tightly packed that the distribution can be considered continuous. Furthermore, the distribution for haploid cells converges towards the diploid distribution. A general threshold that is used by laboratories to denote a balanced locus is 0.6<Hb<1.67  (using Eq. 1).
Theoretical probabilities of heterozygote balance within the accepted range (0.60≤Hb≤1.67) and in perfect balance of different numbers of haploid cells as modelled by the Poisson distribution.
The effect of low PCR efficiency is most noticeable for diploid cells, as the distribution of peak height ratios approaches the distribution for haploid cells. However, the distributions of heterozygote balance for diploid and haploid cells never converge completely. Not even when the PCR efficiency is reduced to pcre=0.20 (Fig. 1).
The effect of PCR aliquot
The effect of extraction efficiency
The DNA extraction process contributes to the pre-PCR random sampling of alleles. Loss of DNA during the extraction process is unavoidable. The loss can be caused by transfer steps, incomplete cell lysis , incomplete cell elution , or other reasons mentioned in . For one-tube extraction methods like Chelex , there is no loss of DNA due to transfer steps. For simplicity, in this paper we assumed that pure samples with small number of cells have an extraction efficiency of 100 %. Hence, the PCR aliquot proportion will be the only source of pre-PCR allele sampling. To explore the effect of extraction efficiency on Hb, simulations were performed using exe=0.30, exe=0.60, and exe=1.00 (Fig. 3). At exe=1.00 (i.e. direct PCR) all alleles from diploid cells are in perfect balance, while alleles from haploid cells form discrete distributions. At exe=0.30, the diploid and haploid Hb distributions are roughly equal. However, at exe=0.60, the difference between diploid and haploid cells are quite large implying that cell type has an effect on Hb at high extraction efficiencies. As with changes in PCR efficiency (Fig. 1), it is observed that as the extraction efficiency decreases, the diamond shape widens at the lower end to become more funnel shaped (Fig. 3).
Very low amounts of DNA lead to reduced heterozygote imbalance
Previous authors have determined that the variance for heterozygote imbalance increases as the amount of DNA decreases [18, 23, 32, 33, 34, 35]. We have shown that this is only partially true. In fact, the reverse happens when the DNA concentration reaches a lower threshold. The theoretical reasoning and independent simulations to verify this is elaborated in the online supplement, Section B. The reason that it has not been previously noted is that the experimental design at very low levels of DNA is very difficult to accommodate. This is where simulation methods not only complement experiments, but can be used to inform experimental design by providing information about predicted behaviour.
Serial dilutions vs. crime stains
For convenience, many laboratory experiments and validations are carried out using highly concentrated stock solutions of extracted DNA which is diluted in several steps to the desired target concentrations . Then, the laboratory applies the measured characteristics (Hb, stutter, etc.) to crime stains that are run routinely. However, dilution experiments do not strictly emulate the conditions in crime stains . The purpose of the following simulations was to determine whether dilution experiments could be used instead of a much more complex experimental design that carries out assessments according to cell type while varying the number of cells per stain. We simulated serial dilutions according to  and compared them to simulated diploid and haploid crime stain samples (see ‘3Serial dilutions vs. 3crime stains’).
The difference between serial dilutions and crime stain samples, amplified using non-direct PCR methods to infer Hb distributions is relatively small. This suggests that the use of serial dilutions is a reasonable approximation, which was also concluded in . The exceptions are methods where both the extraction efficiency and the aliquot proportion are high, e.g. direct PCR, and the cell type is diploid. The ‘diamond’ effect is observed in the simulated data and suggests that the Hb variance starts to decrease below two diploid, or four haploid, cell equivalents of DNA.
Compromised crime stains
Degraded DNA is a common complication with forensic samples. Environmental factors such as humidity, bacteria, and ultraviolet light break down the DNA . Longer DNA fragments are affected more than shorter DNA fragments causing increased imbalance  (see ‘3The effect of 3degradation’). The degradation can be modelled by an exponential curve . Given two measurements of the DNA concentration in a single sample, using qPCR targets of different lengths, the probability of degradation per base pair P(deg) can be calculated (Eq. 8). Inhibition of the Taq polymerase reduces PCR efficiency and increases the imbalance, the effect is greatest on high molecular weight fragments [40, 41]. Therefore the effect of inhibition is the same as degradation. Consequently, both inhibition and degradation can be modelled using the PCR efficiency and degradation parameters together. There are also other modes of inhibition e.g. DNA sequence specific inhibition, which are currently not modelled in pcrsim.
The effect of degradation
Concentrations (ng/μl), degradation index (DI), and estimated degradation parameter (P(deg)) for nine degraded tissue samples based on quantification by Quantifiler®; Trio DNA Quantification Kit (QT) and PowerQuantTM System (PQ). Target amplicon size in base pairs is 80 and 84 for the small targets, and 214 and 294 for the large targets. A degradation index DI is calculated by dividing the small autosomal target DNA concentration by the large autosomal target DNA concentration (Eq. 2)
Degradation is a consequence of random DNA cleavage (refer to the exhaustive review by  for details on mechanisms and consequences of degradation). Consider a fragment of DNA that is x bases long. It makes no difference whether 1 or more cleavages occur within the fragment of interest, as the fragment will fail to amplify no matter where the DNA was cleaved. As a result the fragment will not be visualized.
Usually, there are multiple copies of DNA. Degradation can be related to drop-out. Allele drop-out occurs either because no copies are amplified (i.e. no molecules present in the PCR reaction), or because the fluorescence signal fails to reach the threshold value of the CCD detector of the capillary electrophoresis machine (i.e. insufficient number of molecules present in the PCR reaction).
Modern human real-time DNA quantification kits (e.g. Quantifiler®; Trio DNA Quantification Kit and PowerQuantTM System) often come with the ability to measure the degree of degradation for each sample. This is accomplished by adding a second longer target to measure the total human DNA. Usually a 200–300 base pair fragment (x2) is generated from the longer target, while the shorter generates a 70–150 base pair fragment (x1).
This results in a population of intact fragments that can be amplified, but with this particular example, where P(!drop)=0.05, there are between 1 and 19 undegraded copies derived from 1 ng. A DNA fragment will only be visualized if there are sufficient molecules present to trigger the capillary electrophoresis machine’s CCD camera. For 28 cycles, approximately 30 haploid copies (ca 90 pg) are required before sufficient PCR product is available to trigger a signal , whereas for 34 cycles, just one molecule (ca 3 pg in a haploid cell) is needed to produce sufficient signal .
Therefore optimization of systems when degraded DNA is analysed, cannot be considered without a concurrent consideration of the effect of PCR cycle number.
Heterozygote balance and the ‘diamond effect’
Implications for casework
Maximizing the chance to obtain a complete profile
Theoretical number of cells (rounded up) required in the DNA extract, before an aliquot (of 5, 35, and 100 %) is taken, to obtain a certain average amount (pg) in the PCR reaction.
Inhibiting substances increase heterozygote imbalance
 used LCM to collect 15 to 150 FISH labelled diploid cells for direct PCR, using 28 cycles and the Identifiler®; PCR Amplification Kit, and investigated the heterozygote balance. Although one-tube extraction and direct PCR (i.e. exe = pcra=1.00) should minimise stochastic effects, it was concluded that there was no improvement9 in peak height balance compared to single-source crime scene samples analysed in a study conducted by . Further comparison with ‘Christmas Tree’ stained cells indicated that the FISH process has a negative impact on peak balance.
Haploid versus diploid cells
It has long been established that the variance in heterozygote balance increases as the amount of DNA is reduced. Using simulations, we have shown that the distribution is in fact diamond shaped. As the amount of DNA decreases, the variance increases until a maximum is reached. The variance starts to decrease at very low amounts of DNA (50 pg or less, depending on PCR efficiency, aliquot proportion, and extraction efficiency) and the distributions become multi-modal rather than continuous. This was also confirmed by experimental data. In theory, under optimal conditions, the alleles in amplified diploid cells will be in perfect balance. However the extraction process, aliquot proportion, and amplification efficiency introduces variance. Direct PCR is preferred for optimal allele balance and sensitivity and has been successfully implemented for certain casework samples [51, 52]. Simulations show that for direct PCR, haploid and diploid cells have different heterozygote balance distributions. This may need to be accounted for in some statistical models that are used to evaluate DNA evidence. However, direct PCR is not widely implemented (and may not always be suitable). With realistic extraction efficiencies and aliquot proportions the difference between Hb variances is negligible. Consequently, diploid cells can be used in validation to determine characteristics of Hb also for haploid cells. Simulations also suggest that diluted DNA extracts, which are commonly used in validations exercises are an acceptable approximation to crime stain samples (provided that care is taken to use large volumes) except for direct PCR methods or very low levels of DNA. Our results suggest that simulations of crime stains are preferred over dilutions when the average amount of DNA in the PCR reaction approaches sub-cellular amounts.
We have exemplified that the number of PCR cycles is a key factor to consider when degraded DNA is analysed. If the probability of degradation per base pair is used as a metric, rather than degradation indexes, the measure becomes kit independent. With knowledge of the degradation parameter the resulting characteristics of the DNA profiles can be predicted by simulation.
“Evaluation of DNA Extraction Efficiency” presented by Erica L.R. Butts, 65th American Academy of Forensic Sciences (February 2013)
http://cran.r-project.org/web/packages/pcrsim/index.html, accessed 08.04.2016
Personal communication with 5 European laboratories.
The metric used was the average peak height where the central 0.95 quantile crosses the 60 % threshold.
The work leading to these results has received funding from the European Union seventh Framework Programme (FP7/2007-2013) under Grant Agreement No. 285487 (EUROFORGEN-NoE). The authors thank two anonymous referees that greatly improved this manuscript. We thank Ronny Hedell and Charlotte Dufva at the Biology Section (Swedish National Forensic Centre, NFC) for access to the raw data analysed in their paper . We are also grateful to our colleagues Ingebjørg Heitmann for performing all the quantification work in the laboratory, and Thore Egeland and Øyvind Bleka for deriving the formulae to calculate the degradation parameter. Special thanks to Johannes Hedman (NFC and Applied Microbiology, Lund University) for valuable discussions and input to this paper.