Exploring statistical weight estimates for mitochondrial DNA matches involving heteroplasmy


Massively parallel sequencing (MPS) of mitochondrial (mt) DNA allows forensic laboratories to report heteroplasmy on a routine basis. Statistical approaches will be needed to determine the relative frequency of observing an mtDNA haplotype when including the presence of a heteroplasmic site. Here, we examined 1301 control region (CR) sequences, collected from individuals in four major population groups (European, African, Asian, and Latino), and covering 24 geographically distributed haplogroups, to assess the rates of point heteroplasmy (PHP) on an individual and nucleotide position (np) basis. With a minor allele frequency (MAF) threshold of 2%, the data was similar across population groups, with an overall PHP rate of 37.7%, and the majority of heteroplasmic individuals (77.3%) having only one site of heteroplasmy. The majority (75.2%) of identified PHPs had an MAF of 2–10%, and were observed at 12.6% of the nps across the CR. Both the broad and phylogenetic testing suggested that in many cases the low number of observations of heteroplasmy at any one np results in a lack of statistical association. The posterior frequency estimates, which skew conservative to a degree depending on the sample size in a given haplogroup, had a mean of 0.152 (SD 0.134) and ranged from 0.031 to 0.83. As expected, posterior frequency estimates decreased in accordance with 1/n as the sample size (n) increased. This provides a proposed conservative statistical framework for assessing haplotype/heteroplasmy matches when applying an MPS technique in forensic cases and will allow for continual refinement as more data is generated, both within the CR and across the mitochondrial genome.

Data availability

A portion (n = 731) of the MPS data files associated with this study are available on EMPOP (accession number EMP00747). The remaining MPS data files cannot be made available as consent to do so was not obtained.

This study was approved by The Pennsylvania State University internal review board (IRB) protocol STUDY00000970 and #HRB-588.


Download references


We gratefully acknowledge Mark Shriver and Corey Liebowitz for collection of saliva samples (The Pennsylvania State University ADAPT2 study; IRB#45727). The authors would also like to thank Molly Rathbun and Emmy Demchak for assisting in Illumina processing, Troy Adams for logistical support, Nicole Huber for assisting in quality control and EMPOP, Arne Dür for haplogroup estimation using a refined set of PhyloTree motifs and alignment algorithm, and Charla Marshall and Kim Sturk-Andreaggi for discussion and guidance on NUMT evaluation of our dataset.


This study was funded by the National Institute of Justice—NIJ Grant Numbers 2014-DN-BX-K022 and 2016-DN-BX-0171.

