Introduction

Isolated populations are present in nearly every plant or animal species. For example, among humans, the fossil genetic record provides well-documented evidence of a longstanding history of isolation, subpopulation formation, and interbreeding [1]. While the general population of a particular area may identify with a certain nationality, the true ancestry of the population and its historical origins can often differ significantly. The formation of a genetic isolate is a gradual process influenced by various factors. This process often involves geographical separation, commonly referred to as isolation by distance, where the distance between locations plays a significant role in determining the genetic differences among populations [2, 3]. Certain geographical barriers have been identified as influential obstacles affecting the movement of populations [4, 5], and different types of genetic markers, such as the Y chromosome and mitochondrial DNA (mtDNA), may be affected in distinct ways by these barriers [6, 7]. In addition to geographic barriers, other researchers have observed language barriers result in genetic isolates [8,9,10], although some linguists remain cautious in interpreting the correlation between the two (see review in [11]). Additionally, social and cultural factors, including ethnicity and religion [12], often form the basis of population subdivisions, contributing to their establishment and reinforcement.

Isolated populations often exhibit a reduction in genetic diversity due to key evolutionary processes and factors, including founder effect, where the new population only carries the genetic variation present in the founding individuals; genetic drift, where some alleles may become more common while others disappear entirely, not due to selective advantages or disadvantages, but purely due to chance; bottleneck effect, where a population undergoes a significant reduction in size resulting in limited genetic diversity; consanguinity leading to the increase in homozygous alleles; lack of gene flow resulting in the decline of genetic diversity over time; and selective pressures favouring certain alleles [13,14,15]. It is likely that genetic drift is the most significant evolutionary mechanism influencing the genetic diversity of isolated populations, and hence, the variability of forensic genetic markers [16,17,18]. It is important to consider the potential presence of genetic isolates in forensic casework, as it carries inherent risks [16, 19]. Failing to account for and correct these risks can introduce significant bias into the results of forensic analysis, ultimately reducing the effectiveness of the tests. For example, forensic DNA analysis often depends on estimating the probability that a randomly selected person's DNA would match the evidence sample. These probabilities are calculated based on allele frequencies in the general population. If an individual belongs to a genetic isolate, their allele frequencies may differ from the broader population, and ignoring this can lead to overestimating or underestimating the significance of a DNA match. Ignoring the presence of a genetic isolate can also result in an increased rate of false positives or negatives. This is especially true if the two samples belong to the same genetic isolate, but the forensic analysis uses allele frequencies from the general population. Genetic isolates often have a higher degree of relatedness among members. Failing to account for this can result in misinterpreting DNA evidence, especially when it comes to determining familial relationships. Additionally, if the unique genetic characteristics of certain populations are not properly considered, it can raise ethical concerns about fairness and justice. There might be allegations of systematic bias against specific groups, especially if those groups are already marginalised or face discrimination. To mitigate these risks, forensic analysts should be aware of the presence of genetic isolates and ensure that reference databases used for analysis are representative of the population from which a suspect is drawn. Moreover, it's important to continuously update databases and methods in response to new research and findings about genetic isolates and their impact on DNA analysis.

Mitochondrial DNA for Human Identification

The mitochondrial genome (mtGenome) is a small double stranded haploid genome consisting of approximately 16,569 base pairs [20]. The two strands of the molecule have different densities depending on their guanine content, and the entire genome is split into two sections: a large coding region and a smaller control region. The coding region encodes 37 genes [20], and there are only 91 bases across the coding region that are not involved in gene production; these represent short non-coding intergenic spacers and are thought to represent the most variable region of the mtGenome outside of the control region [21]. The control region does not code for genes, however it does contain the origin of replication for the heavy strand, where a D-loop is found [22]. Unlike the binary nature of nuclear DNA, where each cell contains a singular version of a gene, mtDNA can house both the original and mutated sequences [23]. This coexistence of multiple mtDNA variants within a single cell or organism is known as heteroplasmy [24, 25] and arises from the multiple copies of mtDNA present in each cell. For human identification, particularly forensic contexts, heteroplasmy introduces an element of ambiguity to data analysis and interpretation. While a person may predominantly exhibit one mtDNA sequence, the presence of a minor, alternate sequence due to heteroplasmy can lead to potential challenges in precise matching of samples. This layered genetic landscape necessitates advanced techniques and meticulous data interpretation to ensure accurate human identification.

Sequences from the control region are highly variable and thus, have become an essential source of information regarding the genetic structure of Homo sapiens [26]. The control region contains the polymorphism-rich hypervariable regions, HVI, HVII and HVIII [27, 28], which is where the majority of the sequence variation between individuals is located. Traditionally, sequencing of these regions has been conducted through Sanger sequencing, however while accurate for sequencing specific genes and/or regions, this laborious and costly technology meant that sequencing beyond the control region was not feasible [29]. As a consequence, the coding region of the mtGenome has been less studied than the control region [29, 30]. The introduction of NGS technologies revolutionised mtGenome research, offering unparalleled depth and precision in examining mtDNA, enabling researchers to rapidly sequence the entire mtGenome. In research settings, NGS facilitates a comprehensive understanding of human evolution, population dynamics, and migration patterns by capturing subtle mtDNA variations [31]. Clinically, it aids in diagnosing mitochondrial disorders [32], tracking disease-associated mutations [33], and offering insights into potential therapeutic avenues. Forensically, NGS enhances the reliability of mtDNA evidence, enabling the identification of individuals with greater accuracy by allowing the detection of low-level heteroplasmy [34], even when working with minute or degraded samples [30, 35, 36]. This technology, with its ability to rapidly sequence vast amounts of DNA, ensures that the full potential of mtDNA is harnessed across diverse applications.

Lineage markers such as mtDNA may be used in forensic casework to increase the likelihood ratio provided by short tandem repeats (STRs), or they may be the only informative markers for casework involving degraded biological samples, multigenerational comparisons, or casework where there are no paternal family reference samples for Y-STR analysis [37, 38]. Due to the absence of mtDNA repair mechanisms and the low fidelity of mtDNA polymerase, the mtGenome has a significantly higher mutation rate compared to the nuclear genome. For instance, our group estimated the mutation rate of the entire mtGenome to be 5.8 × 10–8 mutations/site/year [39], whereas the nuclear genome has a rate of 5.0 × 10−10 mutations/site/year [40]. Unlike nuclear DNA, mtDNA is inherited maternally, meaning that excluding de novo mutations, the mtDNA sequence of siblings and all maternal relatives is identical [41,42,43,44,45]. The lack of recombination means that known maternal relatives can provide reference samples for a direct comparison to the mtDNA haplotype, and distant maternal relatives can be used as reference samples [41,42,43, 46,47,48]. Certain features of mtDNA make it particularly suitable for the identification of historical remains, or where the sample has been exposed to harsh environmental conditions, such as extreme heat or cold. It is estimated that mitochondrion contain up to 10 copies of mtDNA and each mammalian somatic cell contains up to 2000 mitochondria [49,50,51,52,53,54]. For this reason, mtDNA has been successfully amplified from compromised DNA samples, where nuclear DNA testing was found to be unsuccessful [55].

Despite these characteristics, mtDNA analysis is not without limitations. A match using lineage markers does not provide the same level of discrimination power (or sensitivity) that standard nuclear DNA profiling provides [38, 44] because the profile is often not unique and may be shared by other, unrelated individuals [45]. For this reason, lineage markers may be more suitable for cases where the victims are expected to be unrelated [44], or for cases involving individuals from a genetically diverse population. Hence, mtDNA for human identification in isolated populations presents unique challenges that must be addressed to ensure accurate and reliable results.

Mitochondrial DNA Analysis in Isolated Populations

Population History and Social Factors

Work with isolated populations often necessitates careful consideration of the population history and social factors; it is crucial to account for the unique historical, social, and cultural contexts that have shaped their genetic makeup and understand the potential implications of mtDNA analysis on these communities. Isolated populations are often characterised by restricted gene flow with other groups, leading to a higher degree of genetic differentiation from other populations [56] and the potential for distinct mtDNA lineages to emerge. Historical factors such as colonisation, migration, isolation, and cultural practices specific to the population under study can significantly impact the genetic diversity [57, 58]. Understanding the population history of such groups is vital as it can shed light on past migration patterns, population bottlenecks, founder effects, and other demographic events that may have influenced the genetic diversity observed today.

When identifying individuals based on mtDNA analysis, it is important to recognise that these findings may have broader social and cultural implications for isolated populations. Genetic information, including ancestry, relatedness, and potential links to specific geographic regions, can influence individuals' self-perception, group identity, and cultural practices [59,60,61,62]. The knowledge of one's genetic heritage can have profound effects on a person's sense of belonging, cultural identity, and relationships within their community. It can also affect the perception of kinship, marriage patterns, and social hierarchies within the population [59,60,61,62,63,64]. Furthermore, the results of mtDNA analysis may intersect with ongoing social and political issues, such as indigenous rights, land claims, and cultural preservation [65]. Care must be taken to ensure that the interpretation and dissemination of genetic information are conducted ethically and responsibly, respecting the autonomy, privacy, and well-being of the individuals and communities involved. Prior informed consent, collaboration with local stakeholders, and adherence to ethical guidelines and legal frameworks may be necessary to mitigate any potential harm that could arise from the use and interpretation of mtDNA analysis results.

Homogeneity of Mitochondrial DNA

Small population size, founder effect and genetic drift are all factors that lead to the homogeneity of mtDNA [66, 67], which can impact its reliability and suitability for human identification in isolated populations. Founder effect occurs when a small group of individuals found a new population, which can lead to a loss of genetic diversity due to the limited number of individuals contributing to the gene pool [68]. This reduction can lead to the fixation of certain mtDNA haplotypes in the population, making it challenging to distinguish between individuals based solely on their mtDNA profiles. Genetic drift refers to the random fluctuations in the frequency of genetic variants within a population, which can occur due to chance events such as natural disasters, migrations, or founder effects [69, 70]. When a small population becomes isolated, the genetic makeup of that population can become increasingly distinct from that of other populations. Over time, genetic drift can cause the frequency of specific genetic variants to increase or decrease, leading to a loss of genetic diversity within the population. As a result, the mtDNA of individuals within the isolated population may be very similar or identical, making it difficult to differentiate between individuals based on their mtDNA profiles. An example of this is observed within the Norfolk Island genetic isolate, which was founded by 11 male Caucasians and 12 Polynesian women. Today, in 225 individuals from the last four generations of the NI core pedigree, 87 individuals shared the most common mtDNA profile observed. Small population size can also exacerbate the effects of genetic drift, leading to the fixation of specific genetic variants and the loss of genetic diversity over time [71]. These factors can make it difficult to establish an accurate baseline for the population and to identify individuals with mtDNA profiles that do not match the baseline. In addition, other factors can also contribute to the homogeneity of mtDNA within these populations. For example, cultural, or religious practices that discourage intermarriage can also contribute to genetic isolation and reduce the diversity of mtDNA within a population [72].

Lack of Representation in Reference Databases

Reference databases serve as repositories of known mtDNA sequences and are crucial for comparing and matching sequences obtained from a sample [73]. For example, the European DNA Profiling (EDNAP) Mitochondrial DNA Population Database (EMPOP) is an international forensic reference database for mtDNA haplotypes [74, 75] used for determining maternal lineage or assessing the rarity of a particular mtDNA profile found at a crime scene [74]. By referencing against EMPOP, or another forensic reference database, forensic analysts or geneticists can estimate the frequency of a particular mtDNA profile in the wider population to assess the likelihood and significance of a match [76,77,78], Hence, the choice of the population database is an essential factor to data interpretation [79]. Despite its wide-ranging collection of data from across the globe, EMPOP, like many databases, has certain limitations. One of the significant current gaps is in the collection of samples from Oceania and Africa [75]. Both are home to a vast array of ethnic and indigenous communities, many of which have lived in isolation for thousands of years, leading to distinctive genetic makeups. Additionally, reference databases such as EMPOP predominantly feature control region sequences and the current limited reference data for the entire mtGenome further exacerbates the underrepresentation of isolated populations (mtDNA database v4: 38,361 control region haplotypes vs 4,289 complete mtGenome haplotypes) [75].

Isolated populations, by their very nature, have unique genetic characteristics and limited gene flow with other populations [80•, 81,82,83]. Due to historical, geographic, or cultural factors, these populations may have developed distinct mtDNA lineages that are not well-represented in existing reference databases. If the population from which the sample originates is underrepresented or absent in the database, the chance of finding an exact match is significantly reduced. Consequently, this increases the likelihood of encountering false positives or false negatives in the identification process. The limited diversity within the database can also result in a higher chance of finding a partial or imperfect match that incorrectly suggests a relationship between the sample and an individual in the database. In contract, the absence of the specific mtDNA lineage in the database can lead to the erroneous conclusion that the sample does not match any individual within the database. The consequences of false positives and false negatives in mtDNA-based human identification can have significant implications, particularly in legal and forensic contexts. False positives can wrongly implicate individuals or result in miscarriages of justice, while false negatives can hinder investigations and prevent the identification of individuals who may be linked to a crime or disaster. Hence, a database that does not adequately represent all global populations can inadvertently lead to incorrect or biased conclusions in forensic cases.

While EMPOP serves as an invaluable tool for forensic scientists, it is crucial to understand its limitations and continually strive to make such databases more inclusive. Efforts should be made to increase the representation of isolated populations in reference databases to ensure that their unique genetic lineages are adequately captured and accounted for. Isolated populations often have unique mtDNA variations that are like genetic ‘fingerprints’, which can be pivotal in distinguishing between populations and tracing ancestry. High-coverage NGS analysis provides a comprehensive view of the mtGenome, capturing low-level variation and therefore by analysing mtGenomes from isolated populations using this technology, we can unveil these unique variations that might be overlooked with traditional sequencing methods such as Sanger sequencing. Integrating this data into reference databases like EMPOP would enrich the content of this important database, making it a more inclusive and comprehensive tool. For forensic applications, this could mean increased accuracy in matching unknown samples to specific populations or individuals, enhancing the reliability of mtDNA as forensic evidence. The inclusion of these specific variations might also reduce the chance of false positives or ambiguities in interpretation, as the database would now encompass a wider range of mtDNA variations. It would also strengthen the evidentiary value as when a match is found, it would be presented with higher confidence, given the increased specificity of the database. In essence, integrating high coverage mtGenome NGS data from isolated populations ensures that reference databases are not just larger, but more precise, and more representative of global human genetic diversity. Collaborative initiatives have involved researchers, forensic experts, and members of isolated communities, such as Iceland [84, 85], Finland [86], the Hutterites [87], and our work with Norfolk Island [39, 88,89,90]. These are essential for establishing comprehensive and diverse reference databases that encompass the global human population. By addressing the lack of representation in reference databases, the accuracy and reliability of mtDNA-based human identification can be significantly improved, ultimately facilitating more precise forensic investigations, accurate familial relationships, and reliable ancestral lineage determinations.

Data Interpretation and Appropriate Statistical Methods

Isolated populations are characterised by their deviation from Hardy-Weinberg equilibrium [91,92,93], leading to an elevated level of homozygosity [94, 95]. As such, the utilisation of data from isolated populations can hinder or introduce bias in forensically significant statistics. Therefore, it is important to carefully select appropriate statistical methods that are suitable for the specific characteristics of the population being studied. This is difficult when review of evaluation methods for mtDNA analysis assumes that available databases are appropriate [96•]. For instance, a common statistical approach used in human identification is the calculation of random match probabilities or likelihood ratios using reference databases [97]. These databases are constructed based on the assumption that the population is randomly mating, and that genetic diversity is evenly distributed across the population. However, in isolated populations, these assumptions may not be valid due to factors such as limited gene flow, genetic drift, or founder effects. When applying statistical methods to analyse mtDNA data in isolated populations, it may be necessary to develop specific reference databases or adjust existing databases to account for the unique genetic characteristics of the isolated population. This could involve incorporating information about the population's history, demographic factors, or genetic structure. Additionally, alternative statistical approaches that are better suited to the specific circumstances of the isolated population may need to be considered. For example, methods that explicitly account for population structure, relatedness, or non-random mating patterns could be more appropriate.

Conclusions

mtDNA analysis is a useful tool for human identification, particularly in cases where nuclear DNA is not available or is severely degraded. However, work involving isolated populations presents unique challenges and requires an interdisciplinary approach that integrates genetic, historical, and sociocultural perspectives that must be addressed to ensure accurate and reliable results. Careful consideration of the unique population history, social dynamics, and potential consequences of genetic information is crucial for understanding the full significance and implications of these findings and are necessary to minimise the impact of genetic drift, founder effect, and the lack of representation in reference databases. Cultural and social factors must also be considered in the interpretation of results to ensure that they are respectful and do not infringe on the rights and practices of isolated populations. There should be careful consideration of the statistical methods used for data analysis, as the assumptions underlying traditional approaches may not hold true in isolated populations, leading to inaccurate results.