Introduction

Short tandem repeat (STR) loci are the most informative genetic markers used worldwide for human identification (HID) purposes. Although commercial kits that analyze 15 STRs allow solving most forensic and paternity cases, some pitfalls arise in situations such as the identification of disaster victims, missing persons, and in motherless paternity cases (Wurmb-Schwark et al. 2006; Coletti et al. 2008; Borovko et al. 2009; Li et al. 2012; Ziętkiewicz et al. 2012). In our experience, the lack of informative reference samples (first-degree relatives) is probably the most common problem when identifying unknown bodies. This is sometimes overcome by searching STR databases that include samples from missing persons and their relatives, criminals, victims, etc. (Álvarez-Cubero et al. 2012). A considerable number of countries have established local databases that include the 13 STR loci of the Combined DNA Index System (CODIS), the amelogenin locus for gender determination, and additional STRs depending on the HID kit employed (Collins et al. 2004). In 2011, the federal State of Mexico (located within the Central region, Mexico) established a civil and criminal STR database containing approximately 2000 DNA profiles analyzed with the commercial PowerPlex 16 System (Krenke et al. 2002). Although many Mexican populations have been analyzed with autosomal STRs to support forensic casework (Rubi-Castellanos et al. 2009; Salazar-Flores et al. 2015), only the following states/regions of the country have been studied with the PowerPlex 16 kit: Guanajuato, Veracruz, Nayarit, Yucatan, Mexico City, and the Western region (Rangel-Villalobos et al. 2010; González-Herrera et al. 2010; Ramírez-Flores et al. 2014; Martínez-Sevilla et al. 2016). Furthermore, the population structure and genetic relationships of the Mexican populations remain unexplored using PowerPlex 16 System.

Therefore, we analyzed this STR genotype database from the State of Mexico. The findings: i) report nine fortuitous matches detected during the addition of new DNA profiles to this genetic database, which suggest first-degree kinship; ii) evaluate the presence of modal alleles in these cases; iii) estimate the expected probability of finding these fortuitous matches; iv) estimate statistical parameters of forensic efficiency in this previously unstudied population (State of Mexico); and v) evaluate the structure and genetic relationships among Mexican populations previously studied with the PowerPlex 16 system.

Material and methods

DNA extraction

Bones analyzed in this work were previously decalcified with EDTA 0.5 M. Bone slices were treated with a Proteinase K Digestion Solution prepared with Bone Incubation Buffer and Proteinase K solution at 21 mg/mL (Promega Corp. Madison, WI). Similarly, tissues (mainly muscle) were digested with Proteinase K and digestion buffer, whereas saliva swabs were spun for DNA extraction from the resulting pellets. The final step of DNA extraction employed the DNA IQ System according to the manufacturer’s instructions (Promega Corp., Madison, WI). Conversely, blood samples obtained from relatives, or reference samples included in the database, were placed on FTA cards and processed with FTA purification reagent (Whatman Inc., Clifton, NJ). One punch derived from FTA cards was used during PCR amplification as DNA sample. All individuals provided a signed, written informed consent in accordance with the ethical guidelines of the Helsinki Declaration. The anonymity of the recruited individuals was preserved. This project was authorized by the Ethical Committee Dirección de Servicios Periciales de la Fiscalía General de Justicia del Estado de México (PGJEM).

STR genotyping

The PowerPlex 16 system was used according to the supplier’s instructions (Promega Corp. Madison, WI). Amplified products were run by capillary electrophoresis using the ABI Prism 310 Genetic Analyzer (Applied Biosystems, Foster City, CA). Allele calling was achieved by comparison with the allelic ladder provided in the kit, helped by GeneMapper software (version 3.2).

STR database creation

DNA profiles based on the 15 STRs constituting the PowerPlex 16 system were individually uploaded into the Genetics Platform (Server Unix Solaris, Architecture Sparc64, Motor Rdbms Oracle, Enterprise Edition) developed by Grupo Empresarial Iberoamericano (GEI) (http://geigen.mx). In accordance with international recommendations (http://enfsi.eu/documents/), the inclusion criteria and upload process of DNA profiles to the STR database included quality control verification of the laboratory’s procedures (ENFSI DNA Working Group 2014). The laboratory has participated in the quality control exercise organized by the Grupo Iberoamericano de Trabajo en Análisis de DNA (GITAD: http://www.aicef.net/). The STR genotype database primarily contains details on unknown bodies, people searching for missing relatives, and biological evidence from criminal cases. The sample’s origin was recorded in the database for classification purposes. We carefully selected a subpopulation of unrelated individuals to estimate forensic parameters (n = 493). For this purpose, we excluded genetic data from criminal samples, unknown bodies, and from individuals sharing surnames in order to avoid possible kinship in this subpopulation sample.

Data analysis

Allele frequencies and the following statistical parameters of forensic importance were calculated using the Excel spreadsheet Powerstats (Tereba 2001): allele frequencies, minimum allele frequencies (MAF), probability of exclusion (PE), power of discrimination (PD), polymorphism information content (PIC), observed heterozygosity (Het), and typical paternity index (TPI). GDA (version 1.1) software was used to perform Hardy–Weinberg equilibrium (HWE) and linkage disequilibrium (LD) tests (Lewis and Zaykin 2002). For the interpopulational analyses, we included STR datasets from four Mexican populations (Rangel-Villalobos et al. 2010; González-Herrera et al. 2010; Ramírez-Flores et al. 2014; Martínez-Sevilla et al. 2016) and from four main American ethnic populations (Hill et al. 2013). Geographic location of the Mexican populations is represented in Fig. 1. Analysis of molecular variance (AMOVA), Fst distances, and pairwise Fst p-values were estimated with Arlequin 3.1 software (Excoffier et al. 2005). Distances were represented graphically in a multidimensional scaling (MDS) plot using the software SPSS (version 20.0 for Windows).

Fig. 1
figure 1

Geographic location and sample size of the Mexican-Mestizo populations used for interpopulational analysis in this study

Likelihood ratios (LRs) representing paternity indexes with only one parent (motherless of fatherless) were computed either when the match was detected during the inclusion of new DNA profiles, or when searching missing persons within the STR database. Bayesian posteriori probabilities assuming equal prior probabilities (0.5) for the two alternative probabilities were considered for this purpose (Gjertson et al. 2007), using the allele frequencies reported here and the Familias 3 software (Kling et al. 2014). In addition, we estimated another LR, termed LRNRC, by applying a correction factor according to the size of the database (i.e., N = 2000). This follows Recommendation 5.1 of the National Research Council (NRC II), which was endorsed by the FBI’s DNA Advisory Board in the February 2000 recommendations on statistical approaches (National Research Council (NRC) Committee on DNA Forensic Science 1996). In order to check whether the observed cases showed agreement with expectations simply by chance, we estimated for each STR the matching probabilities between all possible genotypes sharing at least one allele. For this purpose, HWE and equilibrium linkage assumptions were applied for estimating genotype and DNA profile frequency, respectively. The combined matching probability was obtained by applying the product rule and the Bonferroni correction according to the sample size (N = 2000), using Microsoft Excel 2007.

Results and discussion

Observed cases of fortuitous matching

We report nine matches (presumed to be fortuitous) found during the search or inclusion of DNA profiles into the STR database of 2000 samples from the State of Mexico. According to our records, there were no biological relationships between these matching individuals.

Case 1

When comparing the mother’s DNA profile with the database, two positive matches suggesting first-degree kinship were found during the search for a missing daughter (Table 1). A singular difference between LR and LRNRC values distinguishes the fortuitous match from the (presumed) real kinship regarding the missing daughter: the first does not support the maternity hypothesis (LRNRC = 0.08 versus LR = 2.62). Eventually, the inclusion of the father confirmed which one was the fortuitous match by means of three inconsistencies (D18S51, Penta E, and CSF1PO), discarding undoubtedly the biological relationship of one sample (FM1, see Table 1). Simultaneously, the father’s inclusion increased the LR for the match with the other sample to get a more confident decision on returning the corpse to the (presumably) real relatives.

Table 1 Cases of fortuitous matches (FM) with 15 STRs versus –presumably– real kinship between a mother (M) and/or father (F) searching a missing daughter (MD) and missing son (MS), respectively

Case 2

Two positive matches were found during the search for a missing son when a father’s DNA profile was compared with the STR database (Table 2). Although both matches offered relatively low but positive LRNRC values (1.25 versus 16.48), one was clearly larger when comparing uncorrected LR (2495.8 versus 32,961.9). Therefore, this difference seems helpful to make a decision and returning the corpse to the (presumed) biological father. In brief, low LRNRC values (~1 or <1) appear useful for indicating a fortuitous match, whereas uncorrected LR comparison could be useful for making a final decision when two matching events are observed during the search for a missing person within a STR database. However, we must kept in mind that, whenever possible, additional relatives or markers should be included for further DNA analysis, besides to anchoring anthropological and/or circumstantial data to the DNA identification. Unfortunately, this was not possible when this case was solved some years ago.

Table 2 Fortuitous matching cases found during the inclusion of DNA profiles (PowerPlex® 16 System) to the STR database of the state of Mexico

Cases 3–9

Seven presumed fortuitous matches were detected between pairs of samples during the inclusion of DNA profiles into the STR database (Table 2). Comparison of surnames and/or available records indicated no biological relationship between matching individuals. Although in most cases (6/7; 85.7%) the low LR values allow deducing fortuitous kinship events (LRNRC < 2.2), one (case 7) displayed large LRs (LR = 2,320,000; LRNRC = 1160), suggesting a real biological relationship. Unfortunately, further biological and/or DNA samples were not available for additional genetic analyses (e.g., X-STRs, mtDNA, etc.) to confirm or discard this hypothesis. Again, low LRNRC values (~1 or <1) appear useful for establishing fortuitous matches.

Presence of modal alleles in fortuitous matching

We evaluated the presence of modal alleles in the nine fortuitous matches described here (Tables 1 and 2). By STR locus, the markers whose modal alleles were commonly found (>50%) were D5S818 (8/9), CSF1PO (7/9), D3S1358, D7S820, D16S539, and TPOX (6/9), as well as TH01 and D8S1179 (5/9) (Fig. 2a). To a lesser extent, the modal alleles of D21S11 and vWA were involved in four cases (4/9), followed by D13S317 and FGA (3/9). Conversely, the modal alleles of D18S51, Penta E, and Penta D were rarely observed in fortuitous matches (2/9).

Fig. 2
figure 2

Modal allele presence in the nine fortuitous matching cases observed in the Mexican STR database: a By locus; b By case

Interestingly, in most cases (5/9; 55%) the majority of modal alleles for the 15 loci were observed in the fortuitous matches: 9 STRs in three cases, and 10 STRs in two cases. In the remaining four (Cases 3, 5, 6 and 7), seven or fewer STRs (<50% of STRs) were involved in fortuitous matching (Table 3; Fig. 2b). Although the small number of cases presented herein limits the application of statistical tests, the results suggest those STRs that could be involved in fortuitous matches by their allele distribution, confirming their elevated presence (>50% loci) in most of the reported fortuitous matches (>55% cases). In addition, these results highlight that the inclusion of more powerful STR human identification systems (i.e., PowerPlex Fusion, GlobalFiler, Investigator 24plex QS kits) and lineage markers (e.g., Y-STRs, X-STRs, or mtDNA) can be critical to establishing confident biological relationships when searching STR databases for missing persons.

Table 3 Expected probability for sharing one/two allele(s) between two unrelated individuals for the Powerplex 16 kit in the state of Mexico database (N = 2000)

Evaluating the probability of matching events in the database

Fortuitous matches potentially have serious legal implications, mainly in motherless paternity tests (Poetsch et al. 2006), and during the search for missing persons in STR databases, given that, frequently, few first-degree relatives are available for the test (Ge et al. 2011). This problem could be accentuated in Mexico by the growing number of missing and killed persons during the last years, along with clandestine graves discovered in different parts of the country. Therefore, we estimated, for individual and combined STR loci, the matching probability between all possible genotypes sharing at least one allele (p = 0.00030617). The corrected p-value according to the sample size (p = 0.00000015308) indicated that one fortuitous match between two DNA profiles is expected in 6,532,422 comparisons for this database (Table 3). This frequency differs from that observed in our study (p = 0.000004; Yate’s chi-square = 21.105). This finding is similar to motherless paternity cases described in Germany, where 26 non-STR mismatches were observed between 336 children empirically compared with 348 men (Poetsch et al. 2006). Although we could not identify a direct explanation for this increment, it might involve the presence of neighboring indigenous individuals with larger inbreeding coefficient, who are constantly incorporated into the Mexican-Mestizo populations. This hypothesis would be in agreement with descriptions of Native American ancestry throughout the country (Moreno-Estrada et al. 2014).

Concluding remarks regarding fortuitous matches

As could be expected, many recommendations of the International Society of Forensic Genetics (ISFG) for disaster victim identification (DVI) are useful to promote the correct identification of missing persons (Prinz et al. 2007). However, some critical ISFG recommendations to avoid fortuitous matches are listed briefly herein: #4) Multiple direct references and samples from first-degree relatives should be collected for each missing person; #6) Use of additional typing systems, such as mtDNA, Y-chromosomal STRs or SNP markers; #10) DNA based identification should whenever possible be anchored by anthropological and/or circumstantial data, a second identification modality, or multiple DNA references; #11) Use of LR that permit DNA results to be combined among multiple genetic systems or with other non-DNA evidence. LR threshold should be determined for when DNA data alone can suffice for an identification, which will be based on the size and circumstances of the event (Prinz et al. 2007). It must be noticed that there are difficulties to implement some recommendations in Mexico and probably in another developing countries. For instance, whereas DNA testing nowadays constitutes the main tool for human identification, the use of anthropological and circumstantial data has become scarcely employed and underappreciated. However, as described in case 2 (Table 1), these data would have been helpful to get stronger conclusions; thus, they should be available whenever possible. In brief, as can be noted, strict application of these ISFG recommendations for DVI would have avoided the fortuitous matches described herein for missing persons identification.

Forensic parameters of the STR database

In agreement with ISFG recommendations for DVI (Prinz et al. 2007), we report statistical parameters of forensic efficiency for the PowerPlex 16 kit when used in the State of Mexico population (Additional file 1: Table S1). The forensic parameters were estimated for each STR locus based on 493 unrelated DNA profiles carefully selected from the STR database according to the criteria above described. HWE test showed that only D7S820 was in disequilibrium after applying the Bonferroni correction (p < 0.0033). Similarly, only two LD cases were detected between D13S317/PENTA E and TH01/TPOX after exact tests (data not shown). Altogether, these isolated findings indicate that DNA profiles can be confidently estimated in the State of Mexico population. The combined power of discrimination (PD) and power of exclusion (PE) in this Mexican population were >99.9%, which is sufficiently reliable to solve most forensic and paternity cases, respectively.

Interpopulation comparison

To our knowledge, this is the first time that all 15 STRs of the PowerPlex 16 System have been employed for this purpose, including five Mexican-Mestizo (Rangel-Villalobos et al. 2010; González-Herrera et al. 2010; Ramírez-Flores et al. 2014; Martínez-Sevilla et al. 2016) and four American populations (Hill et al. 2013) as reference. Interestingly, a central population cluster was formed by Mexico City, Veracruz, the State of Mexico (Fig. 3), and Guanajuato, the latter displaying no differentiation with Mexico City. Interpopulation comparisons by Fst distances and Fst p-values are indicated in Additional file 2: Table S2. Mexican-Mestizos from the Western region were closer to Hispanic Americans (probably due to their closer European ancestry) than the remaining Mexican populations, according to the genetic structure previously described in Mexican-Mestizos based on the 13 CODIS-STRs (Wurmb-Schwark et al. 2006; Rubi-Castellanos et al. 2009). AMOVA results suggest significant differentiation between Mexican and American populations (Fst = 1.45%; p = 0.0000), which is approximately seven times larger than that observed among the five Mexican populations studied herein (Fst = 0.198%; p = 0.0000), and 17 times larger than the interpopulation differentiation among the Central Mexican populations (excluding the Western region) (Fst = 0.198%; p = 0.0000). In brief, based on the PowerPlex 16 system, our results support a relative homogeneity among Mexican populations from the Central region, excluding the Western region from this cluster. This conclusion applies to Mestizos (admixed), who constitute the largest proportion of the Mexican population (~90%) (Rubi-Castellanos et al. 2009; Salazar-Flores et al. 2015), but is invalid for Native American groups, who display a particular genetic structure (Rangel-Villalobos et al. 2016). The genetic structure described herein is important in forensic casework because it allows geneticists using alternative STR population data, given that most of the Mexican populations do not have their own STR databases. This paper follows the guidelines for publication of population data requested by the journal (Carracedo et al. 2013).

Fig. 3
figure 3

MDS plot that shows genetic relationships between Mexican-Mestizo and four American populations based on the Powerplex 16 System

Conclusions

In brief, results emphasize the importance of analyzing a sufficient number of relatives and/or HID systems to reach reliable conclusions when searching for relatives in STR databases for DVI, missing persons identification, and motherless paternity cases. When this is not possible, the concomitant presence of low LRNRC values (~1 or <1) and elevated presence of modal STR alleles should be analyzed to detect possible fortuitous matches. Interestingly, an increased frequency of fortuitous matches was observed in the studied Mexican STR database. Finally, we observed a relative homogeneity among Mexican-Mestizos of the Central region based on the PowerPlex 16 system.