Exploring STR sequencing for forensic DNA intelligence databasing using the Austrian National DNA Database as an example

Hölzl-Müller, Petra; Bodner, Martin; Berger, Burkhard; Parson, Walther

doi:10.1007/s00414-021-02685-x

Exploring STR sequencing for forensic DNA intelligence databasing using the Austrian National DNA Database as an example

Original Article
Open access
Published: 26 August 2021

Volume 135, pages 2235–2246, (2021)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Legal Medicine Aims and scope Submit manuscript

Exploring STR sequencing for forensic DNA intelligence databasing using the Austrian National DNA Database as an example

Download PDF

3706 Accesses
12 Citations
8 Altmetric
Explore all metrics

This article has been updated

Abstract

Here, we present the results from a population study that evaluated the performance of massively parallel sequencing (MPS) of short tandem repeats (STRs) with a particular focus on DNA intelligence databasing purposes. To meet this objective, 247 randomly selected reference samples, earlier being processed with conventional capillary electrophoretic (CE) STR sizing from the Austrian National DNA Database, were reanalyzed with the PowerSeq 46Y kit (Promega). This sample set provides MPS-based population data valid for the Austrian population to increase the body of sequence-based STR variation. The study addressed forensically relevant parameters, such as concordance and backward compatibility to extant amplicon-based genotypes, sequence-based stutter ratios, and relative marker performance. Of the 22 autosomal STR loci included in the PowerSeq 46GY panel, 99.98% of the allele calls were concordant between MPS and CE. Moreover, 25 new sequence variants from 15 markers were found in the Austrian dataset that are yet undescribed in the STRSeq online catalogue and were submitted for inclusion. Despite the high degree of concordance between MPS and CE derived genotypes, our results demonstrate the need for a harmonized allele nomenclature system that is equally applicable to both technologies, but at the same time can take advantage of the increased information content of MPS. This appears to be particularly important with regard to database applications in order to prevent false exclusions due to varying allele naming based on different analysis platforms and ensures backward compatibility.

Sequence investigation of 34 forensic autosomal STRs with massively parallel sequencing

Article Open access 01 May 2018

Massively parallel sequencing of 25 short tandem repeat loci including the SE33 marker in Koreans

Article 22 January 2021

A novel forensic panel of 186-plex SNPs and 123-plex STR loci based on massively parallel sequencing

Article 26 August 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Throughout the past decades, short tandem repeat (STR) loci have become the most important genetic markers in forensics. They can be analyzed at a reasonable cost/time ratio and provide high enough statistical discrimination power to identify individuals in the majority of crime and human identification cases [1]. Traditionally, STRs are detected by PCR-generated amplicon sizing (also known as capillary electrophoresis-based methodologies, CE) and only relatively recently, massively parallel sequencing (MPS) approaches were introduced [2,3,4,5]. These studies revealed a number of benefits of MPS-based STR analysis over conventional CE methods, including, but not limited to, a larger number of loci and different types of loci that can be amplified in a single assay, an increased discrimination power by detecting isometric alleles (that share the same size but differ in sequence), and an increased success rate in deconvoluting complex mixtures [6]. Limitations that speak against an immediate implementation of MPS-based STR analysis approaches in a routine environment involve cost considerations, lack of harmonized sequence nomenclature, and lack of user-friendly software analysis tools, amongst others [7]. An important component, yet largely ignored, is the evaluation of current MPS-based STR typing in the routine environment of a forensic DNA databasing laboratory, although individual validation studies have touched on this topic (e.g., [8]). In addition to crime scene samples that are the primary focus of so far published studies, reference samples from suspects or convicted felons are required to be analyzed with the same technology in order to take full advantage of the methodological benefits.

Here, we evaluated the performance of an MPS-based STR-typing system consisting of the PowerSeq 46GY panel (Promega, Madison, USA) analyzed on a MiSeq FGx sequencer (Verogen, San Diego, USA) for DNA intelligence databasing purposes using a random subset of the Austrian National DNA Database as an example. The PowerSeq 46GY kit includes all loci required for national and international DNA intelligence databasing in the United States and Europe. We evaluated concordance and backward compatibility to extant CE-based genotypes, stutter display, and heterozygote balance and provide new population data to increase the body of STR variation that is currently being collected and catalogued in various environments (e.g., [9, 10]).

Materials and methods

Samples

All 248 buccal swab reference samples included in this study derived from the Austrian National DNA Database in accordance with the Austrian Data Protection regime. They were analyzed in line with Austrian legislation and with permission of the Austrian Federal Ministry of the Interior. Legal requirements with respect to sample storage (buccal swabs and/or extracted DNA) as well as the permission to go back to DNA extracts for re-testing are regulated by the Austrian Federal Security Police Act, which also contains specific legal provisions for scientific purposes to provide biometric data, including DNA data, in anonymized form to universities for research. The samples were randomly selected by executive authorities of the Austrian Federal Ministry of the Interior. The selection criteria were based on male sex, Austrian nationality, and birthplaces. The samples were made anonymous to the analyzing laboratory by using barcode information.

DNA extraction

DNA was extracted from buccal swab samples using the Chelex 100 (Bio-Rad, Hercules, USA) method [11], and stored at − 20 °C for 1–13 years prior to DNA testing performed in this study. The selected samples were divided into three storage time groups as follows: group I: < 2 years; group II: 2 < 5 years, and group III: 5–13 years.

DNA quantification

To determine the amount of genomic DNA, a real-time quantitative PCR (qPCR) assay targeting specific AluYb8 sequences was used [12]. A spiked in vitro mutagenized and cloned part of the human retinoblastoma susceptibility protein 1 (RB1) gene was co-amplified as an internal amplification positive control (pRB1-IPC) according to [13], updated in [14]. Calibration curve analyses covered a DNA input range from 169.5 fg to 10 ng per reaction and were performed in duplicates. The final reaction volume of 10 µL consisted of 5 µL TaqMan Fast Universal PCR mix (Thermo Fisher Scientific [TFS], Waltham, USA), 3 µL primer probe premix (made in-house), and 2 µL extracted DNA. The amplification was carried out on an Applied Biosystems 7500 Fast Real-Time PCR instrument (TFS) applying 95 °C for 20 s, 40 cycles of 95 °C for 3 s, and 60 °C for 30 s. Data analysis was carried out with the HID Real-Time PCR Software v 2.3 (TFS). Kinetic information for the pRB1-IPC system yielded no indication of inhibition during DNA amplification.

Capillary electrophoresis

STR analysis was performed using the AmpFlSTR NGM SElect Express kit (TFS) [15] and the PowerPlex 16 System (Promega) [16] on all samples, resulting in a total of 23 autosomal STR (aSTR) loci plus amelogenin. As SE33 is not included in the PowerSeq 46GY panel, only length-based SE33 data could be considered. All remaining 22 aSTR markers as well as amelogenin are also included in the PowerSeq 46GY panel enabling a comparative view of the results. Amplifications were performed on ABI GeneAmp 9700 thermal cyclers (TFS) following the recommended protocols [15, 16]. PCR products were separated and detected using an Applied Biosystems Prism 3500XL Genetic Analyzer (TFS).

MPS workflow

The PowerSeq 46GY kit (Promega) was used to co-amplify 22 aSTRs (D1S1656, TPOX, D2S1338, D2S441, D3S1358, FGA, D5S818, CSF1PO, D7S820, D8S1179, D10S1248, TH01, vWA, D12S391, D13S317, Penta E, D16S539, D18S51, D19S433, D21S11, Penta D, and D22S1045), 23 Y-STRs (data not shown), and amelogenin. This extended STR panel aims to target forensic markers to comply with the European Standard Set (ESS) [17, 18] and the Combined DNA Index System (CODIS) recommendations [19,20,21]. Amplification, purification, library preparation, normalization, quantification, pooling, and sequencing were performed according to the manufacturer’s recommendations [22,23,24].

The 248 DNA samples were processed as five library batches of 4 × 48 and 1 × 56 samples (Table 1). Each library batch included a positive and a negative amplification control, resulting in 50 (4 ×) or 58 (1 ×) samples (incl. controls) that were assembled into one sequencing run. After sequencing and data analysis, one sample was excluded due to contamination during manual library preparation. Therefore, only data of the remaining 247 samples were further considered.

Table 1 Run and quality metrics information for five sequencing runs. Sequencing was performed using the PowerSeq 46GY kit (Promega, USA) on a MiSeq FGx instument (Verogen, USA) according to the manufacturer’s recommendations

Full size table

Amplification of STR fragments

All samples were diluted accordingly in molecular grade water to amplify 0.5 ng of template DNA according to [22]. Multiplex PCR was performed targeting 0.5 ng DNA using the PowerSeq 46GY kit [22] on an Applied Biosystems GeneAmp 9700 thermal cycler (TFS) according to the manufacturer’s recommendations [23]. Each amplification reaction was treated with 5 µL proteinase K solution [504 µg/mL] (Roche Diagnostics, Mannheim, Germany) and purified using AMPure XP beads (Beckman Coulter, CA, USA) following [22]. The concentrations of amplification products were estimated spectrophotometrically prior to library preparation by measuring the absorbance at 260 nm according to [25] and as recommended in [23] with a NanoDrop ND-1000 spectrophotometer and analyzed with software version 3.8.1 (both Peqlab Biotechnologie GmbH, Erlangen, Germany).

Library preparation and sequencing

End-repair, A-tailing, adaptor ligation, initial, and second purification were performed using the TruSeq DNA PCR-Free HT Library Prep Kit (Illumina, San Diego, USA; [24]) according to the manufacturer’s recommendations [22, 23], with the exception that supplied sample purification beads were replaced by AMPure XP beads (Beckman Coulter, CA, USA). To ensure balanced pooling, each library sample was quantified in duplicate by means of qPCR using the KAPA SYBR FAST Universal qPCR Kit (Roche) following [26] on an Applied Biosystems 7500 Fast Real-Time PCR instrument. Data analysis was performed using the HID Real-Time PCR Software v 2.3. Based on qPCR results, samples were diluted and normalized to 4 nM and equally pooled according to [22]. Sequencing was performed on the MiSeq FGx instrument (Verogen, [27]) using a 500 cycles MiSeq Reagent Kit v2 for 2 × 250 paired-end sequencing (Illumina, [28]) according to the manufacturer’s recommendations. The final library concentration was 12 pM with approx. 6.6% spiked PhiX control library (Illumina) following [22].

Data analysis

Capillary electrophoresis data

Size-based analysis of STR fragments was conducted using the GeneMapper ID-X software, version 1.2 (TFS) by applying in-house validated dye thresholds: blue — 50 relative fluorescence units (RFU), green — 80 RFU, yellow — 100 RFU, and red — 100 RFU.

MPS data

Sequencing results were monitored using the Sequencing Analysis Viewer software (Illumina; [29]) to review relevant quality metrics. Compressed FASTQ files were manually extracted for data analysis using the open-source STRait Razor v2s software tool [30, 31]. Sequences were aligned to human genome assembly GRCh38 and genomic coordinates for STR markers were determined by post processing of the mpileup output from SAMtools [32] as described in [30, 31]. STR genotypes were analyzed by applying an analytical threshold (AT) of 50 reads, referring to [33]. Alleles were called above the in-house defined interpretation threshold (IT) of 500 or 100 reads for homozygous or heterozygous genotypes, respectively. Sequence-based allele frequencies are shown in Table S1 considering all available sequence information.

MPS stutter ratios

Stutter analysis was restricted to a subset of 50 samples (app. 20% of the entire sample set) selected according to the total number of reads that was calculated by summarizing the intensity (read count) of a given STR profile for all 22 aSTRs. Stutter sequences were determined as one repeat unit smaller than the parental allele and calculated by dividing the intensity of stutter sequence by the intensity of the corresponding allele. The selected samples were divided in two equally sized groups comprising low (category I) and high performing samples (category II), respectively. Selection criteria were established to investigate the effect of sample performance (total number of reads per genotype) on the formation of stutter height and defined as follows: category I comprised only samples that fell below 63,500 reads, while category II included solely samples above 199,000 reads (Table S2).

Statistical analysis

Microsoft Excel workbooks, IBM SPSS software, version 24 [34], and GraphPad Prism, version 8 for Windows [35], were applied.

Forensic and population genetic parameters, including allele frequency, observed and expected heterozygosity, expected homozygosity, power of exclusion, power of discrimination, matching probability, typical paternity index, and exact Hardy–Weinberg equilibrium (HWE) tests for the population, were calculated from CE data using in-house software according to formulae listed in Table S3 and the STRAF software package [36].

Results and discussion

This population study comprised samples that were collected between August 2005 and July 2017. Size-based STR genotypes were generated in the course of routine forensic practice when the samples entered the laboratory using conventional CE.

MPS run parameters

To evaluate the data quality of each sequencing run, we extracted the provided quality metrics, e.g., cluster density, reads passing filter, Q30 scores, and data output. Table 1 shows the attained run parameters (recommended values in brackets): cluster density (1,000–1,200 K/mm²), cluster passing filter, phasing, pre-phasing, total number of reads, total number of reads passing filter, % ≥ Q30 (> 75%), and the number of samples per run.

One of the runs (run 3) generated optimal cluster density according to the manufacturer’s recommendation [37] (Table 1), whereas cluster densities for runs 1 and 5 as well as for runs 2 and 4 were diagnosed as under- and overclustered, respectively. Overclustering potentially introduces analytical problems, which might lead to poor run performance and decreases Q30 scores along with lowered data output. In contrary, underclustering does not inevitably harmfully affect data quality but predominantly lowers data output. Q-scores, also known as Phred quality scores, are commonly used metrics of base calling accuracy and to communicate very small error probabilities. For example, a Q-score of 30 assigned to a base call is identical to the probability of an incorrect base call 1 in 1,000 times, i.e., base call accuracy is 99.9% [38, 39]. In the current study, all sequencing runs exceeded the recommended Q30 score of > 75% (Table 1) indicating reliable base calling (mean: 83.2%; standard deviation (SD): 4.1%). In this study, we considered only MPS-based aSTR genotype calls that met the in-house defined interpretation threshold (see “MPS data” section).

Concordance

Concordance was evaluated for the 22 aSTR markers shared between the PowerSeq 46GY kit and the applied CE kits (AmpFlSTR NGM SElect Express Kit and PowerPlex 16 System) by comparing length-based allele calls recalculated from MPS results and those derived from CE. The only discordance between both technologies was caused by two allelic drop-out events: in one case, drop-out of the longer allele was found at D2S1338 (CE: 18/28; MPS: 18; 1750 reads) and in the other case, the shorter allele at Penta E (CE: 8/11; MPS: 11; 3029 reads) dropped out. Full concordance was observed between the applied CE kits for the eight autosomal STR loci included in both multiplex assays.

All results for the positive amplification controls were concordant to the known information, except a partial but otherwise correct profile of the positive control in run 2. Repeat-region sequence information was concordant for all typed positive amplification control alleles. No negative amplification control yielded a detectable signal, with the exception of one allelic drop-in in run 4 (TPOX: allele 11; 373 reads).

Allele concordance between MPS and CE was 99.98% (10,866 out of 10,868 alleles in total) and locus concordance amounted to 99.96% (5,432 out of a total of 5,434 STR loci; Table S4). These findings are comparable to earlier reported concordance studies [40,41,42]. Furthermore, concordance was similar to that observed between commonly used CE-STR kits [43, 44] indicating that the results obtained with the PowerSeq 46GY panel are highly compatible with those obtained by standard STR-typing technologies.

PowerSeq 46GY kit performance

DNA quantity and storage period

The mean DNA quantity of the 247 samples was 7.29 ng/µL (SD: 5.20, median: 6.23) with a minimum of 0.71 ng/µL and a maximum of 33.04 ng/µL (Table S5). The mean storage period for DNA samples used within the current study was 2.96 years (SD: 1.90; median: 3.12) and varied between 1 and 13 years (Table S5). As there were no changes in sampling and DNA extraction over the entire time period, the latter is the only remaining variable that could influence the DNA quantity and quality. As shown in Figure S1, storage time did not affect sample performance and was comparable for the majority of samples included in each storage time group (Figure S1). Although relative marker performance decreased slightly with increasing amplicon length, we did not observe differences between storage time groups I–III (Figure S2). The decline in relative marker performance, in relation to amplicon size, did not adversely affect downstream data analysis.

Relative marker performance

Relative marker performance was evaluated (for each sample and STR locus) by dividing the intensity of marker reads by the total number of reads for all markers of a single STR profile. Assuming equally performing STR markers, the expected relative marker performance value for each of the 23 markers (incl. amelogenin) of the PowerSeq 46GY panel would equal 4.35% of all reads. The effective mean relative marker performance values were found to lie close to the expected value and were comparable for the majority of markers included in the present MPS panel, except for D1S1656, D2S1338, TH01, D12S391, D16S539, Penta D, D22S1045, and amelogenin (Fig. 1). The average performances of the latter eight markers were outside the well-balanced range, which we defined as mean relative marker performance + / − one SD_{(marker mean)} (SD_{[marker mean]}: 1.23%; Table S5). These results indicated that the PowerSeq 46GY prototype library kit consists of a more balanced marker set compared to other prototype/early access MPS-based STR multiplexes, when only aSTRs are considered [45].

Heterozygote balance

Heterozygote balance (HB) is expressed (for each sample and STR locus) as ratio between the minor and the major allele intensity for heterozygote genotypes. On average, all markers showed HB ratios ≥ 0.80, except D12S391 (mean: 0.79; SD: 0.13), D19S433 (0.78; 0.14), Penta E (0.77; 0.15), and D2S1338 (0.70; 0.19) (Figure S3, Table S4). In seven samples (0.13% of the sample set), we observed highly imbalanced genotype calls (HB ratios ≤ 0.30) at D2S1338 (5 ×), D19S433 (1 ×), and D21S11 (1 ×) (Figure S3, Table S4). Generally speaking, HB ratios were similar to those obtained with CE-based STR kits [46, 47] or reported in earlier MPS studies [48, 49]. D2S1338 was found to be most susceptible to heterozygote imbalance, which has been described by [49] before. Since, high imbalances potentially lead to marker drop-out, further optimization steps, especially for D2S1338, are required as already indicated by [49]. We note that known imbalances at D22S1045 [2, 50,51,52] and D5S818 [51, 53,54,55] were not observed for genotypes amplified with the PowerSeq 46GY library kit. In one sample, heterozygote imbalance was found at D5S818 after CE analysis but not with MPS (typed alleles — CE: 10/11, HB: 20%; MPS: 10/11, HB: 72.1%). Sequence analysis of the latter sample unveiled the presence of an A > G transition (rs182073376), which was located 36 nucleotides upstream the repeat region of D5S818 [56] and was found only once within our dataset. To understand the reason for this imbalance, we examined the SNP location with respect to the previously published PowerPlex 16 (Promega) primer sequences [57]. Indeed, the observed A > G transition was located ten nucleotides upstream of the 3′ end within the forward primer’s binding site, which apparently decreased thermal stability of the primer-template complex and, therefore, reduced PCR performance (Figure S4) [58, 59].

Stutter ratios

Stutter products are commonly known artifacts in STR typing that are caused by strand slippage of the DNA polymerase during the extension phase of PCR. Strand slippage results in the addition or deletion of typically one repeat unit in the nascent DNA strand [60]. Besides of rarely seen stutter products in n + 4 and n − 8 positions, the most prominent observed stutters are one repeat unit shorter, i.e., n − 4 for tetranucleotide STR markers.

Note, by applying MPS for STR analysis, each repeating motif may result in the formation of a, i.e., n − 4, stutter product and is therefore multivariate [61]. Furthermore, the formation of stutter seems to be also influenced by adjacent nucleotides [61]. As only single source samples were analyzed, we decided to reduce the complexity of sequence-based stutter analysis by considering stutter products as “defined by length” (univariate; recalculated from MPS results) instead of as “defined by sequence” (multivariate). As expected, stutter ratios increased with the growing number of repeat units (allele size) [60, 62]. Locus-specific stutter analysis showed that stutter ratios for integer alleles were higher than those for intermediate alleles (Figure S5), which is in line with earlier reports [61]. We found no evidence of increased stochastic variation of stutter height for samples belonging to category I (samples with reads < 63,500; Table S2), which is in contrast to the findings described by [62].

Stutter ratios were found to be similar for category I and category II samples (Figure S5; Table 2). The majority of STRs included in the PowerSeq 46GY panel showed mean stutter ratios ranging from 10 to 15% for both categories (Table 2). Stutters appeared to be generally higher for MPS-based STR genotyping than for CE-based kits [46, 47, 63] (Table 2), which is in line with earlier reports [45, 64,65,66]. Rarely, we observed stutter values exceeding 20% (Table 2). However, as D22S1045 consists of a trinucleotide repeat, which is more prone to stutter artifacts, higher stutter values were expected and also known from earlier studies [44, 45, 67].

Table 2 Global stutter analysis was performed for 22 forensic markers on a subset of the Austrian population sample (n = 50). Samples were selected according to the total number of reads and divided into two categories (selection criteria: category I: ≤ 63,500 reads; category II: ≥ 199,000 reads). Bold numbers denote stutter values exceeding 20%. In general, mean stutter values were comparable to CE-based stutter heights

Full size table

Sequence variation

As expected, MPS genotyping increased the detection of genetic variation compared to the length-based procedure via CE. For the following 19 of 22 aSTRs, we observed sequence variation not detectable with CE technology: D1S1656, TPOX, D2S1338, D2S441, D3S1358, FGA, D5S818, CSF1PO, D7S820, D8S1179, TH01, vWA, D12S391, D13S317, D16S539, D18S51, D19S433, D21S11, and Penta D (Table 3, Figure S6). Details on sequence variation can be found in Table S6 that used the updated Forensic STR Sequence Structure Guide v5 [56], available on the STRidER (https://strider.online/) website [68], as template.

Table 3 Overview of sequence variation observed within the Austrian population using the PowerSeq 46GY kit: As expected, MPS techniques revealed increased genetic variation compared to length-based technologies. To characterize the location of sequence variation, we used repeat and flanking region definitions reported in the updated Forensic STR Sequence Structure Guide v5 (Phillips 2018). Due to the lack of a harmonized MPS allele nomenclature that would also define flanking region lengths, we considered the fully available sequence strings up- and downstream from the repeat region as flanking regions. Size-based STR analysis was performed using the AmpFlSTR NGM SElect Express kit (Thermo Fisher Scientific, USA) and the PowerPlex16 kit (Promega, USA)

Full size table

The loci D12S391 (3.3-fold), D2S1338 (3.0-fold), and D21S11 (2.6-fold) showed the most pronounced increase in the number of distinguishable alleles by sequence-based analysis (Figures S7a–c), which have already been found highly variable in earlier studies [6, 40,41,42, 49, 69, 70]. Interestingly, we observed sequence variation at TPOX (Table S1, Table S6, Figure S6). This differs from numerous earlier studies [40,41,42, 49, 70,71,72] but is consistent with three reports [6, 66, 69] that recorded sequence variation at TPOX. Our results were confirming those of [6, 66] who reported flanking region SNPs at TPOX, whereas [69] only observed sequence variation at TPOX within the repeat region. In line with earlier studies, no additional sequence variation was found for D10S1248 [40, 41, 49, 66, 71], Penta E [40], and D22S1045 [40, 42, 66] (Table 3).

Isometric alleles

Sequence-based analysis revealed an increase of heterozygosity relative to CE-based genotyping due to isometric alleles (alleles of identical size but different sequence). Using MPS, 181 of the 1075 (16.8%) homozygous allele pairs were unveiled as isometric heterozygotes at 13 aSTRs: D5S818 (34 ×), D3S1358 (25 ×), D13S317 (23 ×), D21S11 (16 ×), D7S820 (15 ×), D8S1179 (14 ×), D16S539 (14 ×), vWA (11 ×), D2S1338 (9 ×), D2S441 (8 ×), D12S391 (6 ×), D1S1656 (4 ×), and TPOX (2 ×) (Table 4). Our findings for increased heterozygosity were in line with an earlier report [42], except for TPOX. For example, Gettings et al. (2018) [69] observed a minimal increase in heterozygosity (< 1%) at TPOX, while Silva et al. (2020) [49] did not observe sequence variation at TPOX, D7S820, and D13S317. In two samples, MPS was able to identify isometric heterozygous genotypes at TPOX, allele 8 (repeat structure variants [AATG]8 vs. [AATG]8 containing a G > T transversion located in the flanking region [rs149212737]).

Table 4 Overview of STR markers showing the total number of observed alleles using two different STR analysis technologies. MPS increased heterozygosity by identifying 181 homozygous genotypes as isometric heterozygotes. Isometric alleles, also known as isoalleles, are alleles of identical length but different internal sequence

Full size table

Benefits and limitations of STR sequence data in DNA intelligence databasing

The primary intention of this study was to characterize the full sequence information of STRs with respect to forensic DNA intelligence databasing. Despite a high degree of concordance between MPS and conventional CE methodologies, we observed sequence variation that could potentially cause differences between the apparent CE length and allele calls based on counting repeat units in the MPS data as applied below. Furthermore, and as described earlier, sequence variation adjacent to the repeat region is known to affect allele nomenclature based on repeat unit counting [73]. In addition, Bodner and Parson (2020) [9] reported that SNP- and insertion/deletion-caused discrepancies between MPS and CE allele sizes were also a main reason of errors and discrepancies detected during STRidER quality control of MPS datasets.

Instances of flanking region deletions were observed at D2S441, D19S433, and Penta D, respectively (Table S6). At D2S441, we identified a single [T/-] deletion (rs888232687) of the first base downstream of the repeat block (CE: 9.3; MPS: 10) [56]. At D19S433, alignment revealed a single [CT/-] deletion (rs745607776) located in the 5′ flanking region (CE: 14.2; MPS: 15) that was previously described [42, 45, 56]. At Penta D, we identified a 13-nucleotide [AAGAAAGAAAAAA/-] deletion (rs1190908807) that results in the length-based allele 2.2 previously reported by [42, 56, 69]. Additionally, we observed a [A/-] deletion (rs536566765) located in the 3′ flanking region of Penta D (CE: 13.3; MPS: 14) that could cause discordance between MPS and CE as previously reported by [71, 74]. Considering sequence information, the two aforementioned samples revealed five and 14 full repeat units, respectively (Table S6). We note that all these described deletions would have caused seemingly shorter amplicons resulting in discrepant allele size estimations by mimicking intermediate alleles with CE-based detection methods (Table S6).

In addition, at D19S433, we identified 55 intermediate alleles (CE: X.2; representing 11.1% of all D19S433 alleles) that contained a well-characterized [TC/-] deletion (rs147936416; Table S6) [56]. This particular polymorphism is located inside the repeat region of D19S433, and spans the border of a counted repeat unit and an uncounted nucleotide block (for more information, see [56, 73]). This might lead to different micro-variant allele designation depending on the technology and counting method used. For example, an allele called 12.2 based on CE would result in an allele 12.3 based on repeat unit counting from MPS results (Table S6). Of note, alignment of these micro-variant alleles currently also differs in catalogues [10, 56]. As any kind of insertion/deletion can impact concordance between MPS and CE, it is inevitable to come up with a straight forward nomenclature system that maintains backward compatibility to the size-based STR profiles stored in national DNA databases. Based on this, it is important to collect and describe as many sequence variants as possible to increase sequence-based genotype accuracy [71].

Allele variation and frequencies, forensic and population genetic parameters, STRidER quality control, and databasing

All STR alleles found in the CE dataset have, in addition to confirmation by MPS, previously been reported on STRidER [68] or pop.STR [75] at similar frequencies, or were rare variants listed in the NIST STRBase [76]. Results for forensic and population genetic parameters revealed generally high diversity for all loci and thus their suitability for forensic applications. For all parameters, TPOX was the least and SE33 the most diverse locus. All loci met HWE expectations with p-values for deviations above 0.05 (Table S3). This is similar to other datasets [40, 42, 77]. Resulting STR allele frequencies, forensic and population genetic parameters calculated from the 245 complete CE genotypes are available from Table S3. The CE dataset was submitted to STRidER [68], including only complete genotypes according to its databasing requirements [9]. Two samples yielded three allelic genotype calls at SE33 in CE, using the AmpFlSTR NGM SElect Express kit [15], and were therefore excluded from the dataset submitted to STRidER. The 245 CE-based and 247 MPS-based datasets passed STRidER quality control [9] and were assigned accession numbers STR000249 (CE) and STR000337 (MPS). The Austrian allele frequencies will augment the quality-checked data available to the community from the STRidER online allele frequency database [68]. MPS-based allele frequencies of 22 autosomal STRs are shown in Table S1 and will enable statistical calculations from sequenced DNA profiles for the Austrian population. Note that these allele frequencies (Table S1) should be considered preliminary due to the lack of a generally recognized allele nomenclature and recommended sequence ranges. They will be presented on STRidER once these prerequisites for forensic population databasing are agreed on [78].

Data analysis revealed 25 novel sequence variants that have been undescribed in the STRSeq online catalogue so far [10] at loci D1S1656 (3 ×), TPOX (1 ×), D2S441 (1 ×), D3S1358 (2 ×), FGA (1 ×), CSF1PO (1 ×), D7S820 (1 ×), TH01 (1 ×), D12S391 (2 ×), D13S317 (1 ×), D16S539 (1 ×), D18S51 (1 ×), D19S433 (2 ×), D21S11 (3 ×), and Penta D (4 ×). They were submitted for inclusion in STRSeq [10] (Table S1, Table S7; BioProject accession: PRJNA380345; aSTR sequence data: Nucleotide (Genomic DNA) – 1386; date of database query: 28/01/2021).

Of note, this first genotype set where complete allelic data from both CE and MPS analysis were submitted to STRidER enabled direct comparison of the degree of identity of the resulting genotypes. From the perspective of a posteriori quality control [9], the findings further contribute to genuine assessment of applicability and pitfalls of “simply” using length-based allele calls recalculated from MPS results as pseudo-CE alleles for databasing.

Conclusions

This study investigates the performance of MPS for forensic STR intelligence databasing purposes taking a set of 247 randomly picked male individuals from the Austrian National DNA Database as example. The PowerSeq 46GY kit analyzed on the MiSeq FGx system resulted in reliable base calling irrespective of cluster density variations. The investigated aSTR genotypes were highly concordant compared to conventional CE sizing approaches with some exceptions that were also observed in earlier studies [42, 45, 71]. However, differences between CE- and MPS-based results can lead to false exclusions when only length-based alleles, recalculated from MPS results by repeat unit counting, are considered as pseudo-CE alleles and used for DNA intelligence databasing purposes. This would have been the case for two samples due to one mismatch each at D2S441 and D19S433 if our dataset had been imported into the Austrian National DNA Database in this way. From a technical point of view, this reinforces the requirement to use an error-tolerant search algorithm when comparing/searching STR genotypes in intelligence databases that already need to deal with discrepancies between different CE-based STR typing kits.

Sequence-based stutter analysis showed comparable ratios to CE for both low and high performing MPS samples with a small tendency for higher stutter in MPS data.

As expected, we observed substantial sequence variation located within the repeat motif and the flanking region for the majority of STR markers. Only few loci showed no gain in discrimination when comparing sequence-based with length-based allele calls. In general, our results were comparable to previously published population studies [6, 41, 42, 49, 69, 70, 72].

Change history

13 November 2021
Funding note has been added.

References

Jobling MA, Gill P (2004) Encoded evidence: DNA in forensic analysis. Nat Rev Genet 5:739–751. https://doi.org/10.1038/nrg1455
Article PubMed Google Scholar
Churchill JD, Schmedes SE, King JL, Budowle B (2016) Evaluation of the Illumina Beta Version ForenSeq DNA Signature Prep Kit for use in genetic profiling. Forensic Sci Int Genet 20:20–29. https://doi.org/10.1016/j.fsigen.2015.09.009
Article CAS PubMed Google Scholar
Just RS, Moreno LI, Smerick JB, Irwin JA (2017) Performance and concordance of the ForenSeq system for autosomal and Y chromosome short tandem repeat sequencing of reference-type specimens. Forensic Sci Int Genet 28:1–9. https://doi.org/10.1016/j.fsigen.2017.01.001
Article CAS PubMed Google Scholar
Guo F, Zhou Y, Liu F et al (2016) Evaluation of the Early Access STR Kit v1 on the Ion Torrent PGM platform. Forensic Sci Int Genet 23:111–120. https://doi.org/10.1016/j.fsigen.2016.04.004
Article CAS PubMed Google Scholar
Xavier C, Parson W (2017) Evaluation of the Illumina ForenSeq DNA Signature Prep Kit-MPS forensic application for the MiSeq FGx benchtop sequencer. Forensic Sci Int Genet 28:188–194. https://doi.org/10.1016/j.fsigen.2017.02.018
Article CAS PubMed Google Scholar
van der Gaag KJ, de Leeuw RH, Hoogenboom J et al (2016) Massively parallel sequencing of short tandem repeats—population data and mixture analysis results for the PowerSeq system. Forensic Sci Int Genet 24:86–96. https://doi.org/10.1016/j.fsigen.2016.05.016
Article CAS PubMed Google Scholar
Alonso A, Barrio PA, Müller P et al (2018) Current state-of-art of STR sequencing in forensic genetics. Electrophoresis. https://doi.org/10.1002/elps.20180003010.1002/elps.201800030
Article PubMed Google Scholar
Hollard C, Ausset L, Chantrel Y et al (2019) Automation and developmental validation of the ForenSeq DNA Signature Preparation kit for high-throughput analysis in forensic laboratories. Forensic Sci Int Genet 40:37–45. https://doi.org/10.1016/j.fsigen.2019.01.010
Article CAS PubMed Google Scholar
Bodner M, Parson W (2020) The STRidER report on two years of quality control of autosomal STR population datasets. Genes 11:901
Article CAS PubMed Central Google Scholar
Gettings KB, Borsuk LA, Ballard D et al (2017) STRSeq: a catalog of sequence diversity at human identification short tandem repeat loci. Forensic Sci Int Genet 31:111–117. https://doi.org/10.1016/j.fsigen.2017.08.017
Article CAS PubMed PubMed Central Google Scholar
Walsh PS, Metzger DA, Higuchi R (1991) Chelex 100 as a medium for simple extraction of DNA for PCR-based typing from forensic material. Biotechniques 10:506–513
CAS PubMed Google Scholar
Walker JA, Hedges DJ, Perodeau BP et al (2005) Multiplex polymerase chain reaction for simultaneous quantitation of human nuclear, mitochondrial, and male Y-chromosome DNA: application in human identification. Anal Biochem 337:89–97. https://doi.org/10.1016/j.ab.2004.09.036
Article CAS PubMed Google Scholar
Niederstätter H, Köchl S, Grubwieser P, Pavlic M, Steinlechner M, Parson W (2007) A modular real-time PCR concept for determining the quantity and quality of human nuclear and mitochondrial DNA. Forensic Sci Int Genet 1:29–34. https://doi.org/10.1016/j.fsigen.2006.10.007
Article CAS PubMed Google Scholar
Bauer CM, Niederstätter H, McGlynn G, Stadler H, Parson W (2013) Comparison of morphological and molecular genetic sex-typing on mediaeval human skeletal remains. Forensic Sci Int Genet 7:581–586. https://doi.org/10.1016/j.fsigen.2013.05.005
Article CAS PubMed PubMed Central Google Scholar
Thermo Fisher Scientific (2014) Applied Biosystems AmpFlSTR NGM SElect Express PCR Amplification Kit, Manual (Rev. C)
Promega (2016) PowerPlex 16 System, Technical Manual (Revised 5/16)
Schneider PM (2009) Expansion of the European Standard Set of DNA Database Loci—the current situation. Profiles in DNA: 6–7
Gill P, Fereday L, Morling N, Schneider PM (2006) The evolution of DNA databases—recommendations for new European STR loci. Forensic Sci Int 156:242–244. https://doi.org/10.1016/j.forsciint.2005.05.036
Article CAS PubMed Google Scholar
Hares DR (2012) Expanding the CODIS core loci in the United States. Forensic Sci Int Genet 6:e52–e54. https://doi.org/10.1016/j.fsigen.2011.04.012
Article CAS PubMed Google Scholar
Hares DR (2015) Selection and implementation of expanded CODIS core loci in the United States. Forensic Sci Int Genet 17:33–34. https://doi.org/10.1016/j.fsigen.2015.03.006
Article CAS PubMed Google Scholar
Moreno LI, Moretti TR (2019) Short tandem repeat genotypes of samples from eleven populations comprising the FBI’s population database. Forensic Sci Int Rep 1:100041. https://doi.org/10.1016/j.fsir.2019.100041
Article Google Scholar
Promega (2017) PowerSeq 46GY System, Technical Manual (Revised 8/17)
Promega (2016) PowerSeq Systems Prototype, Instructions for Use (Revised 6/16)
Illumina Inc. (2015) TruSeq DNA PCR-Free Library Prep Reference Guide (Part # 15036187, Rev. D)
Thermo Fisher Scientific (2010) NanoDrop 1000 Spectrophotometer V3.8 User´s Manual
Kapa Biosystems (2014) KAPA Library Quantification Technical Guide, v1.14
Verogen (2021) MiSeq FGx Sequencing System Reference Guide (Document # VD2018006 Rev. F)
Illumina inc. (2018) MiSeq System Guide-Denature and Dilute Libraries (Document # 15027617, v01)
Illumina Inc. (2014) Sequencing Analysis Viewer Software (#15020619 Rev F).
King JL, Wendt FR, Sun J, Budowle B (2017) STRait Razor v2s: advancing sequence-based STR allele reporting and beyond to other marker systems. Forensic Sci Int Genet 29:21–28. https://doi.org/10.1016/j.fsigen.2017.03.013
Article CAS PubMed Google Scholar
King JL (2017) STRait Razor Analysis Manual
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/btp352
Article CAS PubMed PubMed Central Google Scholar
Scientific Working Group on DNA Analysis Methods (2019) Addendum to “SWGDAM Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing Laboratories” to Address Next Generation Sequencing. 25
Corp IBM (2016) IBM SPSS Statistics for Windows, Version 240. IBM Corp, Armonk
Google Scholar
GraphPad Prism Software. GraphPad Prism version 8.4.3 for Windows. 8.4.3 for Windows ed
Gouy A, Zieger M (2017) STRAF—a convenient online tool for STR data evaluation in forensic genetics. Forensic Sci Int Genet 30:148–151. https://doi.org/10.1016/j.fsigen.2017.07.007
Article CAS PubMed Google Scholar
Illumina Inc. (2019) Cluster optimization, overview guide
Illumina Inc. (2011) Quality scores of next-generation sequencing, technical note
Illumina Inc. (2014) Understanding Illumina quality scores, technical note
Devesse L, Ballard D, Davenport L, Riethorst I, Mason-Buck G, Syndercombe Court D (2017) Concordance of the ForenSeq system and characterisation of sequence-specific autosomal STR alleles across two major population groups. Forensic Sci Int Genet 34:57–61. https://doi.org/10.1016/j.fsigen.2017.10.012
Article CAS PubMed Google Scholar
Devesse L, Davenport L, Borsuk L et al (2020) Classification of STR allelic variation using massively parallel sequencing and assessment of flanking region power. Forensic Sci Int Genet 48:102356. https://doi.org/10.1016/j.fsigen.2020.102356
Article CAS PubMed Google Scholar
Barrio PA, Martin P, Alonso A et al (2019) Massively parallel sequence data of 31 autosomal STR loci from 496 Spanish individuals revealed concordance with CE-STR technology and enhanced discrimination power. Forensic Sci Int Genet 42:49–55. https://doi.org/10.1016/j.fsigen.2019.06.009
Article CAS PubMed Google Scholar
Tucker VC, Hopwood AJ, Sprecher CJ et al (2012) Developmental validation of the PowerPlex ESX 16 and PowerPlex ESX 17 Systems. Forensic Sci Int Genet 6:124–131. https://doi.org/10.1016/j.fsigen.2011.03.009
Article CAS PubMed Google Scholar
Hill CR, Duewer DL, Kline MC et al (2011) Concordance and population studies along with stutter and peak height ratio analysis for the PowerPlex ESX 17 and ESI 17 Systems. Forensic Sci Int Genet 5:269–275. https://doi.org/10.1016/j.fsigen.2010.03.014
Article CAS PubMed Google Scholar
Müller P, Alonso A, Barrio PA et al (2018) Systematic evaluation of the early access applied biosystems precision ID Globalfiler mixture ID and Globalfiler NGS STR panels for the ion S5 system. Forensic Sci Int Genet 36:95–103. https://doi.org/10.1016/j.fsigen.2018.06.016
Article CAS PubMed Google Scholar
Green RL, Lagacé RE, Oldroyd NJ, Hennessy LK, Mulero JJ (2012) Developmental validation of the AmpFLSTR NGM SElect PCR Amplification Kit: a next-generation STR multiplex with the SE33 locus. Forensic Sci Int Genet
Ensenberger MG, Thompson J, Hill B et al (2010) Developmental validation of the PowerPlex 16 HS System: an improved 16-locus fluorescent STR multiplex. Forensic Sci Int Genet 4:257–264. https://doi.org/10.1016/j.fsigen.2009.10.007
Article CAS PubMed Google Scholar
Zeng X, King J, Hermanson S, Patel J, Storts DR, Budowle B (2015) An evaluation of the PowerSeq Auto System: a multiplex short tandem repeat marker kit compatible with massively parallel sequencing. Forensic Sci Int Genet 19:172–179. https://doi.org/10.1016/j.fsigen.2015.07.015
Article CAS PubMed Google Scholar
Silva DSBS, Scheible MK, Bailey SF et al (2020) Sequence-based autosomal STR characterization in four US populations using PowerSeq™ Auto/Y system. Forensic Sci Int Genet 48:102311. https://doi.org/10.1016/j.fsigen.2020.102311
Article CAS PubMed Google Scholar
Verogen (2018) ForenSeq Universal Analysis Software Guide (Document # VD2018007, Rev. A)
Müller P, Sell C, Hadrys T et al (2019) Inter-laboratory study on standardized MPS libraries: evaluation of performance, concordance, and sensitivity using mixtures and degraded DNA. Int J Legal Med. https://doi.org/10.1007/s00414-019-02201-2
Article PubMed PubMed Central Google Scholar
Jäger AC, Alvarez ML, Davis CP et al (2017) Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories. Forensic Sci Int Genet 28:52–70. https://doi.org/10.1016/j.fsigen.2017.01.011
Article CAS PubMed Google Scholar
Guo F, Yu J, Zhang L, Li J (2017) Massively parallel sequencing of forensic STRs and SNPs using the Illumina ForenSeq DNA Signature Prep Kit on the MiSeq FGx Forensic Genomics System. Forensic Sci Int Genet 31:135–148. https://doi.org/10.1016/j.fsigen.2017.09.003
Article CAS PubMed Google Scholar
Silvia AL, Shugarts N, Smith J (2017) A preliminary assessment of the ForenSeq FGx System: next generation sequencing of an STR and SNP multiplex. Int J Legal Med 131:73–86. https://doi.org/10.1007/s00414-016-1457-6
Article PubMed Google Scholar
Almalki N, Chow HY, Sharma V, Hart K, Siegel D, Wurmbach E (2017) Systematic assessment of the performance of Illumina’s MiSeq FGx forensic genomics system. Electrophoresis 38:846–854. https://doi.org/10.1002/elps.201600511
Article CAS PubMed Google Scholar
Phillips C, Gettings KB, King JL et al (2018) “The devil’s in the detail”: release of an expanded, enhanced and dynamically revised forensic STR Sequence Guide. Forensic Sci Int Genet 34:162–169. https://doi.org/10.1016/j.fsigen.2018.02.017
Article CAS PubMed Google Scholar
Krenke BE, Tereba A, Anderson SJ et al (2002) Validation of a 16-locus fluorescent multiplex system. J Forensic Sci 47:773–785
Article CAS PubMed Google Scholar
Kwok S, Kellogg DE, McKinney N et al (1990) Effects of primer-template mismatches on the polymerase chain reaction: human immunodeficiency virus type 1 model studies. Nucleic Acids Res 18:999–1005
Article CAS PubMed PubMed Central Google Scholar
Christopherson C, Sninsky J, Kwok S (1997) The effects of internal primer-template mismatches on RT-PCR: HIV-1 model studies. Nucleic Acids Res 25:654–658. https://doi.org/10.1093/nar/25.3.654
Article CAS PubMed PubMed Central Google Scholar
Walsh PS, Fildes NJ, Reynolds R (1996) Sequence analysis and characterization of stutter products at the tetranucleotide repeat locus vWA. Nucleic Acids Res 24:2807–2812
Article CAS PubMed PubMed Central Google Scholar
Woerner AE, King JL, Budowle B (2019) Compound stutter in D2S1338 and D12S391. Forensic Sci Int Genet 39:50–56. https://doi.org/10.1016/j.fsigen.2018.12.001
Article CAS PubMed Google Scholar
Brookes C, Bright JA, Harbison S, Buckleton J (2012) Characterising stutter in forensic STR multiplexes. Forensic Sci Int Genet 6:58–63. https://doi.org/10.1016/j.fsigen.2011.02.001
Article CAS PubMed Google Scholar
Ensenberger MG, Hill CR, McLaren RS, Sprecher CJ, Storts DR (2014) Developmental validation of the PowerPlex 21 System. Forensic Sci Int Genet 9:169–178. https://doi.org/10.1016/j.fsigen.2013.12.005
Article CAS PubMed Google Scholar
Aponte RA, Gettings KB, Duewer DL, Coble MD, Vallone PM (2015) Sequence-based analysis of stutter at STR loci: characterization and utility. Forensic Sci Int Gen 5:E456–E458. https://doi.org/10.1016/j.fsigss.2015.09.181
Article Google Scholar
Li R, Wu R, Li H et al (2020) Characterizing stutter variants in forensic STRs with massively parallel sequencing. Forensic Sci Int Genet 45:102225. https://doi.org/10.1016/j.fsigen.2019.102225
Article CAS PubMed Google Scholar
Moura-Neto R, King JL, Mello I et al (2021) Evaluation of Promega PowerSeq™ Auto/Y systems prototype on an admixed sample of Rio de Janeiro, Brazil: population data, sensitivity, stutter and mixture studies. Forensic Sci Int Genet 53:102516. https://doi.org/10.1016/j.fsigen.2021.102516
Article CAS PubMed Google Scholar
Bright J-A, Buckleton JS, Taylor D, Fernando MACSS, Curran JM (2014) Modeling forward stutter: toward increased objectivity in forensic DNA interpretation. Electrophoresis 35:3152–3157. https://doi.org/10.1002/elps.201400044
Article CAS PubMed Google Scholar
Bodner M, Bastisch I, Butler JM et al (2016) Recommendations of the DNA Commission of the International Society for Forensic Genetics (ISFG) on quality control of autosomal short tandem repeat allele frequency databasing (STRidER). Forensic Sci Int Genet 24:97–102. https://doi.org/10.1016/j.fsigen.2016.06.008
Article CAS PubMed Google Scholar
Gettings KB, Borsuk LA, Steffen CR, Kiesler KM, Vallone PM (2018) Sequence-based U.S. population data for 27 autosomal STR loci. Forensic Sci Int Genet 37:106–115. https://doi.org/10.1016/j.fsigen.2018.07.013
Article CAS PubMed PubMed Central Google Scholar
Hussing C, Bytyci R, Huber C, Morling N, Børsting C (2019) The Danish STR sequence database: duplicate typing of 363 Danes with the ForenSeq™ DNA Signature Prep Kit. Int J Legal Med 133:325–334. https://doi.org/10.1007/s00414-018-1854-0
Article CAS PubMed Google Scholar
Phillips C, Devesse L, Ballard D et al (2018) Global patterns of STR sequence variation: sequencing the CEPH human genome diversity panel for 58 forensic STRs using the Illumina ForenSeq DNA Signature Prep Kit. Electrophoresis 39:2708–2724. https://doi.org/10.1002/elps.201800117
Article CAS PubMed Google Scholar
Novroski NMM, King JL, Churchill JD, Seah LH, Budowle B (2016) Characterization of genetic sequence variation of 58 STR loci in four major population groups. Forensic Sci Int Genet 25:214–226. https://doi.org/10.1016/j.fsigen.2016.09.007
Article CAS PubMed Google Scholar
Parson W, Ballard D, Budowle B et al (2016) Massively parallel sequencing of forensic STRs: considerations of the DNA commission of the International Society for Forensic Genetics (ISFG) on minimal nomenclature requirements. Forensic Sci Int Genet 22:54–63. https://doi.org/10.1016/j.fsigen.2016.01.009
Article CAS PubMed Google Scholar
Churchill JD, Novroski NMM, King JL, Seah LH, Budowle B (2017) Population and performance analyses of four major populations with Illumina’s FGx Forensic Genomics System. Forensic Sci Int Genet 30:81–92. https://doi.org/10.1016/j.fsigen.2017.06.004
Article CAS PubMed Google Scholar
Amigo J, Phillips C, Salas T, Formoso LF, Carracedo Á, Lareu M (2009) pop.STR—an online population frequency browser for established and new forensic STRs. Forensic Sci Int Genet Suppl Ser 2:361–362. https://doi.org/10.1016/j.fsigss.2009.08.178
Article Google Scholar
Ruitberg CM, Reeder DJ, Butler JM (2001) STRBase: a short tandem repeat DNA database for the human identity testing community. Nucleic Acids Res 29:320–322
Article CAS PubMed PubMed Central Google Scholar
Hatzer-Grubwieser P, Berger B, Niederwieser D, Steinlechner M (2012) Allele frequencies and concordance study of 16 STR loci—including the new European Standard Set (ESS) loci—in an Austrian population sample. Forensic Sci Int Genet 6:e50–e51. https://doi.org/10.1016/j.fsigen.2011.04.006
Article CAS PubMed Google Scholar
Gettings KB, Ballard D, Bodner M et al (2019) Report from the STRAND Working Group on the 2019 STR sequence nomenclature meeting. Forensic Sci Int Genet 43:102165. https://doi.org/10.1016/j.fsigen.2019.102165
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors would like to thank the Austrian Ministry of the Interior for sampling and the team of the High Throughput DNA Database unit for sample preparation and CE-based STR analysis, Alexandra Kaindl-Lindinger, Daniela Niederwieser, and Lisa Schnaller (in alphabetical order, all: Institute of Legal Medicine, Medical University of Innsbruck, Austria) for technical support. Furthermore, the authors would like to thank Harald Niederstätter for reviewing the manuscript (Institute of Legal Medicine, Medical University of Innsbruck, Austria). The authors would like to thank Douglas Storts and Spencer Hermanson (both Promega, Madison, USA) for technical support. The authors would like to thank the team of the DNASeqEx consortium for helpful discussions.

Funding

Open access funding provided by University of Innsbruck and Medical University of Innsbruck. The DNASeqEx project has been funded with support from the European Commission (grant HOME/2014/ISFP/AG/LAWX/4000007135 under the Internal Security Funding Police programme of the European Commission-Directorate General Justice and Home Affairs).

Ethical approval for this study was obtained according to the Austrian Federal Security Police Act.

Author information

Authors and Affiliations

Institute of Legal Medicine, Medical University of Innsbruck, Müllerstrasse 44, 6020, Innsbruck, Austria
Petra Hölzl-Müller, Martin Bodner, Burkhard Berger & Walther Parson
Forensic Science Program, The Pennsylvania State University, State College, PA, USA
Walther Parson

Authors

Petra Hölzl-Müller
View author publications
You can also search for this author in PubMed Google Scholar
Martin Bodner
View author publications
You can also search for this author in PubMed Google Scholar
Burkhard Berger
View author publications
You can also search for this author in PubMed Google Scholar
Walther Parson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Walther Parson.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Disclaimer

This publication reflects the views only of the authors, and the European Commission cannot be held responsible for any use, which may be made of the information contained therein.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 132 KB)

Supplementary file2 (PDF 134 KB)

Supplementary file3 (PDF 96 KB)

Supplementary file4 (PDF 175 KB)

Supplementary file5 (PDF 174 KB)

Supplementary file6 (PDF 138 KB)

Supplementary file7 (PDF 122 KB)

Supplementary file8 (XLSX 1015 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hölzl-Müller, P., Bodner, M., Berger, B. et al. Exploring STR sequencing for forensic DNA intelligence databasing using the Austrian National DNA Database as an example. Int J Legal Med 135, 2235–2246 (2021). https://doi.org/10.1007/s00414-021-02685-x

Download citation

Received: 04 June 2021
Accepted: 12 August 2021
Published: 26 August 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s00414-021-02685-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Exploring STR sequencing for forensic DNA intelligence databasing using the Austrian National DNA Database as an example

Abstract

Similar content being viewed by others

Introduction

Materials and methods

Samples

DNA extraction

DNA quantification

Capillary electrophoresis

MPS workflow

Amplification of STR fragments

Library preparation and sequencing

Data analysis

Capillary electrophoresis data

MPS data

MPS stutter ratios

Statistical analysis

Results and discussion

MPS run parameters

Concordance

PowerSeq 46GY kit performance

DNA quantity and storage period

Relative marker performance

Heterozygote balance

Stutter ratios

Sequence variation

Isometric alleles

Benefits and limitations of STR sequence data in DNA intelligence databasing

Allele variation and frequencies, forensic and population genetic parameters, STRidER quality control, and databasing

Conclusions

Change history

13 November 2021

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Disclaimer

Additional information

Publisher's note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation