Skip to main content
Log in

Enhanced mixture interpretation with macrohaplotypes based on long-read DNA sequencing

  • Original Article
  • Published:
International Journal of Legal Medicine Aims and scope Submit manuscript

Abstract

Deconvoluting mixture samples is one of the most challenging problems confronting DNA forensic laboratories. Efforts have been made to provide solutions regarding mixture interpretation. The probabilistic interpretation of Short Tandem Repeat (STR) profiles has increased the number of complex mixtures that can be analyzed. A portion of complex mixture profiles, particularly for mixtures with a high number of contributors, are still being deemed uninterpretable. Novel forensic markers, such as Single Nucleotide Variants (SNV) and microhaplotypes, also have been proposed to allow for better mixture interpretation. However, these markers have both a lower discrimination power compared with STRs and are not compatible with CODIS or other national DNA databanks worldwide. The short-read sequencing (SRS) technologies can facilitate mixture interpretation by identifying intra-allelic variations within STRs. Unfortunately, the short size of the amplicons containing STR markers and sequence reads limit the alleles that can be attained per STR. The latest long-read sequencing (LRS) technologies can overcome this limitation in some samples in which larger DNA fragments (including both STRs and SNVs) with definitive phasing are available. Based on the LRS technologies, this study developed a novel CODIS compatible forensic marker, called a macrohaplotype, which combines a CODIS STR and flanking variants to offer extremely high number of haplotypes and hence very high discrimination power per marker. The macrohaplotype will substantially improve mixture interpretation capabilities. Based on publicly accessible data, a panel of 20 macrohaplotypes with sizes of ~ 8 k bp and the maximum high discrimination powers were designed. The statistical evaluation demonstrates that these macrohaplotypes substantially outperform CODIS STRs for mixture interpretation, particularly for mixtures with a high number of contributors, as well as other forensic applications. Based on these results, efforts should be undertaken to build a complete workflow, both wet-lab and bioinformatics, to precisely call the variants and generate the macrohaplotypes based on the LRS technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Gill P, Jeffreys AJ, Werrett DJ (1985) Forensic application of DNA ‘fingerprints.’ Nature 318:577–579

    Article  CAS  Google Scholar 

  2. Voorhees JC, Ferrance JP, Landers JP (2006) Enhanced elution of sperm from cotton swabs via enzymatic digestion for rape kit analysis. J Forensic Sci 51:574–579. https://doi.org/10.1111/j.1556-4029.2006.00112.x

    Article  CAS  PubMed  Google Scholar 

  3. Giusti A, Baird M, Pasquale S, Balazs I, Glassberg J (1986) Application of deoxyribonucleic acid (DNA) polymorphisms to the analysis of DNA recovered from sperm. J Forensic Sci 31:409–417

    Article  CAS  Google Scholar 

  4. Vandewoestyne M, Van Nieuwerburgh F, Van Hoofstat D, Deforce D (2012) Evaluation of three DNA extraction protocols for forensic STR typing after laser capture microdissection. Forensic Sci Int Genet 6:258–262. https://doi.org/10.1016/j.fsigen.2011.06.002

    Article  CAS  PubMed  Google Scholar 

  5. Šafařı́k I, Šafařı́ková M (1999) Use of magnetic techniques for the isolation of cells. J Chromatogr B Biomed Sci Appl 722:33–53

    Article  Google Scholar 

  6. Buoncristiani MR, Timken MD (2009) Development of a procedure for dielectrophoretic (DEP) separation of sperm and epithelial cells for application to sexual assault case evidence. Bureau of Justice Statistics. https://www.ojp.gov/pdffiles1/nij/grants/228278.pdf. Accessed 7 Aug 2021

  7. Gill P, Brenner CH, Buckleton JS et al (2006) DNA commission of the International Society of Forensic Genetics: recommendations on the interpretation of mixtures. Forensic Sci Int 160:90–101. https://doi.org/10.1016/j.forsciint.2006.04.009

    Article  CAS  PubMed  Google Scholar 

  8. SWGDAM (2015) Guidelines for the Validation of Probabilistic Genotyping Systems. https://1ecb9588-ea6f-4feb-971a-73265dbf079c.filesusr.com/ugd/4344b0_22776006b67c4a32a5ffc04fe3b56515.pdf. Accessed 7 Aug 2021

  9. Bright JA, Taylor D, McGovern C et al (2016) Developmental validation of STRmix, expert software for the interpretation of forensic DNA profiles. Forensic Sci Int Genet 23:226–239. https://doi.org/10.1016/j.fsigen.2016.05.007

    Article  CAS  PubMed  Google Scholar 

  10. Gill P, Haned H, Eduardoff M, Santos C, Phillips C, Parson W (2015) The open-source software LRmix can be used to analyse SNP mixtures. Forensic Sci Int Genet Suppl Ser 5:e50–e51

    Article  Google Scholar 

  11. Perlin MW, Legler MM, Spencer CE et al (2011) Validating TrueAllele® DNA mixture interpretation. J Forensic Sci 56:1430–1447

    Article  CAS  Google Scholar 

  12. Bleka Ø, Storvik G, Gill P (2016) EuroForMix: an open source software based on a continuous model to evaluate STR DNA profiles from a mixture of contributors with artefacts. Forensic Sci Int Genet 21:35–44. https://doi.org/10.1016/j.fsigen.2015.11.008

    Article  CAS  PubMed  Google Scholar 

  13. Ge J, Budowle B, Planz JV, Chakraborty R (2010) Haplotype block: a new type of forensic DNA markers. Int J Legal Med 124:353–361. https://doi.org/10.1007/s00414-009-0400-5

    Article  PubMed  Google Scholar 

  14. Kidd KK, Speed WC, Pakstis AJ et al (2017) Evaluating 130 microhaplotypes across a global set of 83 populations. Forensic Sci Int Genet 29:29–37. https://doi.org/10.1016/j.fsigen.2017.03.014

    Article  CAS  PubMed  Google Scholar 

  15. Castella V, Gervaix J, Hall D (2013) DIP–STR: highly sensitive markers for the analysis of unbalanced genomic mixtures. Hum Mutat 34:644–654. https://doi.org/10.1002/humu.22280

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Wang L, He W, Mao J et al (2015) Development of a SNP-STRs multiplex for forensic identification. Forensic Sci Int Genet Suppl Ser 5:e598–e600. https://doi.org/10.1016/j.fsigss.2015.09.236

    Article  Google Scholar 

  17. Liu Z, Liu J, Wang J et al (2018) A set of 14 DIP-SNP markers to detect unbalanced DNA mixtures. Biochem Biophys Res Commun 497:591–596. https://doi.org/10.1016/j.bbrc.2018.02.109

    Article  CAS  PubMed  Google Scholar 

  18. Voskoboinik L, Darvasi A (2011) Forensic identification of an individual in complex DNA mixtures. Forensic Sci Int Genet 5:428–435. https://doi.org/10.1016/j.fsigen.2010.09.002

    Article  CAS  PubMed  Google Scholar 

  19. Voskoboinik L, Ayers SB, LeFebvre AK, Darvasi A (2015) SNP-microarrays can accurately identify the presence of an individual in complex forensic DNA mixtures. Forensic Sci Int Genet 16:208–215. https://doi.org/10.1016/j.fsigen.2015.01.009

    Article  CAS  PubMed  Google Scholar 

  20. Gill P, Phillips C, McGovern C, Bright JA, Buckleton J (2012) An evaluation of potential allelic association between the STRs vWA and D12S391: implications in criminal casework and applications to short pedigrees. Forensic Sci Int Genet 6:477–486. https://doi.org/10.1016/j.fsigen.2011.11.001

    Article  CAS  PubMed  Google Scholar 

  21. Epstein MP, Duren WL, Boehnke M (2000) Improved inference of relationship for pairs of individuals. Am J Hum Genet 67:1219–1231. https://doi.org/10.1016/S0002-9297(07)62952-8

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Homer N, Szelinger S, Redman M et al (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet 4:e1000167

  23. Egeland T, Fonnelop AE, Berg PR, Kent M, Lien S et al (2012) Complex mixtures: a critical examination of a paper by Homer. Forensic Sci Int Genet 6:64–69. https://doi.org/10.1016/j.fsigen.2011.02.003

    Article  CAS  PubMed  Google Scholar 

  24. Børsting C, Morling N (2015) Next generation sequencing and its applications in forensic genetics. Forensic Sci Int Genet 18:78–89

    Article  Google Scholar 

  25. Novroski NMM, King JL, Churchill JD, Seah LH, Budowle B (2016) Characterization of genetic sequence variation of 58 STR loci in four major population groups. Forensic Sci Int Genet 25:214–226. https://doi.org/10.1016/j.fsigen.2016.09.007

    Article  CAS  PubMed  Google Scholar 

  26. Van Neste C, Van Nieuwerburgh F, Van Hoofstat D, Deforce D (2012) Forensic STR analysis using massive parallel sequencing. Forensic Sci Int Genet 6:810–8

  27. Bornman DM, Hester ME, Schuetter JM et al (2012) Short-read, high-throughput sequencing technology for STR genotyping. Biotech Rapid Dispatches 2012:1–6

    PubMed  PubMed Central  Google Scholar 

  28. Jeffreys AJ, Wilson V, Thein SL (1985) Hypervariable ‘minisatellite’regions in human DNA. Nature 314:67–73

    Article  CAS  Google Scholar 

  29. Cornelis S, Willems S, Van Neste C et al (2018) Forensic STR profiling using Oxford Nanopore Technologies’ MinION sequencer. BioRxiv: 433151. https://doi.org/10.1101/433151

  30. Lindberg MR, Schmedes SE, Hewitt FC et al (2016) A Comparison and Integration of MiSeq and MinION Platforms for Sequencing Single Source and Mixed Mitochondrial Genomes. PLoS ONE 11:e0167600. https://doi.org/10.1371/journal.pone.0167600

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Zaaijer S, Gordon A, Speyer D, Piccone R, Groen SC, Erlich Y (2017) Rapid re-identification of human samples using portable DNA sequencing. Elife 6:e27798. https://doi.org/10.7554/eLife.27798

  32. Mitsuhashi S, Kryukov K, Nakagawa S et al (2017) A portable system for rapid bacterial composition analysis using a nanopore-based sequencer and laptop computer. Sci Rep 7:1–9

    Article  Google Scholar 

  33. Plesivkova D, Richards R, Harbison S (2019) A review of the potential of the MinION™ single-molecule sequencing system for forensic applications. Wiley Interdisciplinary Rev Forensic Sci 1. https://doi.org/10.1002/wfs2.1323

  34. Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78:629–644. https://doi.org/10.1086/502802

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Browning BL, Zhou Y, Browning SR (2018) A One-Penny Imputed Genome from Next-Generation Reference Panels. Am J Hum Genet 103:338–348. https://doi.org/10.1016/j.ajhg.2018.07.015

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Midha MK, Wu M, Chiu KP (2019) Long-read sequencing in deciphering human genetics to a greater depth. Hum Genet 138:1201–1215. https://doi.org/10.1007/s00439-019-02064-y

    Article  CAS  PubMed  Google Scholar 

  37. Saini S, Mitra I, Mousavi N, Fotsing SF, Gymrek M (2018) A reference haplotype panel for genome-wide imputation of short tandem repeats. Nat Commun 9:4397. https://doi.org/10.1038/s41467-018-06694-0

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Consortium GP (2015) A global reference for human genetic variation. Nature 526:68–74

    Article  Google Scholar 

  39. Phillips C, Gettings KB, King JL et al (2018) “The devil’s in the detail”: release of an expanded, enhanced and dynamically revised forensic STR Sequence Guide. Forensic Sci Int Genet 34:162–169. https://doi.org/10.1016/j.fsigen.2018.02.017

    Article  CAS  PubMed  Google Scholar 

  40. Kidd KK, Speed WC (2015) Criteria for selecting microhaplotypes: mixture detection and deconvolution. Investig Genet 6:1. https://doi.org/10.1186/s13323-014-0018-3

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Excoffier L, Lischer HE (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour 10:564–567

    Article  Google Scholar 

  42. Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer

    Book  Google Scholar 

  43. Team RC (2017) R: A language and environment for statistical computing. R Found Stat Comput Vienna, Austria

  44. Tang H, Kirkness EF, Lippert C et al (2017) Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes. Am J Hum Genet 101:700–715. https://doi.org/10.1016/j.ajhg.2017.09.013

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Willems T, Zielinski D, Yuan J, Gordon A, Gymrek M, Erlich Y (2017) Genome-wide profiling of heritable and de novo STR variations. Nat Methods 14:590–592. https://doi.org/10.1038/nmeth.4267

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Aalbers SE, Weir BS (2020) Analyzing population structure for forensic STR markers in next generations sequencing data. Forensic Sci Int Genet. https://doi.org/10.1016/j.fsigen.2020.102364

    Article  PubMed  Google Scholar 

  47. Karst SM, Ziels RM, Kirkegaard RH et al (2021) High-accuracy long-read amplicon sequences using unique molecular identifiers with nanopore or PacBio sequencing. Nat Methods 18:165–9

  48. Budowle B, van Daal A (2008) Forensically relevant SNP classes. Biotechniques 44(603–8):10. https://doi.org/10.2144/000112806

    Article  CAS  Google Scholar 

  49. Taliun D, Harris DN, Kessler MD et al (2021) Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590:290–299

    Article  CAS  Google Scholar 

  50. Tytgat O, Gansemans Y, Weymaere J, Rubben K, Deforce D, Van Nieuwerburgh F (2020) Nanopore Sequencing of a Forensic STR Multiplex Reveals Loci Suitable for Single-Contributor STR Profiling. Genes (Basel) 11:381. https://doi.org/10.3390/genes11040381

  51. Asogawa M, Ohno A, Nakagawa S et al (2020) Human short tandem repeat identification using a nanopore-based DNA sequencer: a pilot study. J Hum Genet 65:21–24. https://doi.org/10.1038/s10038-019-0688-z

    Article  CAS  PubMed  Google Scholar 

Download references

Funding

This study was funded with internal funds of the Center for Human Identification at the University of North Texas Health Science Center.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianye Ge.

Ethics declarations

Research involving human participants and/or animals

All data used in this study are publicly accessible and anonymized. No human participants or animals were involved in this study.

Informed consent

All data used in this study are publicly accessible and anonymized. No consent form was needed in this study.

Conflict of interest

A patent application by the University of North Texas Health Science Center is pending.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Highlights

• A novel CODIS compatible forensic marker, macrohaplotype, was developed

• A panel of 20 macrohaplotypes with sizes of ~ 8 k bp and extremely high discrimination powers were designed

• These macrohaplotypes substantially outperform CODIS STRs alone for mixture interpretation

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (XLSX 3590 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ge, J., King, J., Mandape, S. et al. Enhanced mixture interpretation with macrohaplotypes based on long-read DNA sequencing. Int J Legal Med 135, 2189–2198 (2021). https://doi.org/10.1007/s00414-021-02679-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00414-021-02679-9

Keywords

Navigation