Abstract
To improve the accuracy and the cost-efficiency of next-generation sequencing in ultralow-frequency mutation detection, we developed the Paired-End and Complementary Consensus Sequencing (PECC-Seq), a PCR-free duplex consensus sequencing approach. PECC-Seq employed shear points as endogenous barcodes to identify consensus sequences from the overlap in the shortened, complementary DNA strand-derived paired-end reads for sequencing error correction. With the high accuracy of PECC-Seq, we identified the characteristic base substitution errors introduced by the end-repair process of mechanical fragmentation-based library preparations, which were prominent at the terminal 7 bp of the library fragments in the 5′-NpCpA-3′ and 5′-NpCpT-3′ trinucleotide context. As demonstrated at the human genome scale (TK6 cells), after removing these potential end-repair artifacts from the terminal 7 bp, PECC-Seq could reduce the sequencing error frequency to mid-10−7 with a relatively low sequencing depth. For TA base pairs, the background error rate could be suppressed to mid-10−8. In mutagen-treated (6 μg/mL methyl methanesulfonate or 12 μg/mL N-nitroso-N-ethylurea) TK6, increases in mutagen treatment-related mutant frequencies could be detected, indicating the potential of PECC-Seq in detecting genome-wide ultra-rare mutations. In addition, our finding on the patterns of end-repair artifacts may provide new insights into further reducing technical errors not only for PECC-Seq, but also for other next-generation sequencing techniques.
Similar content being viewed by others
References
Aird D, Ross MG, Chen WS et al (2011) Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol 12(2):R18. https://doi.org/10.1186/gb-2011-12-2-r18
Barbaric I, Wells S, Russ A, Dear TN (2007) Spectrum of ENU-induced mutations in phenotype-driven and gene-driven screens in the mouse. Environ Mol Mutagen 48(2):124–142. https://doi.org/10.1002/em.20286
Beckman RA, Loeb LA (2017) Evolutionary dynamics and significance of multiple subclonal mutations in cancer. DNA Repair (Amst) 56:7–15. https://doi.org/10.1016/j.dnarep.2017.06.002
Behjati S, Huch M, van Boxtel R et al (2014) Genome sequencing of normal cells reveals developmental lineages and mutational processes. Nature 513(7518):422–425. https://doi.org/10.1038/nature13448
Besenbacher S, Liu S, Izarzugaza JM et al (2015) Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios. Nat Commun 6:5969. https://doi.org/10.1038/ncomms6969
Besser J, Carleton HA, Gerner-Smidt P, Lindsey RL, Trees E (2018) Next-generation sequencing technologies and their application to the study and control of bacterial infections. Clin Microbiol Infect 24(4):335–341. https://doi.org/10.1016/j.cmi.2017.10.013
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120. https://doi.org/10.1093/bioinformatics/btu170
Bragg L, Tyson GW (2014) Metagenomics using next-generation sequencing. Methods Mol Biol 1096:183–201. https://doi.org/10.1007/978-1-62703-712-9_15
Bronner IF, Quail MA (2019) Best practices for Illumina library preparation. Curr Protoc Hum Genet 102(1):e86. https://doi.org/10.1002/cphg.86
Chawanthayatham S, Valentine CC 3rd, Fedeles BI et al (2017) Mutational spectra of aflatoxin B1 in vivo establish biomarkers of exposure for human hepatocellular carcinoma. Proc Natl Acad Sci USA 114(15):E3101–E3109. https://doi.org/10.1073/pnas.1700759114
Costello M, Pugh TJ, Fennell TJ et al (2013) Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res 41(6):e67. https://doi.org/10.1093/nar/gks1443
Duarte V, Muller JG, Burrows CJ (1999) Insertion of dGMP and dAMP during in vitro DNA synthesis opposite an oxidized form of 7,8-dihydro-8-oxoguanine. Nucleic Acids Res 27(2):496–502. https://doi.org/10.1093/nar/27.2.496
Fox EJ, Reid-Bayliss KS, Emond MJ, Loeb LA (2014) Accuracy of next generation sequencing platforms. Next Gener Seq Appl. https://doi.org/10.4172/jngsa.1000106
Hoang ML, Kinde I, Tomasetti C et al (2016) Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proc Natl Acad Sci USA 113(35):9846–9851. https://doi.org/10.1073/pnas.1607794113
Honma M, Hayashi M (2011) Comparison of in vitro micronucleus and gene mutation assay results for p53-competent versus p53-deficient human lymphoblastoid cells. Environ Mol Mutagen 52(5):373–384. https://doi.org/10.1002/em.20634
Jenkins GJ, Doak SH, Johnson GE, Quick E, Waters EM, Parry JM (2005) Do dose response thresholds exist for genotoxic alkylating agents? Mutagenesis 20(6):389–398. https://doi.org/10.1093/mutage/gei054
Kennedy SR, Schmitt MW, Fox EJ et al (2014) Detecting ultralow-frequency mutations by Duplex Sequencing. Nat Protoc 9(11):2586–2606. https://doi.org/10.1038/nprot.2014.170
Kinde I, Wu J, Papadopoulos N, Kinzler KW, Vogelstein B (2011) Detection and quantification of rare mutations with massively parallel sequencing. Proc Natl Acad Sci USA 108(23):9530–9535. https://doi.org/10.1073/pnas.1105422108
Kino K, Sugiyama H (2001) Possible cause of G-C→C-G transversion mutation by guanine oxidation product, imidazolone. Chem Biol 8(4):369–378. https://doi.org/10.1016/s1074-5521(01)00019-9
Kino K, Hirao-Suzuki M, Morikawa M, Sakaga A, Miyazawa H (2017) Generation, repair and replication of guanine oxidation products. Genes Environ 39:21. https://doi.org/10.1186/s41021-017-0081-0
Klungland A, Laake K, Hoff E, Seeberg E (1995) Spectrum of mutations induced by methyl and ethyl methanesulfonate at the hprt locus of normal and tag expressing Chinese hamster fibroblasts. Carcinogenesis 16(6):1281–1285. https://doi.org/10.1093/carcin/16.6.1281
Kucab JE, Zou X, Morganella S et al (2019) A compendium of mutational signatures of environmental agents. Cell 177(4):821–836.e16. https://doi.org/10.1016/j.cell.2019.03.001
Li C, Chng KR, Boey EJ, Ng AH, Wilm A, Nagarajan N (2016) INC-Seq: accurate single molecule reads using nanopore sequencing. GigaScience 5(1):34. https://doi.org/10.1186/s13742-016-0140-7
Liber HL, Yandell DW, Little JB (1989) A comparison of mutation induction at the tk and hprt loci in human lymphoblastoid cells; quantitative differences are due to an additional class of mutations at the autosomal tk locus. Mutat Res 216(1):9–17. https://doi.org/10.1016/0165-1161(89)90018-6
Lou DI, Hussmann JA, McBee RM et al (2013) High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proc Natl Acad Sci USA 110(49):19872–19877. https://doi.org/10.1073/pnas.1319590110
Matsumura S, Sato H, Otsubo Y, Tasaki J, Ikeda N, Morita O (2019) Genome-wide somatic mutation analysis via Hawk-Seq reveals mutation profiles associated with chemical mutagens. Arch Toxicol 93(9):2689–2701. https://doi.org/10.1007/s00204-019-02541-3
Milholland B, Dong X, Zhang L, Hao X, Suh Y, Vijg J (2017) Differences between germline and somatic mutation rates in humans and mice. Nat Commun 8:15183. https://doi.org/10.1038/ncomms15183
Morikawa M, Kino K, Oyoshi T, Suzuki M, Kobayashi T, Miyazawa H (2014) Analysis of guanine oxidation products in double-stranded DNA and proposed guanine oxidation pathways in single-stranded, double-stranded or quadruplex DNA. Biomolecules 4(1):140–159. https://doi.org/10.3390/biom4010140
Neeley WL, Delaney JC, Henderson PT, Essigmann JM (2004) In vivo bypass efficiencies and mutational signatures of the guanine oxidation products 2-aminoimidazolone and 5-guanidino-4-nitroimidazole. J Biol Chem 279(42):43568–43573. https://doi.org/10.1074/jbc.M407117200
OECD/OCDE (2011) Test no. 488: transgenic rodent somatic and germ cell gene mutation assays. OECD Publishing. https://doi.org/10.1787/9789264122819-en
OECD/OCDE (2016) Test no. 490: in vitro mammalian cell gene mutation tests using the thymidine kinase gene. OECD Publishing. https://doi.org/10.1787/9789264264908-en
Op het Veld CW, Jansen J, Zdzienicka MZ, Vrieling H, van Zeeland AA (1998) Methyl methanesulfonate-induced hprt mutation spectra in the Chinese hamster cell line CHO9 and its xrcc1-deficient derivative EM-C11. Mutat Res 398(1–2):83–92. https://doi.org/10.1016/s0027-5107(97)00243-1
Oyola SO, Otto TD, Gu Y et al (2012) Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes. BMC Genom 13:1. https://doi.org/10.1186/1471-2164-13-1
Peng Q, Xu C, Kim D, Lewis M, DiCarlo J, Wang Y (2019) Targeted single primer enrichment sequencing with single end duplex-UMI. Sci Rep 9(1):4810. https://doi.org/10.1038/s41598-019-41215-z
Roach JC, Glusman G, Smit AF et al (2010) Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328(5978):636–639. https://doi.org/10.1126/science.1186802
Salk JJ, Schmitt MW, Loeb LA (2018) Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat Rev Genet 19(5):269–285. https://doi.org/10.1038/nrg.2017.117
Schmitt MW, Kennedy SR, Salk JJ, Fox EJ, Hiatt JB, Loeb LA (2012) Detection of ultra-rare mutations by next-generation sequencing. Proc Natl Acad Sci USA 109(36):14508–14513. https://doi.org/10.1073/pnas.1208715109
Schmitt MW, Fox EJ, Salk JJ (2014) Risks of double-counting in deep sequencing. Proc Natl Acad Sci USA 111(16):E1560. https://doi.org/10.1073/pnas.1400941111
Schmitt MW, Loeb LA, Salk JJ (2016) The influence of subclonal resistance mutations on targeted cancer therapy. Nat Rev Clin Oncol 13(6):335–347. https://doi.org/10.1038/nrclinonc.2015.175
Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26(10):1135–1145. https://doi.org/10.1038/nbt1486
Sloan DB, Broz AK, Sharbrough J, Wu Z (2018) Detecting rare mutations and DNA damage with sequencing-based methods. Trends Biotechnol 36(7):729–740. https://doi.org/10.1016/j.tibtech.2018.02.009
Takahasi KR, Sakuraba Y, Gondo Y (2007) Mutational pattern and frequency of induced nucleotide changes in mouse ENU mutagenesis. BMC Mol Biol 8:52. https://doi.org/10.1186/1471-2199-8-52
Vasan N, Baselga J, Hyman DM (2019) A view on drug resistance in cancer. Nature 575(7782):299–309. https://doi.org/10.1038/s41586-019-1730-1
Wenger AM, Peluso P, Rowell WJ et al (2019) Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 37(10):1155–1162. https://doi.org/10.1038/s41587-019-0217-9
Yang Y, Xie B, Yan J (2014) Application of next-generation sequencing technology in forensic science. Genomics Proteom Bioinform 12(5):190–197. https://doi.org/10.1016/j.gpb.2014.09.001
Acknowledgements
We appreciate Dr. Rajaguru Palanisamy for his critical reading and suggestions on our manuscript.
Funding
This study was funded by the Project Research on Regulatory Harmonization and Evaluation of Pharmaceuticals, Medical Devices, Regenerative and Cellular Therapy Products, Gene Therapy Products, and Cosmetics from the Japan Agency for Medical Research and Development, AMED (Grant number 16mk0102010j0003).
Author information
Authors and Affiliations
Contributions
XY, ST and TS designed the study and performed the experiments. TS and YL supervised the study. XY analyzed and interpreted the data, and was a major contributor in writing the manuscript. MH provided the cell lines used in the study. WL, YC, MN and CF helped design the study and provided constructive suggestions during the study. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethics approval
The manuscript does not contain clinical studies or patient data.
Availability of data and material
The raw whole genome sequencing data used in this study are available in the NCBI Sequence Read Archive with the accession number of PRJNA632709.
Code availability
The codes used for data processing are available in the Supplementary Material.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
You, X., Thiruppathi, S., Liu, W. et al. Detection of genome-wide low-frequency mutations with Paired-End and Complementary Consensus Sequencing (PECC-Seq) revealed end-repair-derived artifacts as residual errors. Arch Toxicol 94, 3475–3485 (2020). https://doi.org/10.1007/s00204-020-02832-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00204-020-02832-0