Abstract
Long-read DNA sequencing techniques such as nanopore are especially useful for characterizing complex sequence rearrangements, which occur in some genetic diseases and also during evolution. Analyzing the sequence data to understand such rearrangements is not trivial, due to sequencing error, rearrangement intricacy, and abundance of repeated similar sequences in genomes.
The last and dnarrange software packages can resolve complex relationships between DNA sequences and characterize changes such as gene conversion, processed pseudogene insertion, and chromosome shattering. They can filter out numerous rearrangements shared by controls, e.g., healthy humans versus a patient, to focus on rearrangements unique to the patient. One useful ingredient is last-train, which learns the rates (probabilities) of deletions, insertions, and each kind of base match and mismatch. These probabilities are then used to find the most likely sequence relationships/alignments, which is especially useful for DNA with unusual rates, such as DNA from Plasmodium falciparum (malaria) with ∼80% a+t. This is also useful for less-studied species that lack reference genomes, so the DNA reads are compared to a different species’ genome. We also point out that a reference genome with ancestral alleles would be ideal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hamada M, Ono Y, Asai K, Frith MC (2017) Training alignment parameters for arbitrary sequencers with last-train. Bioinformatics 33(6):926–928
Frith MC, Kawaguchi R (2015) Split-alignment of genomes finds orthologies more accurately. Genome Biology 16(1):1–17
Huson DH, Albrecht B, Bağcı C, Bessarab I, Gorska A, Jolic D, Williams RB (2018) MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs. Biology Direct 13(1):1–17
Frith MC, Khan S (2018) A survey of localized sequence rearrangements in human DNA. Nucleic Acids Res 46(4):1661–1673
Mitsuhashi S, Ohori S, Katoh K, Frith MC, Matsumoto N (2020) A pipeline for complete characterization of complex germline rearrangements from long DNA reads. Genome Medicine 12(1):1–17
Shafin K, Pesout T, Lorig-Roach R, Haukness M, Olsen HE, Bosworth C, Armstrong J, Tigyi K, Maurer N, Koren S, et al. (2020) Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nature Biotechnology 38(9):1044–1053
Shabardina V, Kischka T, Manske F, Grundmann N, Frith MC, Suzuki Y, Makałowski W (2019) NanoPipe—a web server for nanopore MinION sequencing data analysis. GigaScience 8(2):giy169
Frith MC (2011) A new repeat-masking method enables specific detection of homologous sequences. Nucleic Acids Res 39(4):e23–e23
Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J (2018) Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature Methods 15(7):475–476
Möller S, Krabbenhöft HN, Tille A, Paleino D, Williams A, Wolstencroft K, Goble C, Holland R, Belhachemi D, Plessy C (2010) Community-driven computational biology with Debian Linux. BMC Bioinformatics 11(Suppl 12):S5
Morgulis A, Gertz EM, Schäffer AA, Agarwala R (2006) WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22(2):134–141
Löytynoja A, Goldman N (2017) Short template switch events explain mutation clusters in the human genome. Genome Research 27(6):1039–1049
Frith MC, Mitsuhashi S, Katoh K (2021) lamassemble: multiple alignment and consensus sequence of long reads. In: Multiple sequence alignment, pp 135–145. Springer
Lei M, Liang D, Yang Y, Mitsuhashi S, Katoh K, Miyake N, Frith MC, Wu L, Matsumoto N (2020) Long-read DNA sequencing fully characterized chromothripsis in a patient with Langer-Giedion syndrome and Cornelia de Lange syndrome-4. J Hum Genet 65(8):667–674
Frith MC, Noé L, Kucherov G (2020) Minimally overlapping words for sequence similarity search. Bioinformatics 36(22-23):5344–5350
Mitsuhashi S, Frith MC, Mizuguchi T, Miyatake S, Toyota T, Adachi H, Oma Y, Kino Y, Mitsuhashi H, Matsumoto N (2019) Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads. Genome Biology 20(1):1–17
Acknowledgements
We thank Takeshi Mizuguchi, Kazuharu Misawa, and Naomichi Matsumoto for helping us to fix inefficiencies in dnarrange.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Frith, M.C., Mitsuhashi, S. (2023). Finding Rearrangements in Nanopore DNA Reads with LAST and dnarrange. In: Arakawa, K. (eds) Nanopore Sequencing. Methods in Molecular Biology, vol 2632. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2996-3_12
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2996-3_12
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2995-6
Online ISBN: 978-1-0716-2996-3
eBook Packages: Springer Protocols