Finding small somatic structural variants in exome sequencing data: a machine learning approach

Kuhn, Matthias; Stange, Thoralf; Herold, Sylvia; Thiede, Christian; Roeder, Ingo

doi:10.1007/s00180-016-0674-2

Finding small somatic structural variants in exome sequencing data: a machine learning approach

Original Paper
Published: 10 August 2016

Volume 33, pages 1145–1158, (2018)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Matthias Kuhn ORCID: orcid.org/0000-0003-2868-5155¹,
Thoralf Stange¹,
Sylvia Herold^2,3,
Christian Thiede^2,3 &
…
Ingo Roeder¹

678 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

Genetic variation forms the basis for diversity but can as well be harmful and cause diseases, such as tumors. Structural variants (SV) are an example of complex genetic variations that comprise of many nucleotides ranging up to several megabases. Based on recent developments in sequencing technology it has become feasable to elucidate the genetic state of a person’s genes (i.e. the exome) or even the complete genome. Here, a machine learning approach is presented to find small disease-related SVs with the help of sequencing data. The method uses differences in characteristics of mapping patterns between tumor and normal samples at a genomic locus. This way, the method aims to be directly applicable for exome sequencing data to improve detection of SVs since specific SV detection methods are currently lacking. The method has been evaluated based on a simulation study as well as with exome data of patients with acute myeloid leukemia. An implementation of the algorithm is available at https://github.com/lenz99-/svmod.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SVFX: a machine learning framework to quantify the pathogenicity of structural variants

Article Open access 09 November 2020

Computational Analysis of Structural Variation in Cancer Genomes

SvAnna: efficient and accurate pathogenicity prediction of coding and regulatory structural variants in long-read genome sequencing

Article Open access 28 April 2022

References

Alkan C et al (2011) Genome structural variation discovery and genotyping. Nat Rev Genet 12(5):363–376
Article Google Scholar
Bischl B et al (2012) Resampling methods for meta-model validation with recommendations for evolutionary computation. Evol Comput 20(2):249–275
Article Google Scholar
Bischl B et al (2015) mlr: Machine Learning in R. R package version 2.3
Chiara M, Pesole G, Horner DS (2012) SVM2: an improved paired-end-based tool for the detection of small genomic structural variations using high-throughput single-genome resequencing data. Nucl Acids Res 40(18):1–11
Article Google Scholar
Hanahan D, Weinberg RA (2011) Hallmarks of cancer: the next generation. Cell 144(5):646–674
Article Google Scholar
Huang W et al (2012) ART: a next-generation sequencing read simulator. Bioinformatics 28(4):593–594
Article Google Scholar
Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arxiv:1303.3997
MacDonald JR et al (2014) The database of genomic variants: a curated collection of structural variation in the human genome. Nucl Acids Res 42(Database issue):D986–992. doi:10.1093/nar/gkt958
Article Google Scholar
Mardis ER et al (2009) Recurring mutations found by sequencing an acute myeloid leukemia genome. N Engl J Med 361(11):1058–1066
Article Google Scholar
Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11:31–46
Article Google Scholar
R Core Team R (2015) A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
Raphael BJ (2012) Chapter 6: structural variation and medical genomics. PLoS Comput Biol 8(12):e100282. doi:10.1371/journal.pcbi.1002821
Article Google Scholar
Rausch T et al (2012) DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28(18):i333–i339
Article Google Scholar
Schölkopf B, Smola A (2002) Learning with Kernels. MIT Press, Cambridge
MATH Google Scholar
Spencer D et al (2013) Detection of FLT3 internal tandem duplication in targeted short-read-length, next-generation sequencing data. J Mol Diagn 15(1):81–93
Article Google Scholar
Scott D et al (2009) Evidence of uneven selective pressure on different subsets of the conserved human genome; implications for the significance of intronic and intergenic DNA. BMC Genom 10(614):1
Google Scholar
The 1000 Genomes Project Consortium (2015) A global reference for human genetic variation. Nature 526(7571):68–74
Thiede C et al (2002) Analysis of FLT3-activating mutations in 979 patients with acute myelogenous leukemia: association with FAB subtypes and identification of subgroups with poor prognosis. Blood 99(12):4326–4335
Article Google Scholar
Vogelstein B, Kinzler KW (2004) Cancer genes and the pathways they control. Nat Med 10(8):789–799
Article Google Scholar
Ye K et al (2009) Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25(21):2865–2871
Article Google Scholar

Download references

Acknowledgments

We thank the anonymous reviewers for their suggestions that contributed to improving the manuscript. And we thank the MessAge group and the Bioinformatics Core Unit at IMB for providing extra computational resources when they were needed.

Author information

Authors and Affiliations

Faculty of Medicine Carl Gustav Carus, Institute for Medical Informatics and Biometry (IMB), Technische Universität, Dresden, Germany
Matthias Kuhn, Thoralf Stange & Ingo Roeder
Medizinische Klinik und Poliklinik I, Universitätsklinikum der Technischen Universität, Dresden, Germany
Sylvia Herold & Christian Thiede
Deutsches Konsortium für Translationale Krebsforschung, Deutsches Krebsforschungszentrum, Heidelberg, Germany
Sylvia Herold & Christian Thiede

Authors

Matthias Kuhn
View author publications
You can also search for this author in PubMed Google Scholar
Thoralf Stange
View author publications
You can also search for this author in PubMed Google Scholar
Sylvia Herold
View author publications
You can also search for this author in PubMed Google Scholar
Christian Thiede
View author publications
You can also search for this author in PubMed Google Scholar
Ingo Roeder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias Kuhn.

Additional information

This work has been supported by the German Research Foundation (DFG) Grant RO3500/4-1 within the Research Unit FOR 1961 and by the German Federal Ministry of Research and Education, Grant 031A424 “HaematoOPT”.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuhn, M., Stange, T., Herold, S. et al. Finding small somatic structural variants in exome sequencing data: a machine learning approach. Comput Stat 33, 1145–1158 (2018). https://doi.org/10.1007/s00180-016-0674-2

Download citation

Received: 30 April 2015
Accepted: 01 August 2016
Published: 10 August 2016
Issue Date: September 2018
DOI: https://doi.org/10.1007/s00180-016-0674-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding small somatic structural variants in exome sequencing data: a machine learning approach

Abstract

Access this article

Similar content being viewed by others

SVFX: a machine learning framework to quantify the pathogenicity of structural variants

Computational Analysis of Structural Variation in Cancer Genomes

SvAnna: efficient and accurate pathogenicity prediction of coding and regulatory structural variants in long-read genome sequencing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Finding small somatic structural variants in exome sequencing data: a machine learning approach

Abstract

Access this article

Similar content being viewed by others

SVFX: a machine learning framework to quantify the pathogenicity of structural variants

Computational Analysis of Structural Variation in Cancer Genomes

SvAnna: efficient and accurate pathogenicity prediction of coding and regulatory structural variants in long-read genome sequencing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation