NGS Datenanalyse und Qualitätskontrolle

Weißmann, R.; Gilissen, C.

doi:10.1007/s11825-014-0448-6

NGS Datenanalyse und Qualitätskontrolle

NGS data analysis and quality control

Schwerpunkt
Published: 12 June 2014

Volume 26, pages 239–245, (2014)
Cite this article

medizinische genetik

R. Weißmann¹ &
C. Gilissen Ph.D.²

778 Accesses
1 Citation
Explore all metrics

Zusammenfassung

Next Generation Sequencing (NGS) wird immer häufiger in der Humangenetik eingesetzt. Die Analyse der anfallenden Datenmengen birgt allerdings andere und größere Herausforderungen als bisher eingesetzte Verfahren. In diesem Artikel werden einige Grundlagen, die dem Verständnis der anfallenden Daten und Analyseschritte beim NGS dienen sollen, beschrieben. Ein besonderer Schwerpunkt ist dabei die Qualitätskontrolle.

Abstract

Next generation DNA sequencing (NGS) is rapidly becoming a pervasive technique within the human genetics community. The analysis of NGS data is however much more challenging than with previous genetic and genomics techniques. In this article, the basic data formats and analysis steps that are involved in any NGS DNA resequencing experiment are described. Special emphasis is placed on methods for quality control.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Abbreviations

„Base caller“:: Computerprogramm, das auf Grundlage der Primärdaten eine Nukleotidsequenz (Read) generiert.
BAM:: „ Binary sequence alignment/map“. Herstellerübergreifender Quasi-Standard für NGS-Reads.
dbSNP:: Datenbank, in der bekannte „single nucleotide polymorphisms“ (SNP) gesammelt werden.
„Flow cell/flow chip“:: Glasträger, an den beim Sequenzieren DNA-Fragmente angeheftet sind.
GRC:: Genome Reference Consortium.
Illumina®:: Hersteller von NGS-Maschinen.
Ion Torrent™:: NGS-Technik, bei der keine Bilder gemacht, sondern pH-Werte auf einem Siliziumchip gemessen werden.
Monoklonal:: Cluster auf der „flow cell“, die aus einem einzigen DNA-Fragment entstanden sind. Gegensatz: polyklonal
Polyklonal:: Cluster auf der „flow cell“, die aus einer Mischung von 2 oder mehreren DNA-Fragmenten entstanden sind und nicht weiter ausgewertet werden können.
Pyrosequenzierung:: NGS-Methode der Fa. Roche (454™-Technologie).
Read:: Kurze Nukleotidsequenz (26–1000 nt), die beim NGS produziert wird.
„Read mapper“:: Computerprogramm, das Reads einer Position im Referenzgenom zuordnet.
SOLiD™:: Sequencing by oligonucleotide ligation and detection. NGS-Technik der Fa. ABI (Life Technologies).

Literatur

Challis D, Yu J, Evani US et al (2012) An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics 13:8
Article PubMed Central PubMed Google Scholar
Chang X, Wang K (2012) wANNOVAR: annotating genetic variants for personal genomes via the web. J Med Genet 49:433–436
Article PubMed Central PubMed Google Scholar
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194
Article CAS PubMed Google Scholar
Gilissen C, Hoischen A, Brunner HG, Veltman JA (2012) Disease gene identification strategies for exome sequencing. Eur J Hum Genet 20:490–497
Article CAS PubMed Central PubMed Google Scholar
Guo Y, Ye F, Sheng Q et al (2013) Three-stage quality control strategies for DNA re-sequencing data. Brief Bioinform (Epub ahead of print)
Hatem A, Bozdağ D, Toland AE, Çatalyürek ÜV (2013) Benchmarking short sequence mapping tools. BMC Bioinformatics 14:184
Article PubMed Central PubMed Google Scholar
Heinrich V, Kamphans T, Stange J et al (2013) Estimating exome genotyping accuracy by comparing to data from large scale sequencing projects. Genome Med 5:69
Article PubMed Central PubMed Google Scholar
Li H, Handsaker B, Wysoker A et al (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079
Article PubMed Central PubMed Google Scholar
Li Y, Vinckenbosch N, Tian G et al (2010) Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat Genet 42:969–972
Article CAS PubMed Google Scholar
Liu Q, Guo Y, Li J et al (2012) Steps to ensure accuracy in genotype and SNP calling from Illumina sequencing data. BMC Genomics 13(Suppl 8):S8
PubMed Central PubMed Google Scholar
Liu X, Han S, Wang Z et al (2013) Variant callers for next-generation sequencing data: a comparison study. PLoS One 8:e75619
Article CAS PubMed Central PubMed Google Scholar
McKenna A, Hanna M, Banks E et al (2010) The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303
Article CAS PubMed Central PubMed Google Scholar
Ng SB, Buckingham KJ, Lee C et al (2010) Exome sequencing identifies the cause of a mendelian disorder. Nat Genet 42:30–35
Article CAS PubMed Central PubMed Google Scholar
O’Rawe J, Jiang T, Sun G et al (2013) Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med 5:28
Article Google Scholar
Pabinger S, Dander A, Fischer M et al (2013) A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform. doi:10.1093/bib/bbs086 (Advance access published January 21, 2013)
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842
Article CAS PubMed Central PubMed Google Scholar
Robinson JT, Thorvaldsdóttir H, Winckler W et al (2011) Integrative genomics viewer. Nat Biotechnol 29:24–26
Article CAS PubMed Central PubMed Google Scholar
Sherry ST, Ward MH, Kholodov M et al (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308–311
Article CAS PubMed Central PubMed Google Scholar
Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38:e164
Article PubMed Central PubMed Google Scholar

Download references

Danksagung

Die Autoren danken Prof. Dr. Ute Felbor und Prof. Dr. Andreas W. Kuss für ihre Unterstützung bei der Erstellung und Korrektur des Manuskripts.

Einhaltung ethischer Richtlinien

Interessenkonflikt. R. Weißmann und C. Gilissen geben an, dass kein Interessenkonflikt besteht.

Dieser Beitrag beinhaltet keine Studien an Menschen oder Tieren.

Author information

Authors and Affiliations

Institut für Humangenetik, Universitätsmedizin Greifswald, und Interfakultäres Institut für Genetik und Funktionelle Genomforschung, Universität Greifswald, Greifswald, Deutschland
R. Weißmann
Department of Human Genetics, Radboud University Medical Center, Nijmegen, Die Niederlande
C. Gilissen Ph.D.

Authors

R. Weißmann
View author publications
You can also search for this author in PubMed Google Scholar
C. Gilissen Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. Gilissen Ph.D..

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weißmann, R., Gilissen, C. NGS Datenanalyse und Qualitätskontrolle. medgen 26, 239–245 (2014). https://doi.org/10.1007/s11825-014-0448-6

Download citation

Published: 12 June 2014
Issue Date: June 2014
DOI: https://doi.org/10.1007/s11825-014-0448-6

Schlüsselwörter

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NGS Datenanalyse und Qualitätskontrolle

Zusammenfassung

Abstract

Access this article

Abbreviations

Literatur

Danksagung

Einhaltung ethischer Richtlinien

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Schlüsselwörter

Keywords

Search

Navigation