Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine

Hong, HuiXiao; Zhang, WenQian; Shen, Jie; Su, ZhenQiang; Ning, BaiTang; Han, Tao; Perkins, Roger; Shi, LeMing; Tong, WeiDa

doi:10.1007/s11427-013-4439-7

Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine

Review
Special Topic
Open access
Published: 08 February 2013

Volume 56, pages 110–118, (2013)
Cite this article

Download PDF

You have full access to this open access article

Science China Life Sciences Aims and scope Submit manuscript

Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine

Download PDF

HuiXiao Hong¹,
WenQian Zhang²,
Jie Shen¹,
ZhenQiang Su¹,
BaiTang Ning³,
Tao Han³,
Roger Perkins¹,
LeMing Shi¹ &
…
WeiDa Tong¹

2923 Accesses
27 Citations
7 Altmetric
Explore all metrics

An Erratum to this article was published on 23 March 2013

Abstract

Realizing personalized medicine requires integrating diverse data types with bioinformatics. The most vital data are genomic information for individuals that are from advanced next-generation sequencing (NGS) technologies at present. The technologies continue to advance in terms of both decreasing cost and sequencing speed with concomitant increase in the amount and complexity of the data. The prodigious data together with the requisite computational pipelines for data analysis and interpretation are stressors to IT infrastructure and the scientists conducting the work alike. Bioinformatics is increasingly becoming the rate-limiting step with numerous challenges to be overcome for translating NGS data for personalized medicine. We review some key bioinformatics tasks, issues, and challenges in contexts of IT requirements, data quality, analysis tools and pipelines, and validation of biomarkers.

Article PDF

Bioinformatics Tools in Clinical Genomics

Next Generation Sequencing in Healthcare

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Crick F. Central dogma of molecular biology. Nature, 1970, 227: 561–563
Article PubMed CAS Google Scholar
Sanger F, Nicklen S, Coulson A R. DNA sequencing with chainterminating inhibitors. Proc Natl Acad Sci USA, 1977, 74: 5463–5467
Article PubMed CAS PubMed Central Google Scholar
Margulies M, Egholm M, Altman W E, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature, 2005, 437: 376–380
PubMed CAS PubMed Central Google Scholar
Metzker M L. Sequencing technologies—the next generation. Nat Rev Genet, 2010, 11: 31–46
Article PubMed CAS Google Scholar
Voelkerding K V, Dames S A, Durtschi J D. Next-generation sequencing: from basic research to diagnostics. Clin Chem, 2009, 55: 641–658
Article PubMed CAS Google Scholar
Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol, 2008, 26: 1135–1145
Article PubMed CAS Google Scholar
Ansorge W J. Next-generation DNA sequencing techniques. Nat Biotechnol, 2009, 25, 195-203
Google Scholar
Reis-Filho J S. Next-generation sequencing. Breast Cancer Res, 2009, 11: S12
Article PubMed PubMed Central Google Scholar
Das S K, Austin M D, Akana M C, et al. Single molecule linear analysis of DNA in nano-channel labeled with sequence specific fluorescent probes. Nucleic Acids Res, 2010, 38: e177
Article PubMed PubMed Central Google Scholar
Langreth R, Waldholz M. New era of personalized medicine: targeting drugs for each unique genetic profile. Oncologist, 1999, 4: 426–427
PubMed Google Scholar
Khemani A, Jaju G. Contracting sequencing costs could mean ballooning informatics prices. Genet Eng Biotech News, 2012, http://www.genengnews.com/blog-biotech/contracting-sequencing-costs-could-mean-ballooning-informatics-prices/690/
Huse S M, Huber J A, Morrison H G, et al. Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol, 2007, 8: R143
Article PubMed PubMed Central Google Scholar
Huang X, Madan A. CAP3: a DNA sequence assembly program. Genome Res, 1999, 9: 868–877
Article PubMed CAS PubMed Central Google Scholar
Ledergerber C, Dessimoz C. Base-calling for next-generation sequencing platforms. Brief Bioinform, 2011, 12: 489–497
Article PubMed PubMed Central Google Scholar
Kircher M, Stenzel U, Kelso J. Improved base calling for the Illumina Genome Analyzer using machine learning strategies. Genome Biol, 2009, 10: R83
Article PubMed PubMed Central Google Scholar
Erlich Y, Mitra P P, Delabastide M, et al. Alta-Cyclic: a self-optimizing base caller for next-generation sequencing. Nat Methods, 2008, 5: 679–682
Article PubMed CAS PubMed Central Google Scholar
Rougemont J, Amzallag A, Iseli C, et al. Probabilistic base calling of solexa sequencing data. BMC Bioinformatics, 2008, 9: 431
Article PubMed PubMed Central Google Scholar
Kao W C, Stevens K, Song Y S. BayesCall: a model-based base-calling algorithm for high-throughput short-read sequencing. Genome Res, 2009, 19: 1884–1895
Article PubMed CAS PubMed Central Google Scholar
Li H, Homer N. A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform, 2010, 11: 473–483
Article PubMed CAS PubMed Central Google Scholar
Homer N, Merriman B, Nelson S F. BFAST: an alignment tool for large scale genome resequencing. PLoS ONE, 2009, 4: e7767
Article PubMed PubMed Central Google Scholar
Langmead B, Trapnell C, Pop M, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol, 2009, 10: R25
Article PubMed PubMed Central Google Scholar
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 2009, 25: 1754–1760
Article PubMed CAS PubMed Central Google Scholar
Slater G S, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics, 2005, 6: 31
Article PubMed PubMed Central Google Scholar
Taylor J, Schenck I, Blankenberg D, et al. Using galaxy to perform large-scale interactive data anal. Curr Prot Bioinfo, 2007, 19: 1–10
Google Scholar
Schneeberger K, Hagmann J, Ossowski S, et al. Simultaneous alignment of short reads against multiple genomes. Genome Biol, 2009, 10: R98
Article PubMed PubMed Central Google Scholar
Wu T D, Watanabe C K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics, 2005, 21: 1859–1875
Article PubMed CAS Google Scholar
Clement N L, Snell Q, Clement M J, et al. The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing. Bioinformatics, 2010, 26: 38–45
Article PubMed CAS Google Scholar
Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res, 2008, 18: 1851–1858
Article PubMed CAS PubMed Central Google Scholar
Alkan C, Kidd J M, Marques-Bonet T, et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet, 2009, 41: 1061–1067
Article PubMed CAS PubMed Central Google Scholar
Ossowski S, Schneeberger K, Clark R M, et al. Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res, 2008, 18: 2024–2033
Article PubMed CAS PubMed Central Google Scholar
Smith A D, Chung W Y, Hodges E, et al. Updates to the RMAP short-read mapping software. Bioinformatics, 2009, 25: 2841–2842
Article PubMed CAS PubMed Central Google Scholar
Jiang H, Wong W H. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics, 2008, 24, 2395–2396
Article PubMed CAS PubMed Central Google Scholar
Rumble S M, Lacroute P, Dalca A V, et al. SHRiMP: accurate mapping of short color-space reads. PLoS Comput Biol, 2009, 5: e1000386
Article PubMed PubMed Central Google Scholar
Malhis N, Butterfield Y S, Ester M, et al. Slider—maximum use of probability information for alignment of short sequence reads and SNP detection. Bioinformatics, 2009, 25: 6–13
Article PubMed CAS PubMed Central Google Scholar
Li R, Li Y, Kristiansen K, et al. SOAP: short oligonucleotide alignment program. Bioinformatics, 2008, 24: 713–714
Article PubMed CAS Google Scholar
Ondov B D, Cochran C, Landers M, et al. An alignment algorithm for bisulfite sequencing using the Applied Biosystems SOLiD System. Bioinformatics, 2010, 26: 1901–1902
Article PubMed CAS PubMed Central Google Scholar
Ning Z, Cox A J, Mullikin J C. SSAHA: a fast search method for large DNA databases. Genome Res, 2001, 11: 1725–1729
Article PubMed CAS PubMed Central Google Scholar
Rasmussen K, Stoye J, Myers E W. Efficient q-gram filters for finding all epsilon-matches over a given length. J Comp Biol, 2006, 13: 296–308
Article CAS Google Scholar
Trapnell C, Pachter L, Salzberg S L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 2009, 25: 1105–1111
Article PubMed CAS PubMed Central Google Scholar
Delcher A L, Kasif S, Fleischmann R D, et al. Alignment of whole genomes. Nucleic Acids Res, 1999, 27: 2369–2376
Article PubMed CAS PubMed Central Google Scholar
Lin H, Zhang Z, Zhang M Q, et al. ZOOM! Zillions of oligos mapped. Bioinformatics, 2008, 24: 2431–2437
Article PubMed CAS PubMed Central Google Scholar
Chistoserdova L. Recent progress and new challenges in metagenomics for biotechnology. Biotechnol Lett, 2010, 32: 1351–1359
Article PubMed CAS Google Scholar
Ma B, Tromp J, Li M. PatternHunter: Faster and more sensitive homology search. Bioinformatics, 2002, 18: 440–445
Article PubMed CAS Google Scholar
Kalyanaraman A, Emrich S J, Schanble P S, et al. Assembling genomes on large-scale parallel computers. J Parallel Distrib Comput, 2007, 67: 1240–1255
Article Google Scholar
Pevzner P A, Tang H, Waterman M S. An eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci USA, 2001, 98: 9748–9753
Article PubMed CAS PubMed Central Google Scholar
Gnerre S, Maccallum I, Przybylski D, et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA, 2011, 108: 1513–1518
Article PubMed CAS PubMed Central Google Scholar
Li R, Zhu H, Ruan J, et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res, 2010, 20: 265–272
Article PubMed CAS PubMed Central Google Scholar
Birol I, Jackman S D, Nielsen C B, et al. De novo transcriptome assembly with ABySS. Bioinformatics, 2009, 25: 2872–2877
Article PubMed CAS Google Scholar
Zerbino D R, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res, 2008, 18: 821–829
Article PubMed CAS PubMed Central Google Scholar
Butler J, MacCallum I, Kleber M, et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res, 2008, 18: 810–820
Article PubMed CAS PubMed Central Google Scholar
Swanson B, Gilder G. Estimating the exaflood: the impact of video and rich media on the Internet—“a zetabyte”; of data by 2015? Discovery Institute Report, 2008, http://www.discovery.org/a/4428
Boehret K. Get your storage out of the cloud. Wall Street J, 2010, http://online.wsj.com/article/SB40001424052748704188104575083533949634468.html
Fischer M, Snajder R, Pabinger S, et al. SIMPLEX: cloud-enabled pipeline for the comprehensive analysis of exome sequencing data. PLoS ONE, 2012, 7: e41948
Article PubMed CAS PubMed Central Google Scholar
Pagani I, Liolios K, Jansson J, et al. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res, 2012, 40: D571–D579
Article PubMed CAS PubMed Central Google Scholar
Baker M. De novo genome assembly: what every biologist should know. Nat Methods, 2012, 9: 333–337
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
HuiXiao Hong, Jie Shen, ZhenQiang Su, Roger Perkins, LeMing Shi & WeiDa Tong
Beijing Genomic Institute, Beishan Industrial Zone, Shenzhen, 518083, China
WenQian Zhang
Division of Systems Biology, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
BaiTang Ning & Tao Han

Authors

HuiXiao Hong
View author publications
You can also search for this author in PubMed Google Scholar
WenQian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Shen
View author publications
You can also search for this author in PubMed Google Scholar
ZhenQiang Su
View author publications
You can also search for this author in PubMed Google Scholar
BaiTang Ning
View author publications
You can also search for this author in PubMed Google Scholar
Tao Han
View author publications
You can also search for this author in PubMed Google Scholar
Roger Perkins
View author publications
You can also search for this author in PubMed Google Scholar
LeMing Shi
View author publications
You can also search for this author in PubMed Google Scholar
WeiDa Tong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to HuiXiao Hong.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Hong, H., Zhang, W., Shen, J. et al. Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine. Sci. China Life Sci. 56, 110–118 (2013). https://doi.org/10.1007/s11427-013-4439-7

Download citation

Received: 08 October 2012
Accepted: 29 November 2012
Published: 08 February 2013
Issue Date: February 2013
DOI: https://doi.org/10.1007/s11427-013-4439-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine

Abstract

Article PDF

Similar content being viewed by others

Bioinformatics Tools in Clinical Genomics

Bioinformatics Tools in Clinical Genomics

Next Generation Sequencing in Healthcare

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine

Abstract

Article PDF

Similar content being viewed by others

Bioinformatics Tools in Clinical Genomics

Bioinformatics Tools in Clinical Genomics

Next Generation Sequencing in Healthcare

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation