Abstract
Realizing personalized medicine requires integrating diverse data types with bioinformatics. The most vital data are genomic information for individuals that are from advanced next-generation sequencing (NGS) technologies at present. The technologies continue to advance in terms of both decreasing cost and sequencing speed with concomitant increase in the amount and complexity of the data. The prodigious data together with the requisite computational pipelines for data analysis and interpretation are stressors to IT infrastructure and the scientists conducting the work alike. Bioinformatics is increasingly becoming the rate-limiting step with numerous challenges to be overcome for translating NGS data for personalized medicine. We review some key bioinformatics tasks, issues, and challenges in contexts of IT requirements, data quality, analysis tools and pipelines, and validation of biomarkers.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Crick F. Central dogma of molecular biology. Nature, 1970, 227: 561–563
Sanger F, Nicklen S, Coulson A R. DNA sequencing with chainterminating inhibitors. Proc Natl Acad Sci USA, 1977, 74: 5463–5467
Margulies M, Egholm M, Altman W E, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature, 2005, 437: 376–380
Metzker M L. Sequencing technologies—the next generation. Nat Rev Genet, 2010, 11: 31–46
Voelkerding K V, Dames S A, Durtschi J D. Next-generation sequencing: from basic research to diagnostics. Clin Chem, 2009, 55: 641–658
Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol, 2008, 26: 1135–1145
Ansorge W J. Next-generation DNA sequencing techniques. Nat Biotechnol, 2009, 25, 195-203
Reis-Filho J S. Next-generation sequencing. Breast Cancer Res, 2009, 11: S12
Das S K, Austin M D, Akana M C, et al. Single molecule linear analysis of DNA in nano-channel labeled with sequence specific fluorescent probes. Nucleic Acids Res, 2010, 38: e177
Langreth R, Waldholz M. New era of personalized medicine: targeting drugs for each unique genetic profile. Oncologist, 1999, 4: 426–427
Khemani A, Jaju G. Contracting sequencing costs could mean ballooning informatics prices. Genet Eng Biotech News, 2012, http://www.genengnews.com/blog-biotech/contracting-sequencing-costs-could-mean-ballooning-informatics-prices/690/
Huse S M, Huber J A, Morrison H G, et al. Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol, 2007, 8: R143
Huang X, Madan A. CAP3: a DNA sequence assembly program. Genome Res, 1999, 9: 868–877
Ledergerber C, Dessimoz C. Base-calling for next-generation sequencing platforms. Brief Bioinform, 2011, 12: 489–497
Kircher M, Stenzel U, Kelso J. Improved base calling for the Illumina Genome Analyzer using machine learning strategies. Genome Biol, 2009, 10: R83
Erlich Y, Mitra P P, Delabastide M, et al. Alta-Cyclic: a self-optimizing base caller for next-generation sequencing. Nat Methods, 2008, 5: 679–682
Rougemont J, Amzallag A, Iseli C, et al. Probabilistic base calling of solexa sequencing data. BMC Bioinformatics, 2008, 9: 431
Kao W C, Stevens K, Song Y S. BayesCall: a model-based base-calling algorithm for high-throughput short-read sequencing. Genome Res, 2009, 19: 1884–1895
Li H, Homer N. A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform, 2010, 11: 473–483
Homer N, Merriman B, Nelson S F. BFAST: an alignment tool for large scale genome resequencing. PLoS ONE, 2009, 4: e7767
Langmead B, Trapnell C, Pop M, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol, 2009, 10: R25
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 2009, 25: 1754–1760
Slater G S, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics, 2005, 6: 31
Taylor J, Schenck I, Blankenberg D, et al. Using galaxy to perform large-scale interactive data anal. Curr Prot Bioinfo, 2007, 19: 1–10
Schneeberger K, Hagmann J, Ossowski S, et al. Simultaneous alignment of short reads against multiple genomes. Genome Biol, 2009, 10: R98
Wu T D, Watanabe C K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics, 2005, 21: 1859–1875
Clement N L, Snell Q, Clement M J, et al. The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing. Bioinformatics, 2010, 26: 38–45
Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res, 2008, 18: 1851–1858
Alkan C, Kidd J M, Marques-Bonet T, et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet, 2009, 41: 1061–1067
Ossowski S, Schneeberger K, Clark R M, et al. Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res, 2008, 18: 2024–2033
Smith A D, Chung W Y, Hodges E, et al. Updates to the RMAP short-read mapping software. Bioinformatics, 2009, 25: 2841–2842
Jiang H, Wong W H. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics, 2008, 24, 2395–2396
Rumble S M, Lacroute P, Dalca A V, et al. SHRiMP: accurate mapping of short color-space reads. PLoS Comput Biol, 2009, 5: e1000386
Malhis N, Butterfield Y S, Ester M, et al. Slider—maximum use of probability information for alignment of short sequence reads and SNP detection. Bioinformatics, 2009, 25: 6–13
Li R, Li Y, Kristiansen K, et al. SOAP: short oligonucleotide alignment program. Bioinformatics, 2008, 24: 713–714
Ondov B D, Cochran C, Landers M, et al. An alignment algorithm for bisulfite sequencing using the Applied Biosystems SOLiD System. Bioinformatics, 2010, 26: 1901–1902
Ning Z, Cox A J, Mullikin J C. SSAHA: a fast search method for large DNA databases. Genome Res, 2001, 11: 1725–1729
Rasmussen K, Stoye J, Myers E W. Efficient q-gram filters for finding all epsilon-matches over a given length. J Comp Biol, 2006, 13: 296–308
Trapnell C, Pachter L, Salzberg S L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 2009, 25: 1105–1111
Delcher A L, Kasif S, Fleischmann R D, et al. Alignment of whole genomes. Nucleic Acids Res, 1999, 27: 2369–2376
Lin H, Zhang Z, Zhang M Q, et al. ZOOM! Zillions of oligos mapped. Bioinformatics, 2008, 24: 2431–2437
Chistoserdova L. Recent progress and new challenges in metagenomics for biotechnology. Biotechnol Lett, 2010, 32: 1351–1359
Ma B, Tromp J, Li M. PatternHunter: Faster and more sensitive homology search. Bioinformatics, 2002, 18: 440–445
Kalyanaraman A, Emrich S J, Schanble P S, et al. Assembling genomes on large-scale parallel computers. J Parallel Distrib Comput, 2007, 67: 1240–1255
Pevzner P A, Tang H, Waterman M S. An eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci USA, 2001, 98: 9748–9753
Gnerre S, Maccallum I, Przybylski D, et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA, 2011, 108: 1513–1518
Li R, Zhu H, Ruan J, et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res, 2010, 20: 265–272
Birol I, Jackman S D, Nielsen C B, et al. De novo transcriptome assembly with ABySS. Bioinformatics, 2009, 25: 2872–2877
Zerbino D R, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res, 2008, 18: 821–829
Butler J, MacCallum I, Kleber M, et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res, 2008, 18: 810–820
Swanson B, Gilder G. Estimating the exaflood: the impact of video and rich media on the Internet—“a zetabyte”; of data by 2015? Discovery Institute Report, 2008, http://www.discovery.org/a/4428
Boehret K. Get your storage out of the cloud. Wall Street J, 2010, http://online.wsj.com/article/SB40001424052748704188104575083533949634468.html
Fischer M, Snajder R, Pabinger S, et al. SIMPLEX: cloud-enabled pipeline for the comprehensive analysis of exome sequencing data. PLoS ONE, 2012, 7: e41948
Pagani I, Liolios K, Jansson J, et al. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res, 2012, 40: D571–D579
Baker M. De novo genome assembly: what every biologist should know. Nat Methods, 2012, 9: 333–337
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Hong, H., Zhang, W., Shen, J. et al. Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine. Sci. China Life Sci. 56, 110–118 (2013). https://doi.org/10.1007/s11427-013-4439-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11427-013-4439-7