Limitations of next-generation genome sequence assembly

Alkan, Can; Sajjadian, Saba; Eichler, Evan E

doi:10.1038/nmeth.1527

Limitations of next-generation genome sequence assembly

Perspective
Published: 21 November 2010

Volume 8, pages 61–65, (2011)
Cite this article

From

View current issue Submit your manuscript

Can Alkan¹,
Saba Sajjadian¹ &
Evan E Eichler¹

15k Accesses
490 Citations
38 Altmetric
4 Mentions
Explore all metrics

Abstract

High-throughput sequencing technologies promise to transform the fields of genetics and comparative biology by delivering tens of thousands of genomes in the near future. Although it is feasible to construct de novo genome assemblies in a few months, there has been relatively little attention to what is lost by sole application of short sequence reads. We compared the recent de novo assemblies using the short oligonucleotide analysis package (SOAP), generated from the genomes of a Han Chinese individual and a Yoruban individual, to experimentally validated genomic features. We found that de novo assemblies were 16.2% shorter than the reference genome and that 420.2 megabase pairs of common repeats and 99.1% of validated duplicated sequences were missing from the genome. Consequently, over 2,377 coding exons were completely missing. We conclude that high-quality sequencing approaches must be considered in conjunction with high-throughput sequencing for comparative genomics analyses and studies of genome evolution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Figure 1: Summary of *de novo* genome assembly and new sequence analysis.**

The Illumina Sequencing Protocol and the NovaSeq 6000 System

A survey of best practices for RNA-seq data analysis

Article Open access 26 January 2016

BUSCO: Assessing Genome Assembly and Annotation Completeness

References

Huang, S. et al. The genome of the cucumber, Cucumis sativus L. Nat. Genet. 41, 1275–1281 (2009).
Article CAS Google Scholar
Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010).
Article CAS Google Scholar
Genome 10K Community of Scientists. Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J. Hered. 100, 659–674 (2009).
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Article CAS Google Scholar
Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Article CAS Google Scholar
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).
Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).
Article CAS Google Scholar
Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
Article CAS Google Scholar
Myers, E.W. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).
Article CAS Google Scholar
Pevzner, P.A., Tang, H. & Waterman, M.S. An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98, 9748–9753 (2001).
Article CAS Google Scholar
Chaisson, M.J., Brinza, D. & Pevzner, P.A. De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Res. 19, 336–346 (2009).
Article CAS Google Scholar
Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).
Article CAS Google Scholar
Schuster, S.C. et al. Complete Khoisan and Bantu genomes from southern Africa. Nature 463, 943–947 (2010).
Article CAS Google Scholar
Green, P. Whole-genome disassembly. Proc. Natl. Acad. Sci. USA 99, 4143–4144 (2002).
Article CAS Google Scholar
Schatz, M.C., Delcher, A.L. & Salzberg, S.L. Assembly of large genomes using second-generation sequencing. Genome Res. 20, 1165–1173 (2010).
Article CAS Google Scholar
Meader, S., Hillier, L.W., Locke, D., Ponting, C.P. & Lunter, G. Genome assembly quality: assessment and improvement using the neutral indel model. Genome Res. 20, 675–684 (2010).
Article CAS Google Scholar
Zhang, Z., Schwartz, S., Wagner, L. & Miller, W. A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7, 203–214 (2000).
Article CAS Google Scholar
Li, R. et al. Building the sequence map of the human pan-genome. Nat. Biotechnol. 28, 57–63 (2010).
Article CAS Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Article CAS Google Scholar
Mills, R.E., Bennett, E.A., Iskow, R.C. & Devine, S.E. Which transposable elements are active in the human genome? Trends Genet. 23, 183–191 (2007).
Article CAS Google Scholar
Bailey, J.A., Yavor, A.M., Massa, H.F., Trask, B.J. & Eichler, E.E. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 11, 1005–1017 (2001).
Article CAS Google Scholar
She, X. et al. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431, 927–930 (2004).
Article CAS Google Scholar
Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41, 1061–1067 (2009).
Article CAS Google Scholar
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Article CAS Google Scholar
Doggett, N.A. et al. A 360-kb interchromosomal duplication of the human HYDIN locus. Genomics 88, 762–771 (2006).
Article CAS Google Scholar
Worley, K.C. & Gibbs, R.A. Genetics: decoding a national treasure. Nature 463, 303–304 (2010).
Article CAS Google Scholar
Kidd, J.M. et al. Characterization of missing human genome sequences and copy-number polymorphic insertions. Nat. Methods 7, 365–371 (2010).
Article CAS Google Scholar

Download references

Acknowledgements

We thank E. Karakoc and P. Sudmant for helpful discussions, T. Marques-Bonet and J.M. Kidd for providing the nonredundant gene table, and T. Brown for proofreading the manuscript. This work was partly supported by US National Institutes of Health grant HG002385 to E.E.E. E.E.E. receives funds as an Investigator of the Howard Hughes Medical Institute.

Author information

Authors and Affiliations

Department of Genome Sciences, University of Washington School of Medicine and Howard Hughes Medical Institute, Seattle, Washington, USA
Can Alkan, Saba Sajjadian & Evan E Eichler

Authors

Can Alkan
View author publications
You can also search for this author in PubMed Google Scholar
Saba Sajjadian
View author publications
You can also search for this author in PubMed Google Scholar
Evan E Eichler
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.A. and E.E.E. conceived the study and wrote the manuscript. C.A. and S.S. analyzed the data.

Corresponding author

Correspondence to Evan E Eichler.

Ethics declarations

Competing interests

E.E.E. is a scientific advisory board member of Pacific Biosciences.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alkan, C., Sajjadian, S. & Eichler, E. Limitations of next-generation genome sequence assembly. Nat Methods 8, 61–65 (2011). https://doi.org/10.1038/nmeth.1527

Download citation

Published: 21 November 2010
Issue Date: January 2011
DOI: https://doi.org/10.1038/nmeth.1527
Springer Nature America, Inc.

This article is cited by

STRavinsky STR database and PGTailor PGT tool demonstrate superiority of CHM13-T2T over hg38 and hg19 for STR-based applications
- Noam Hadar
- Ginat Narkis
- Ohad S. Birk
European Journal of Human Genetics (2023)
16p13.11p11.2 triplication syndrome: a new recognizable genomic disorder characterized by optical genome mapping and whole genome sequencing
- Romain Nicolle
- Karine Siquier-Pernet
- Valérie Malan
European Journal of Human Genetics (2022)
Genomic resources of Colletotrichum fungi: development and application
- Ayako Tsushima
- Ken Shirasu
Journal of General Plant Pathology (2022)
Insights into genomic evolution from the chromosomal and mitochondrial genomes of Ustilaginoidea virens
- Kang Zhang
- Zaixu Zhao
- Wenxian Sun
Phytopathology Research (2021)
Draft genome sequence of the pulse crop blackgram [Vigna mungo (L.) Hepper] reveals potential R-genes
- Souframanien Jegadeesan
- Avi Raizada
- Penna Suprasanna
Scientific Reports (2021)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Limitations of next-generation genome sequence assembly

From

Abstract

Access this article

Similar content being viewed by others

The Illumina Sequencing Protocol and the NovaSeq 6000 System

A survey of best practices for RNA-seq data analysis

BUSCO: Assessing Genome Assembly and Annotation Completeness

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Text and Figures

Supplementary Table 1

Supplementary Table 3

Supplementary Table 4

Supplementary Table 5

Rights and permissions

About this article

Cite this article

This article is cited by

STRavinsky STR database and PGTailor PGT tool demonstrate superiority of CHM13-T2T over hg38 and hg19 for STR-based applications

16p13.11p11.2 triplication syndrome: a new recognizable genomic disorder characterized by optical genome mapping and whole genome sequencing

Genomic resources of Colletotrichum fungi: development and application

Insights into genomic evolution from the chromosomal and mitochondrial genomes of Ustilaginoidea virens

Draft genome sequence of the pulse crop blackgram [Vigna mungo (L.) Hepper] reveals potential R-genes

Navigation

Limitations of next-generation genome sequence assembly

Abstract

Access this article

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation