Building a Pangenome Reference for a Population

Nguyen, Ngan; Hickey, Glenn; Zerbino, Daniel R.; Raney, Brian; Earl, Dent; Armstrong, Joel; Haussler, David; Paten, Benedict

doi:10.1007/978-3-319-05269-4_17

Ngan Nguyen²⁰,
Glenn Hickey²⁰,
Daniel R. Zerbino²¹,
Brian Raney²⁰,
Dent Earl²⁰,
Joel Armstrong²⁰,
David Haussler^20,22 &
…
Benedict Paten²⁰

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8394))

Included in the following conference series:

International Conference on Research in Computational Molecular Biology

3160 Accesses
2 Citations

Abstract

A reference genome is a high quality individual genome that is used as a coordinate system for the genomes of a population, or genomes of closely related subspecies. Given a set of genomes partitioned by homology into alignment blocks we formalise the problem of ordering and orienting the blocks such that the resulting ordering maximally agrees with the underlying genomes’ ordering and orientation, creating a pangenome reference ordering. We show this problem is NP-hard, but also demonstrate, empirically and within simulations, the performance of heuristic algorithms based upon a cactus graph decomposition to find locally maximal solutions. We describe an extension of our Cactus software to create a pangenome reference for whole genome alignments, and demonstrate how it can be used to create novel genome browser visualizations using human variation data as a test.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Coffey, A.J., Kokocinski, F., Calafato, M.S., Scott, C.E., Palta, P., Drury, E., Joyce, C.J., Leproust, E.M., Harrow, J., Hunt, S., Lehesjoki, A.E., Turner, D.J., Hubbard, T.J., Palotie, A.: The gencode exome: sequencing the complete human exome. Eur. J. Hum. Genet. 19(7), 827–831 (2011)
Article Google Scholar
Myers, R.M., Stamatoyannopoulos, J., Snyder, M., Dunham, I., Hardison, R.C., Bernstein, B.E., Gingeras, T.R., Kent, W.J., Birney, E., Wold, B., Crawford, G.E.: A user’s guide to the encyclopedia of dna elements (encode). PLoS Biol. 9(4), e1001046 (2011); ENCODE-Project-Consortium
Google Scholar
1000-Genomes-Project-Consortium: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–1073 (October 2010)
Google Scholar
Paten, B., Earl, D., Nguyen, N., Diekhans, M., Zerbino, D., Haussler, D.: Cactus: Algorithms for genome multiple sequence alignment. Genome Res. 21(9), 1512–1528 (2011)
Article Google Scholar
Meyer, L.R., et al.: The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Research, 64–69 (2013)
Google Scholar
Tannier, E., Zheng, C., Sankoff, D.: Multichromosomal median and halving problems under different genomic distances. BMC Bioinformatics 10, 120 (2009)
Article Google Scholar
Kirkpatrick, M.: How and why chromosome inversions evolve. PLoS Biol. 8(9) (January 2010)
Google Scholar
Berard, S., Chateau, A., Chauve, C., Paul, C., Tannier, E.: Computation of perfect dcj rearrangement scenarios with linear and circular chromosomes. Journal of Computational Biology 16(10), 1287–1309 (2009)
Article MathSciNet Google Scholar
Fagin, R., Kumar, R., Sivakumar, D.: Comparing Top k Lists. SIAMJ. Discrete Math. 17(1), 134–160 (2002)
Article MathSciNet Google Scholar
Kendall, M.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)
Article MATH MathSciNet Google Scholar
Bertrand, D., Blanchette, M., El-Mabrouk, N.: Genetic map refinement using a comparative genomic approach. J. Comput. Biol. 16(10), 1475–1486 (2009)
Article MathSciNet Google Scholar
Paten, B., Diekhans, M., Earl, D., John, J.S., Ma, J., Suh, B., Haussler, D.: Cactus graphs for genome comparisons. J. Comput. Biol. 18(3), 469–481 (2011)
Article MathSciNet Google Scholar
Medvedev, P., Brudno, M.: Maximum likelihood genome assembly. J. Comput. Biol. 16(8), 1101–1116 (2009)
Article MathSciNet Google Scholar
Griffiths, A.J.F., Miller, J.H., Suzuki, D.T.: An introduction to genetic analysis (January 1999)
Google Scholar
Karp, R.: Reducibility among combinatorial problems. Plenum (Complexity of Computer Computations), 85–103 (January 1972)
Google Scholar
Newman, A.: Max-cut. Encyclopedia of Algorithms 1, 489–492 (2008)
Article Google Scholar
Erdos, P., Rényi, A.: On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17–61 (1960)
Google Scholar
Xu, A.W.: A fast and exact algorithm for the median of three problem: a graph decomposition approach. J. Comput. Biol. 16(10), 1369–1381 (2009)
Article MathSciNet Google Scholar
Hickey, G., Paten, B., Earl, D., Zerbino, D., Haussler, D.: HAL: a hierarchical format for storing and analyzing multiple genome alignments. Bioinformatics 29(10), 1341–1342 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Biomolecular Science and Engineering, University of California Santa Cruz, CA, USA
Ngan Nguyen, Glenn Hickey, Brian Raney, Dent Earl, Joel Armstrong, David Haussler & Benedict Paten
Wellcome Trust Genome Campus, EMBL-EBI, Cambridge, CB10 1SD, UK
Daniel R. Zerbino
Howard Hughes Medical Institute, University of California, Santa Cruz, CA, 95064, USA
David Haussler

Authors

Ngan Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Glenn Hickey
View author publications
You can also search for this author in PubMed Google Scholar
Daniel R. Zerbino
View author publications
You can also search for this author in PubMed Google Scholar
Brian Raney
View author publications
You can also search for this author in PubMed Google Scholar
Dent Earl
View author publications
You can also search for this author in PubMed Google Scholar
Joel Armstrong
View author publications
You can also search for this author in PubMed Google Scholar
David Haussler
View author publications
You can also search for this author in PubMed Google Scholar
Benedict Paten
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Tel Aviv University, 69978, Tel Aviv, Israel
Roded Sharan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, N. et al. (2014). Building a Pangenome Reference for a Population. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-05269-4_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05268-7
Online ISBN: 978-3-319-05269-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics