Skip to main content

Building a Pangenome Reference for a Population

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2014)

Abstract

A reference genome is a high quality individual genome that is used as a coordinate system for the genomes of a population, or genomes of closely related subspecies. Given a set of genomes partitioned by homology into alignment blocks we formalise the problem of ordering and orienting the blocks such that the resulting ordering maximally agrees with the underlying genomes’ ordering and orientation, creating a pangenome reference ordering. We show this problem is NP-hard, but also demonstrate, empirically and within simulations, the performance of heuristic algorithms based upon a cactus graph decomposition to find locally maximal solutions. We describe an extension of our Cactus software to create a pangenome reference for whole genome alignments, and demonstrate how it can be used to create novel genome browser visualizations using human variation data as a test.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Coffey, A.J., Kokocinski, F., Calafato, M.S., Scott, C.E., Palta, P., Drury, E., Joyce, C.J., Leproust, E.M., Harrow, J., Hunt, S., Lehesjoki, A.E., Turner, D.J., Hubbard, T.J., Palotie, A.: The gencode exome: sequencing the complete human exome. Eur. J. Hum. Genet. 19(7), 827–831 (2011)

    Article  Google Scholar 

  2. Myers, R.M., Stamatoyannopoulos, J., Snyder, M., Dunham, I., Hardison, R.C., Bernstein, B.E., Gingeras, T.R., Kent, W.J., Birney, E., Wold, B., Crawford, G.E.: A user’s guide to the encyclopedia of dna elements (encode). PLoS Biol. 9(4), e1001046 (2011); ENCODE-Project-Consortium

    Google Scholar 

  3. 1000-Genomes-Project-Consortium: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–1073 (October 2010)

    Google Scholar 

  4. Paten, B., Earl, D., Nguyen, N., Diekhans, M., Zerbino, D., Haussler, D.: Cactus: Algorithms for genome multiple sequence alignment. Genome Res. 21(9), 1512–1528 (2011)

    Article  Google Scholar 

  5. Meyer, L.R., et al.: The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Research, 64–69 (2013)

    Google Scholar 

  6. Tannier, E., Zheng, C., Sankoff, D.: Multichromosomal median and halving problems under different genomic distances. BMC Bioinformatics 10, 120 (2009)

    Article  Google Scholar 

  7. Kirkpatrick, M.: How and why chromosome inversions evolve. PLoS Biol. 8(9) (January 2010)

    Google Scholar 

  8. Berard, S., Chateau, A., Chauve, C., Paul, C., Tannier, E.: Computation of perfect dcj rearrangement scenarios with linear and circular chromosomes. Journal of Computational Biology 16(10), 1287–1309 (2009)

    Article  MathSciNet  Google Scholar 

  9. Fagin, R., Kumar, R., Sivakumar, D.: Comparing Top k Lists. SIAMJ. Discrete Math. 17(1), 134–160 (2002)

    Article  MathSciNet  Google Scholar 

  10. Kendall, M.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)

    Article  MATH  MathSciNet  Google Scholar 

  11. Bertrand, D., Blanchette, M., El-Mabrouk, N.: Genetic map refinement using a comparative genomic approach. J. Comput. Biol. 16(10), 1475–1486 (2009)

    Article  MathSciNet  Google Scholar 

  12. Paten, B., Diekhans, M., Earl, D., John, J.S., Ma, J., Suh, B., Haussler, D.: Cactus graphs for genome comparisons. J. Comput. Biol. 18(3), 469–481 (2011)

    Article  MathSciNet  Google Scholar 

  13. Medvedev, P., Brudno, M.: Maximum likelihood genome assembly. J. Comput. Biol. 16(8), 1101–1116 (2009)

    Article  MathSciNet  Google Scholar 

  14. Griffiths, A.J.F., Miller, J.H., Suzuki, D.T.: An introduction to genetic analysis (January 1999)

    Google Scholar 

  15. Karp, R.: Reducibility among combinatorial problems. Plenum (Complexity of Computer Computations), 85–103 (January 1972)

    Google Scholar 

  16. Newman, A.: Max-cut. Encyclopedia of Algorithms 1, 489–492 (2008)

    Article  Google Scholar 

  17. Erdos, P., Rényi, A.: On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17–61 (1960)

    Google Scholar 

  18. Xu, A.W.: A fast and exact algorithm for the median of three problem: a graph decomposition approach. J. Comput. Biol. 16(10), 1369–1381 (2009)

    Article  MathSciNet  Google Scholar 

  19. Hickey, G., Paten, B., Earl, D., Zerbino, D., Haussler, D.: HAL: a hierarchical format for storing and analyzing multiple genome alignments. Bioinformatics 29(10), 1341–1342 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, N. et al. (2014). Building a Pangenome Reference for a Population. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05269-4_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05268-7

  • Online ISBN: 978-3-319-05269-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics