Building a Pangenome Reference for a Population

  • Ngan Nguyen
  • Glenn Hickey
  • Daniel R. Zerbino
  • Brian Raney
  • Dent Earl
  • Joel Armstrong
  • David Haussler
  • Benedict Paten
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8394)

Abstract

A reference genome is a high quality individual genome that is used as a coordinate system for the genomes of a population, or genomes of closely related subspecies. Given a set of genomes partitioned by homology into alignment blocks we formalise the problem of ordering and orienting the blocks such that the resulting ordering maximally agrees with the underlying genomes’ ordering and orientation, creating a pangenome reference ordering. We show this problem is NP-hard, but also demonstrate, empirically and within simulations, the performance of heuristic algorithms based upon a cactus graph decomposition to find locally maximal solutions. We describe an extension of our Cactus software to create a pangenome reference for whole genome alignments, and demonstrate how it can be used to create novel genome browser visualizations using human variation data as a test.

Keywords

Recombination Hunt Tral Verse Dunham 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Coffey, A.J., Kokocinski, F., Calafato, M.S., Scott, C.E., Palta, P., Drury, E., Joyce, C.J., Leproust, E.M., Harrow, J., Hunt, S., Lehesjoki, A.E., Turner, D.J., Hubbard, T.J., Palotie, A.: The gencode exome: sequencing the complete human exome. Eur. J. Hum. Genet. 19(7), 827–831 (2011)CrossRefGoogle Scholar
  2. 2.
    Myers, R.M., Stamatoyannopoulos, J., Snyder, M., Dunham, I., Hardison, R.C., Bernstein, B.E., Gingeras, T.R., Kent, W.J., Birney, E., Wold, B., Crawford, G.E.: A user’s guide to the encyclopedia of dna elements (encode). PLoS Biol. 9(4), e1001046 (2011); ENCODE-Project-ConsortiumGoogle Scholar
  3. 3.
    1000-Genomes-Project-Consortium: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–1073 (October 2010)Google Scholar
  4. 4.
    Paten, B., Earl, D., Nguyen, N., Diekhans, M., Zerbino, D., Haussler, D.: Cactus: Algorithms for genome multiple sequence alignment. Genome Res. 21(9), 1512–1528 (2011)CrossRefGoogle Scholar
  5. 5.
    Meyer, L.R., et al.: The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Research, 64–69 (2013)Google Scholar
  6. 6.
    Tannier, E., Zheng, C., Sankoff, D.: Multichromosomal median and halving problems under different genomic distances. BMC Bioinformatics 10, 120 (2009)CrossRefGoogle Scholar
  7. 7.
    Kirkpatrick, M.: How and why chromosome inversions evolve. PLoS Biol. 8(9) (January 2010)Google Scholar
  8. 8.
    Berard, S., Chateau, A., Chauve, C., Paul, C., Tannier, E.: Computation of perfect dcj rearrangement scenarios with linear and circular chromosomes. Journal of Computational Biology 16(10), 1287–1309 (2009)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Fagin, R., Kumar, R., Sivakumar, D.: Comparing Top k Lists. SIAMJ. Discrete Math. 17(1), 134–160 (2002)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Kendall, M.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    Bertrand, D., Blanchette, M., El-Mabrouk, N.: Genetic map refinement using a comparative genomic approach. J. Comput. Biol. 16(10), 1475–1486 (2009)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Paten, B., Diekhans, M., Earl, D., John, J.S., Ma, J., Suh, B., Haussler, D.: Cactus graphs for genome comparisons. J. Comput. Biol. 18(3), 469–481 (2011)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Medvedev, P., Brudno, M.: Maximum likelihood genome assembly. J. Comput. Biol. 16(8), 1101–1116 (2009)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Griffiths, A.J.F., Miller, J.H., Suzuki, D.T.: An introduction to genetic analysis (January 1999)Google Scholar
  15. 15.
    Karp, R.: Reducibility among combinatorial problems. Plenum (Complexity of Computer Computations), 85–103 (January 1972)Google Scholar
  16. 16.
    Newman, A.: Max-cut. Encyclopedia of Algorithms 1, 489–492 (2008)CrossRefGoogle Scholar
  17. 17.
    Erdos, P., Rényi, A.: On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17–61 (1960)Google Scholar
  18. 18.
    Xu, A.W.: A fast and exact algorithm for the median of three problem: a graph decomposition approach. J. Comput. Biol. 16(10), 1369–1381 (2009)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Hickey, G., Paten, B., Earl, D., Zerbino, D., Haussler, D.: HAL: a hierarchical format for storing and analyzing multiple genome alignments. Bioinformatics 29(10), 1341–1342 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ngan Nguyen
    • 1
  • Glenn Hickey
    • 1
  • Daniel R. Zerbino
    • 2
  • Brian Raney
    • 1
  • Dent Earl
    • 1
  • Joel Armstrong
    • 1
  • David Haussler
    • 1
    • 3
  • Benedict Paten
    • 1
  1. 1.Center for Biomolecular Science and EngineeringUniversity of California Santa CruzUSA
  2. 2.Wellcome Trust Genome CampusEMBL-EBICambridgeUK
  3. 3.Howard Hughes Medical InstituteUniversity of California, Santa CruzUSA

Personalised recommendations