Extending Maximal Perfect Haplotype Blocks to the Realm of Pangenomics

  • Lucia WilliamsEmail author
  • Brendan Mumey
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12099)


Recent work provides the first method to measure the relative fitness of genomic variants within a population that scales to large numbers of genomes. A key component of the computation involves finding conserved haplotype blocks, which can be done in linear time. Here, we extend the notion of conserved haplotype blocks to pangenomes, which can store more complex variation than a single reference genome. We define a maximal perfect pangenome haplotype block and give a linear-time, suffix tree based approach to find all such blocks from a set of pangenome haplotypes. We demonstrate the method by applying it to a pangenome built from yeast strains.


Population genomics Haplotype block Pangenomics 



Support provided by US National Science Foundation grants DBI-1759522 and DBI-1661530. We thank the anonymous reviewers for their thoughtful feedback and questions.


  1. 1.
    Alanko, J., Bannai, H., Cazaux, B., Peterlongo, P., Stoye, J.: Finding all maximal perfect haplotype blocks in linear time. In: 19th International Workshop on Algorithms in Bioinformatics, WABI 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2019)Google Scholar
  2. 2.
    Beller, T., Ohlebusch, E.: A representation of a compressed de Bruijn graph for pan-genome analysis that enables search. Algorithms Mol. Biol. 11(1), 20 (2016)CrossRefGoogle Scholar
  3. 3.
    Chen, H., Hey, J., Slatkin, M.: A hidden markov model for investigating recent positive selection through haplotype structure. Theoret. Popul. Biol. 99, 18–30 (2015)CrossRefGoogle Scholar
  4. 4.
    Cunha, L., Diekmann, Y., Kowada, L., Stoye, J.: Identifying maximal perfect haplotype blocks. In: Alves, R. (ed.) BSB 2018. LNCS, vol. 11228, pp. 26–37. Springer, Cham (2018). Scholar
  5. 5.
    Farach, M.: Optimal suffix tree construction with large alphabets. In: Proceedings 38th Annual Symposium on Foundations of Computer Science, pp. 137–143. IEEE (1997)Google Scholar
  6. 6.
    Gillespie, J.H.: Population Genetics: a Concise Guide. JHU Press, Baltimore (2004)Google Scholar
  7. 7.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)CrossRefGoogle Scholar
  8. 8.
    Marçais, G., Delcher, A.L., Phillippy, A.M., Coston, R., Salzberg, S.L., Zimin, A.: MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14(1), e1005944 (2018)CrossRefGoogle Scholar
  9. 9.
    Sung, W.K.: Algorithms in Bioinformatics: A Practical Introduction. CRC Press, Boca Raton (2009)Google Scholar
  10. 10.
    Tettelin, H., et al.: Genome analysis of multiple pathogenic isolates of streptococcus agalactiae: implications for the microbial “pan-genome". Proc. Natl. Acad. Sci. 102(39), 13950–13955 (2005)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Montana State UniversityBozemanUSA

Personalised recommendations