Extending Maximal Perfect Haplotype Blocks to the Realm of Pangenomics
- 19 Downloads
Recent work provides the first method to measure the relative fitness of genomic variants within a population that scales to large numbers of genomes. A key component of the computation involves finding conserved haplotype blocks, which can be done in linear time. Here, we extend the notion of conserved haplotype blocks to pangenomes, which can store more complex variation than a single reference genome. We define a maximal perfect pangenome haplotype block and give a linear-time, suffix tree based approach to find all such blocks from a set of pangenome haplotypes. We demonstrate the method by applying it to a pangenome built from yeast strains.
KeywordsPopulation genomics Haplotype block Pangenomics
Support provided by US National Science Foundation grants DBI-1759522 and DBI-1661530. We thank the anonymous reviewers for their thoughtful feedback and questions.
- 1.Alanko, J., Bannai, H., Cazaux, B., Peterlongo, P., Stoye, J.: Finding all maximal perfect haplotype blocks in linear time. In: 19th International Workshop on Algorithms in Bioinformatics, WABI 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2019)Google Scholar
- 5.Farach, M.: Optimal suffix tree construction with large alphabets. In: Proceedings 38th Annual Symposium on Foundations of Computer Science, pp. 137–143. IEEE (1997)Google Scholar
- 6.Gillespie, J.H.: Population Genetics: a Concise Guide. JHU Press, Baltimore (2004)Google Scholar
- 9.Sung, W.K.: Algorithms in Bioinformatics: A Practical Introduction. CRC Press, Boca Raton (2009)Google Scholar