Skip to main content

Advertisement

Log in

Modeling coverage gaps in haplotype frequencies via Bayesian inference to improve stem cell donor selection

  • Original Article
  • Published:
Immunogenetics Aims and scope Submit manuscript

Abstract

Regardless of sampling depth, accurate genotype imputation is limited in regions of high polymorphism which often have a heavy-tailed haplotype frequency distribution. Many rare haplotypes are thus unobserved. Statistical methods to improve imputation by extending reference haplotype distributions using linkage disequilibrium patterns that relate allele and haplotype frequencies have not yet been explored. In the field of unrelated stem cell transplantation, imputation of highly polymorphic human leukocyte antigen (HLA) genes has an important application in identifying the best-matched stem cell donor when searching large registries totaling over 28,000,000 donors worldwide. Despite these large registry sizes, a significant proportion of searched patients present novel HLA haplotypes. Supporting this observation, HLA population genetic models have indicated that many extant HLA haplotypes remain unobserved. The absent haplotypes are a significant cause of error in haplotype matching. We have applied a Bayesian inference methodology for extending haplotype frequency distributions, using a model where new haplotypes are created by recombination of observed alleles. Applications of this joint probability model offer significant improvement in frequency distribution estimates over the best existing alternative methods, as we illustrate using five-locus HLA frequency data from the National Marrow Donor Program registry. Transplant matching algorithms and disease association studies involving phasing and imputation of rare variants may benefit from this statistical inference framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Anasetti C (2012) The ever elusive permissive mismatch. Biol Blood Marrow Transplant 18:657–658

    Article  PubMed  Google Scholar 

  • Browning SR, Weir BS (2010) Population structure with localized haplotype clusters. Genetics 185:1337–1344

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chi EC, Zhou H, Chen GK, Del Vecchyo DO, Lange K (2013) Genotype imputation via matrix completion. Genome Res 23:509–518

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Consortium IH (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467:52–58

    Article  Google Scholar 

  • Dehn J, Arora M, Spellman S, Setterholm M, Horowitz M, Confer D, Weisdorf D (2008) Unrelated donor hematopoietic cell transplantation: factors associated with a better HLA match. Biol Blood Marrow Transplant 14(12):1334–1340. https://doi.org/10.1016/j.bbmt.2008.09.009

    Article  PubMed  PubMed Central  Google Scholar 

  • Dehn J, Setterholm M, Buck K, Kempenich J, Beduhn B, Gragert L, Madbouly A, Fingerson S, Maiers M (2016) HapLogic: a predictive human leukocyte antigen-matching algorithm to enhance rapid identification of the optimal unrelated hematopoietic stem cell sources for transplantation. Biol Blood Marrow Transplant 22(11):2038–2046

  • Dilthey A, Cox C, Iqbal Z, Nelson MR, McVean G (2015) Improved genome inference in the MHC using a population reference graph. Nat Genet 47:682–688

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • du Preez JA (1998) Efficient training of high-order hidden Markov models using first-order representations. Comput Speech Lang 12:23–39

    Article  Google Scholar 

  • Eberhard HP, Feldmann U, Bochtler W, Baier D, Rutt C, Schmidt AH, Müller CR (2010) Estimating unbiased haplotype frequencies from stem cell donor samples typed at heterogeneous resolutions: a practical study based on over 1 million German donors. Tissue Antigens 76:352–361

    Article  PubMed  Google Scholar 

  • Eberhard HP, Madbouly A, Gourraud P, Balère M, Feldmann U, Gragert L, Maldonado Torres H, Pingel J, Schmidt A, Steiner D (2013) Comparative validation of computer programs for haplotype frequency estimation from donor registry data. Tissue Antigens 82:93–105

    Article  PubMed  Google Scholar 

  • Erlich H (2012) HLA DNA typing: past, present, and future. Tissue Antigens 80:1–11

    Article  CAS  PubMed  Google Scholar 

  • Ewens W (1972) The sampling theory of selectively neutral alleles. Theor Popul Biol 3(1):87–112

  • Excoffier L, Slatkin M (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12:921–927

    CAS  PubMed  Google Scholar 

  • Gragert L, Madbouly A, Freeman J, Maiers M (2013) Six-locus high resolution HLA haplotype frequencies derived from mixed-resolution DNA typing for the entire US donor registry. Hum Immunol 74:1313–1320

    Article  CAS  PubMed  Google Scholar 

  • Hansen JA, Yamamoto K, Petersdorf E, Sasazuki T (1999) The role of {HLA} matching in hematopoietic cell transplantation. Rev Immunogenet 1:359–373

    CAS  PubMed  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) Unsupervised learning. Springer, The elements of statistical learning

    Book  Google Scholar 

  • Hawley ME, Kidd KK (1995) {HAPLO:} a program using the {EM} algorithm to estimate the frequencies of multi-site haplotypes. J Hered 86:409–411

    Article  CAS  PubMed  Google Scholar 

  • Hellinger E (1909) Neue Begr{ü}ndung der Theorie quadratischer Formen von unendlichvielen Ver{ä}nderlichen. Journal f{ü}r die reine und angewandte Mathematik 136:210–271

    Google Scholar 

  • Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67

    Article  Google Scholar 

  • Hou L, Vierra-Green C, Lazaro A, Brady C, Haagenson M, Spellman S, Hurley C (2017) Limited HLA sequence variation outside of antigen recognition domain exons of 360 10 of 10 matched unrelated hematopoietic stem cell transplant donor-recipient pairs. HLA 89:39–46

    Article  CAS  PubMed  Google Scholar 

  • Klitz W, Hedrick P, Louis EJ (2012) New reservoirs of HLA alleles: pools of rare variants enhance immune defense. Trends Genet 28:480–486

    Article  CAS  PubMed  Google Scholar 

  • Kollman C, Maiers M, Gragert L, Müller C, Setterholm M, Oudshoorn M, Hurley CK (2007) Estimation of {HLA-A}, -B, -{DRB1} haplotype frequencies using mixed resolution data from a National Registry with selective retyping of volunteers. Hum Immunol 68:950–958

    Article  CAS  PubMed  Google Scholar 

  • Kulkarni S, Martin MP, Carrington M (2008) The Yin and Yang of HLA and KIR in human disease. Elsevier, Seminars in immunology

    Google Scholar 

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86

    Article  Google Scholar 

  • Lee SJ, Klein J, Haagenson M, Baxter-Lowe LA, Confer DL, Eapen M, Fernandez-Vina M, Flomenberg N, Horowitz M, Hurley CK, Noreen H, Oudshoorn M, Petersdorf E, Setterholm M, Spellman S, Weisdorf D, Williams TM, Anasetti C (2007) High-resolution donor-recipient {HLA} matching contributes to the success of unrelated donor marrow transplantation. Blood 110:4576–4583

    Article  CAS  PubMed  Google Scholar 

  • Mack SJ, Cano P, Hollenbach JA, He J, Hurley CK, Middleton D, Moraes ME, Pereira SE, Kempenich JH, Reed EF (2013) Common and well-documented HLA alleles: 2012 update to the CWD catalogue. Tissue Antigens 81:194–203

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mack SJ, Tu B, Lazaro A, Yang R, Lancaster AK, Cao K, Ng J, Hurley CK (2009) {HLA-A}, -B, -C, and -{DRB1} allele and haplotype frequencies distinguish eastern European Americans from the general European American population. Tissue Antigens 73:17–32

    Article  CAS  PubMed  Google Scholar 

  • Maiers M, Gragert L, Klitz W (2007) High-resolution {HLA} alleles and haplotypes in the United States population. Hum Immunol 68:779–788

    Article  CAS  PubMed  Google Scholar 

  • Maiers M, Gragert L, Madbouly A, Steiner D, Marsh SGE, Gourraud P-A, Oudshoorn M, van der Zanden H, Schmidt AH, Pingel J, Hofmann J, Müller C, Eberhard H-P (2013) 16(th) {IHIW:} global analysis of registry {HLA} haplotypes from 20 million individuals: report from the {IHIW} registry diversity group. Int J Immunogenet 40:66–71

    CAS  PubMed  Google Scholar 

  • Maiers M, Hurley C, Perlee L, Fernandez-Vina M, Baisch J, Cook D, Fraser P, Heine U, Hsu S, Leffell M (1999) Maintaining updated DNA-based HLA assignments in the National Marrow Donor Program Bone Marrow Registry. Rev Immunogenet 2:449–460

    Google Scholar 

  • Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11:499–511

    Article  CAS  PubMed  Google Scholar 

  • Martin AM, Freitas EM, Witt CS, Christiansen FT (2000) The genomic organization and evolution of the natural killer immunoglobulin-like receptor (KIR) gene cluster. Immunogenetics 51:268–280

    Article  CAS  PubMed  Google Scholar 

  • Niu T (2004) Algorithms for inferring haplotypes. Genet Epidemiol 27:334–347

    Article  PubMed  Google Scholar 

  • Norman PJ, Hollenbach JA, Nemat-Gorgani N, Marin WM, Norberg SJ, Ashouri E, Jayaraman J, Wroblewski EE, Trowsdale J, Rajalingam R (2016) Defining KIR and HLA class I genotypes at highest resolution via high-throughput sequencing. Am J Hum Genet 99:375–391

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nothnagel M, Ellinghaus D, Schreiber S, Krawczak M, Franke A (2009) A comprehensive evaluation of {SNP} genotype imputation. Hum Genet 125:163–171

    Article  CAS  PubMed  Google Scholar 

  • Paunić V, Gragert L, Schneider J, Mueller C, Maiers M (2016) Charting improvements in US registry HLA typing ambiguity using a typing resolution score. Hum Immunol 77:542–549

    Article  PubMed  Google Scholar 

  • Petersdorf EW, Anasetti C, Martin PJ, Gooley T, Radich J, Malkki M, Woolfrey A, Smith A, Mickelson E, Hansen JA (2004) Limits of {HLA} mismatching in unrelated hematopoietic cell transplantation. Blood 104:2976–2980

    Article  CAS  PubMed  Google Scholar 

  • Petersdorf EW, Malkki M, Gooley TA, Martin PJ, Guo Z (2007) MHC haplotype matching for unrelated hematopoietic cell transplantation. PLoS Med 4:e8

    Article  PubMed  PubMed Central  Google Scholar 

  • Robinson J, Halliwell JA, Hayhurst JD, Flicek P, Parham P, Marsh SG (2014) The IPD and IMGT/HLA database: allele variant databases. Nucleic acids research:gku1161

  • Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7:147–177

    Article  PubMed  Google Scholar 

  • Siva N (2008) 1000 Genomes project. Nat Biotechnol 26:256–256

    Article  PubMed  Google Scholar 

  • Slater N, Louzoun Y, Gragert L, Maiers M, Chatterjee A, Albrecht M (2015) Power laws for heavy-tailed distributions: modeling allele and haplotype diversity for the National Marrow Donor Program. PLoS Comput Biol 11(4)

  • Spellman SR, Eapen M, Logan BR, Mueller C, Rubinstein P, Setterholm MI, Woolfrey AE, Horowitz MM, Confer DL, Hurley CK (2012) A perspective on the selection of unrelated donors and cord blood units for transplantation. Blood 120:259–265

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Templeton AR, Sing CF (1993) A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. IV. Nested analyses with cladogram uncertainty and recombination. Genetics 134:659–669

    CAS  PubMed  PubMed Central  Google Scholar 

  • Trowsdale J, Knight JC (2013) Major histocompatibility complex genomics and human disease. Annu Rev Genomics Hum Genet 14:301–323

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Vierra-Green C, Roe D, Hou L, Hurley CK, Rajalingam R, Reed E, Lebedeva T, Yu N, Stewart M, Noreen H (2012) Allele-level haplotype frequencies and pairwise linkage disequilibrium for 14 KIR loci in 506 European-American individuals. PLoS One 7:e47491

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoram Louzoun.

Electronic supplementary material

ESM 1

(PDF 718 kb)

Fig. S1

(PDF 154 kb)

Fig. S3

(PDF 660 kb)

Fig. S4

(PDF 447 kb)

Fig. S5

(PDF 212 kb)

Fig. S7

(PDF 153 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Louzoun, Y., Alter, I., Gragert, L. et al. Modeling coverage gaps in haplotype frequencies via Bayesian inference to improve stem cell donor selection. Immunogenetics 70, 279–292 (2018). https://doi.org/10.1007/s00251-017-1040-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00251-017-1040-4

Keywords

Navigation