A pruning strategy of reference panels for fast SNP genotype imputation


In recent genome-wide association studies, the task of genotype imputation for missing SNPs is a common procedure to increase the power of observed genetic markers. For genotype imputation, they usually employ publicly available resources, such as the International HapMap Project data or the 1000 Genome Project data, as a reference panel. However, lately, the volume of publicly available resources is rapidly increasing with the maturation of high-throughput genotyping technology. Thus, it often requires heavy computation for learning large reference panels, leading to long imputation time. In this work, to handle such problem, we propose a pruning strategy for the construction of imputation reference panels which is to reduce the size of reference panel data by excluding (or pruning) somewhat redundant samples from the reference panel based on the estimation of the kinship coefficients between samples. For evaluation, this approach was implemented under the Beagle framework and was tested on two real datasets, Mao et al.’s prostate cancer data and KNIH’s diabetes data. Our experiment results show that the proposed pruning strategy for reference panel construction can provide fast imputation time without the loss of imputation accuracy.

This is a preview of subscription content, access via your institution.


  1. 1.

    Lewis, C.M. Genetic association studies: design, analysis and interpretation. Brief. Bioinform. 3, 146–153 (2002).

    Article  CAS  Google Scholar 

  2. 2.

    Tanaka, T. International HapMap project. Nihon Rinsho 63, 29–34 (2005).

    Google Scholar 

  3. 3.

    Thorisson, G.A., Smith, A.V., Krishnan, L. & Stein, L.D. The international HapMap project web site. Genome Res. 15, 1592–1593 (2005).

    Article  CAS  Google Scholar 

  4. 4.

    Consortium Genomes Project. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

    Article  Google Scholar 

  5. 5.

    Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of genomes. G3 (Bethesda) 1, 457–470 (2011).

    Article  Google Scholar 

  6. 6.

    Huang, L. et al. The relationship between imputation error and statistical power in genetic association studies in diverse populations. Am. J. Hum. Genet. 85, 692–698 (2009).

    Article  CAS  Google Scholar 

  7. 7.

    Pasaniuc, B. et al. A generic coalescent-based framework for the selection of a reference panel for imputation. Genet. Epidemiol. 34, 773–782 (2010).

    Article  Google Scholar 

  8. 8.

    Jostins, L., Morley, K.I. & Barrett, J.C. Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets. Eur. J. Hum. Genet. 19, 662–666 (2011).

    Article  Google Scholar 

  9. 9.

    Sung, Y.J., Wang, L., Rankinen, T., Bouchard, C. & Rao, D.C. Performance of genotype imputations using data from the 1000 genomes project. Hum. Hered. 73, 18–25 (2012).

    Article  Google Scholar 

  10. 10.

    Huang, L. et al. Genotype-imputation accuracy across worldwide human populations. Am. J. Hum. Genet. 84, 235–250 (2009).

    Article  CAS  Google Scholar 

  11. 11.

    Mao, X. et al. Distinct genomic alterations in prostate cancers in Chinese and Western populations suggest alternative pathways of prostate carcinogenesis. Cancer Res. 70, 5207–5212 (2010).

    Article  CAS  Google Scholar 

  12. 12.

    Danford, T., Rolfe, A. & Gifford, D. GSE: a comprehensive database system for the representation, retrieval, and analysis of microarray data. Pac. Symp. Biocomput. 539–550 (2008).

    Google Scholar 

  13. 13.

    Edgar, R., Domrachev, M. & Lash, A.E. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).

    Article  CAS  Google Scholar 

  14. 14.

    Browning, B.L. & Browning, S.R. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210–223 (2009).

    Article  CAS  Google Scholar 

  15. 15.

    Browning, S.R. Missing data imputation and haplotype phase inference for genome-wide association studies. Hum. Genet. 124, 439–450 (2008).

    Article  CAS  Google Scholar 

  16. 16.

    Browning, S.R. & Browning, B.L. Haplotype phasing: existing methods and new developments. Nat. Rev. Genet. 12, 703–714 (2011).

    Article  CAS  Google Scholar 

  17. 17.

    Sung, Y.J., Wang, L., Rankinen, T., Bouchard, C. & Rao, D.C. Performance of genotype imputations using data from the 1000 genomes project. Hum. Hered. 73, 18–25 (2012).

    Article  Google Scholar 

  18. 18.

    Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).

    Article  CAS  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Miyoung Shin.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Jadamba, E., Shin, M., Chung, M. et al. A pruning strategy of reference panels for fast SNP genotype imputation. BioChip J 7, 6–10 (2013). https://doi.org/10.1007/s13206-013-7102-2

Download citation


  • SNP imputation
  • Reference panel pruning
  • Kinship coefficient