Advertisement

An Accurate Method for Inferring Relatedness in Large Datasets of Unphased Genotypes via an Embedded Likelihood-Ratio Test

  • Jesse M. Rodriguez
  • Serafim Batzoglou
  • Sivan Bercovici
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7821)

Abstract

Studies that map disease genes rely on accurate annotations that indicate whether individuals in the studied cohorts are related to each other or not. For example, in genome-wide association studies, the cohort members are assumed to be unrelated to one another. Investigators can correct for individuals in a cohort with previously-unknown shared familial descent by detecting genomic segments that are shared between them, which are considered to be identical by descent (IBD). Alternatively, elevated frequencies of IBD segments near a particular locus among affected individuals can be indicative of a disease-associated gene. As genotyping studies grow to use increasingly large sample sizes and meta-analyses begin to include many data sets, accurate and efficient detection of hidden relatedness becomes a challenge. To enable disease-mapping studies of increasingly large cohorts, a fast and accurate method to detect IBD segments is required.

We present PARENTE, a novel method for detecting related pairs of individuals and shared haplotypic segments within these pairs. PARENTE is a computationally-efficient method based on an embedded likelihood ratio test. As demonstrated by the results of our simulations, our method exhibits better accuracy than the current state of the art, and can be used for the analysis of large genotyped cohorts. PARENTE’s higher accuracy becomes even more significant in more challenging scenarios, such as detecting shorter IBD segments or when an extremely low false-positive rate is required. PARENTE is publicly and freely available at http://parente.stanford.edu/.

Keywords

Population genetics IBD relatedness 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abecasis, G.R., Cherny, S.S., Cookson, W.O., Cardon, L.R.: Merlin–rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30(1), 97–101 (2002)CrossRefGoogle Scholar
  2. 2.
    Alkuraya, F.S.: Homozygosity mapping: one more tool in the clinical geneticist’s toolbox. Genet. Med. 12(4), 236–239 (2010)CrossRefGoogle Scholar
  3. 3.
    Altshuler, D.M., Gibbs, R.A., Peltonen, L., Dermitzakis, E., Schaffner, S.F., Yu, F., Bonnen, P.E., De Bakker, P.I.W., Deloukas, P., Gabriel, S.B., et al.: Integrating common and rare genetic variation in diverse human populations. Nature 467(7311), 52–58 (2010)CrossRefGoogle Scholar
  4. 4.
    Bercovici, S., Meek, C., Wexler, Y., Geiger, D.: Estimating genome-wide ibd sharing from snp data via an efficient hidden markov model of ld with application to gene mapping. Bioinformatics 26(12), i175–i182 (2010)Google Scholar
  5. 5.
    Browning, B.L., Browning, S.R.: A fast, powerful method for detecting identity by descent. American Journal of Human Genetics 88(2), 173–182 (2011)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Browning, S., Browning, B.: Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81(5), 1084–1097 (2007)CrossRefGoogle Scholar
  7. 7.
    Browning, S., Thompson, E.: Detecting Rare Variant Associations by Identity by Descent Mapping in Case-control Studies. Genetics 190, 1521–1531 (2012)CrossRefGoogle Scholar
  8. 8.
    Browning, S.R., Browning, B.L.: High-Resolution Detection of Identity by Descent in Unrelated Individuals. American Journal of Human Genetics 86(4), 526–539 (2010)CrossRefGoogle Scholar
  9. 9.
    Carey, V.J.: Mathematical and statistical methods for genetic analysis (2nd ed.). kenneth lange. Journal of the American Statistical Association 100, 712 (2005)Google Scholar
  10. 10.
    Conrad, D.F., Keebler, J.E.M., DePristo, M.A., Lindsay, S.J., Zhang, Y., Casals, F., Idaghdour, Y., Hartl, C.L., Torroja, C., Garimella, K.V., Zilversmit, M., Cartwright, R., Rouleau, G.A., Daly, M., Stone, E.A., Hurles, M.E., Awadalla, P., for the 1000 Genomes Project: Variation in genome-wide mutation rates within and between human families. Nature Genetics (2011)Google Scholar
  11. 11.
    Elston, R., Stewart, J.: A general model for the analysis of pedigree data. Hum. Hered. 21, 523–542 (1971)CrossRefGoogle Scholar
  12. 12.
    Ghahramani, Z., Jordan, M.I., Smyth, P.: Factorial hidden markov models. In: Machine Learning. MIT Press (1997)Google Scholar
  13. 13.
    Gudbjartsson, D.F., Thorvaldsson, T., Kong, A., Gunnarsson, G., Ingolfsdottir, A.: Allegro version 2. Nature Genetics 37(10), 1015–1016 (2005)CrossRefGoogle Scholar
  14. 14.
    Gusev, A., Lowe, J.K., Stoffel, M., Daly, M.J., Altshuler, D., Breslow, J.L., Friedman, J.M., Pe’er, I.: Whole population, genome-wide mapping of hidden relatedness. Genome Research 19, 318–326 (2009), doi:10.1101/gr.081398.108CrossRefGoogle Scholar
  15. 15.
    Henn, B.M., Hon, L., Macpherson, J.M., Eriksson, N., Saxonov, S., Pe’er, I., Mountain, J.L.: Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS ONE 7(4), e34267 (2012)Google Scholar
  16. 16.
    Ingólfsdóttir, A., Gudbjartsson, D.: Genetic Linkage Analysis Algorithms and Their Implementation. In: Priami, C., Merelli, E., Gonzalez, P., Omicini, A. (eds.) Transactions on Computational Systems Biology III. LNCS (LNBI), vol. 3737, pp. 123–144. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Kyriazopoulou-Panagiotopoulou, S., Kashef Haghighi, D., Aerni, S.J., Sundquist, A., Bercovici, S., Batzoglou, S.: Reconstruction of genealogical relationships with applications to phase iii of hapmap. Bioinformatics 27(13), i333–i341 (2011)Google Scholar
  18. 18.
    Lander, E.S., Green, P.: Construction of multilocus genetic maps in humans. Proceedings of the National Academy of Sciences 84, 2363–2367 (1987)CrossRefGoogle Scholar
  19. 19.
    Li, M.-H., Strandén, I., Tiirikka, T., Sevón-Aimonen, M.-L., Kantanen, J.: A comparison of approaches to estimate the inbreeding coefficient and pairwise relatedness using genomic and pedigree data in a sheep population. PLoS ONE 6(11), e26256 (2011)Google Scholar
  20. 20.
    Markianos, K., Daly, M.J., Kruglyak, L.: Efficient multipoint linkage analysis through reduction of inheritance space. Am. J. Hum. Genet. 68(4), 963–977 (2001)CrossRefGoogle Scholar
  21. 21.
    1000 Genomes Project. A map of human genome variation from population-scale sequencing. Nature 467(7319),1061–1073 (2010)Google Scholar
  22. 22.
    Moltke, I., Albrechtsen, A., Thomas, Nielsen, F.C., Nielsen, R.: A method for detecting IBD regions simultaneously in multiple individuals with applications to disease genetics. Genome Research 21(7), 1168–1180 (2011)CrossRefGoogle Scholar
  23. 23.
    Nalls, M.A., Simon-Sanchez, J., Gibbs, J.R., Paisan-Ruiz, C., Bras, J.T., Tanaka, T., Matarin, M., Scholz, S., Weitz, C., Harris, T.B., Ferrucci, L., Hardy, J., Singleton, A.B.: Measures of autozygosity in decline: Globalization, urbanization, and its implications for medical genetics. PLoS Genet 5(3), e1000415 (2009)Google Scholar
  24. 24.
    Ott, J.: Analysis of Human Genetic Linkage. The Johns Hopkins series in contemporary medicine and public health. Johns Hopkins University Press (1999)Google Scholar
  25. 25.
    Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., de Bakker, P.I., Daly, M.J., Sham, P.C.: PLINK: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics 81(3), 559–575 (2007)CrossRefGoogle Scholar
  26. 26.
    Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 257–286 (1989)Google Scholar
  27. 27.
    Ralph, P., Coop, G.: The geography of recent genetic ancestry across Europe (July 2012)Google Scholar
  28. 28.
    WTCCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145), 661–678 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jesse M. Rodriguez
    • 1
    • 2
  • Serafim Batzoglou
    • 1
  • Sivan Bercovici
    • 1
  1. 1.Department of Computer ScienceStanford UniversityUSA
  2. 2.Biomedical Informatics ProgramStanford UniversityUSA

Personalised recommendations