Advertisement

Identification and Frequency Estimation of Inversion Polymorphisms from Haplotype Data

  • Suzanne S. Sindi
  • Benjamin J. Raphael
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5541)

Abstract

Structural rearrangements, including copy-number alterations and inversions, are increasingly recognized as an important contributor to human genetic variation. Copy number variants are readily measured via array-based techniques like comparative genomic hybridization, but copy-neutral variants such as inversion polymorphisms remain difficult to identify without whole genome sequencing. We introduce a method to identify inversion polymorphisms and estimate their frequency in a population using readily available single nucleotide polymorphism (SNP) data. Our method uses a probabilistic model to describe a population as a mixture of forward and inverted chromosomes and identifies putative inversions by characteristic differences in haplotype frequencies around inversion breakpoints. On simulated data, our method accurately predicts inversions with frequencies as low as 25% in the population and reliably estimates inversion frequencies over a wide range. On the human HapMap Phase 2 data, we predict between 88 and 142 inversion polymorphisms with frequency ranging from 20 to 92 percent. Many of these correspond to known inversions or have other evidence supporting them, and the predicted inversion frequencies largely agree with the limited information presently available.

Keywords

Haplotype Block HapMap Data Deletion Polymorphism Single Nucleotide Polymorphism Data Inversion Polymorphism 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Frazer, K., et al.: A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007)Google Scholar
  2. 2.
    Wellcome Trust Case Control Consortium: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)Google Scholar
  3. 3.
    Sharp, A., Cheng, Z., Eichler, E.: Structural variation of the human genome. Annu. Rev. Genomics Hum. Genet. 7, 407–442 (2006)CrossRefPubMedGoogle Scholar
  4. 4.
    Walsh, T., McClellan, J., McCarthy, S., Addington, A., Pierce, S., Cooper, G., Nord, A., Kusenda, M., Malhotra, D., Bhandari, A., Stray, S., Rippey, C., Roccanova, P., Makarov, V., Lakshmi, B., Findling, R., Sikich, L., Stromberg, T., Merriman, B., Gogtay, N., Butler, P., Eckstrand, K., Noory, L., Gochman, P., Long, R., Chen, Z., Davis, S., Baker, C., Eichler, E., Meltzer, P., Nelson, S., Singleton, A., Lee, M., Rapoport, J., King, M., Sebat, J.: Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320, 539–543 (2008)CrossRefPubMedGoogle Scholar
  5. 5.
    Stefansson, H., Helgason, A., Thorleifsson, G., Steinthorsdottir, V., Masson, G., Barnard, J., Baker, A., Jonasdottir, A., Ingason, A., Gudnadottir, V., Desnica, N., Hicks, A., Gylfason, A., Gudbjartsson, D., Jonsdottir, G., Sainz, J., Agnarsson, K., Birgisdottir, B., Ghosh, S., Olafsdottir, A., Cazier, J., Kristjansson, K., Frigge, M., Thorgeirsson, T., Gulcher, J., Kong, A., Stefansson, K.: A common inversion under selection in Europeans. Nat. Genet. 37, 129–137 (2005)CrossRefPubMedGoogle Scholar
  6. 6.
    Perry, G., Dominy, N., Claw, K., Lee, A., Fiegler, H., Redon, R., Werner, J., Villanea, F., Mountain, J., Misra, R., Carter, N., Lee, C., Stone, A.: Diet and the evolution of human amylase gene copy number variation. Nat. Genet. 39, 1256–1260 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Cooper, G., Zerr, T., Kidd, J., Eichler, E., Nickerson, D.: Systematic assessment of copy number variant detection via genome-wide SNP genotyping. Nat. Genet. 40, 1199–1203 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    McCarroll, S., Kuruvilla, F., Korn, J., Cawley, S., Nemesh, J., Wysoker, A., Shapero, M., de Bakker, P., Maller, J., Kirby, A., Elliott, A., Parkin, M., Hubbell, E., Webster, T., Mei, R., Veitch, J., Collins, P., Handsaker, R., Lincoln, S., Nizzari, M., Blume, J., Jones, K., Rava, R., Daly, M., Gabriel, S., Altshuler, D.: Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166–1174 (2008)CrossRefPubMedGoogle Scholar
  9. 9.
    Perry, G., Ben-Dor, A., Tsalenko, A., Sampas, N., Rodriguez-Revenga, L., Tran, C., Scheffer, A., Steinfeld, I., Tsang, P., Yamada, N., Park, H., Kim, J., Seo, J., Yakhini, Z., Laderman, S., Bruhn, L., Lee, C.: The fine-scale and complex architecture of human copy-number variation. Am. J. Hum. Genet. 82, 685–695 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    McCarroll, S., Hadnott, T., Perry, G., Sabeti, P., Zody, M., Barrett, J., Dallaire, S., Gabriel, S., Lee, C., Daly, M., Altshuler, D.: Common deletion polymorphisms in the human genome. Nat. Genet. 38, 86–92 (2006)CrossRefPubMedGoogle Scholar
  11. 11.
    Conrad, D., Andrews, T., Carter, N., Hurles, M., Pritchard, J.: A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38, 75–81 (2006)CrossRefPubMedGoogle Scholar
  12. 12.
    Corona, E., Raphael, B., Eskin, E.: Identification of deletion polymorphisms from haplotypes. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS (LNBI), vol. 4453, pp. 354–365. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  13. 13.
    Tuzun, E., Sharp, A.J., Bailey, J.A., Kaul, R., Morrison, V.A., Pertz, L.M., Haugen, E., Hayden, H., Albertson, D., Pinkel, D., Olson, M.V., Eichler, E.E.: Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005)CrossRefPubMedGoogle Scholar
  14. 14.
    Korbel, J.O., Urban, A.E., Affourtit, J.P., Godwin, B., Grubert, F., Simons, J.F., Kim, P.M., Palejev, D., Carriero, N.J., Du, L., Taillon, B.E., Chen, Z., Tanzer, A., Saunders, A.C.E., Chi, J., Yang, F., Carter, N.P., Hurles, M.E., Weissman, S.M., Harkins, T.T., Gerstein, M.B., Egholm, M., Snyder, M.: Paired-end mapping reveals extensive structural variation in the human genome. Science 318(5849), 420–426 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Kidd, J.M., Cooper, G.M., Donahue, W.F., Hayden, H.S., Sampas, N., Graves, T., Hansen, N., Teague, B., Alkan, C., Antonacci, F., Haugen, E., Zerr, T., Yamada, N.A., Tsang, P., Newman, T.L., Tüzün, E., Cheng, Z., Ebling, H.M., Tusneem, N., David, R., Gillett, W., Phelps, K.A., Weaver, M., Saranga, D., Brand, A., Tao, W., Gustafson, E., McKernan, K., Chen, L., Malig, M., Smith, J.D., Korn, J.M., McCarroll, S.A., Altshuler, D.A., Peiffer, D.A., Dorschner, M., Stamatoyannopoulos, J., Schwartz, D., Nickerson, D.A., Mullikin, J.C., Wilson, R.K., Bruhn, L., Olson, M.V., Kaul, R., Smith, D.R., Eichler, E.E.: Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Levy, S., Sutton, G., Ng, P., Feuk, L., Halpern, A., Walenz, B., Axelrod, N., Huang, J., Kirkness, E., Denisov, G., Lin, Y., MacDonald, J., Pang, A., Shago, M., Stockwell, T., Tsiamouri, A., Bafna, V., Bansal, V., Kravitz, S., Busam, D., Beeson, K., McIntosh, T., Remington, K., Abril, J., Gill, J., Borman, J., Rogers, Y., Frazier, M., Scherer, S., Strausberg, R., Venter, J.: The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007)CrossRefGoogle Scholar
  17. 17.
    Wheeler, D., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A., He, W., Chen, Y., Makhijani, V., Roth, G., Gomes, X., Tartaro, K., Niazi, F., Turcotte, C., Irzyk, G., Lupski, J., Chinault, C., Song, X., Liu, Y., Yuan, Y., Nazareth, L., Qin, X., Muzny, D., Margulies, M., Weinstock, G., Gibbs, R., Rothberg, J.: The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008)CrossRefPubMedGoogle Scholar
  18. 18.
    Iafrate, A., Feuk, L., Rivera, M., Listewnik, M., Donahoe, P., Qi, Y., Scherer, S., Lee, C.: Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004)CrossRefPubMedGoogle Scholar
  19. 19.
    Feuk, L., MacDonald, J., Tang, T., Carson, A., Li, M., Rao, G., Khaja, R., Scherer, S.: Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies. PLoS Genet. 1, e56 (2005)CrossRefGoogle Scholar
  20. 20.
    Chaisson, M., Raphael, B., Pevzner, P.: Microinversions in mammalian evolution. Proc. Natl. Acad. Sci. U.S.A. 103, 19824–19829 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Kirkpatrick, M., Barton, N.: Chromosome inversions, local adaptation and speciation. Genetics 173, 419–434 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Hoffmann, A., Sgrò, C., Weeks, A.: Chromosomal inversion polymorphisms and adaptation. Trends Ecol. Evol. (Amst.) 19, 482–488 (2004)CrossRefGoogle Scholar
  23. 23.
    Bansal, V., Bashir, A., Bafna, V.: Evidence for large inversion polymorphisms in the human genome from HapMap data. Genome Res. 17, 219–230 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Patil, N., Berno, A., Hinds, D., Barrett, W., Doshi, J., Hacker, C., Kautzer, C., Lee, D., Marjoribanks, C., McDonough, D., Nguyen, B., Norris, M., Sheehan, J., Shen, N., Stern, D., Stokowski, R., Thomas, D., Trulson, M., Vyas, K., Frazer, K., Fodor, S., Cox, D.: Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294, 1719–1723 (2001)CrossRefPubMedGoogle Scholar
  25. 25.
    Daly, M., Rioux, J., Schaffner, S., Hudson, T., Lander, E.: High-resolution haplotype structure in the human genome. Nat. Genet. 29, 229–232 (2001)CrossRefPubMedGoogle Scholar
  26. 26.
    Pritchard, J., Stephens, M., Donnelly, P.: Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000)PubMedPubMedCentralGoogle Scholar
  27. 27.
    Price, A., Patterson, N., Plenge, R., Weinblatt, M., Shadick, N., Reich, D.: Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006)CrossRefPubMedGoogle Scholar
  28. 28.
    Sridhar, S., Rao, S., Halperin, E.: An efficient and accurate graph-based approach to detect population substructure. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS (LNBI), vol. 4453, pp. 503–517. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  29. 29.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39(1), 1–38 (1977)Google Scholar
  30. 30.
    Falush, D., Stephens, M., Pritchard, J.: Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587 (2003)PubMedPubMedCentralGoogle Scholar
  31. 31.
    Schaffner, S.F., Foo, C., Gabriel, S., Reich, D., Daly, M.J., Altshuler, D.: Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15(11), 1576–1583 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Sindi, S.S., Helman, E., Bashir, A., Raphael, B.J.: A geometric approach for classification and comparison of structural variants. In: Bioinformatics. Proc. ISMB/ECCB 2009 (in press, 2009)Google Scholar
  33. 33.
    Koolen, D., Vissers, L., Pfundt, R., de Leeuw, N., Knight, S., Regan, R., Kooy, R., Reyniers, E., Romano, C., Fichera, M., Schinzel, A., Baumer, A., Anderlid, B., Schoumans, J., Knoers, N., van Kessel, A., Sistermans, E., Veltman, J., Brunner, H., de Vries, B.: A new chromosome 17q21.31 microdeletion syndrome associated with a common inversion polymorphism. Nat. Genet. 38, 999–1001 (2006)CrossRefPubMedGoogle Scholar
  34. 34.
    Zhang, K., Deng, M., Chen, T., Waterman, M., Sun, F.: A dynamic programming algorithm for haplotype block partitioning. Proc. Natl. Acad. Sci. U.S.A. 99, 7335–7339 (2002)CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Anderson, E., Novembre, J.: Finding haplotype block boundaries by using the minimum-description-length principle. Am. J. Hum. Genet. 73, 336–354 (2003)CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Wang, N., Akey, J., Zhang, K., Chakraborty, R., Jin, L.: Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. Am. J. Hum. Genet. 71, 1227–1234 (2002)CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Kimmel, G., Shamir, R.: GERBIL: Genotype resolution and block identification using likelihood. Proc. Natl. Acad. Sci. U.S.A. 102, 158–162 (2005)CrossRefPubMedGoogle Scholar
  38. 38.
    1000 Genomes Project. Technical report (2008), http://www.1000genomes.org
  39. 39.
    Feuk, L., Carson, A.R., Scherer, S.W.: Structural variation in the human genome. Nat. Rev. Genet. 7(2), 85–97 (2006)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Suzanne S. Sindi
    • 1
    • 3
  • Benjamin J. Raphael
    • 2
    • 3
  1. 1.Division of Applied MathematicsUSA
  2. 2.Department of Computer ScienceUSA
  3. 3.Center for Computational Molecular BiologyBrown UniversityProvidenceUSA

Personalised recommendations