Skip to main content

eALPS: Estimating Abundance Levels in Pooled Sequencing Using Available Genotyping Data

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2013)

Abstract

The recent advances in high-throughput sequencing technologies bring the potential of a better characterization of the genetic variation in humans and other organisms. In many occasions, either by design or by necessity, the sequencing procedure is performed on a pool of DNA samples with different abundances, where the abundance of each sample is unknown. Such a scenario is naturally occurring in the case of metagenomics analysis where a pool of bacteria is sequenced, or in the case of population studies involving DNA pools by design. Particularly, various pooling designs were recently suggested that can identify carriers of rare alleles in large cohorts, dramatically reducing the cost of such large-scale sequencing projects.

A fundamental problem with such approaches for population studies is that the uncertainly of DNA proportions from different individuals in the pools might lead to spurious associations. Fortunately, it is often the case that the genotype data of at least some of the individuals in the pool is known. Here, we propose a method (eALPS) that uses the genotype data in conjunction with the pooled sequence data in order to accurately estimate the proportions of the samples in the pool, even in cases where not all individuals in the pool were genotyped (eALPS-LD). Using real data from a sequencing pooling study of Non-Hodgkin’s Lymphoma, we demonstrate that the estimation of the proportions is crucial, since otherwise there is a risk for false discoveries. Additionally, we demonstrate that our approach is also applicable to the problem of quantification of species in metagenomics samples (eALPS-BCR), and is particularly suitable for metagenomic quantification of closely-related species.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Manolio, T.A., et al.: A HapMap harvest of insights into the genetics of common disease. The Journal of Clinical Investigation 118(5), 1590–1605 (2008)

    Article  Google Scholar 

  2. Matsuzaki, H., et al.: Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nature Methods 1(2), 109–111 (2004)

    Article  Google Scholar 

  3. Gunderson, K.L., et al.: A genome-wide scalable SNP genotyping assay using microarray technology. Nature Genetics 37(5), 549–554 (2005)

    Article  Google Scholar 

  4. Wheeler, D.A., et al.: The complete genome of an individual by massively parallel DNA sequencing. Nature 452(7189), 872–876 (2008)

    Article  Google Scholar 

  5. Skibola, C.F., et al.: Genetic variants at 6p21.33 are associated with susceptibility to follicular lymphoma. Nature Genetics 41(8), 873–875 (2010)

    Article  Google Scholar 

  6. Brown, K.M., et al.: Common sequence variants on 20q11.22 confer melanoma susceptibility. Nature Genetics 40(7), 838–840 (2008)

    Article  Google Scholar 

  7. Hanson, R.L., et al.: Identification of PVT1 as a candidate gene for end-stage renal disease in type 2 diabetes using a pooling-based genome-wide single nucleotide polymorphism association study. Diabetes 56(4), 975–983 (2007)

    Article  MathSciNet  Google Scholar 

  8. Erlich, Y., et al.: DNA Sudoku–harnessing high-throughput sequencing for multiplexed specimen analysis. Genome Research 19(7), 1243–1253 (2009)

    Article  Google Scholar 

  9. Golan, D., et al.: Weighted pooling–practical and cost-effective techniques for pooled high-throughput sequencing. Bioinformatics 28(12), i197–i206 (2012)

    Google Scholar 

  10. Prabhu, S., Pe’er, I.: Overlapping pools for high-throughput targeted resequencing. Genome Research 19(1), 1254–1261 (2009)

    Article  Google Scholar 

  11. Savage, D.C., et al.: The Gastrointestinal Epithelium and its Autochthonous Bacterial Flora. The Journal of Experimental Medicine 127(1), 67–76 (1968)

    Article  Google Scholar 

  12. Guarner, F., Malagelada, J.R.: Gut flora in health and disease. Lancet 361(9356), 512–519 (2003)

    Article  Google Scholar 

  13. Heselmans, M., et al.: Gut Flora in Health and Disease: Potential Role of Probiotics. Current Issues in Intestinal Microbiology 6(1), 0–8 (2005)

    Google Scholar 

  14. Mahida, Y.R.: Epithelial cell responses. Best Practice & Research Clinical Gastroenterology 18(2), 241–253 (2004)

    Article  Google Scholar 

  15. Amir, A., Zuk, O.: Bacterial community reconstruction using compressed sensing. Journal of Computational Biology 18(11), 1723–1741 (2011)

    Article  MathSciNet  Google Scholar 

  16. Hamady, M., et al.: Error-correcting barcoded primers allow hundreds of samples to be pyrosequenced in multiplex. Nature Methods 5(3), 235–237 (2008)

    Article  Google Scholar 

  17. Dethlefsen, L., et al.: The Pervasive Effects of an Antibiotic on the Human Gut Microbiota, as Revealed by Deep 16S rRNA Sequencing. PLoS Biology 6(11), e280 (2008)

    Google Scholar 

  18. Angly, F.E., et al.: The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Computational Biology 5(12), e1000593 (2009)

    Google Scholar 

  19. Xia, L.C., et al.: Accurate genome relative abundance estimation based on shotgun metagenomic reads. PloS One 6(12), e27992 (2011)

    Google Scholar 

  20. Lin, W.Y., et al.: Evaluation of pooled association tests for rare variant identification. BMC Proceedings 5(suppl. 9), S118 (2011)

    Google Scholar 

  21. Price, A.L., et al.: Pooled association tests for rare variants in exon-resequencing studies. American Journal of Human Genetics 86(6), 832–838 (2010)

    Article  Google Scholar 

  22. Lee, J.S., et al.: On Optimal Pooling Designs to Identify Rare Variants Through Massive Resequencing. Genetic Epidemiology 35(3), 139–147 (2011)

    Article  Google Scholar 

  23. Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Learning in Graphical Models, 1977, pp. 355–368. Kluwer Academic Publishers (1998)

    Google Scholar 

  24. Kimmel, G., Shamir, R.: A block-free hidden Markov model for genotypes and its application to disease association. Journal of Computational Biology 12(10), 1243–1260 (2005)

    Article  Google Scholar 

  25. Kennedy, J., et al.: Genotype error detection using Hidden Markov Models of haplotype diversity. Journal of Computational Biology 15(9), 1155–1171 (2008)

    Article  MathSciNet  Google Scholar 

  26. Browning, S.R.: Multilocus association mapping using variable-length Markov chains. American Journal of Human Genetics 78(6), 903–913 (2006)

    Article  Google Scholar 

  27. Conde, L., et al.: Genome-wide association study of follicular lymphoma identifies a risk locus at 6p21.32. Nature Genetics 42(8), 661–664 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Eskin, I. et al. (2013). eALPS: Estimating Abundance Levels in Pooled Sequencing Using Available Genotyping Data. In: Deng, M., Jiang, R., Sun, F., Zhang, X. (eds) Research in Computational Molecular Biology. RECOMB 2013. Lecture Notes in Computer Science(), vol 7821. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37195-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37195-0_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37194-3

  • Online ISBN: 978-3-642-37195-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics