This study uses simulations to explore statistical power and false-positive rates for eQTL mapping in allopolyploid organisms and provides guidelines to apply eQTL mapping in these organisms.
In recent years, RNA-seq has become the dominant technology for eQTL studies. However, most work has been in diploid organisms. Many species of economic and environmental importance are polyploid, and approaches for eQTL mapping in polyploids are not well developed. High similarity between duplicated genes in polyploids will cause misassignment of sequence reads and may cause false-positive results and/or lack of power to detect eQTL. In this paper, we first explore the similarity of homoeologous transcripts in polyploid organisms. We find that 5–20% of genes (varying with organism) in important agricultural plants such as wheat, soybean, and switchgrass are not sufficiently diverged between duplicated genomes to allow unambiguous assignment of reads. Second, we examine the impact of misassigned reads on eQTL mapping and show that both false-positive and false-negative rates can be greatly inflated. Third, we compare four strategies for dealing with ambiguous reads: (1) dividing ambiguous reads evenly between homoeologous transcripts, (2) assigning them proportionally, (3) using all reads for all genes, and (4) discarding ambiguous reads. We find that the strategy of discarding ambiguous reads gives the best balance of false-positive and false-negative rates for most genes. However, for genes that are very similar between genomes, using all reads is the only choice. This leads to reduced power, but false-positive rates will be maintained. We also discuss QTL mapping in polyploids using allele-specific expression (ASE) and show how the proportion of ASE-informative reads varies according to the divergence between homoeologous genes.
This is a preview of subscription content,to check access.
Access this article
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410. https://doi.org/10.1016/S0022-2836(05)80360-2
Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11:R106. https://doi.org/10.1186/gb-2010-11-10-r106
Bennetzen JL et al (2012) Reference genome sequence of the model plant Setaria. Nat Biotechnol 30:555–561. https://doi.org/10.1038/nbt.2196
Berge KE et al (2000) Accumulation of dietary cholesterol in sitosterolemia caused by mutations in adjacent ABC transporters. Science 290:1771–1775. https://doi.org/10.1126/science.290.5497.1771
Brem RB, Kruglyak L (2005) The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc Natl Acad Sci USA 102:1572–1577. https://doi.org/10.1073/pnas.0408709102
Brem RB, Yvert G, Clinton R, Kruglyak L (2002) Genetic dissection of transcriptional regulation in budding yeast. Science 296:752–755. https://doi.org/10.1126/science.1069516
Bystrykh L et al (2005) Uncovering regulatory pathways that affect hematopoietic stem cell function using ‘genetical genomics’. Nat Genet 37:225–232. https://doi.org/10.1038/ng1497
Chen Y et al (2008) Variations in DNA elucidate molecular networks that cause disease. Nature 452:429–435. https://doi.org/10.1038/nature06757
Chesler EJ et al (2005) Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat Genet 37:233–242. https://doi.org/10.1038/ng1518
Cheung VG, Conlin LK, Weber TM, Arcaro M, Jen KY, Morley M, Spielman RS (2003) Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet 33:422–425. https://doi.org/10.1038/ng1094
Consortium GT (2015) Human genomics. The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348:648–660. https://doi.org/10.1126/science.1262110
Daverdin G, Bahri BA, Wu XM, Serba DD, Tobias C, Saha MC, Devos KM (2015) Comparative relationships and chromosome evolution in switchgrass (Panicum virgatum) and its genomic model, foxtail millet (Setaria italica). Bioenergy Res 8:137–151. https://doi.org/10.1007/s12155-014-9508-7
Davis JR, Fresard L, Knowles DA, Pala M, Bustamante CD, Battle A, Montgomery SB (2016) An efficient multiple-testing adjustment for eQTL studies that accounts for linkage disequilibrium between variants. Am J Hum Genet 98:216–224. https://doi.org/10.1016/j.ajhg.2015.11.021
Emilsson V et al (2008) Genetics of gene expression and its effect on disease. Nature 452:423–428. https://doi.org/10.1038/nature06758
Emms DM, Kelly S (2015) OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16:157. https://doi.org/10.1186/s13059-015-0721-2
Ghazalpour A et al (2006) Integrating genetic and network analysis to characterize genes related to mouse weight. PLoS Genet 2:e130. https://doi.org/10.1371/journal.pgen.0020130
Harvey CT, Moyerbrailean GA, Davis GO, Wen X, Luca F, Pique-Regi R (2015) QuASAR: quantitative allele-specific analysis of reads. Bioinformatics 31:1235–1242. https://doi.org/10.1093/bioinformatics/btu802
Hasin-Brumshtein Y, Hormozdiari F, Martin L, van Nas A, Eskin E, Lusis AJ, Drake TA (2014) Allele-specific expression and eQTL analysis in mouse adipose tissue. BMC Genom 15:471. https://doi.org/10.1186/1471-2164-15-471
Hu Y et al (2014) PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution. Nucleic Acids Res 42:e20. https://doi.org/10.1093/nar/gkt1304
Hu YJ, Sun W, Tzeng JY, Perou CM (2015) Proper use of allele-specific expression improves statistical power for cis-eQTL mapping with RNA-Seq data. J Am Stat Assoc 110:962–974. https://doi.org/10.1080/01621459.2015.1038449
Hubner N et al (2005) Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nat Genet 37:243–253. https://doi.org/10.1038/ng1522
International Wheat Genome Sequencing C (2014) A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345:1251788. https://doi.org/10.1126/science.1251788
Kang EY et al (2016) Discovering single nucleotide polymorphisms regulating human gene expression using allele specific expression from RNA-seq data. Genetics 204:1057–1064. https://doi.org/10.1534/genetics.115.177246
Karp CL et al (2000) Identification of complement factor 5 as a susceptibility locus for experimental allergic asthma. Nat Immunol 1:221–226. https://doi.org/10.1038/79759
Kirst M, Myburg AA, De Leon JP, Kirst ME, Scott J, Sederoff R (2004) Coordinated genetic regulation of growth and lignin revealed by quantitative trait locus analysis of cDNA microarray data in an interspecific backcross of eucalyptus. Plant Physiol 135:2368–2378. https://doi.org/10.1104/pp.103.037960
Lawn RM et al (1999) The Tangier disease gene product ABC1 controls the cellular apolipoprotein-mediated lipid removal pathway. J Clin Invest 104:R25–R31. https://doi.org/10.1172/JCI8119
Li P, Brutnell TP (2011) Setaria viridis and Setaria italica, model genetic systems for the Panicoid grasses. J Exp Bot 62:3031–3037. https://doi.org/10.1093/jxb/err096
Liu Y, Zhou J, White KP (2014) RNA-seq differential expression studies: more sequence or more replication? Bioinformatics 30:301–304. https://doi.org/10.1093/bioinformatics/btt688
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550. https://doi.org/10.1186/s13059-014-0550-8
Nica AC, Dermitzakis ET (2013) Expression quantitative trait loci: present and future. Philos Trans R Soc Lond B Biol Sci 368:20120362. https://doi.org/10.1098/rstb.2012.0362
Ramasamy A et al (2014) Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat Neurosci 17:1418–1428. https://doi.org/10.1038/nn.3801
Rashid NU, Sun W, Ibrahim JG (2016) A statistical model to assess (allele-specific) associations between gene expression and epigenetic features using sequencing data. Ann Appl Stat 10:2254–2273. https://doi.org/10.1214/16-AOAS973
Robinson MD, Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11:R25. https://doi.org/10.1186/gb-2010-11-3-r25
Schadt EE et al (2003) Genetics of gene expression surveyed in maize, mouse and man. Nature 422:297–302. https://doi.org/10.1038/nature01434
Schadt EE et al (2005) An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet 37:710–717. https://doi.org/10.1038/ng1589
Schadt EE et al (2008) Mapping the genetic architecture of gene expression in human liver. PLoS Biol 6:e107. https://doi.org/10.1371/journal.pbio.0060107
Schmutz J et al (2010) Genome sequence of the palaeopolyploid soybean. Nature 463:178–183. https://doi.org/10.1038/nature08670
Schurch NJ et al (2016) How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA 22:839–851. https://doi.org/10.1261/rna.053959.115
Shabalin AA (2012) Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28:1353–1358. https://doi.org/10.1093/bioinformatics/bts163
Skelly DA, Johansson M, Madeoy J, Wakefield J, Akey JM (2011) A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. Genome Res 21:1728–1737. https://doi.org/10.1101/gr.119784.110
Sun W (2012) A statistical framework for eQTL mapping using RNA-seq data. Biometrics 68:1–11. https://doi.org/10.1111/j.1541-0420.2011.01654.x
Sun W, Hu Y (2013) eQTL mapping using RNA-seq data. Stat Biosci 5:198–219. https://doi.org/10.1007/s12561-012-9068-3
van de Geijn B, McVicker G, Gilad Y, Pritchard JK (2015) WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat Methods 12:1061–1063. https://doi.org/10.1038/nmeth.3582
Wayne ML, McIntyre LM (2002) Combining mapping and arraying: an approach to candidate gene identification. Proc Natl Acad Sci U S A 99:14903–14906. https://doi.org/10.1073/pnas.222549199
Yan H, Yuan W, Velculescu VE, Vogelstein B, Kinzler KW (2002) Allelic variation in human gene expression. Science 297:1143. https://doi.org/10.1126/science.1072545
This project was supported by US Department of Energy Grant DE-SC0010743.
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Communicated by Christine A Hackett.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Fan, KH., Devos, K.M. & Schliekelman, P. Strategies for eQTL mapping in allopolyploid organisms. Theor Appl Genet 133, 2477–2497 (2020). https://doi.org/10.1007/s00122-020-03612-1