Abstract
As RNA-seq is replacing gene expression microarrays to assess genome-wide transcription abundance, gene expression Quantitative Trait Locus (eQTL) studies using RNA-seq have emerged. RNA-seq delivers two novel features that are important for eQTL studies. First, it provides information on allele-specific expression (ASE), which is not available from gene expression microarrays. Second, it generates unprecedentedly rich data to study RNA-isoform expression. In this paper, we review current methods for eQTL mapping using ASE and discuss some future directions. We also review existing works that use RNA-seq data to study RNA-isoform expression and we discuss the gaps between these works and isoform-specific eQTL mapping.
Similar content being viewed by others
References
Ameur A, Wetterbom A, Feuk L, Gyllensten U (2010) Global and unbiased detection of splice junctions from RNA-seq data. Genome Biol 11(3):R34
Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11(10):R106
Au K, Jiang H, Lin L, Xing Y, Wong W (2010) Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res 38(14):4570–4578
Auer P, Doerge R (2011) A two-stage Poisson model for testing RNA-seq data. Stat Appl Genet Mol Biol 10(1):26
Birol I, Jackman S, Nielsen C, Qian J, Varhol R, Stazyk G, Morin R, Zhao Y, Hirst M, Schein J et al. (2009) De novo transcriptome assembly with ABySS. Bioinformatics 25(21):2872
Bohnert R, Rätsch G (2010) rQuant. web: a tool for RNA-seq-based transcript quantitation. Nucleic Acids Res 38(Suppl 2):W348–W351
Brem RB, Yvert G, Clinton R, Kruglyak L (2002) Genetic dissection of transcriptional regulation in budding yeast. Science 296(5568):752–755
Browning S, Browning B (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81(5):1084–1097
Chesler EJ, Lu L, Shou S, Qu Y, Gu J, Wang J, Hsu HC, Mountz JD, Baldwin NE, Langston MA, Threadgill DW, Manly KF, Williams RW (2005) Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat Genet 37(3):233–242
Cloonan N, Forrest A, Kolle G, Gardiner B, Faulkner G, Brown M, Taylor D, Steptoe A, Wani S, Bethel G et al. (2008) Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 5(7):613–619
Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (2009) Mapping complex disease traits with global gene expression. Nat Rev Genet 10(3):184–194
Cowles CR, Hirschhorn JN, Altshuler D, Lander ES (2002) Detection of regulatory variation in mouse genes. Nat Genet 32(3):432–437
De Bona F, Ossowski S, Schneeberger K, Rätsch G (2008) Optimal spliced alignments of short sequence reads. BMC Bioinform 9(Suppl 10):O7
Degner J, Marioni J, Pai A, Pickrell J, Nkadori E, Gilad Y, Pritchard J (2009) Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25(24):3207
Denoeud F, Aury J, Da Silva C, Noel B, Rogier O, Delledonne M, Morgante M, Valle G, Wincker P Scarpelli C et al. (2008) Annotating genomes with massive-scale RNA sequencing. Genome Biol 9(12):R175
Doss S, Schadt E, Drake T, Lusis A (2005) Cis-acting expression quantitative trait loci in mice. Genome Res 15(5):681
Durbin R, Altshuler D, Abecasis G, Bentley D, Chakravarti A, Clark A, Collins F, De La Vega F, Donnelly P, Egholm M et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467(7319):1061–1073
Emilsson V, Thorleifsson G, Zhang B, Leonardson A, Zink F, Zhu J, Carlson S, Helgason A, Walters G, Gunnarsdottir S et al. (2008) Genetics of gene expression and its effect on disease. Nature 452(7186):423–428
Fan H, Wang J, Potanina A, Quake S (2010) Whole-genome molecular haplotyping of single cells. Nat Biotechnol 29(1):51–57
Flicek P, Amode M, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S et al. (2011) Ensembl 2011. Nucleic Acids Res 39(Suppl 1):D800
Garber M, Grabherr M, Guttman M, Trapnell C (2011) Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods 8(6):469–477
Garcia-Blanco M, Baraniak A, Lasda E (2004) Alternative splicing in disease and therapy. Nat Biotechnol 22(5):535–546
Ge B, Pokholok DK, Kwan T, Grundberg E, Morcos L, Verlaan DJ, Le J, Koka V, Lam KC, Gagn V, Dias J, Hoberman R, Montpetit A, Joly MM, Harvey EJ, Sinnett D, Beaulieu P, Hamon R, Graziani A, Dewar K, Harmsen E, Majewski J, Güring HH, Naumova AK, Blanchette M, Gunderson KL, Pastinen T (2009) Global patterns of cis variation in human cells revealed by high-density allelic expression analysis. Nat Genet 41(11):1216–1222
Gimelbrant A, Hutchinson JN, Thompson BR, Chess A (2007) Widespread monoallelic expression on human autosomes. Science 318(5853):1136–1140
Gregg C, Zhang J, Weissbourd B, Luo S, Schroth G, Haig D, Dulac C (2010) High-resolution analysis of parent-of-origin allelic expression in the mouse brain. Science 329(5992):643
Griffith M, Griffith O, Mwenifumbo J, Goya R, Morrissy A, Morin R, Corbett R, Tang M, Hou Y, Pugh T et al. (2010) Alternative expression analysis by RNA sequencing. Nat Methods 7(10):843–847
Guttman M, Garber M, Levin J, Donaghey J, Robinson J, Adiconis X, Fan L, Koziol M, Gnirke A, Nusbaum C et al. (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28(5):503–510
Hansen K, Brenner S, Dudoit S (2010) Biases in illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38(12):e131
Hardcastle T, Kelly K (2010) baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinform 11(1):422
Hosokawa Y, Arnold A (1998) Mechanism of cyclin d1 (ccnd1, prad1) overexpression in human cancer cells: analysis of allele-specific expression. Genes Chromosomes Cancer 22(1):66–71
Huang R, Duan S, Bleibel W, Kistner E, Zhang W, Clark T, Chen T, Schweitzer A, Blume J, Cox N et al. (2007) A genome-wide approach to identify genetic variants that contribute to etoposide-induced cytotoxicity. Proc Natl Acad Sci 104(23):9758
Jiang H, Wong W (2009) Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25(8):1026
Johnson J, Castle J, Garrett-Engele P, Kan Z, Loerch P, Armour C, Santos R, Schadt E, Stoughton R, Shoemaker D (2003) Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 302(5653):2141
Katz Y, Wang E, Airoldi E, Burge C (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7(12):1009–1015
Kitzman J, MacKenzie A, Adey A, Hiatt J, Patwardhan R, Sudmant P, Ng S, Alkan C, Qiu R, Eichler E et al. (2010) Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat Biotechnol 29(1):59–63
Lander E, Linton L, Birren B, Nusbaum C, Zody M, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al. (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921
Lang J (1996) On the comparison of multinomial and Poisson log-linear models. Journal of the Royal Statistical Society Series B (Methodological), 253–266
Lee S, Seo C, Lim B, Yang J, Oh J, Kim M, Lee S, Lee B, Kang C, Lee S (2011) Accurate quantification of transcriptome from RNA-seq data by effective length normalization. Nucleic Acids Res 39(2):e9
Li B, Ruotti V, Stewart R, Thomson J, Dewey C (2010) RNA-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4):493–500
Li J, Jiang C, Hu Y, Brown B, Huang H, Bickel P (2011, in press) Sparse linear modeling of RNA-seq data for isoform discovery and abundance estimation. Proc Natl Acad Sci USA
Li J, Jiang H, Wong W (2010) Modeling non-uniformity in short-read rates in RNA-seq data. Genome Biol 11(5):R25
Li W, Feng J, Jiang T (2011) IsoLasso: a lasso regression approach to RNA-seq based transcriptome assembly. Research in Computational Molecular Biology, 168–188
Li Y, Alvarez OA, Gutteling EW, Tijsterman M, Fu J, Riksen JAG, Hazendonk E, Prins P, Plasterk RHA, Jansen RC, Breitling R, Kammenga JE (2006) Mapping determinants of gene expression plasticity by genetical genomics in C elegans. PLoS Genet 2(12):e222
Li Y, Willer C, Ding J, Scheet P, Abecasis G (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34(8):816–834
Lo HS, Wang Z, Hu Y, Yang HH, Gere S, Buetow KH, Lee MP (2003) Allelic variation in gene expression is common in the human genome. Genome Res 13(8):1855–1862
Marchini J, Cutler D, Patterson N, Stephens M, Eskin E, Halperin E, Lin S, Qin Z, Munro H, Abecasis G et al. (2006) A comparison of phasing algorithms for trios and unrelated individuals. Am J Hum Genet 78(3):437–450
Marioni J, Mason C, Mane S, Stephens M, Gilad Y (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18(9):1509–1517
McManus C, Coolon J, Duff M, Eipper-Mains J, Graveley B, Wittkopp P (2010) Regulatory divergence in drosophila revealed by mRNA-seq. Genome Res 20(6):816–825
Meyer K, Maia A, O’Reilly M, Teschendorff A, Chin S, Caldas C, Ponder B (2008) Allele-specific up-regulation of FGFR2 increases susceptibility to breast cancer. PLoS Biol 6(5):e108
Montgomery S, Sammeth M, Gutierrez-Arcelus M, Lach R, Ingle C, Nisbett J, Guigo R, Dermitzakis E (2010) Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464(7289):773–777
Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5(7):621–628
Ozsolak F, Milos P (2010) RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 12(2):87–98
Pachter L (2011) Models for transcript quantification from RNA-seq. arXiv:1104.3889
Pan Q, Shai O, Lee L, Frey B, Blencowe B (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40(12):1413–1415
Pastinen T (2010) Genome-wide allele-specific analysis: insights into regulatory variation. Nat Rev Genet 11(8):533–538
Peng J, Zhu J, Bergamaschi A, Han W, Noh DY, Pollack J, Wang P (2010) Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann Appl Stat 4(1):53–77
Petretto E, Mangion J, Dickens NJ, Cook SA, Kumaran MK, Lu H, Fischer J, Maatz H, Kren V, Pravenec M, Hubner N, Aitman TJ (2006) Heritability and tissue specificity of expression quantitative trait loci. PLoS Genet 2(10):e172
Pickrell J, Marioni J, Pai A, Degner J, Engelhardt B, Nkadori E, Veyrieras J, Stephens M, Gilad Y, Pritchard J (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464(7289):768–772
Richard H, Schulz M, Sultan M, Nürnberger A, Schrinner S, Balzereit D, Dagand E, Rasche A, Lehrach H, Vingron M et al. (2010) Prediction of alternative isoforms from exon expression levels in RNA-seq experiments. Nucleic Acids Res 38(10):e112
Roberts A, Trapnell C, Donaghey J, Rinn J, Pachter L et al. (2011) Improving RNA-seq expression estimates by correcting for fragment bias. Genome Biol 12(3):R22
Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman S, Mungall K, Lee S, Okada H, Qian J et al. (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7(11):909–912
Robinson M, McCarthy D, Smyth G (2010) edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140
Rockman M, Kruglyak L (2006) Genetics of global gene expression. Nat Rev Genet 7(11):862–872
Ronald J, Brem R, Whittle J, Kruglyak L (2005) Local regulatory variation in Saccharomyces cerevisiae. PLoS Genet 1(2):e25
Salzman J, Jiang H, Wong W (2011) Statistical modeling of RNA-seq data. Stat Sci 26(1):62–83
Schadt E, Molony C, Chudin E, Hao K, Yang X, Lum P, Kasarskis A, Zhang B, Wang S, Suver C et al. (2008) Mapping the genetic architecture of gene expression in human liver. PLoS Biol 6(5):e107
Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V, Ruff TG, Milligan SB, Lamb JR, Cavet G, Linsley PS, Mao M, Stoughton RB, Friend SH (2003) Genetics of gene expression surveyed in maize, mouse and man. Nature 422(6929):297–302
Shen S, Warzecha C, Carstens R, Xing Y (2010) MADS+: discovery of differential splicing events from Affymetrix exon junction array data. Bioinformatics 26(2):268
Simpson J, Wong K, Jackman S, Schein J, Jones S, Birol İ (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19(6):1117
Singh D, Orellana C, Hu Y, Jones C, Liu Y, Chiang D, Liu J, Prins J (2011) FDM: a graph-based statistical method to detect differential transcription using RNA-seq data. Bioinformatics 27(19):2633–2640
Skelly DA, Johansson M, Madeoy J, Wakefield J, Akey JM (2011) A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. Genome Res 21(10):1728–1737
Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG (2007) Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet 39(2):226–231
Srivastava S, Chen L (2010) A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res 38(17):e170
Stranger B, Forrest M, Dunning M, Ingle C, Beazley C, Thorne N, Redon R, Bird C, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer S, Tavare S, Deloukas P, Hurles M, Dermitzakis E (2007) Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315:848–853
Stranger B, Nica A, Forrest M, Dimas A, Bird C, Beazley C, Ingle C, Dunning M, Flicek P, Koller D et al. (2007) Population genomics of human gene expression. Nat Genet 39(10):1217–1224
Sultan M, Schulz M, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D et al. (2008) A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321(5891):956
Sun W (2011, in press) A statistical framework for EQTL mapping using RNA-seq data. Biometrics
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc B 58(1):267–288
Trapnell C, Pachter L, Salzberg S (2009) TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25(9):1105
Trapnell C, Williams B, Pertea G, Mortazavi A, Kwan G, van Baren M, Salzberg S, Wold B, Pachter L (2010) Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515
Turro E, Su S, Gonçalves Â, Coin L, Richardson S, Lewin A (2011) Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads. Genome Biol 12(2):R13
Valle L, Serena-Acedo T, Liyanarachchi S, Hampel H, Comeras I, Li Z, Zeng Q, Zhang H, Pennison M, Sadim M et al. (2008) Germline allele-specific expression of tgfbr1 confers an increased risk of colorectal cancer. Science 321(5894):1361
Venables J (2004) Aberrant and alternative splicing in cancer. Cancer Res 64(21):7647
Wang E, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore S, Schroth G, Burge C (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221):470–476
Wang E, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore S, Schroth G, Burge C (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221):470–476
Wang G, Cooper T (2007) Splicing in disease: disruption of the splicing code and the decoding machinery. Nat Rev Genet 8(10):749–761
Wang K, Singh D, Zeng Z, Coleman S, Huang Y, Savich G, He X, Mieczkowski P, Grimm S, Perou C et al. (2010) Mapsplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38(18):e178
Wang L, Feng Z, Wang X, Wang X, Zhang X (2010) Degseq: an r package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 26(1):136–138
Wang S, Yehya N, Schadt EE, Wang H, Drake TA, Lusis AJ (2006) Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genet 2(2):e15
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63
Wittkopp P, Haerum B, Clark A (2004) Evolutionary changes in cis and trans gene regulation. Nature 430(6995):85–88
Wu Z, Wang X, Zhang X (2011) Using non-uniform read distribution models to improve isoform expression inference in RNA-seq. Bioinformatics 27(4):502
Xia Z, Wen J, Chang C, Zhou X (2011) NSMAP: a method for spliced isoforms identification and quantification from RNA-seq. BMC Bioinform 12(1):162
Xiao R, Scott L (2011) Detection of cis-acting regulatory SNPs using allelic expression data. Genet Epidemiol 35:515–525
Xing Y, Stoilov P, Kapur K, Han A, Jiang H, Shen S, Black D, Wong W (2008) MADS: a new and improved method for analysis of differential alternative splicing by exon-tiling microarrays. RNA 14(8):1470–1479
Xing Y, Yu T, Wu Y, Roy M, Kim J, Lee C (2006) An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res 34(10):3150
Yang H, Chen X, Wong W (2011) Completely phased genome sequencing through chromosome sorting. Proc Natl Acad Sci 108(1):12
Yin J, Li H (2011) A sparse conditional Gaussian graphical model for analysis of genetical genomics data. Ann Appl Stat 5(4):2630–2650
Zerbino D, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18(5):821–829
Zhang K, Li JB, Gao Y, Egli D, Xie B, Deng J, Li Z, Lee JH, Aach J, Leproust EM, Eggan K, Church GM (2009) Digital RNA allelotyping reveals tissue-specific and allele-specific gene expression in human. Nat Methods 6(8):613–618
Zhao Q, Kirkness E, Caballero O, Galante P, Parmigiani R, Edsall L, Kuan S, Ye Z, Levy S, Vasconcelos A et al. (2010) Systematic detection of putative tumor suppressor genes through the combined use of exome and transcriptome sequencing. Genome Biol 11(11):R114
Zheng S, Chen L (2009) A hierarchical Bayesian model for comparing transcriptomes at the individual transcript isoform level. Nucleic Acids Res 37(10):e75
Zhong H, Yang X, Kaplan L, Molony C, Schadt E (2010) Integrating pathway analysis and genetics of gene expression for genome-wide association studies. Am J Hum Genet 86(4):581–591
Acknowledgements
We appreciate constructive comments and suggestions from an associate editor and an anonymous reviewer.
Wei Sun’s research is supported in part by the NIH Grant R01MH090936 and EPA Grant for Carolina Center for Computational Toxicology (RD-83382501). Dr. Hu’s research is supported in part by an internal grant from Emory University.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sun, W., Hu, Y. eQTL Mapping Using RNA-seq Data. Stat Biosci 5, 198–219 (2013). https://doi.org/10.1007/s12561-012-9068-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-012-9068-3