Abstract
RNA-seq technology enables large-scale studies of allele-specific expression (ASE), or the expression difference between maternal and paternal alleles. Here, we study ASE in animals for which parental RNA-seq data are available. While most methods for determining ASE rely on read alignment, read alignment either leads to reference bias or requires knowledge of genomic variants in each parental strain. When RNA-seq data are available for both parental strains of a hybrid animal, it is possible to infer ASE with minimal reference bias and without knowledge of parental genomic variants. Our approach first uses parental RNA-seq reads to discover maternal and paternal versions of transcript sequences. Using these alternative transcript sequences as features, we estimate abundance levels of transcripts in the hybrid animal using a modified lasso linear regression model.
We tested our methods on synthetic data from the mouse transcriptome and compared our results with those of Trinity, a state-of-the-art de novo RNA-seq assembler. Our methods achieved high sensitivity and specificity in both identifying expressed transcripts and transcripts exhibiting ASE. We also ran our methods on real RNA-seq mouse data from two F1 samples with wild-derived parental strains and were able to validate known genes exhibiting ASE, as well as confirm the expected maternal contribution ratios in all genes and genes on the X chromosome.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm, Citeseer (1994)
Chadwick, L.H., Pertz, L.M., Broman, K.W., Bartolomei, M.S., Willard, H.F.: Genetic control of x chromosome inactivation in mice: definition of the xce candidate interval. Genetics 173(4), 2103–2110 (2006)
Chinwalla, A.T., Cook, L.L., Delehaunty, K.D., Fewell, G.A., Fulton, L.A., Fulton, R.S., Graves, T.A., Hillier, L.D.W., Mardis, E.R., McPherson, J.D., et al.: Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915), 520–562 (2002)
Church, D.M., Goodstadt, L., Hillier, L.W., Zody, M.C., Goldstein, S., She, X., Bult, C.J., Agarwala, R., Cherry, J.L., DiCuccio, M., et al.: Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biology 7(5), e1000112 (2009)
de Bruijn, N.G., Erdos, P.: A combinatorial problem. Koninklijke Netherlands: Academe Van Wetenschappen 49, 758–764 (1946)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. The Annals of Statistics 32(2), 407–499 (2004)
Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., et al.: Full-length transcriptome assembly from rna-seq data without a reference genome. Nature Biotechnology 29(7), 644–652 (2011)
Gregg, C., Zhang, J., Weissbourd, B., Luo, S., Schroth, G.P., Haig, D., Dulac, C.: High-resolution analysis of parent-of-origin allelic expression in the mouse brain. Science 329(5992), 643–648 (2010)
Griebel, T., Zacher, B., Ribeca, P., Raineri, E., Lacroix, V., Guigó, R., Sammeth, M.: Modelling and simulating generic rna-seq experiments with the flux simulator. Nucleic Acids Research 40(20), 10073–10083 (2012)
Guttman, M., Garber, M., Levin, J.Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M.J., Gnirke, A., Nusbaum, C., et al.: Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas. Nature Biotechnology 28(5), 503–510 (2010)
Hastie, T., Tibshirani, R., Friedman, J., Franklin, J.: The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer 27(2), 83–85 (2005)
Huang, S., Kao, C.-Y., McMillan, L., Wang, W.: Transforming genomes using mod files with applications. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM (2013)
Keane, T.M., Goodstadt, L., Danecek, P., White, M.A., Wong, K., Yalcin, B., Heger, A., Agam, A., Slater, G., Goodson, M., et al.: Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477(7364), 289–294 (2011)
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L., et al.: Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biol. 10(3), R25 (2009)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions. Technical report, and reversals. Technical Report 8 (1966)
Li, B., Dewey, C.N.: Rsem: accurate transcript quantification from rna-seq data with or without a reference genome. BMC Bioinformatics 12(1), 323 (2011)
Li, W., Feng, J., Jiang, T.: Isolasso: a lasso regression approach to rna-seq based transcriptome assembly. Journal of Computational Biology 18(11), 1693–1707 (2011)
Li, Y., Osher, S.: Coordinate descent optimization for l-1 minimization with application to compressed sensing; a greedy algorithm. Inverse Probl. Imaging 3(3), 487–503 (2009)
Liu, R., Maia, A.-T., Russell, R., Caldas, C., Ponder, B.A., Ritchie, M.E.: Allele-specific expression analysis methods for high-density snp microarray data. Bioinformatics 28(8), 1102–1108 (2012)
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization 22(2), 341–362 (2012)
Robertson, G., Schein, J., Chiu, R., Corbett, R., Field, M., Jackman, S.D., Mungall, K., Lee, S., Okada, H.M., Qian, J.Q., et al.: De novo assembly and analysis of rna-seq data. Nature Methods 7(11), 909–912 (2010)
Ronald, J., Akey, J.M., Whittle, J., Smith, E.N., Yvert, G., Kruglyak, L.: Simultaneous genotyping, gene-expression measurement, and detection of allele-specific expression with oligonucleotide arrays. Genome Research 15(2), 284–291 (2005)
Rozowsky, J., Abyzov, A., Wang, J., Alves, P., Raha, D., Harmanci, A., Leng, J., Bjornson, R., Kong, Y., Kitabayashi, N., et al.: Alleleseq: analysis of allele-specific expression and binding in a network framework. Molecular Systems Biology 7(1) (2011)
Skelly, D.A., Johansson, M., Madeoy, J., Wakefield, J., Akey, J.M.: A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from rna-seq data. Genome Research 21(10), 1728–1737 (2011)
Trapnell, C., Pachter, L., Salzberg, S.L.: Tophat: discovering splice junctions with rna-seq. Bioinformatics 25(9), 1105–1111 (2009)
Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., Pachter, L.: Differential gene and transcript expression analysis of rna-seq experiments with tophat and cufflinks. Nature Protocols 7(3), 562–578 (2012)
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., Van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology 28(5), 511–515 (2010)
Wang, X., Soloway, P.D., Clark, A.G., et al.: Paternally biased x inactivation in mouse neonatal brain. Genome Biol. 11(7), R79 (2010)
Wang, X., Sun, Q., McGrath, S.D., Mardis, E.R., Soloway, P.D., Clark, A.G.: Transcriptome-wide identification of novel imprinted genes in neonatal mouse brain. PloS One 3(12), e3839 (2008)
Yang, H., Wang, J.R., Didion, J.P., Buus, R.J., Bell, T.A., Welsh, C.E., Bonhomme, F., Yu, A.H.T., Nachman, M.W., Pialek, J., et al.: Subspecific origin and haplotype diversity in the laboratory mouse. Nature Genetics 43(7), 648–655 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Fu, CP., Jojic, V., McMillan, L. (2014). An Alignment-Free Regression Approach for Estimating Allele-Specific Expression Using RNA-Seq Data. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-05269-4_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05268-7
Online ISBN: 978-3-319-05269-4
eBook Packages: Computer ScienceComputer Science (R0)