Skip to main content

An Alignment-Free Regression Approach for Estimating Allele-Specific Expression Using RNA-Seq Data

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2014)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8394))

Abstract

RNA-seq technology enables large-scale studies of allele-specific expression (ASE), or the expression difference between maternal and paternal alleles. Here, we study ASE in animals for which parental RNA-seq data are available. While most methods for determining ASE rely on read alignment, read alignment either leads to reference bias or requires knowledge of genomic variants in each parental strain. When RNA-seq data are available for both parental strains of a hybrid animal, it is possible to infer ASE with minimal reference bias and without knowledge of parental genomic variants. Our approach first uses parental RNA-seq reads to discover maternal and paternal versions of transcript sequences. Using these alternative transcript sequences as features, we estimate abundance levels of transcripts in the hybrid animal using a modified lasso linear regression model.

We tested our methods on synthetic data from the mouse transcriptome and compared our results with those of Trinity, a state-of-the-art de novo RNA-seq assembler. Our methods achieved high sensitivity and specificity in both identifying expressed transcripts and transcripts exhibiting ASE. We also ran our methods on real RNA-seq mouse data from two F1 samples with wild-derived parental strains and were able to validate known genes exhibiting ASE, as well as confirm the expected maternal contribution ratios in all genes and genes on the X chromosome.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm, Citeseer (1994)

    Google Scholar 

  2. Chadwick, L.H., Pertz, L.M., Broman, K.W., Bartolomei, M.S., Willard, H.F.: Genetic control of x chromosome inactivation in mice: definition of the xce candidate interval. Genetics 173(4), 2103–2110 (2006)

    Article  Google Scholar 

  3. Chinwalla, A.T., Cook, L.L., Delehaunty, K.D., Fewell, G.A., Fulton, L.A., Fulton, R.S., Graves, T.A., Hillier, L.D.W., Mardis, E.R., McPherson, J.D., et al.: Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915), 520–562 (2002)

    Article  Google Scholar 

  4. Church, D.M., Goodstadt, L., Hillier, L.W., Zody, M.C., Goldstein, S., She, X., Bult, C.J., Agarwala, R., Cherry, J.L., DiCuccio, M., et al.: Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biology 7(5), e1000112 (2009)

    Google Scholar 

  5. de Bruijn, N.G., Erdos, P.: A combinatorial problem. Koninklijke Netherlands: Academe Van Wetenschappen 49, 758–764 (1946)

    MATH  Google Scholar 

  6. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. The Annals of Statistics 32(2), 407–499 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  7. Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., et al.: Full-length transcriptome assembly from rna-seq data without a reference genome. Nature Biotechnology 29(7), 644–652 (2011)

    Article  Google Scholar 

  8. Gregg, C., Zhang, J., Weissbourd, B., Luo, S., Schroth, G.P., Haig, D., Dulac, C.: High-resolution analysis of parent-of-origin allelic expression in the mouse brain. Science 329(5992), 643–648 (2010)

    Article  Google Scholar 

  9. Griebel, T., Zacher, B., Ribeca, P., Raineri, E., Lacroix, V., Guigó, R., Sammeth, M.: Modelling and simulating generic rna-seq experiments with the flux simulator. Nucleic Acids Research 40(20), 10073–10083 (2012)

    Article  Google Scholar 

  10. Guttman, M., Garber, M., Levin, J.Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M.J., Gnirke, A., Nusbaum, C., et al.: Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas. Nature Biotechnology 28(5), 503–510 (2010)

    Article  Google Scholar 

  11. Hastie, T., Tibshirani, R., Friedman, J., Franklin, J.: The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer 27(2), 83–85 (2005)

    Article  Google Scholar 

  12. Huang, S., Kao, C.-Y., McMillan, L., Wang, W.: Transforming genomes using mod files with applications. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM (2013)

    Google Scholar 

  13. Keane, T.M., Goodstadt, L., Danecek, P., White, M.A., Wong, K., Yalcin, B., Heger, A., Agam, A., Slater, G., Goodson, M., et al.: Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477(7364), 289–294 (2011)

    Article  Google Scholar 

  14. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L., et al.: Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biol. 10(3), R25 (2009)

    Google Scholar 

  15. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions. Technical report, and reversals. Technical Report 8 (1966)

    Google Scholar 

  16. Li, B., Dewey, C.N.: Rsem: accurate transcript quantification from rna-seq data with or without a reference genome. BMC Bioinformatics 12(1), 323 (2011)

    Article  Google Scholar 

  17. Li, W., Feng, J., Jiang, T.: Isolasso: a lasso regression approach to rna-seq based transcriptome assembly. Journal of Computational Biology 18(11), 1693–1707 (2011)

    Article  MathSciNet  Google Scholar 

  18. Li, Y., Osher, S.: Coordinate descent optimization for l-1 minimization with application to compressed sensing; a greedy algorithm. Inverse Probl. Imaging 3(3), 487–503 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  19. Liu, R., Maia, A.-T., Russell, R., Caldas, C., Ponder, B.A., Ritchie, M.E.: Allele-specific expression analysis methods for high-density snp microarray data. Bioinformatics 28(8), 1102–1108 (2012)

    Article  Google Scholar 

  20. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization 22(2), 341–362 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  21. Robertson, G., Schein, J., Chiu, R., Corbett, R., Field, M., Jackman, S.D., Mungall, K., Lee, S., Okada, H.M., Qian, J.Q., et al.: De novo assembly and analysis of rna-seq data. Nature Methods 7(11), 909–912 (2010)

    Article  Google Scholar 

  22. Ronald, J., Akey, J.M., Whittle, J., Smith, E.N., Yvert, G., Kruglyak, L.: Simultaneous genotyping, gene-expression measurement, and detection of allele-specific expression with oligonucleotide arrays. Genome Research 15(2), 284–291 (2005)

    Article  Google Scholar 

  23. Rozowsky, J., Abyzov, A., Wang, J., Alves, P., Raha, D., Harmanci, A., Leng, J., Bjornson, R., Kong, Y., Kitabayashi, N., et al.: Alleleseq: analysis of allele-specific expression and binding in a network framework. Molecular Systems Biology 7(1) (2011)

    Google Scholar 

  24. Skelly, D.A., Johansson, M., Madeoy, J., Wakefield, J., Akey, J.M.: A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from rna-seq data. Genome Research 21(10), 1728–1737 (2011)

    Article  Google Scholar 

  25. Trapnell, C., Pachter, L., Salzberg, S.L.: Tophat: discovering splice junctions with rna-seq. Bioinformatics 25(9), 1105–1111 (2009)

    Article  Google Scholar 

  26. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., Pachter, L.: Differential gene and transcript expression analysis of rna-seq experiments with tophat and cufflinks. Nature Protocols 7(3), 562–578 (2012)

    Article  Google Scholar 

  27. Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., Van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology 28(5), 511–515 (2010)

    Article  Google Scholar 

  28. Wang, X., Soloway, P.D., Clark, A.G., et al.: Paternally biased x inactivation in mouse neonatal brain. Genome Biol. 11(7), R79 (2010)

    Google Scholar 

  29. Wang, X., Sun, Q., McGrath, S.D., Mardis, E.R., Soloway, P.D., Clark, A.G.: Transcriptome-wide identification of novel imprinted genes in neonatal mouse brain. PloS One 3(12), e3839 (2008)

    Google Scholar 

  30. Yang, H., Wang, J.R., Didion, J.P., Buus, R.J., Bell, T.A., Welsh, C.E., Bonhomme, F., Yu, A.H.T., Nachman, M.W., Pialek, J., et al.: Subspecific origin and haplotype diversity in the laboratory mouse. Nature Genetics 43(7), 648–655 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Fu, CP., Jojic, V., McMillan, L. (2014). An Alignment-Free Regression Approach for Estimating Allele-Specific Expression Using RNA-Seq Data. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05269-4_6

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05268-7

  • Online ISBN: 978-3-319-05269-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics