Skip to main content

A Robust Method for Transcript Quantification with RNA-seq Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7262))

Abstract

The advent of high throughput RNA-seq technology allows deep sampling of the transcriptome, making it possible to characterize both the diversity and the abundance of transcript isoforms. Accurate abundance estimation or transcript quantification of isoforms is critical for downstream differential analysis (e.g. healthy vs. diseased cells), but remains a challenging problem for several reasons. First, while various types of algorithms have been developed for abundance estimation, short reads often do not uniquely identify the transcript isoforms from which they were sampled. As a result, the quantification problem may not be identifiable, i.e. lacks a unique transcript solution even if the read maps uniquely to the reference genome. In this paper, we develop a general linear model for transcript quantification that leverages reads spanning multiple splice junctions to ameliorate identifiability. Second, RNA-seq reads sampled from the transcriptome exhibit unknown position-specific and sequence-specific biases. We extend our method to simultaneously learn bias parameters during transcript quantification to improve accuracy. Third, transcript quantification is often provided with a candidate set of isoforms, not all of which are likely to be significantly expressed in a given tissue type or condition. By resolving the linear system with LASSO our approach can infer an accurate set of dominantly expressed transcripts while existing methods tend to assign positive expression to every candidate isoform. Using simulated RNA-seq datasets, our method demonstrated better quantification accuracy than existing methods. The application of our method on real data experimentally demonstrated that transcript quantification is effective for differential analysis of transcriptomes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cufflinks, http://cufflinks.cbcb.umd.edu

  2. Ensembl Genome Browser, http://useast.ensembl.org/index.html

  3. NCBI Reference Sequence (RefSeq), http://www.ncbi.nlm.nih.gov/RefSeq

  4. Roberts, A., Trapnell, C., Donaghey, J., Rinn, J., Pachter, L.: Improving rna-seq expression estimates by correcting for fragment bias. Genome Biology 12(3), R22 (2011)

    Article  Google Scholar 

  5. Bejerano, G.: Algorithms for variable length markov chain modeling. Bioinformatics 20, 788–789 (2004)

    Article  Google Scholar 

  6. Bohnert, R., Gunnar, R.: rquant.web: a tool for rna-seq-based transcript quantitation. Nucleic Acids Research 38(suppl. 2), W348–W351 (2010)

    Article  Google Scholar 

  7. Brosseau, J.-P., Lucier, J.-F., Lapointe, E., Durand, M., Gendron, D., Gervais-Bird, J., Tremblay, K., Perreault, J.-P., Elela, S.A.: High-throughput quantification of splicing isoforms. RNA Society 16, 442–449 (2010)

    Article  Google Scholar 

  8. Feng, J., Li, W., Jiang, T.: Inference of Isoforms from Short Sequence Reads. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 138–157. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Fox-Walsh, K.L., Dou, Y., Lam, B.J., Hung, S.-P., Baldi, P.F., Herte, K.J.: The architecture of pre-mrnas affects mechanisms of splice-site pairing. Proc. Natl. Acad. Sci. 102(45), 16176–16181 (2005)

    Article  Google Scholar 

  10. Guttman, M., Garber, M., Levin, J.Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M.J., Gnirke, A., Nusbaum, C., Rinn, J.L., Lander, E.S., Regev, A.: Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas. Nature Biotechnology 28, 503–510 (2010)

    Article  Google Scholar 

  11. Horn, R.A., Johnson, C.R.: Matrix analysis. Cambridge University Press (1990)

    Google Scholar 

  12. Hu, Y., Wang, K., He, X., Chiang, D.Y., Prins, J.F., Liu, J.: A probabilistic framework for aligning paired-end rna-seq data. Bioinformatics 26, 1950–1957 (2010)

    Article  Google Scholar 

  13. Jiang, H., Wong, W.H.: Statistical inferences for isoform expression in rna-seq. Bioinformatics 25, 1026–1032 (2009)

    Article  Google Scholar 

  14. Kozarewa, I., Ning, Z., Quail, M.A., Sanders, M.J., Berriman, M., Turner, D.J.: Amplification-free illumina sequencing-library preparation facilitates improved mapping and assembly of (g+c)-biased genomes. Nuc. 6, 291–295 (2009)

    Google Scholar 

  15. Shi, L., Reid, L.H., Jones, W.D., et al.: The microarray quality control (maqc) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology 24(9), 1151–1161 (2006)

    Article  Google Scholar 

  16. Lacroix, V., Sammeth, M., Guigo, R., Bergeron, A.: Exact Transcriptome Reconstruction from Short Sequence Reads. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 50–63. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  17. Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., Dewey, C.N.: Rna-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26 (4), 493–500 (2010)

    Article  Google Scholar 

  18. Li, J., Jiang, H., Wong, W.H.: Modeling non-uniformity in short-read rates in rna-seq data. Genome Biology 11 (2010)

    Google Scholar 

  19. Li, W., Feng, J., Jiang, T.: IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 168–188. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  20. Lia, J.J., Jiangb, C.-R., Browna, J.B., Huanga, H., Bickela, P.J.: Sparse linear modeling of next-generation mrna sequencing (rna-seq) data for isoform discovery and abundance estimation. PNAS (2011)

    Google Scholar 

  21. Olejniczak, M., Galka, P., Krzyzosiak, W.J.: Sequence-non-specific effects of rna interference triggers and microrna regulators. Nucl. Acids Res. 38(1), 1–16 (2010)

    Article  Google Scholar 

  22. Nicolae, M., Mangul, S., Mandoiu, I.I., Zelikovsky, A.: Estimation of alternative splicing isoform frequencies from rna-seq data. Algorithms for Molecular Biology 6, 9 (2011)

    Article  Google Scholar 

  23. Pan, Q., Shai, O., Lee, L.J., Frey, B.J., Blencowe, B.J.: Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature Genetics 40, 1413–1415 (2008)

    Article  Google Scholar 

  24. Richard, H., Schulz, M.H., Sultan, M., Nrnberger, A., Schrinner, S., Balzereit, D., Dagand, E., Rasche, A., Lehrach, H., Vingron, M., Haas, S.A., Yaspo, M.-L.: Prediction of alternative isoforms from exon expression levels in rna-seq experiments. Nucleic Acids Research 38, e112 (2010)

    Article  Google Scholar 

  25. Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L., Pachter, L.: Improving rna-seq expression estimates by correcting for fragment bias. Genome Biology 12, R22 (2011)

    Article  Google Scholar 

  26. Russell, S., Norvig, P.: Artificial intelligence: A modern approach, R22 (2003)

    Google Scholar 

  27. Srivastava, S., Chen, L.: A two-parameter generalized poisson model to improve the analysis of rna-seq data. Nucleic Acids Research, 1–15 (2010)

    Google Scholar 

  28. Singh, D., Orellana, C.F., Hu, Y., Jones, C.D., Liu, Y., Chiang, D.Y., Liu, J., Prins, J.F.: Fdm: A graph-based statistical method to detect differential transcription using rna-seq data. Bioinformatics (2011)

    Google Scholar 

  29. Srivastava, S., Chen, L.: A two-parameter generalized poisson model to improve the analysis of rna-seq data. Nucleic Acids Research 38, e112 (2010)

    Article  Google Scholar 

  30. Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society Series B. 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  31. Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology 28, 511–515 (2010)

    Article  Google Scholar 

  32. Turro, E., Su, S.-Y., Gonçalves, Â., Coin, L.J.M., Richardson, S., Lewin, A.: Haplotype and isoform specific expression estimation using multi-mapping rna-seq reads. Genome Biology 12, R13 (2011)

    Article  Google Scholar 

  33. Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., Burge, C.B.: Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008)

    Article  Google Scholar 

  34. Wang, K., Singh, D., Zeng, Z., Huang, Y., Coleman, S., Savich, G.L., He, X., Mieczkowski, P., Grimm, S.A., Perou, C.M., MacLeod, J.N., Chiang, D.Y., Prins, J.F., Liu, J.: Mapsplice: Accurate mapping of rna-seq reads for splice junction discovery. Nucleic Acid Research 38(18), 178 (2010)

    Article  Google Scholar 

  35. Wang, Z., Gerstein, M., Snyder, M.: Rna-seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 10, 57–63 (2009)

    Article  Google Scholar 

  36. Wu, J., Akerman, M., Sun, S., Richard McCombie, W., Krainer, A.R., Zhang, M.Q.: Splicetrap: a method to quantify alternative splicing under single cellular conditions. Bioinformatics (2011)

    Google Scholar 

  37. Wu, Z., Wang, X., Zhang, X.: Using non-uniform read distribution models to improve isoform expression inference in rna-seq. Bioinformatics 27, 502–508 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, Y. et al. (2012). A Robust Method for Transcript Quantification with RNA-seq Data. In: Chor, B. (eds) Research in Computational Molecular Biology. RECOMB 2012. Lecture Notes in Computer Science(), vol 7262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29627-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29627-7_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29626-0

  • Online ISBN: 978-3-642-29627-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics