Modeling and analysis of RNA-seq data: a review from a statistical perspective
Abstract
Background
Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date.
Results
We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations.
Conclusions
Keywords
RNA-seq statistical modeling differentially expressed genes alternatively spliced exons isoform reconstruction and quantificationNotes
Acknowledgements
This work was supported by the following grants: National Science Foundation DMS-1613338, NIH/NIGMS R01GM120507, PhRMA Foundation Research Starter Grant in Informatics, Johnson & Johnson WiSTEM2D Award, and Sloan Research Fellowship (to J.J.L) and the UCLA Dissertation Year Fellowship (to W.V.L). The authors would like to thank the insightful feedbacks from Dr. Lior Pachter at California Institute of Technology and Dr. Michael I. Love at University of North Carolina at Chapel Hill.
References
- 1.Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet., 10, 57–63CrossRefPubMedPubMedCentralGoogle Scholar
- 2.Zhao, S., Fung-Leung, W.-P., Bittner, A., Ngo, K. and Liu, X. (2014) Comparison of RNA-seq and microarray in transcriptome profiling of activated t cells. PLoS One, 9, e78644CrossRefPubMedPubMedCentralGoogle Scholar
- 3.Engström, P. G., Steijger, T., Sipos, B., Grant, G. R., Kahles, A., The RGASP Consortium, Rätsch, G., Goldman, N., Hubbard, T. J., Harrow, J., et al. (2013) Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods, 10, 1185–1191CrossRefPubMedPubMedCentralGoogle Scholar
- 4.Soneson, C. and Delorenzi, M. (2013) A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics, 14, 91CrossRefPubMedPubMedCentralGoogle Scholar
- 5.Giorgi, F. M., Del Fabbro, C. and Licausi, F. (2013) Comparative study of RNA-seq-and microarray-derived coexpression networks in Arabidopsis thaliana. Bioinformatics, 29, 717–724CrossRefPubMedGoogle Scholar
- 6.Kanitz, A., Gypas, F., Gruber, A. J., Gruber, A. R., Martin, G. and Zavolan, M (2015) Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol., 16, 1–26CrossRefGoogle Scholar
- 7.Tourasse, N. J., Millet, J. R. M, and Dupuy, D. (2017) Quantitative RNA-seq meta-analysis of alternative exon usage in C. elegans. Genome Res., 27, 2120–2128CrossRefPubMedGoogle Scholar
- 8.Li, J. J., Huang, H., Qian, M. and Zhang, X. (2015) Advanced Medical Statistics, 2nd ed., chapter 24, pp. 915–936. World ScientificCrossRefGoogle Scholar
- 9.Seqc/Maqc-Iii Consortium (2014) A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol., 32, 903–914CrossRefGoogle Scholar
- 10.Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., McPherson, A., Szcześniak, M. W., Gaffney, D. J., Elo, L. L., Zhang, X. et al. (2016) A survey of best practices for RNA-seq data analysis. Genome Biol., 17, 1CrossRefGoogle Scholar
- 11.Gao, R. and Li, J. J. (2017) Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics of conserved exons. BMC Genomics, 18, 234CrossRefPubMedPubMedCentralGoogle Scholar
- 12.Arbeitman, M. N., Furlong, E. E. M., Imam, F., Johnson, E., Null, B. H., Baker, B. S., Krasnow, M. A., Scott, M. P., Davis, R. W. and White, K. P. (2002) Gene expression during the life cycle of Drosophila melanogaster. Science, 297, 2270–2275CrossRefPubMedGoogle Scholar
- 13.Necsulea, A., Soumillon, M., Warnefors, M., Liechti, A., Daish, T., Zeller, U., Baker, J. C., Grützner, F. and Kaessmann, H. (2014) The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature, 505, 635–640CrossRefPubMedGoogle Scholar
- 14.Li, W. V., Chen, Y. and Li, J. J. (2017) Trom: a testing-based method for finding transcriptomic similarity of biological samples. Stat. Biosci., 9, 105–136CrossRefPubMedGoogle Scholar
- 15.de la Fuente, A., Bing, N., Hoeschele, I. and Mendes, P. (2004) Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics, 20, 3565–3574CrossRefPubMedGoogle Scholar
- 16.Wyner, A. D. (1978) A definition of conditional mutual information for arbitrary ensembles. Inf. Control, 38, 51–59CrossRefGoogle Scholar
- 17.Zhao, J., Zhou, Y., Zhang, X. and Chen, L. (2016) Part mutual information for quantifying direct associations in networks. Proc. Natl. Acad. Sci. USA, 113, 5130–5135CrossRefPubMedGoogle Scholar
- 18.van der Maaten, L. and Hinton, G. (2008) Visualizing data using t-SNE. J. Mach. Learn. Res., 9, 2579–2605Google Scholar
- 19.Kruskal, J. B. and Wish, M. (1978) Multidimensional Scaling, volume 11. SageGoogle Scholar
- 20.Evans, C., Hardin, J. and Stoebel, D. M. (2017) Selecting between-sample RNA-seq normalization methods from the perspective of their assumptions. Brief. Bioinform., bbx008Google Scholar
- 21.Bullard, J. H., Purdom, E., Hansen, K. D. and Dudoit, S. (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-seq experiments. BMC Bioinformatics, 11, 94CrossRefPubMedPubMedCentralGoogle Scholar
- 22.Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. and Wold, B. (2008) Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods, 5, 621–628CrossRefPubMedGoogle Scholar
- 23.Trapnell, C., Pachter, L. and Salzberg, S. L. (2009) Tophat: discovering splice junctions with RNA-seq. Bioinformatics, 25, 1105–1111CrossRefPubMedPubMedCentralGoogle Scholar
- 24.Li, B. and Dewey, C. N. (2011) RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics, 12, 323CrossRefPubMedPubMedCentralGoogle Scholar
- 25.Wagner, G. P., Kin, K. and Lynch, V. J. (2012) Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci., 131, 281–285CrossRefPubMedGoogle Scholar
- 26.Dillies, M.-A., Rau, A., Aubert, J., Hennequet-Antier, C., Jeanmougin, M., Servant, N., Keime, C., Marot, G., Castel, D., Estelle, J., et al. (2013) A comprehensive evaluation of normalization methods for illumina high-throughput RNA sequencing data analysis. Brief. Bioinform., 14, 671–683CrossRefPubMedGoogle Scholar
- 27.Bolstad, B. M., Irizarry, R. A., Astrand, M. and Speed, T. P. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185–193CrossRefPubMedGoogle Scholar
- 28.Anders, S. and Huber, W. (2010) Differential expression analysis for sequence count data. Genome Biol., 11, R106CrossRefPubMedPubMedCentralGoogle Scholar
- 29.Robinson, M. D. and Oshlack, A. (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol., 11, R25CrossRefPubMedPubMedCentralGoogle Scholar
- 30.Li, J., Witten, D. M., Johnstone, I. M. and Tibshirani, R. (2012) Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics, 13, 523–538CrossRefPubMedGoogle Scholar
- 31.Rapaport, F., Khanin, R., Liang, Y., Pirun, M., Krek, A., Zumbo, P., Mason, C. E., Socci, N. D. and Betel, D. (2013) Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol., 14, 3158CrossRefGoogle Scholar
- 32.Bloom, J. S., Khan, Z., Kruglyak, L., Singh, M. and Caudy, A. A. (2009) Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays. BMC Genomics, 10, 221CrossRefPubMedPubMedCentralGoogle Scholar
- 33.Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010) edger: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 139–140CrossRefPubMedGoogle Scholar
- 34.Hardcastle, T. J. and Kelly, K. A. (2010) baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics, 11, 422CrossRefPubMedPubMedCentralGoogle Scholar
- 35.Love, M. I., Huber, W. and Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol., 15, 550CrossRefPubMedPubMedCentralGoogle Scholar
- 36.Yu, D., Huber, W. and Vitek, O. (2013) Shrinkage estimation of dispersion in negative binomial models for RNA-seq experiments with small sample size. Bioinformatics, 29, 1275–1282CrossRefPubMedPubMedCentralGoogle Scholar
- 37.Leng, N., Dawson, J. A., Thomson, J. A., Ruotti, V., Rissman, A. I., Smits, B. M. G., Haag, J. D., Gould, M. N., Stewart, R. M. and Kendziorski, C. (2013) Ebseq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics, 29, 1035–1043CrossRefPubMedPubMedCentralGoogle Scholar
- 38.Van DeWiel, M. A., Leday, G. G. R., Pardo, L., Rue, H., Van Der Vaart, A. W. and Van Wieringen, W. N. (2013) Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics, 14, 113–128CrossRefGoogle Scholar
- 39.Law, C. W., Chen, Y., Shi, W. and Smyth, G. K. (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol., 15, R29CrossRefPubMedPubMedCentralGoogle Scholar
- 40.Smyth, G. K. (2005) Limma: linear models for microarray data. In Bioinformatics and Computational Biology Solutions Using R and Bioconductor, pp. 397–420. SpringerCrossRefGoogle Scholar
- 41.Pimentel, H., Bray, N. L., Puente, S., Melsted, P. and Pachter, L. (2017) Differential analysis of RNA-seq incorporating quantification uncertainty. Nat. Methods, 14, 687–690CrossRefPubMedGoogle Scholar
- 42.Schurch, N. J., Schofield, P., Gierliński, M., Cole, C., Sherstnev, A., Singh, V., Wrobel, N., Gharbi, K., Simpson, G. G., Owen-Hughes, T., et al. (2016) How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA, 22, 839–851CrossRefPubMedPubMedCentralGoogle Scholar
- 43.Neyman, J. and Pearson, E. S. (1928) On the use and interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika, 20, 175–240Google Scholar
- 44.Holm, S. (1979) A simple sequentially rejective multiple test procedure. Scand. J. Stat., 6, 65–70Google Scholar
- 45.Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B, 57, 289–300Google Scholar
- 46.Nueda, M. J., Martorell-Marugan, J., Martí, C., Tarazona, S. and Conesa, A. (2018) Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics, 34, 524–526CrossRefPubMedGoogle Scholar
- 47.Tai, Y. C. and Speed, T. P. (2006) A multivariate empirical Bayes statistic for replicated microarray time course data. Ann. Stat., 34, 2387–2412CrossRefGoogle Scholar
- 48.Stuart, J. M., Segal, E., Koller, D.and Kim, S. K. (2003) A genecoexpression network for global discovery of conserved genetic modules. Science, 302, 249–255CrossRefPubMedGoogle Scholar
- 49.Langfelder, P. and Horvath, S. (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 9, 559CrossRefPubMedPubMedCentralGoogle Scholar
- 50.Zhang, B. and Horvath, S. (2005) A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol., 4, Article 17Google Scholar
- 51.Ravasz, E., SomeraA. L., Mongru, D. A., Oltvai, Z. N. and Barabási, A.-L. (2002) Hierarchical organization of modularity in metabolic networks. Science, 297, 1551–1555CrossRefPubMedGoogle Scholar
- 52.Oti, M., van Reeuwijk, J., Huynen, M. A. and Brunner, H. G. (2008) Conserved co-expression for candidate disease gene prioritization. BMC Bioinformatics, 9, 208CrossRefPubMedPubMedCentralGoogle Scholar
- 53.Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D. and Friedman, N. (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet., 34, 166–176CrossRefPubMedGoogle Scholar
- 54.Canzar, S., Andreotti, S., Weese, D., Reinert, K. and Klau, G. W. (2016) CIDANE: comprehensive isoform discovery and abundance estimation. Genome Biol., 17, 16CrossRefPubMedPubMedCentralGoogle Scholar
- 55.Jiang, H. and Wong, W. H. (2009) Statistical inferences for isoform expression in RNA-seq. Bioinformatics, 25, 1026–1032CrossRefPubMedPubMedCentralGoogle Scholar
- 56.Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., Wold, B. J. and Pachter, L. (2010) Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol., 28, 511–515CrossRefPubMedPubMedCentralGoogle Scholar
- 57.Roberts, A. and Pachter, L. (2013) Streaming fragment assignment for real-time analysis of sequencing experiments. Nat. Methods, 10, 71–73CrossRefPubMedGoogle Scholar
- 58.Bray, N. L., Pimentel, H., Melsted, P. and Pachter, L. (2016) Nearoptimal probabilistic RNA-seq quantification. Nat. Biotechnol., 34, 525–527CrossRefPubMedGoogle Scholar
- 59.Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B, 39, 1–38Google Scholar
- 60.Zhang, J., Jay Kuo, C.-C. and Chen, L. (2014) WEMIQ: an accurate and robust isoform quantification method for RNA-seq data. Bioinformatics, 31, 878–885CrossRefPubMedPubMedCentralGoogle Scholar
- 61.Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. and Kingsford, C. (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods, 14, 417–419CrossRefPubMedPubMedCentralGoogle Scholar
- 62.Mezlini, A.M., Smith, E. J. M., Fiume, M., Buske, O., Savich, G. L., Shah, S., Aparicio, S., Chiang, D.Y., Goldenberg, A. and Brudno, M. (2013) iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Res., 23, 519–529CrossRefPubMedPubMedCentralGoogle Scholar
- 63.Li, W. V., Zhao, A., Zhang, S. and Li, J. J. (2017) Msiq: joint modeling of multiple RNA-seq samples for accurate isoform quantification. Ann. Appl. Stat., 12, 510–539CrossRefGoogle Scholar
- 64.Katz, Y. and Eric, T. (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods, 7, 1009–1015CrossRefPubMedPubMedCentralGoogle Scholar
- 65.Love, M. I., Hogenesch, J. B. and Irizarry, R. A. (2016) Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nat. Biotechnol., 34, 1287–1291CrossRefPubMedPubMedCentralGoogle Scholar
- 66.Roberts, A., Trapnell, C., Donaghey, J., Rinn, J. L. and Pachter, L. (2011) Improving RNA-seq expression estimates by correcting for fragment bias. Genome Biol., 12, R22CrossRefPubMedPubMedCentralGoogle Scholar
- 67.Xia, Z., Wen, J., Chang, C.-C. and Zhou, X. (2011) Nsmap: a method for spliced isoforms identification and quantification from RNA-seq. BMC Bioinformatics, 12, 162CrossRefPubMedPubMedCentralGoogle Scholar
- 68.Bohnert, R. and Rätsch, G. (2010) rQuant. web: a tool for RNAseq- based transcript quantitation. Nucleic Acids Res., 38, W348–W351CrossRefPubMedPubMedCentralGoogle Scholar
- 69.Li, J. J., Jiang, C.-R., Brown, J. B., Huang, H. and Bickel, P. J. (2011) Sparse linear modeling of next-generation mRNA sequencing (RNA-seq) data for isoform discovery and abundance estimation. Proc. Natl. Acad. Sci. USA, 108, 19867–19872CrossRefPubMedGoogle Scholar
- 70.Li, W., Feng, J. and Jiang, T. (2011) IsoLasso: a LASSO regression approach to RNA-seq based transcriptome assembly. J. Comput. Biol., 18, 1693–1707CrossRefPubMedPubMedCentralGoogle Scholar
- 71.Meinshausen, N. and Bühlmann, P. (2010) Stability selection. J. R. Stat. Soc. Series B Stat. Methodol., 72, 417–473CrossRefGoogle Scholar
- 72.Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., et al. (2011) Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol., 29, 644–652CrossRefPubMedPubMedCentralGoogle Scholar
- 73.Guttman, M., Garber, M., Levin, J. Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M. J., Gnirke, A., Nusbaum, C., et al. (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas. Nat. Biotechnol., 28, 503–510CrossRefPubMedPubMedCentralGoogle Scholar
- 74.Pertea, M., Pertea, G. M., Antonescu, C. M., Chang, T.-C., Mendell, J. T. and Salzberg, S. L. (2015) Stringtie enables improved reconstruction of a transcrip-tome from RNA-seq reads. Nat. Biotechnol., 33, 290–295CrossRefPubMedPubMedCentralGoogle Scholar
- 75.Wang, X., Wu, Z. and Zhang, X. (2010) Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq. J. Bioinform. Comput. Biol., 8 (Supp. 1), 177–192CrossRefPubMedGoogle Scholar
- 76.Lin, Y.-Y., Dao, P., Hach, F., Bakhshi, M., Mo, F., Lapuk, A., Collins, C. and Cenk Sahinalp, S. (2012) Cliiq: accurate comparative detection and quantification of expressed isoforms in a population. In Algorithms in Bioinformatics, pp. 178–189. SpringerCrossRefGoogle Scholar
- 77.Behr, J., Kahles, A., Zhong, Y., Sreedharan, V. T., Drewe, P. and Rätsch, G. (2013) MITIE: Simultaneous RNA-seq-based transcript identification and quantification in multiple samples. Bioinformatics, 29, 2529–2538CrossRefPubMedPubMedCentralGoogle Scholar
- 78.Bernard, E., Jacob, L., Mairal, J. and Vert, J.-P. (2014) Efficient RNA isoform identification and quantification from RNA-seq data with network flows. Bioinformatics, 30, 2447–2455CrossRefPubMedPubMedCentralGoogle Scholar
- 79.Steijger, T., Abril, J. F., Engström, P. G., Kokocinski, F., Abril, J. F., Akerman, M., Alioto, T., Ambrosini, G., Antonarakis, S. E., Behr, J., et al. (2013) Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods, 10, 1177–1184CrossRefPubMedGoogle Scholar
- 80.Wu, J., Akerman, M., Sun, S., McCombie, W. R., Krainer, A. R. and Zhang, M. Q. (2011) Splicetrap: a method to quantify alternative splicing under single cellular conditions. Bioinformatics, 27, 3010–3016CrossRefPubMedPubMedCentralGoogle Scholar
- 81.Shen, S., Park, J. W., Lu, Z., Lin, L., Henry, M. D., Wu, Y. N., Zhou, Q. and Xing, Y. (2014) rMATS: robust and flexible detection of differential alternative splicing from replicate RNAseq data. Proc. Natl. Acad. Sci. USA., 111, E5593–E5601CrossRefPubMedGoogle Scholar
- 82.Hu, Y., Huang, Y., Du, Y., Orellana, C. F., Singh, D., Johnson, A. R., Monroy, A., Kuan, P.-F., Hammond, S. M., Makowski, L., et al. (2013) Diffsplice: the genome-wide detection of differential splicing events with RNA-seq. Nucleic Acids Res., 41, e39–e39CrossRefPubMedGoogle Scholar
- 83.Anders, S., Reyes, A. and Huber, W. (2012) Detecting differential usage of exons from RNA-seq data. Genome Res., 22, 2008–2017CrossRefPubMedPubMedCentralGoogle Scholar
- 84.Harrow, J., Frankish, A., Gonzalez, J. M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B. L., Barrell, D., Zadissa, A., Searle, S., et al. (2012) GENCODE: the reference human genome annotation for the ENCODE project. Genome Res., 22, 1760–1774CrossRefPubMedPubMedCentralGoogle Scholar
- 85.Rhoads, A. and Au, K. F. (2015) Pacbio sequencing and its applications. Genom. Proteom. Bioinf., 13, 278–289CrossRefGoogle Scholar
- 86.Branton, D., Deamer, D. W., Marziali, A., Bayley, H., Benner, S. A., Butler, T., Di Ventra, M., Garaj, S., Hibbs, A., Huang, X., et al. (2008) The potential and challenges of nanopore sequencing. Nat. Biotechnol., 26, 1146–1153CrossRefPubMedPubMedCentralGoogle Scholar
- 87.Byrne, A., Beaudin, A. E., Olsen, H. E., Jain, M., Cole, C., Palmer, T., DuBois, R. M., Forsberg, E. C., Akeson, M. and Vollmers, C. (2017) Nanopore long-read RNA-seq reveals widespread transcriptional variation among the surface receptors of individual B cells. Nat. Commun., 8, 16027CrossRefPubMedPubMedCentralGoogle Scholar
- 88.Au, K. F., Sebastiano, V., Afshar, P. T., Durruthy, J. D. and Lee, L. Williams, B.A., van Bakel, H., Schadt, E. E., Reijo-Pera, R. A., Underwood, J.G., et al. (2013) Characterization of the human ESC transcriptome by hybrid sequencing. Proc. Natl. Acad. Sci. USA, 110, E4821–E4830CrossRefPubMedGoogle Scholar
- 89.Bleidorn, C. (2016) Third generation sequencing: technology and its potential impact on evolutionary biodiversity research. Syst. Biodivers., 14, 1–8CrossRefGoogle Scholar
- 90.Ramaswami, G., Lin, W., Piskol, R., Tan, M. H., Davis, C. and Li, J. B. (2012) Accurate identification of human Alu and non-Alu RNA editing sites. Nat. Methods, 9, 579–581CrossRefPubMedPubMedCentralGoogle Scholar
- 91.Bahn, J. H., Lee, J.-H., Li, G., Greer, C., Peng, G. and Xiao, X. (2012) Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res., 22, 142–150CrossRefPubMedPubMedCentralGoogle Scholar
- 92.Iyer, M. K., Niknafs, Y. S., Malik, R., Singhal, U., Sahu, A., Hosono, Y., Barrette, T. R., Prensner, J. R., Evans, J. R., Zhao, S., et al. (2015) The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet., 47, 199–208CrossRefPubMedPubMedCentralGoogle Scholar
- 93.Hezroni, H., Koppstein, D., Schwartz, M. G., Avrutin, A., Bartel, D. P. and Ulitsky, I. (2015) Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Reports, 11, 1110–1122CrossRefPubMedPubMedCentralGoogle Scholar
- 94.Pickrell, J. K., Marioni, J. C., Pai, A. A., Degner, J. F., Engelhardt, B. E., Nkadori, E., Veyrieras, J.-B., Stephens, M., Gilad, Y. and Pritchard, J. K. (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature, 464, 768–772.CrossRefPubMedPubMedCentralGoogle Scholar
- 95.Zak, D. E., Penn-Nicholson, A., Scriba, T. J., Thompson, E., Suliman, S., Amon, L. M., Mahomed, H., Erasmus, M., Whatney, W., Hussey, G. D., et al. (2016) A blood RNA signature for tuberculosis disease risk: a prospective cohort study. Lancet, 387, 2312–2322CrossRefPubMedPubMedCentralGoogle Scholar
- 96.Hawkins, R. D., Hon, G. C. and Ren, B. (2010) Next-generation genomics: an integrative approach. Nat. Rev. Genet., 11, 476–486CrossRefPubMedPubMedCentralGoogle Scholar
- 97.Kolodziejczyk, A. A., Kim, J. K., Svensson, V., Marioni, J. C. and Teichmann, S. A. (2015) The technology and biology of singlecell RNA sequencing. Mol. Cell, 58, 610–620CrossRefPubMedGoogle Scholar
- 98.Xu, C. and Su, Z. (2015) Identification of cell types from singlecell transcriptomes using a novel clustering method. Bioinformatics, 31, 1974–1980CrossRefPubMedGoogle Scholar
- 99.Pierson, E. and Yau, C. (2015) Zifa: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol., 16, 241CrossRefPubMedPubMedCentralGoogle Scholar
- 100.Li, W. V. and Li, J. J. (2018) An accurate and robust imputation method scimpute for single-cell RNA-seq data. Nat. Commun., 9, 997CrossRefPubMedPubMedCentralGoogle Scholar
- 101.Regev, A., Teichmann, S.A., Lander, E.S., Amit, I., Benoist, C., Birney, E., Bodenmiller, B., Campbell, P., Carninci, P., Clatworthy, M., et al. (2017) The human cell atlas. eLife, 6, e27041Google Scholar
- 102.The Human Cell Atlas Consortium. (2017) The human cell atlas white paperGoogle Scholar