QuaPra: Efficient transcript assembly and quantification using quadratic programming with Apriori algorithm
RNA sequencing (RNA-seq) has greatly facilitated the exploring of transcriptome landscape for diverse organisms. However, transcriptome reconstruction is still challenging due to various limitations of current tools and sequencing technologies. Here, we introduce an efficient tool, QuaPra (Quadratic Programming combined with Apriori), for accurate transcriptome assembly and quantification. QuaPra could detect at least 26.5% more low abundance (0.1–1 FPKM) transcripts with over 2.1% increase of sensitivity and precision on simulated data compared to other currently popular tools. Moreover, around one-quarter more known transcripts were correctly assembled by QuaPra than other assemblers on real sequencing data. QuaPra is freely available at https://doi.org/www.megabionet.org/QuaPra/.
KeywordsRNA-Seq transcriptome reconstruction transcript assembly transcript quantification
Unable to display preview. Download preview PDF.
This work was supported by the National High Technology Research and Development Program of China (2015AA020108), the National Key Research and Development Program of China (2016YFC0902100), the China Human Proteome Project (2014DFB30010 and 2014DFB30030), the National Science Foundation of China (31671377, 31401133, 31771460 and 91629103) and the Program of Introducing Talents of Discipline to Universities of China (B14019). We thank Dr. Jiannan Lin, Huanlong Liu, Yimin Ma, Yan Shi, Jiwei Chen, Jun Tang, Qing Zhou for their extensive help with this manuscript. Thanks for the Graduate School and Supercomputer Center of East China Normal University.
- Chan, M.C., Ilott, N.E., Schödel, J., Sims, D., Tumber, A., Lippl, K., Mole, D.R., Pugh, C.W., Ratcliffe, P.J., Ponting, C.P., et al. (2016). Tuning the transcriptional response to hypoxia by inhibiting hypoxia-inducible factor (HIF) prolyl and asparaginyl hydroxylases. J Biol Chem 291, 20661–20673.CrossRefGoogle Scholar
- Derrien, T., Johnson, R., Bussotti, G., Tanzer, A., Djebali, S., Tilgner, H., Guernec, G., Martin, D., Merkel, A., Knowles, D.G., et al. (2012). The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22, 1775–1789.CrossRefGoogle Scholar
- Hipp J., Myka A., Wirth R., Güntzer U. (1998) A new algorithm for faster mining of generalized association rules. Lect Notes Artif Int, 1510, 74–82.Google Scholar
- Mollet, I.G., Ben-Dov, C., Felicio-Silva, D., Grosso, A.R., Eleutério, P., Alves, R., Staller, R., Silva, T.S., and Carmo-Fonseca, M. (2010). Unconstrained mining of transcript data reveals increased alternative splicing complexity in the human transcriptome. Nucl Acids Res 38, 4740–4754.CrossRefGoogle Scholar
- Parkinson, H., Sarkans, U., Kolesnikov, N., Abeygunawardena, N., Burdett, T., Dylag, M., Emam, I., Farne, A., Hastings, E., Holloway, E., et al. (2011). ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucl Acids Res 39, D1002–D1004.CrossRefGoogle Scholar
- Sun, T. T., He, J., Liang, Q., Ren, L. L., Yan, T. T., Yu, T. C., Tang, J. Y., Bao, Y.J., Hu, Y., Lin, Y., et al. (2016). LncRNA GClnc1 promotes gastric carcinogenesis and may act as a modular scaffold of WDR5 and KAT2A complexes to specify the histone modification pattern. Cancer Discov 6, 784–801.CrossRefGoogle Scholar
- Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511–515.CrossRefGoogle Scholar