QuaPra: Efficient transcript assembly and quantification using quadratic programming with Apriori algorithm

Ji, Xiangjun; Tong, Weida; Ning, Baitang; Mason, Christopher E.; Kreil, David P.; Labaj, Pawel P.; Chen, Geng; Shi, Tieliu

doi:10.1007/s11427-018-9433-3

QuaPra: Efficient transcript assembly and quantification using quadratic programming with Apriori algorithm

Research Paper
Published: 22 May 2019

Volume 62, pages 937–946, (2019)
Cite this article

Science China Life Sciences Aims and scope Submit manuscript

Xiangjun Ji¹,
Weida Tong²,
Baitang Ning²,
Christopher E. Mason^3,4,5,
David P. Kreil⁶,
Pawel P. Labaj^6,7,8,
Geng Chen¹ &
…
Tieliu Shi^1,9

155 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

RNA sequencing (RNA-seq) has greatly facilitated the exploring of transcriptome landscape for diverse organisms. However, transcriptome reconstruction is still challenging due to various limitations of current tools and sequencing technologies. Here, we introduce an efficient tool, QuaPra (Quadratic Programming combined with Apriori), for accurate transcriptome assembly and quantification. QuaPra could detect at least 26.5% more low abundance (0.1–1 FPKM) transcripts with over 2.1% increase of sensitivity and precision on simulated data compared to other currently popular tools. Moreover, around one-quarter more known transcripts were correctly assembled by QuaPra than other assemblers on real sequencing data. QuaPra is freely available at https://doi.org/www.megabionet.org/QuaPra/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transcriptome assembly and quantification from Ion Torrent RNA-Seq data

Article Open access 14 July 2014

TACO produces robust multisample transcriptome assemblies from RNA-seq

Article 21 November 2016

Transcriptator: Computational Pipeline to Annotate Transcripts and Assembled Reads from RNA-Seq Data

References

Bradford, J.R., Cox, A., Bernard, P., and Camp, N.J. (2016). Consensus analysis of whole transcriptome profiles from two breast cancer patient cohorts reveals long non-coding RNAs associated with intrinsic subtype and the tumour microenvironment. PLoS ONE 11, e0163238.
Article CAS PubMed PubMed Central Google Scholar
Bray, N.L., Pimentel, H., Melsted, P., and Pachter, L. (2016). Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34, 525–527.
Article CAS PubMed Google Scholar
Chan, M.C., Ilott, N.E., Schödel, J., Sims, D., Tumber, A., Lippl, K., Mole, D.R., Pugh, C.W., Ratcliffe, P.J., Ponting, C.P., et al. (2016). Tuning the transcriptional response to hypoxia by inhibiting hypoxia-inducible factor (HIF) prolyl and asparaginyl hydroxylases. J Biol Chem 291, 20661–20673.
Article CAS PubMed PubMed Central Google Scholar
Chen, G., Shi, T., and Shi, L. (2017). Characterizing and annotating the genome using RNA-seq data. Sci China Life Sci 60, 116–125.
Article CAS PubMed Google Scholar
Chen, J., and Xue, Y. (2016). Emerging roles of non-coding RNAs in epigenetic regulation. Sci China Life Sci 59, 227–235.
Article CAS PubMed Google Scholar
Derrien, T., Johnson, R., Bussotti, G., Tanzer, A., Djebali, S., Tilgner, H., Guernec, G., Martin, D., Merkel, A., Knowles, D.G., et al. (2012). The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22, 1775–1789.
Article CAS PubMed PubMed Central Google Scholar
Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T.R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21.
Article CAS PubMed Google Scholar
Dong, C., Zhao, G., Zhong, M., Yue, Y., Wu, L., and Xiong, S. (2013). RNA sequencing and transcriptomal analysis of human monocyte to macrophage differentiation. Gene 519, 279–287.
Article CAS PubMed PubMed Central Google Scholar
Griebel, T., Zacher, B., Ribeca, P., Raineri, E., Lacroix, V., Guigó, R., and Sammeth, M. (2012). Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucl Acids Res 40, 10073–10083.
Article CAS PubMed PubMed Central Google Scholar
Hipp J., Myka A., Wirth R., Güntzer U. (1998) A new algorithm for faster mining of generalized association rules. Lect Notes Artif Int, 1510, 74–82.
Google Scholar
Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12, 357–360.
Article CAS PubMed PubMed Central Google Scholar
Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S. L. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14, R36.
Article CAS PubMed PubMed Central Google Scholar
Labaj, P.P., Leparc, G.G., Linggi, B.E., Markillie, L.M., Wiley, H.S., and Kreil, D.P. (2011). Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics 27, i383–i391.
Article CAS PubMed PubMed Central Google Scholar
Leinonen, R., Sugawara, H., Shumway, M., and Shumway, M. (2011). The sequence read archive. Nucl Acids Res 39, D19–D21.
Article CAS PubMed Google Scholar
Li, B., and Dewey, C.N. (2011). RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC BioInf 12, 323.
Article CAS Google Scholar
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Durbin, R. (2009). The sequence alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079.
Article CAS PubMed PubMed Central Google Scholar
Li, W., and Jiang, T. (2012). Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads. Bioinformatics 28, 2914–2921.
Article CAS PubMed PubMed Central Google Scholar
Magistri, M., Velmeshev, D., Makhmutova, M., and Faghihi, M.A. (2015). Transcriptomics profiling of Alzheimer’s disease reveal neurovascular defects, altered amyloid-β homeostasis, and deregulated expression of long noncoding RNAs. J Alzheimer’s Disease 48, 647–665.
Article CAS Google Scholar
Mollet, I.G., Ben-Dov, C., Felicio-Silva, D., Grosso, A.R., Eleutério, P., Alves, R., Staller, R., Silva, T.S., and Carmo-Fonseca, M. (2010). Unconstrained mining of transcript data reveals increased alternative splicing complexity in the human transcriptome. Nucl Acids Res 38, 4740–4754.
Article CAS PubMed PubMed Central Google Scholar
Parkinson, H., Sarkans, U., Kolesnikov, N., Abeygunawardena, N., Burdett, T., Dylag, M., Emam, I., Farne, A., Hastings, E., Holloway, E., et al. (2011). ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucl Acids Res 39, D1002–D1004.
Article CAS PubMed Google Scholar
Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T., and Salzberg, S.L. (2015). Stringtie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33, 290–295.
Article CAS PubMed PubMed Central Google Scholar
Schiano, C., Costa, V., Aprile, M., Grimaldi, V., Maiello, C., Esposito, R., Soricelli, A., Colantuoni, V., Donatelli, F., Ciccodicola, A., et al. (2017). Heart failure: pilot transcriptomic analysis of cardiac tissue by RNA-sequencing. Cardiol J 24, 539–553.
Article PubMed Google Scholar
Song, L., Sabunciyan, S., and Florea, L. (2016). CLASS2: accurate and efficient splice variant annotation from RNA-seq reads. Nucl Acids Res 44, e98.
Article CAS PubMed PubMed Central Google Scholar
Sun, T. T., He, J., Liang, Q., Ren, L. L., Yan, T. T., Yu, T. C., Tang, J. Y., Bao, Y.J., Hu, Y., Lin, Y., et al. (2016). LncRNA GClnc1 promotes gastric carcinogenesis and may act as a modular scaffold of WDR5 and KAT2A complexes to specify the histone modification pattern. Cancer Discov 6, 784–801.
Article CAS PubMed Google Scholar
The ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74.
Article CAS PubMed Central Google Scholar
Tomescu, A.I., Kuosmanen, A., Rizzi, R., Mäkinen, V. (2013). A novel min-cost flow method for estimating transcript expression with RNA-Seq. BMC Bioinformatics 14, S15.
Article PubMed PubMed Central Google Scholar
Trapnell, C., Pachter, L., and Salzberg, S.L. (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111.
Article CAS PubMed PubMed Central Google Scholar
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511–515.
Article CAS PubMed PubMed Central Google Scholar
Volders, P.J., Helsens, K., Wang, X., Menten, B., Martens, L., Gevaert, K., Vandesompele, J., and Mestdagh, P. (2013). LNCipedia: a database for annotated human lncRNA transcript sequences and structures. Nucl Acids Res 41, D246–D251.
Article CAS PubMed Google Scholar
Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., and Burge, C.B. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476.
Article CAS PubMed PubMed Central Google Scholar
Wang, K., Singh, D., Zeng, Z., Coleman, S.J., Huang, Y., Savich, G.L., He, X., Mieczkowski, P., Grimm, S.A., Perou, C.M., et al. (2010). MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucl Acids Res 38, e178.
Article CAS PubMed PubMed Central Google Scholar
Wang, Z., Gerstein, M., and Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10, 57–63.
Article CAS PubMed PubMed Central Google Scholar
Zhu, Y., Orre, L.M., Johansson, H.J., Huss, M., Boekel, J., Vesterlund, M., Fernandez-Woodbridge, A., Branca, R.M.M., and Lehtiö, J. (2018). Discovery of coding regions in the human genome by integrated proteogenomics analysis workflow. Nat Commun 9, 903.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the National High Technology Research and Development Program of China (2015AA020108), the National Key Research and Development Program of China (2016YFC0902100), the China Human Proteome Project (2014DFB30010 and 2014DFB30030), the National Science Foundation of China (31671377, 31401133, 31771460 and 91629103) and the Program of Introducing Talents of Discipline to Universities of China (B14019). We thank Dr. Jiannan Lin, Huanlong Liu, Yimin Ma, Yan Shi, Jiwei Chen, Jun Tang, Qing Zhou for their extensive help with this manuscript. Thanks for the Graduate School and Supercomputer Center of East China Normal University.

Author information

Authors and Affiliations

The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
Xiangjun Ji, Geng Chen & Tieliu Shi
National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR RD, Jefferson, AR, 72079, USA
Weida Tong & Baitang Ning
Department of Physiology and Biophysics, Weill Cornell Medicine, New York, 10065, USA
Christopher E. Mason
The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, 10021, USA
Christopher E. Mason
Feil Family Brain & Mind Research Institute, New York, 10065, USA
Christopher E. Mason
Chair of Bioinformatics Research Group, Boku University, Vienna, A-1190, Austria
David P. Kreil & Pawel P. Labaj
Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, 30-387, Poland
Pawel P. Labaj
APART Fellow, Austrian Academy of Science, Vienna, A-1190, Austria
Pawel P. Labaj
National Center for International Research of Biological Targeting Diagnosis and Therapy, Guangxi Key Laboratory of Biological Targeting Diagnosis and Therapy Research, Collaborative Innovation Center for Targeting Tumor Diagnosis and Therapy, Guangxi Medical University, Nanning, 530021, China
Tieliu Shi

Authors

Xiangjun Ji
View author publications
You can also search for this author in PubMed Google Scholar
Weida Tong
View author publications
You can also search for this author in PubMed Google Scholar
Baitang Ning
View author publications
You can also search for this author in PubMed Google Scholar
Christopher E. Mason
View author publications
You can also search for this author in PubMed Google Scholar
David P. Kreil
View author publications
You can also search for this author in PubMed Google Scholar
Pawel P. Labaj
View author publications
You can also search for this author in PubMed Google Scholar
Geng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tieliu Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Geng Chen or Tieliu Shi.

Supporting Information

Supplementary material, approximately 17.9 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ji, X., Tong, W., Ning, B. et al. QuaPra: Efficient transcript assembly and quantification using quadratic programming with Apriori algorithm. Sci. China Life Sci. 62, 937–946 (2019). https://doi.org/10.1007/s11427-018-9433-3

Download citation

Received: 23 September 2018
Accepted: 17 October 2018
Published: 22 May 2019
Issue Date: July 2019
DOI: https://doi.org/10.1007/s11427-018-9433-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

QuaPra: Efficient transcript assembly and quantification using quadratic programming with Apriori algorithm

Abstract

Access this article

Similar content being viewed by others

Transcriptome assembly and quantification from Ion Torrent RNA-Seq data

TACO produces robust multisample transcriptome assemblies from RNA-seq

Transcriptator: Computational Pipeline to Annotate Transcripts and Assembled Reads from RNA-Seq Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Supporting Information

Supplementary material, approximately 17.9 KB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

QuaPra: Efficient transcript assembly and quantification using quadratic programming with Apriori algorithm

Abstract

Access this article

Similar content being viewed by others

Transcriptome assembly and quantification from Ion Torrent RNA-Seq data

TACO produces robust multisample transcriptome assemblies from RNA-seq

Transcriptator: Computational Pipeline to Annotate Transcripts and Assembled Reads from RNA-Seq Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Supporting Information

Supplementary material, approximately 17.9 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation