Skip to main content

A Flexible Automated Pipeline Engine for Transcript-Level Quantification from RNA-seq

  • Conference paper
  • First Online:
Advances in Conceptual Modeling (ER 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13650))

Included in the following conference series:

Abstract

The advances in massive parallel sequencing technologies (i.e., Next-Generation Sequencing) allowed RNA sequencing (RNA-seq). The analysis of RNA-seq data uses a large amount of computational resources, and it is very time-consuming. Usually, the processing is performed on a large set of samples, and it is convenient designing an automatic pipeline to eliminate the downtime. The pipelines represent an advantage, however these are difficult to customize, or to use outside the specific context for which they have been tested.

In this paper, we propose FAPE (Flexible Automated Pipeline Engine), a software platform to configure and to deploy automated pipelines. It models a pipeline based on a given template. The latter has a highly understandable and manipulable organization, to meet the operator’s need for customization. In addition, a scientist may model an in-house custom pipeline able to execute all tools based on a command line interface (CLI). FAPE supports both parallel and iterative processes, in order to analyze whole datasets. We tested our solution on a pipeline for Transcript-level Quantification from RNA-seq, based on Hisat2, SamTools, and StringTie. It exhibited high robustness as well as inherent flexibility in supporting any pipeline modeled to specification. Furthermore, it has proven not to be expensive in terms of memory, and it does not introduce a significant latency during the execution, as compared to a pipeline executed through a shell-script program. In addition, the statement parallel of FAPE allowed during the test a reduction of the total elapsed time of \(\sim 6.5\%\).

- https://github.com/pietrocinaglia/fape

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yang, I.S., Kim, S.: Analysis of whole transcriptome sequencing data: workflow and software. Genomics Inform. 13(4), 119–125 (2015)

    Article  Google Scholar 

  2. Li, J., Liu, C.: Coding or noncoding, the converging concepts of RNAs. Front. Genet. 10, 496 (2019)

    Article  Google Scholar 

  3. Thomas, Q.A., et al.: Transcript isoform sequencing reveals widespread promoter-proximal transcriptional termination in Arabidopsis. Nat. Commun. 11(1), 2589 (2020)

    Article  Google Scholar 

  4. Nielsen, M., et al.: Transcription-driven chromatin repression of Intragenic transcription start sites. PLoS Genet. 15(2), e1007969 (2019)

    Article  Google Scholar 

  5. Cinaglia, P., Guzzi, P.H., Veltri, P.: Integro: an algorithm for data-integration and disease-gene association. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2076–2081 (2018)

    Google Scholar 

  6. Denoeud, F., et al.: Annotating genomes with massive-scale RNA sequencing. Genome Biol. 9(12), R175 (2008)

    Article  Google Scholar 

  7. Creason, A., et al.: A community challenge to evaluate RNA-seq, fusion detection, and isoform quantification methods for cancer discovery. Cell Syst. 12(8), 827–838 (2021)

    Article  Google Scholar 

  8. Haas, B.J., et al.: De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8(8), 1494–1512 (2013)

    Article  Google Scholar 

  9. Yang, X., et al.: HTQC: a fast quality control toolkit for Illumina sequencing data. BMC Bioinform. 14, 33 (2013)

    Article  Google Scholar 

  10. Conesa, A., et al.: A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 13 (2016)

    Article  Google Scholar 

  11. Kim, D., Paggi, J.M., Park, C., Bennett, C., Salzberg, S.L.: Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37(8), 907–915 (2019)

    Article  Google Scholar 

  12. Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T., Salzberg, S.L.: StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33(3), 290–295 (2015)

    Article  Google Scholar 

  13. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)

    Article  Google Scholar 

  14. Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., Salzberg, S.L.: TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14(4), R36 (2013)

    Article  Google Scholar 

  15. Trapnell, C., et al.: Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28(5), 511–515 (2010)

    Article  Google Scholar 

  16. Pertea, M., Kim, D., Pertea, G.M., Leek, J.T., Salzberg, S.L.: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 11(9), 1650–1667 (2016)

    Article  Google Scholar 

  17. Trapnell, C., et al.: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and cufflinks. Nat. Protoc. 7(3), 562–578 (2012)

    Article  Google Scholar 

  18. Trapnell, C., Pachter, L., Salzberg, S.L.: TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25(9), 1105–1111 (2009)

    Article  Google Scholar 

  19. Spinozzi, G., Tini, V., Adorni, A., Falini, B., Martelli, M.P.: ARPIR: automatic RNA-seq pipelines with interactive report. BMC Bioinform. 21(Suppl 19), 574 (2020)

    Article  Google Scholar 

  20. Srivastava, H., Ferrell, D., Popescu, G.V.: NetSeekR: a network analysis pipeline for RNA-seq time series data. BMC Bioinform. 23(1), 54 (2022)

    Article  Google Scholar 

  21. Wratten, L., Wilm, A., Göke, J.: Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat. Methods 18(10), 1161–1168 (2021)

    Article  Google Scholar 

  22. Danecek, P., et al.: Twelve years of SAMtools and BCFtools. GigaScience 10(2), giab008 (2021)

    Google Scholar 

  23. Cinaglia, P., Cannataro, M.: Forecasting COVID-19 epidemic trends by combining a neural network with rt estimation. Entropy (Basel) 24(7), 929 (2022)

    Article  Google Scholar 

  24. Cinaglia, P., Tradigo, G., Cascini, G.L., Zumpano, E., Veltri, P.: A framework for the decomposition and features extraction from lung dicom images. In: Proceedings of the 22nd International Database Engineering & Applications Symposium, pp. 31–36. IDEAS 2018, Association for Computing Machinery (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pietro Cinaglia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cinaglia, P., Cannataro, M. (2022). A Flexible Automated Pipeline Engine for Transcript-Level Quantification from RNA-seq. In: Guizzardi, R., Neumayr, B. (eds) Advances in Conceptual Modeling. ER 2022. Lecture Notes in Computer Science, vol 13650. Springer, Cham. https://doi.org/10.1007/978-3-031-22036-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22036-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22035-7

  • Online ISBN: 978-3-031-22036-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics