Skip to main content

MOSCA: An Automated Pipeline for Integrated Metagenomics and Metatranscriptomics Data Analysis

  • Conference paper
  • First Online:
Practical Applications of Computational Biology and Bioinformatics, 12th International Conference (PACBB2018 2018)

Abstract

Metagenomics (MG) and Metatranscriptomics (MT) approaches open new perspectives on the interpretation of biological systems composed by complex microbial communities. Dealing with large sequencing datasets, to extract the desired information and interpret the results are big challenges associated with meta-omics studies. There are several bioinformatics pipelines for MG data analysis and less to MT. Up to date, none performs a complete analysis integrating both MG and MT data, including the assembly of reads into contigs, functional and taxonomic annotation of identified genes, differential gene expression analysis and the comparison of multiple samples. Here, we present Meta-Omics Software for Community Analysis (MOSCA) that was designed with this purpose. It integrates RNA-Seq analysis with Whole Genome Sequencing as reference. Raw sequencing reads are submitted to preprocessing for quality trimming and rRNA removal, and assembled into contigs, which afterwards are annotated by using a reference database. MOSCA performs differential gene expression and provides graphical visualization of the results and comparison of multiple samples. Validation and reproducibility of the pipeline was obtained by using simulated MG and MT datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at github.com/iquasere/MOSCA.

References

  1. Zhou, J., He, Z., Yang, Y., Deng, Y., Tringe, S.G., Alvarez-cohen, L.: High-throughput metagenomic technologies for complex microbial community analysis: open and closed formats. MBio 6(1), e02288-14 (2015)

    Article  Google Scholar 

  2. Narayanasamy, S., Jarosz, Y., Muller, E.E., et al.: IMP: a pipeline for reproducible metagenomic and metatranscriptomic analyses. bioRxiv (7), 039263 (2016)

    Google Scholar 

  3. Kultima, J.R., Coelho, L.P., Forslund, K., et al.: Genome analysis MOCAT2: a metagenomic assembly, annotation and profiling framework. Bioinformatics 32(16), 2520–2523 (2016)

    Article  Google Scholar 

  4. Wilke, A., Bischof, J., Gerlach, W., Glass, E., et al.: The MG-RAST metagenomics database and portal in 2015. Nucleic Acids Res. 44(D1), D590–D594 (2015)

    Article  Google Scholar 

  5. Martinez, X., Pozuelo, M., Pascal, V., et al.: MetaTrans: an open-source pipeline for metatranscriptomics. Sci. Rep. 6, 26447 (2016)

    Article  Google Scholar 

  6. Westreich, S.T., Treiber, M.L., Mills, D.A., Korf, I., Lemay, D.G.: SAMSA2: a standalone metatranscriptome analysis pipeline. bioRxiv, 195826 (2017)

    Google Scholar 

  7. Kim, J., Kim, M.S., Koh, A.Y., et al.: FMAP: Functional Mapping and Analysis Pipeline for metagenomics and metatranscriptomics studies. BMC Bioinform. 17(1), 420 (2016)

    Article  Google Scholar 

  8. Nurk, S., Meleshko, D., Korobeynikov, A., Pevzner, P.A.: metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27(5), 824–834 (2017)

    Article  Google Scholar 

  9. Li, D., Liu, C.M., Luo, R., Sadakane, K., Lam, T.W.: MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31(10), 1674–1676 (2015)

    Article  Google Scholar 

  10. Andrews, S.: FastQC: a quality control tool for high throughput sequence data (2010)

    Google Scholar 

  11. Bolger, A.M., Lohse, M., Usadel, B.: Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15), 2114–2120 (2014)

    Article  Google Scholar 

  12. Kopylova, E., Noé, L., Touzet, H.: Sortmerna: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28(24), 3211–3217 (2012)

    Article  Google Scholar 

  13. Quast, C., Pruesse, E., Yilmaz, P., et al.: The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41(D1), D590–D596 (2012)

    Article  Google Scholar 

  14. Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., Eddy, S.R.: Rfam: an RNA family database. Nucleic Acids Res. 31(1), 439–441 (2003)

    Article  Google Scholar 

  15. Mikheenko, A., Saveliev, V., Gurevich, A.: MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32(7), 1088–1090 (2015)

    Article  Google Scholar 

  16. Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357 (2012)

    Article  Google Scholar 

  17. Rho, M., Tang, H., Ye, Y.: FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 38(20), e191 (2010)

    Article  Google Scholar 

  18. UniProt Consortium: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45(D1), D158–D169 (2016)

    Google Scholar 

  19. Buchfink, B., Xie, C., Huson, D.H.: Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12(1), 59–60 (2015)

    Article  Google Scholar 

  20. Anders, S., Pyl, P.T., Huber, W.: HTSeqa Python framework to work with high-throughput sequencing data. Bioinformatics 31(2), 166–169 (2015)

    Article  Google Scholar 

  21. Love, M., Anders, S., Huber, W.: Differential analysis of count data – the DESeq2 package. Genome Biol. 15, 550 (2014)

    Article  Google Scholar 

  22. R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2015)

    Google Scholar 

  23. Angly, F.E., Willner, D., Rohwer, F., et al.: Grinder: a versatile amplicon and shotgun sequence simulator. Nucleic Acids Res. 40(12), 94 (2012)

    Article  Google Scholar 

  24. NCBI Resource Coordinators: Database resources of the national center for biotechnology information. Nucleic Acids Res. 45(D1), D12–D17 (2017)

    Google Scholar 

  25. Frazee, A.C., Jaffe, A.E., Langmead, B., Leek, J.T.: Polyester: simulating RNA-seq datasets with differential transcript expression. Bioinformatics 31(17), 2778–2784 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

This study was supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684) and BioTecNorte operation (NORTE-01-0145-FEDER-000004) funded by the European Regional Development Fund under the scope of Norte2020 - Programa Operacional Regional do Norte, and by the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013)/ERC Grant Agreement no. 323009.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreia Ferreira Salvador .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sequeira, J.C., Rocha, M., Madalena Alves, M., Salvador, A.F. (2019). MOSCA: An Automated Pipeline for Integrated Metagenomics and Metatranscriptomics Data Analysis. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., González, P. (eds) Practical Applications of Computational Biology and Bioinformatics, 12th International Conference. PACBB2018 2018. Advances in Intelligent Systems and Computing, vol 803. Springer, Cham. https://doi.org/10.1007/978-3-319-98702-6_22

Download citation

Publish with us

Policies and ethics