Abstract
Metagenomics (MG) and Metatranscriptomics (MT) approaches open new perspectives on the interpretation of biological systems composed by complex microbial communities. Dealing with large sequencing datasets, to extract the desired information and interpret the results are big challenges associated with meta-omics studies. There are several bioinformatics pipelines for MG data analysis and less to MT. Up to date, none performs a complete analysis integrating both MG and MT data, including the assembly of reads into contigs, functional and taxonomic annotation of identified genes, differential gene expression analysis and the comparison of multiple samples. Here, we present Meta-Omics Software for Community Analysis (MOSCA) that was designed with this purpose. It integrates RNA-Seq analysis with Whole Genome Sequencing as reference. Raw sequencing reads are submitted to preprocessing for quality trimming and rRNA removal, and assembled into contigs, which afterwards are annotated by using a reference database. MOSCA performs differential gene expression and provides graphical visualization of the results and comparison of multiple samples. Validation and reproducibility of the pipeline was obtained by using simulated MG and MT datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available at github.com/iquasere/MOSCA.
References
Zhou, J., He, Z., Yang, Y., Deng, Y., Tringe, S.G., Alvarez-cohen, L.: High-throughput metagenomic technologies for complex microbial community analysis: open and closed formats. MBio 6(1), e02288-14 (2015)
Narayanasamy, S., Jarosz, Y., Muller, E.E., et al.: IMP: a pipeline for reproducible metagenomic and metatranscriptomic analyses. bioRxiv (7), 039263 (2016)
Kultima, J.R., Coelho, L.P., Forslund, K., et al.: Genome analysis MOCAT2: a metagenomic assembly, annotation and profiling framework. Bioinformatics 32(16), 2520–2523 (2016)
Wilke, A., Bischof, J., Gerlach, W., Glass, E., et al.: The MG-RAST metagenomics database and portal in 2015. Nucleic Acids Res. 44(D1), D590–D594 (2015)
Martinez, X., Pozuelo, M., Pascal, V., et al.: MetaTrans: an open-source pipeline for metatranscriptomics. Sci. Rep. 6, 26447 (2016)
Westreich, S.T., Treiber, M.L., Mills, D.A., Korf, I., Lemay, D.G.: SAMSA2: a standalone metatranscriptome analysis pipeline. bioRxiv, 195826 (2017)
Kim, J., Kim, M.S., Koh, A.Y., et al.: FMAP: Functional Mapping and Analysis Pipeline for metagenomics and metatranscriptomics studies. BMC Bioinform. 17(1), 420 (2016)
Nurk, S., Meleshko, D., Korobeynikov, A., Pevzner, P.A.: metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27(5), 824–834 (2017)
Li, D., Liu, C.M., Luo, R., Sadakane, K., Lam, T.W.: MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31(10), 1674–1676 (2015)
Andrews, S.: FastQC: a quality control tool for high throughput sequence data (2010)
Bolger, A.M., Lohse, M., Usadel, B.: Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15), 2114–2120 (2014)
Kopylova, E., Noé, L., Touzet, H.: Sortmerna: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28(24), 3211–3217 (2012)
Quast, C., Pruesse, E., Yilmaz, P., et al.: The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41(D1), D590–D596 (2012)
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., Eddy, S.R.: Rfam: an RNA family database. Nucleic Acids Res. 31(1), 439–441 (2003)
Mikheenko, A., Saveliev, V., Gurevich, A.: MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32(7), 1088–1090 (2015)
Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357 (2012)
Rho, M., Tang, H., Ye, Y.: FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 38(20), e191 (2010)
UniProt Consortium: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45(D1), D158–D169 (2016)
Buchfink, B., Xie, C., Huson, D.H.: Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12(1), 59–60 (2015)
Anders, S., Pyl, P.T., Huber, W.: HTSeqa Python framework to work with high-throughput sequencing data. Bioinformatics 31(2), 166–169 (2015)
Love, M., Anders, S., Huber, W.: Differential analysis of count data – the DESeq2 package. Genome Biol. 15, 550 (2014)
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2015)
Angly, F.E., Willner, D., Rohwer, F., et al.: Grinder: a versatile amplicon and shotgun sequence simulator. Nucleic Acids Res. 40(12), 94 (2012)
NCBI Resource Coordinators: Database resources of the national center for biotechnology information. Nucleic Acids Res. 45(D1), D12–D17 (2017)
Frazee, A.C., Jaffe, A.E., Langmead, B., Leek, J.T.: Polyester: simulating RNA-seq datasets with differential transcript expression. Bioinformatics 31(17), 2778–2784 (2015)
Acknowledgements
This study was supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684) and BioTecNorte operation (NORTE-01-0145-FEDER-000004) funded by the European Regional Development Fund under the scope of Norte2020 - Programa Operacional Regional do Norte, and by the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013)/ERC Grant Agreement no. 323009.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sequeira, J.C., Rocha, M., Madalena Alves, M., Salvador, A.F. (2019). MOSCA: An Automated Pipeline for Integrated Metagenomics and Metatranscriptomics Data Analysis. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., González, P. (eds) Practical Applications of Computational Biology and Bioinformatics, 12th International Conference. PACBB2018 2018. Advances in Intelligent Systems and Computing, vol 803. Springer, Cham. https://doi.org/10.1007/978-3-319-98702-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-98702-6_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98701-9
Online ISBN: 978-3-319-98702-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)