Obtaining the Most Accurate de novo Transcriptomes for Non-model Organisms: The Case of Castanea sativa

Espigares, Marina; Seoane, Pedro; Bautista, Rocío; Quintana, Julia; Gómez, Luis; Claros, M. Gonzalo

doi:10.1007/978-3-319-56154-7_44

Marina Espigares¹⁵,
Pedro Seoane¹⁵,
Rocío Bautista¹⁶,
Julia Quintana¹⁷,
Luis Gómez^17,18 &
…
M. Gonzalo Claros¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10209))

Included in the following conference series:

International Conference on Bioinformatics and Biomedical Engineering

1852 Accesses

Abstract

Gene expression analyses of non-model organisms must start with the construction of a high accurate de novo transcriptome as a reference. The best way to determine the suitability of any de novo transcriptome assembling is its comparison with other well-known “reference” transcriptomes. In this study, we took six complete plant transcriptomes (Arabidopsis thaliana, Vitis vinifera, Zea mays, Populus trichocarpa, Triticum aestivum and Oryza sativa) and compared all of them using a series of metrics system for a principal component analysis, resulting that A. thaliana and P. trichocarpa were the best references. This has been automated using AutoFlow. A primary assembly of short reads from Illumina Platform (50 nt, single reads) and long reads from Roche-454 technology from Castanea sativa was performed individually using k-mers from 25 to 35 and different assemblers (Oases v2, SOAPdenovoTrans, RAY, MIRA4 and MINIMUS). The resulting contigs were then reconciled with the aim of obtaining the best transcriptome. Oases and SOAP were used for the assembling of short reads, MIRA and MINIMUS for the assembling of long reads or the reconciliations, and RAY, that can compute de novo transcript assembling from heterogeneous (long and short reads) next-generation sequencing data, was included to avoid the reconciliation step. A total of 90 different assemblies were generated in a single run of the pipeline. A hierarchical clustering on the PCA components (HCPC) was implemented to automatically identify the best assembling strategies based on the shortest distance in HCPC to the two plant reference transcriptomes is selected. In this approach, reconciliation of Roche/454 long reads with Illumina contigs produce more complete and accurate gene reconstructions than other combinations. Surprisingly, reconstructions based only on Illumina and the ones creates with RAY seem to be less accurate. For this specific study, the most complete and accurate transcriptome corresponds to the Illumina contigs obtained with SOAPdenovoTrans and reassembled with 454 long reads using MIRA4. This is only a one example of a transcriptome building. Many other assembling can be performed just changing parameters, k-mers, sequencing technology, assemblers, reference organisms, etc. The pipeline in AutoFlow is easily customizable for those purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms

Article Open access 20 November 2018

Hybrid transcriptome sequencing approach improved assembly and gene annotation in Cynara cardunculus (L.)

Article Open access 21 August 2020

Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows

Article Open access 21 October 2023

References

Quintana, J.: Molecular tools to improve chestnut management: El Bierzo as case of study, Ph.D. Thesis (2015)
Google Scholar
Seoane, P., Ocaña, S., Carmona, R., Bautista, R., Madrid, E., Torres, A.M., Claros, M.G.: AutoFlow, a versatile workflow engine illustrated by assembling an optimised de novo transcriptome for a non-model species, such as Faba Bean (Vicia faba). Curr. Bioinform. 11(4), 440–450 (2016)
Article Google Scholar
Ocana, S., Seoane, P., Bautista, R., Palomino, C., Claros, G.M., Torres, A.M., Madrid, E.: Large-scale transcriptome analysis in Faba Bean (Vicia Faba L.) under ascochyta fabae infection. PLoS ONE 10(8), 1–17 (2015)
Article Google Scholar
Carmona, R., Zafra, A., Seoane, P., Castro, A., Guerrero-Fernández, D., Castillo-Castillo, T., Medina-García, A., Cánovas, F.M., Aldana-Montes, J.F., Navas-Delgado, I., Alché, J.D., Claros, M.G.: ReprOlive: a database with linked data for the olive tree (Olea europaea L.) reproductive transcriptome. Front. Plant Sci. 6, 625 (2015)
Article Google Scholar
Schulz, M.H., Zerbino, D.R., Vingron, M., Birney, E.: Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28, 1086–1092 (2012)
Article Google Scholar
Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y.Y.Y.Y., Tang, J., Wu, G., Zhang, H., Shi, Y., Liu, Y.Y.Y.Y., Yu, C., Wang, B., Lu, Y., Han, C., Cheung, D.W., Yiu, S.-M., Peng, S., Xiaoqian, Z., Liu, G., Liao, X., Li, Y., Yang, H., Wang, J.J., Lam, T.-W., Wang, J.J.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1(1), 18 (2012)
Article Google Scholar
Boisvert, S., Laviolette, F., Corbeil, J.: Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. J. Comput. Biol. 17(11), 1519–1533 (2010)
Article MathSciNet Google Scholar
Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A. 98, 9748–9753 (2001)
Article MATH MathSciNet Google Scholar
Chevreux, B., Pfisterer, T., Drescher, B., Driesel, A.J., Müller, W.E.G., Wetter, T., Suhai, S.: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 14(6), 1147–1159 (2004)
Article Google Scholar
Huang, X., Madan, A.: CAP3: a DNA sequence assembly program. Genome Res. 9(9), 868–877 (1999)
Article Google Scholar
Sommer, D.D., Delcher, A.L., Salzberg, S.L., Pop, M.: Minimus: a fast, lightweight genome assembler. BMC Bioinform. 8, 64 (2007)
Article Google Scholar
Boisvert, S., Raymond, F., Godzaridis, E., Laviolette, F., Corbeil, J.: Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol. 13(12), R122 (2012)
Article Google Scholar
Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., Zdobnov, E.M.: BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19), 3210–3212 (2015)
Article Google Scholar
Lê, S., Josse, J., Husson, F.: FactoMineR: an R package for multivariate analysis. J. Stat. Softw. 25(1), 1–18 (2008)
Article Google Scholar
Husson, F., Josse, J., Pagès, J.: Principal component methods - hierarchical clustering - partitional clustering: why would we need to choose for visualizing data? Technical report, pp. 1–17 (2010)
Google Scholar

Download references

Acknowledgments

This work has been supported by co-funding from the ERDF (European Regional Development Fund) 2014-2020 “Programa Operativo de Crecimiento Inteligente” to the grant RTA2013-00068-C03 of the Spanish INIA and MINECO. The authors also thankfully acknowledge the computer resources and the technical support provided by the Plataforma Andaluza de Bioinformática of the University of Málaga.

Author information

Authors and Affiliations

Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, Malaga, Spain
Marina Espigares, Pedro Seoane & M. Gonzalo Claros
Plataforma Andaluza de Bioinformática, Universidad de Málaga, Malaga, Spain
Rocío Bautista
Departamento de Sistemas y Recursos Naturales, ETSI Forestal, de Montes y del Medio Natural, Universidad Politécnica de Madrid, 28040, Madrid, Spain
Julia Quintana & Luis Gómez
CBGP UPM-INIA, Universidad Politécnica de Madrid, Campus de Montegancedo, 28223, Pozuelo de Alarcón, Madrid, Spain
Luis Gómez

Authors

Marina Espigares
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Seoane
View author publications
You can also search for this author in PubMed Google Scholar
Rocío Bautista
View author publications
You can also search for this author in PubMed Google Scholar
Julia Quintana
View author publications
You can also search for this author in PubMed Google Scholar
Luis Gómez
View author publications
You can also search for this author in PubMed Google Scholar
M. Gonzalo Claros
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Gonzalo Claros .

Editor information

Editors and Affiliations

Universidad de Granada, Granada, Spain
Ignacio Rojas
Universidad de Granada, Granada, Spain
Francisco Ortuño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Espigares, M., Seoane, P., Bautista, R., Quintana, J., Gómez, L., Claros, M.G. (2017). Obtaining the Most Accurate de novo Transcriptomes for Non-model Organisms: The Case of Castanea sativa . In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-56154-7_44
Published: 01 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56153-0
Online ISBN: 978-3-319-56154-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Obtaining the Most Accurate de novo Transcriptomes for Non-model Organisms: The Case of Castanea sativa

Abstract

Access this chapter

Similar content being viewed by others

TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms

Hybrid transcriptome sequencing approach improved assembly and gene annotation in Cynara cardunculus (L.)

Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Obtaining the Most Accurate de novo Transcriptomes for Non-model Organisms: The Case of Castanea sativa

Abstract

Access this chapter

Similar content being viewed by others

TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms

Hybrid transcriptome sequencing approach improved assembly and gene annotation in Cynara cardunculus (L.)

Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation