High throughput sequencing (HTS) provides new research opportunities for work on non-model organisms, such as differential expression studies between populations exposed to different environmental conditions. However, such transcriptomic studies first require the production of a reference assembly. The choice of sampling procedure, sequencing strategy and assembly workflow is crucial. To develop a reliable reference transcriptome for Triatoma brasiliensis, the major Chagas disease vector in Northeastern Brazil, different de novo assembly protocols were generated using various datasets and software. Both 454 and Illumina sequencing technologies were applied on RNA extracted from antennae and mouthparts from single or pooled individuals. The 454 library yielded 278 Mb. Fifteen Illumina libraries were constructed and yielded nearly 360 million RNA-seq single reads and 46 million RNA-seq paired-end reads for nearly 45 Gb. For the 454 reads, we used three assemblers, Newbler, CAP3 and/or MIRA and for the Illumina reads, the Trinity assembler. Ten assembly workflows were compared using these programs separately or in combination. To compare the assemblies obtained, quantitative and qualitative criteria were used, including contig length, N50, contig number and the percentage of chimeric contigs. Completeness of the assemblies was estimated using the CEGMA pipeline. The best assembly (57,657 contigs, completeness of 80 %, <1 % chimeric contigs) was a hybrid assembly leading to recommend the use of (1) a single individual with large representation of biological tissues, (2) merging both long reads and short paired-end Illumina reads, (3) several assemblers in order to combine the specific advantages of each.
HTS De novo assembly Non-model organisms Chimeric contigs Chagas disease vector Triatoma
This is a preview of subscription content, log in to check access.
We would like to thank Rachel Legendre and Claire Toffano of Institut de Génétique et Microbiologie CNRS - UMR 8621 who gave us the script for 454 contig correction. We thank Marie-Christine François (iEES, INRA Versailles, France) for help with the T. brasiliensis RNA extractions. The authors are also very grateful to the engineers of the bioinformatics platforms Genouest at the University of Rennes 1 and eBio of the University Paris Sud for technical support. This work has benefited from the facilities and expertise of the HTS platform of IMAGIF (Centre de Recherche de Gif - www.imagif.cnrs.fr. This study was funded by the French Agence Nationale de la Recherche (ADAPTANTHROP project, ANR-097-PEXT-009) and supported by the labex Biodiversité, Agroécosystèmes, Société, Climat (BASC; University Paris Saclay, France). Marchant A. was funded by the Idex Paris Saclay, France.
Conflict of interest
The authors announce that they have not a financial relationship with the organization that sponsored the research. The authors declare that they have no conflict of interest.
Almeida CE, Pacheco RS, Haag K et al (2008) Inferring from the Cyt B gene the Triatoma brasiliensis Neiva, 1911 (Hemiptera: Reduviidae: Triatominae) genetic structure and domiciliary infestation in the state of Paraíba, Brazil. Am J Trop Med Hyg 78:791–802PubMedGoogle Scholar
Conesa (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21Google Scholar
Costa J (1999) The synanthropic process of Chagas disease vectors in Brazil, with special attention to Triatoma brasiliensis Neiva, 1911 (Hemiptera, Reduviidae, Triatominae) population, genetical, ecological, and epidemiological aspects. Mem Inst Oswaldo Cruz 94:239–241. doi:10.1590/S0074-02761999000700038CrossRefPubMedGoogle Scholar
Costa J, de Almeida JR, Britto C et al (1998) Ecotopes, natural infection and trophic resources of Triatoma brasiliensis (Hemiptera, Reduviidae, Triatominae). Mem Inst Oswaldo Cruz 93:7–13CrossRefPubMedGoogle Scholar
Croucher PJ, Brewer MS, Winchell CJ et al (2013) de novo characterization of the gene-rich transcriptomes of two color-polymorphic spiders, Theridion grallator and T. californicum (Araneae: Theridiidae), with special reference to pigment genes. BMC Genom 14:862. doi:10.1186/1471-2164-14-862CrossRefGoogle Scholar
Feldmeyer B, Wheat CW, Krezdorn N et al (2011) Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance. BMC Genom 12:317. doi:10.1186/1471-2164-12-317CrossRefGoogle Scholar
Haznedaroglu BZ, Reeves D, Rismani-Yazdi H, Peccia J (2012) Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms. BMC Bioinform 13:170. doi:10.1186/1471-2105-13-170CrossRefGoogle Scholar
Karatolos N, Pauchet Y, Wilkinson P et al (2011) Pyrosequencing the transcriptome of the greenhouse whitefly, Trialeurodes vaporariorum reveals multiple transcripts encoding insecticide targets and detoxifying enzymes. BMC Genom 12:56. doi:10.1186/1471-2164-12-56CrossRefGoogle Scholar
Knudsen B, Knudsen T, Flensborg M et al (2007) CLC Genomics Workbench. Version 5:5Google Scholar
Vijay N, Poelstra JW, Künstner A, Wolf JBW (2013) Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments. Mol Ecol 22:620–634. doi:10.1111/mec.12014CrossRefPubMedGoogle Scholar