, Volume 143, Issue 2, pp 225–239 | Cite as

De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease

  • A. MarchantEmail author
  • F. Mougel
  • C. Almeida
  • E. Jacquin-Joly
  • J. Costa
  • M. HarryEmail author


High throughput sequencing (HTS) provides new research opportunities for work on non-model organisms, such as differential expression studies between populations exposed to different environmental conditions. However, such transcriptomic studies first require the production of a reference assembly. The choice of sampling procedure, sequencing strategy and assembly workflow is crucial. To develop a reliable reference transcriptome for Triatoma brasiliensis, the major Chagas disease vector in Northeastern Brazil, different de novo assembly protocols were generated using various datasets and software. Both 454 and Illumina sequencing technologies were applied on RNA extracted from antennae and mouthparts from single or pooled individuals. The 454 library yielded 278 Mb. Fifteen Illumina libraries were constructed and yielded nearly 360 million RNA-seq single reads and 46 million RNA-seq paired-end reads for nearly 45 Gb. For the 454 reads, we used three assemblers, Newbler, CAP3 and/or MIRA and for the Illumina reads, the Trinity assembler. Ten assembly workflows were compared using these programs separately or in combination. To compare the assemblies obtained, quantitative and qualitative criteria were used, including contig length, N50, contig number and the percentage of chimeric contigs. Completeness of the assemblies was estimated using the CEGMA pipeline. The best assembly (57,657 contigs, completeness of 80 %, <1 % chimeric contigs) was a hybrid assembly leading to recommend the use of (1) a single individual with large representation of biological tissues, (2) merging both long reads and short paired-end Illumina reads, (3) several assemblers in order to combine the specific advantages of each.


HTS De novo assembly Non-model organisms Chimeric contigs Chagas disease vector Triatoma 



We would like to thank Rachel Legendre and Claire Toffano of Institut de Génétique et Microbiologie CNRS - UMR 8621 who gave us the script for 454 contig correction. We thank Marie-Christine François (iEES, INRA Versailles, France) for help with the T. brasiliensis RNA extractions. The authors are also very grateful to the engineers of the bioinformatics platforms Genouest at the University of Rennes 1 and eBio of the University Paris Sud for technical support. This work has benefited from the facilities and expertise of the HTS platform of IMAGIF (Centre de Recherche de Gif - This study was funded by the French Agence Nationale de la Recherche (ADAPTANTHROP project, ANR-097-PEXT-009) and supported by the labex Biodiversité, Agroécosystèmes, Société, Climat (BASC; University Paris Saclay, France). Marchant A. was funded by the Idex Paris Saclay, France.

Conflict of interest

The authors announce that they have not a financial relationship with the organization that sponsored the research. The authors declare that they have no conflict of interest.

Supplementary material

10709_2014_9790_MOESM1_ESM.tar (329 kb)
Supplementary material 1 (TAR 329 kb)
10709_2014_9790_MOESM2_ESM.tar (282 kb)
Supplementary material 2 (TAR 282 kb)
10709_2014_9790_MOESM3_ESM.tar (266 kb)
Supplementary material 3 (TAR 266 kb)
10709_2014_9790_MOESM4_ESM.tif (13.1 mb)
Supplementary material 4 (TIFF 13369 kb)
10709_2014_9790_MOESM5_ESM.docx (12 kb)
Supplementary material 5 (DOCX 12 kb)


  1. Almeida CE, Pacheco RS, Haag K et al (2008) Inferring from the Cyt B gene the Triatoma brasiliensis Neiva, 1911 (Hemiptera: Reduviidae: Triatominae) genetic structure and domiciliary infestation in the state of Paraíba, Brazil. Am J Trop Med Hyg 78:791–802PubMedGoogle Scholar
  2. Bai X, Mamidala P, Rajarapu SP et al (2011) Transcriptomics of the bed bug (Cimex lectularius). PLoS ONE 6:e16336. doi: 10.1371/journal.pone.0016336 CrossRefPubMedCentralPubMedGoogle Scholar
  3. Bonen L (1993) Trans-splicing of pre-mRNA in plants, animals, and protists. FASEB J 7:40–46PubMedGoogle Scholar
  4. Borges ÉC, Dujardin J-P, Schofield CJ et al (2005) Dynamics between sylvatic, peridomestic and domestic populations of Triatoma brasiliensis (Hemiptera: Reduviidae) in Ceará State, Northeastern Brazil. Acta Trop 93:119–126. doi: 10.1016/j.actatropica.2004.10.002 CrossRefPubMedGoogle Scholar
  5. Cahais V, Gayral P, Tsagkogeorga G et al (2012) Reference-free transcriptome assembly in non-model animals from next-generation sequencing data. Mol Ecol Resour 12:834–845. doi: 10.1111/j.1755-0998.2012.03148.x CrossRefPubMedGoogle Scholar
  6. Chevreux B, Pfisterer T, Drescher B et al (2004) Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 14:1147–1159. doi: 10.1101/gr.1917404 CrossRefPubMedCentralPubMedGoogle Scholar
  7. Conesa (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21Google Scholar
  8. Costa J (1999) The synanthropic process of Chagas disease vectors in Brazil, with special attention to Triatoma brasiliensis Neiva, 1911 (Hemiptera, Reduviidae, Triatominae) population, genetical, ecological, and epidemiological aspects. Mem Inst Oswaldo Cruz 94:239–241. doi: 10.1590/S0074-02761999000700038 CrossRefPubMedGoogle Scholar
  9. Costa J, de Almeida JR, Britto C et al (1998) Ecotopes, natural infection and trophic resources of Triatoma brasiliensis (Hemiptera, Reduviidae, Triatominae). Mem Inst Oswaldo Cruz 93:7–13CrossRefPubMedGoogle Scholar
  10. Costa J, Almeida CE, Dotson EM et al (2003) The epidemiologic importance of Triatoma brasiliensis as a Chagas disease vector in Brazil: a revision of domiciliary captures during 1993-1999. Mem Inst Oswaldo Cruz 98:443–449. doi: 10.1590/S0074-02762003000400002 CrossRefPubMedGoogle Scholar
  11. Croucher PJ, Brewer MS, Winchell CJ et al (2013) de novo characterization of the gene-rich transcriptomes of two color-polymorphic spiders, Theridion grallator and T. californicum (Araneae: Theridiidae), with special reference to pigment genes. BMC Genom 14:862. doi: 10.1186/1471-2164-14-862 CrossRefGoogle Scholar
  12. Danecek P, Auton A, Abecasis G et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158. doi: 10.1093/bioinformatics/btr330 CrossRefPubMedCentralPubMedGoogle Scholar
  13. Development Core Team R (2008) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  14. Dias JCP, Silveira AC, Schofield CJ (2002) The impact of Chagas disease control in Latin America: a review. Mem Inst Oswaldo Cruz 97:603–612. doi: 10.1590/S0074-02762002000500002 CrossRefPubMedGoogle Scholar
  15. Ekblom R, Galindo J (2011) Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity 107:1–15. doi: 10.1038/hdy.2010.152 CrossRefPubMedCentralPubMedGoogle Scholar
  16. Feldmeyer B, Wheat CW, Krezdorn N et al (2011) Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance. BMC Genom 12:317. doi: 10.1186/1471-2164-12-317 CrossRefGoogle Scholar
  17. Francis WR, Christianson LM, Kiko R et al (2013) A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly. BMC Genom 14:167. doi: 10.1186/1471-2164-14-167 CrossRefGoogle Scholar
  18. Glaser N, Gallot A, Legeai F et al (2013) Candidate chemosensory genes in the stemborer Sesamia nonagrioides. Int J Biol Sci 9:481–495. doi: 10.7150/ijbs.6109 CrossRefPubMedCentralPubMedGoogle Scholar
  19. Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652. doi: 10.1038/nbt.1883 CrossRefPubMedCentralPubMedGoogle Scholar
  20. Hashimoto K, Schofield CJ (2012) Elimination of Rhodnius prolixus in Central America. Parasit Vectors 5:45. doi: 10.1186/1756-3305-5-45 CrossRefPubMedCentralPubMedGoogle Scholar
  21. Haznedaroglu BZ, Reeves D, Rismani-Yazdi H, Peccia J (2012) Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms. BMC Bioinform 13:170. doi: 10.1186/1471-2105-13-170 CrossRefGoogle Scholar
  22. Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res 9:868–877. doi: 10.1101/gr.9.9.868 CrossRefPubMedCentralPubMedGoogle Scholar
  23. Hull JJ, Geib SM, Fabrick JA, Brent CS (2013) Sequencing and de novo assembly of the western tarnished plant bug (Lygus hesperus) transcriptome. PLoS ONE 8:e55105. doi: 10.1371/journal.pone.0055105 CrossRefPubMedCentralPubMedGoogle Scholar
  24. Karatolos N, Pauchet Y, Wilkinson P et al (2011) Pyrosequencing the transcriptome of the greenhouse whitefly, Trialeurodes vaporariorum reveals multiple transcripts encoding insecticide targets and detoxifying enzymes. BMC Genom 12:56. doi: 10.1186/1471-2164-12-56 CrossRefGoogle Scholar
  25. Knudsen B, Knudsen T, Flensborg M et al (2007) CLC Genomics Workbench. Version 5:5Google Scholar
  26. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. doi: 10.1093/bioinformatics/btp324 CrossRefPubMedCentralPubMedGoogle Scholar
  27. Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659. doi: 10.1093/bioinformatics/btl158 CrossRefPubMedGoogle Scholar
  28. Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. doi: 10.1093/bioinformatics/btp352 CrossRefPubMedCentralPubMedGoogle Scholar
  29. Liu S, Chougule NP, Vijayendran D, Bonning BC (2012) Deep sequencing of the transcriptomes of Soybean aphid and associated endosymbionts. PLoS ONE 7:e45161. doi: 10.1371/journal.pone.0045161 CrossRefPubMedCentralPubMedGoogle Scholar
  30. Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764–770. doi: 10.1093/bioinformatics/btr011 CrossRefPubMedCentralPubMedGoogle Scholar
  31. Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10–12CrossRefGoogle Scholar
  32. Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12:671–682. doi: 10.1038/nrg3068 CrossRefPubMedGoogle Scholar
  33. Martin J, Bruno VM, Fang Z et al (2010) Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genom 11:663. doi: 10.1186/1471-2164-11-663 CrossRefGoogle Scholar
  34. Mundry M, Bornberg-Bauer E, Sammeth M, Feulner PGD (2012) Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach. PLoS ONE 7:e31410. doi: 10.1371/journal.pone.0031410 CrossRefPubMedCentralPubMedGoogle Scholar
  35. Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23:1061–1067. doi: 10.1093/bioinformatics/btm071 CrossRefPubMedGoogle Scholar
  36. Paszkiewicz K, Studholme DJ (2010) De novo assembly of short sequence reads. Brief Bioinform 11:457–472. doi: 10.1093/bib/bbq020 CrossRefPubMedGoogle Scholar
  37. Poivet E, Gallot A, Montagné N et al (2013) A comparison of the olfactory gene repertoires of adults and larvae in the noctuid moth Spodoptera littoralis. PLoS ONE 8:e60263. doi: 10.1371/journal.pone.0060263 CrossRefPubMedCentralPubMedGoogle Scholar
  38. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. doi: 10.1093/bioinformatics/btq033 CrossRefPubMedCentralPubMedGoogle Scholar
  39. Ren X, Liu T, Dong J et al (2012) Evaluating de Bruijn graph assemblers on 454 transcriptomic data. PLoS ONE 7:e51188. doi: 10.1371/journal.pone.0051188 CrossRefPubMedCentralPubMedGoogle Scholar
  40. Robertson G, Schein J, Chiu R et al (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7:909–912. doi: 10.1038/nmeth.1517 CrossRefPubMedGoogle Scholar
  41. Sammeth M (2009) Complete alternative splicing events are bubbles in splicing graphs. J Comput Biol 16:1117–1140. doi: 10.1089/cmb.2009.0108 CrossRefPubMedGoogle Scholar
  42. Santos A, Ribeiro JMC, Lehane MJ et al (2007) The sialotranscriptome of the blood-sucking bug Triatoma brasiliensis (Hemiptera, Triatominae). Insect Biochem Mol Biol 37:702–712. doi: 10.1016/j.ibmb.2007.04.004 CrossRefPubMedCentralPubMedGoogle Scholar
  43. Schmieder R, Edwards R (2011a) Quality control and preprocessing of metagenomic datasets. Bioinformatics 27:863–864. doi: 10.1093/bioinformatics/btr026 CrossRefPubMedCentralPubMedGoogle Scholar
  44. Schmieder R, Edwards R (2011b) Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS ONE 6:e17288. doi: 10.1371/journal.pone.0017288 CrossRefPubMedCentralPubMedGoogle Scholar
  45. Schulz MH, Zerbino DR, Vingron M, Birney E (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28:1086–1092. doi: 10.1093/bioinformatics/bts094 CrossRefPubMedCentralPubMedGoogle Scholar
  46. Sengupta S, Bolin JM, Ruotti V et al (2011) Single read and paired end mRNA-seq Illumina libraries from 10 nanograms total RNA. J Vis Exp. doi: 10.3791/3340 Google Scholar
  47. Stapley J, Reger J, Feulner PGD et al (2010) Adaptation genomics: the next generation. Trends Ecol Evol 25:705–712. doi: 10.1016/j.tree.2010.09.002 CrossRefPubMedGoogle Scholar
  48. Surget-Groba Y, Montoya-Burgos JI (2010) Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res 20:1432–1440. doi: 10.1101/gr.103846.109 CrossRefPubMedCentralPubMedGoogle Scholar
  49. Vijay N, Poelstra JW, Künstner A, Wolf JBW (2013) Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments. Mol Ecol 22:620–634. doi: 10.1111/mec.12014 CrossRefPubMedGoogle Scholar
  50. Werner T (2010) Next generation sequencing in functional genomics. Brief Bioinform 11:499–511. doi: 10.1093/bib/bbq018 CrossRefPubMedGoogle Scholar
  51. Xie Y, Wu G, Tang J, et al. (2013) SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. ArXiv13056760 Q-BioGoogle Scholar
  52. Zhao Q-Y, Wang Y, Kong Y-M et al (2011) Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinform 12:S2. doi: 10.1186/1471-2105-12-S14-S2 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Laboratoire Evolution, Génomes et Spéciation LEGS, UPR 9034CNRSGif-sur-YvetteFrance
  2. 2.Université Paris SudOrsayFrance
  3. 3.Departamento de Ciências Biológicas, Faculdade de Ciências FarmacêuticasUNESPAraraquaraBrazil
  4. 4.INRA, UMR 1392Institut d’Ecologie et des Sciences de l’Environnement de ParisVersaillesFrance
  5. 5.Laboratório de Biodiversidade EntomológicaInstituto Oswaldo Cruz, FiocruzRio de JaneiroBrazil

Personalised recommendations