Global Assembly of Expressed Sequence Tags
The method for the construction of Expressed Sequence Tag (EST) assemblies described here uses reads generated from 454 pyrosequencing and Sanger and Illumina (Solexa) sequencing technologies as input. It is consistent with and parallels many established EST assembly protocols, for example the TIGR Gene Indices. Reads that are used as input to the EST assembly process usually come from both internal and external sources. Thus, in addition to internally generated EST reads, expressed transcripts are collected from dbEST and also the NCBI GenBank nucleotide database (full-length and partial cDNAs). “Virtual” transcript sequences derived from whole genome annotation projects can be excluded, depending on the needs of the project. Currently, in most cases, 454-derived sequences can be treated similar to Sanger-derived ESTs. In contrast, the shorter Solexa-derived sequences will have to undergo a round of either de novo assembly or an “align-then-assemble” approach against a reference genome, if available, before these transcripts can be used for the purpose of a global EST assembly that combines a mixture of Sanger and next-generation sequencing technologies.
Key wordsExpressed Sequenced Tags Sequencing EST assembly 454 pyrosequencing Sanger Illumina Solexa
- 4.Samuel Yang S, Cheung F, Lee J, Ha M, Wei N, Sze S, Stelly D, Thaxton P, Triplett B, Town C, Jeffrey Chen Z (2006) Accumulation of genome-specific transcripts, transcription factors and phytohormonal regulators during early stages of fiber cell development in allotetraploid cotton. Plant J 47:761–775PubMedCrossRefGoogle Scholar
- 5.Nishiyama T, Fujita T, Shin-I T, Seki M, Nishide H, Uchiyama I, Kamiya A, Carninci P, Hayashizaki Y, Shinozaki K, Kohara Y, Hasebe M (2003) Comparative genomics of Physcomitrella patens gametophytic transcriptome and Arabidopsis thaliana: implication for land plant evolution. Proc Natl Acad Sci USA 100:8007–8012PubMedCrossRefGoogle Scholar
- 10.Kuhl JC, Cheung F, Yuan Q, Martin W, Zewdie Y, McCallum J, Catanach A, Rutherford P, Sink KC, Jenderek M, Prince JP, Town CD, Havey MJ (2004) A unique set of 11,008 onion expressed sequence tags reveals expressed sequence and genomic differences between the monocot orders Asparagales and Poales. Plant Cell 16:114–125PubMedCrossRefGoogle Scholar
- 24.Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652PubMedCrossRefGoogle Scholar