Transcriptome Sequencing Goals, Assembly, and Assessment
Transcriptome sequencing provides quick, direct access to the mRNA. With this information, one can design primers for PCR of thousands of different genes, SNP markers, probes for microarrays and qPCR, or just use the sequence data itself in comparative studies. Transcriptome sequencing, while getting cheaper, is still an expensive endeavor, with an examination of data quality and its assembly infrequently performed in depth. Here, we outline many of the important issues we think need consideration when starting a transcriptome sequencing project. We also walk the reader through a detailed analysis of an example transcriptome dataset, highlighting the importance of both within-dataset analysis and comparative inferences. Our hope is that with greater attention focused upon assessing assembly performance, advances in transcriptome assembly will increase as prices continue to drop and new technologies, such as Illumina sequencing, start to be used.
Key wordsNext generation sequencing cDNA Transcriptome Assembly
The authors would like to thank many of our colleagues over the past couple of years who have shared their experiences with us. CWW would additionally like to thank W. Stephan, R. Butlin, E. Randi, and D. Tautz for invitations to speak at, and learn from, various Next Generation Sequencing workshops over the past year. CWW would also like to thank Jim Marden for his initial experience with 454 sequencing, as it was he who decided that 454 sequencing could be used for the transcriptome. Support for this work comes from the Max Planck Gesellschaft and Finnish Academy Grant 131155.
- 1.Birol I, Jackman S, Nielsen C et al (2009) De novo transcriptome assembly with ABySS. Bioinformatics 25:2872–2877Google Scholar
- 2.Wheat CW (2010) Rapidly developing functional genomics in ecological model systems via 454 transcriptome sequencing. Genetica 138:433–51Google Scholar
- 3.Wray GA (2007) The evolutionary significance of cis-regulatory mutations. Nat Rev Genet 8:206–216Google Scholar
- 4.Vera JC, Wheat C, Fescemyer HW et al (2008) Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Mol Ecol 17:1636–1647Google Scholar
- 5.Weber APM, Weber KL, Carr K et al (2007) Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing. Plant Physiology 144:32–42Google Scholar
- 6.Elmer K, Fan S, Gunter H et al (2010) Rapid evolution and selection inferred from the transcriptomes of sympatric crater lake cichlid fishes. Mol Ecol 19:197–211Google Scholar
- 7.Steiner C, Rompler H, Boettger L et al (2008) The genetic basis of phenotypic convergence in beach mice: similar pigment patterns but different genes. Mol Biol Evol 26:35–45Google Scholar
- 8.Oleksiak M, Roach J, Crawford D (2004) Natural variation in cardiac metabolism and gene expression in Fundulus heteroclitus. Nature Genetics 37:67–72Google Scholar
- 9.Huse SM, Huber JA, Morrison HG et al (2007) Accuracy and quality of massively-parallel DNA pyrosequencing. Genome Biology 8:R143Google Scholar
- 10.Chevreux B, Pfisterer T, Drescher B et al (2004) Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 14:1147–1159Google Scholar
- 11.Drosophila 12 Genomes Consortium (2007) Evolution of genes and genomes on the Drosophila phylogeny. Nature 450:203–218Google Scholar
- 12.Tamura K, Subramanian S, Kumar S (2004) Temporal patterns of fruit fly (Drosophila) evolution revealed by mutation clocks. Mol Biol Evol 21:36–44Google Scholar