RNA Sequencing and Quantitation Using the Helicos Genetic Analysis System
The recent transition in gene expression analysis technology to ultra high-throughput cDNA sequencing provides a means for higher quantitation sensitivity across a wider dynamic range than previously possible. Sensitivity of detection is mostly a function of the sheer number of sequence reads generated. Typically, RNA is converted to cDNA using random hexamers and the cDNA is subsequently sequenced (RNA-Seq). With this approach, higher read numbers are generated for long transcripts as compared to short ones. This length bias necessitates the generation of very high read numbers to achieve sensitive quantitation of short, low-expressed genes. To eliminate this length bias, we have developed an ultra high-throughput sequencing approach where only a single read is generated for each transcript molecule (single-molecule sequencing Digital Gene Expression (smsDGE)). So, for example, equivalent quantitation accuracy of the yeast transcriptome can be achieved by smsDGE using only 25% of the reads that would be required using RNA-Seq. For sample preparation, RNA is first reverse-transcribed into single-stranded cDNA using oligo-dT as a primer. A poly-A tail is then added to the 3′ ends of cDNA to facilitate the hybridization of the sample to the Helicos® single-molecule sequencing Flow-Cell to which a poly dT oligo serves as the substrate for subsequent sequencing by synthesis. No PCR, sample-size selection, or ligation steps are required, thus avoiding possible biases that may be introduced by such manipulations. Each tailed cDNA sample is injected into one of 50 flow-cell channels and sequenced on the Helicos® Genetic Analysis System. Thus, 50 samples are sequenced simultaneously generating 10–20 million sequence reads on average for each sample channel. The sequence reads can then be aligned to the reference of choice such as the transcriptome, for quantitation of known transcripts, or the genome for novel transcript discovery. This chapter provides a summary of the methods required for smsDGE.
Key wordsSingle-molecule sequencing smsDGE Expression analysis
- 3.Oshlack, A., and Wakefield, M. J. (2009) Transcript length bias in RNA-seq data confounds systems biology. Biol. Direct 9, 4–14.Google Scholar
- 6.Perkins, T. T., Kingsley, R. A., Fookes, M. C., Gardner, P. P., James, K. D., et al. (2009) A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi. PLoS Genet. 5, e1000569.Google Scholar
- 7.Yoder-Himes, D. R., Chain, P. S., Zhu, Y., Wurtzel, O., Rubin, E. M., et al. (2009) Mapping the Burkholderia cenocepacia niche response via high-throughput sequencing. Proc. Natl. Acad. Sci. U.S. A. 106, 3976 –3981.Google Scholar