Skip to main content

Global Assembly of Expressed Sequence Tags

  • Protocol
  • First Online:
RNA Abundance Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 883))

  • 3619 Accesses

Abstract

The method for the construction of Expressed Sequence Tag (EST) assemblies described here uses reads generated from 454 pyrosequencing and Sanger and Illumina (Solexa) sequencing technologies as input. It is consistent with and parallels many established EST assembly protocols, for example the TIGR Gene Indices. Reads that are used as input to the EST assembly process usually come from both internal and external sources. Thus, in addition to internally generated EST reads, expressed transcripts are collected from dbEST and also the NCBI GenBank nucleotide database (full-length and partial cDNAs). “Virtual” transcript sequences derived from whole genome annotation projects can be excluded, depending on the needs of the project. Currently, in most cases, 454-derived sequences can be treated similar to Sanger-derived ESTs. In contrast, the shorter Solexa-derived sequences will have to undergo a round of either de novo assembly or an “align-then-assemble” approach against a reference genome, if available, before these transcripts can be used for the purpose of a global EST assembly that combines a mixture of Sanger and next-generation sequencing technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cheung F, Haas B, Goldberg S, May G, Xiao Y, Town C (2006) Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology. BMC Genomics 7:272

    Article  PubMed  Google Scholar 

  2. Bourdon V, Naef F, Rao P, Reuter V, Mok S, Bosl G, Koul S, Murty V, Kucherlapati R, Chaganti R (2002) Genomic and expression analysis of the 12p11-p12 amplicon using EST arrays identifies two novel amplified and overexpressed genes. Cancer Res 62:6218–6223

    PubMed  CAS  Google Scholar 

  3. Ewing R, Ben Kahla A, Poirot O, Lopez F, Audic S, Claverie J (1999) Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. Genome Res 9:950–959

    Article  PubMed  CAS  Google Scholar 

  4. Samuel Yang S, Cheung F, Lee J, Ha M, Wei N, Sze S, Stelly D, Thaxton P, Triplett B, Town C, Jeffrey Chen Z (2006) Accumulation of genome-specific transcripts, transcription factors and phytohormonal regulators during early stages of fiber cell development in allotetraploid cotton. Plant J 47:761–775

    Article  PubMed  CAS  Google Scholar 

  5. Nishiyama T, Fujita T, Shin-I T, Seki M, Nishide H, Uchiyama I, Kamiya A, Carninci P, Hayashizaki Y, Shinozaki K, Kohara Y, Hasebe M (2003) Comparative genomics of Physcomitrella patens gametophytic transcriptome and Arabidopsis thaliana: implication for land plant evolution. Proc Natl Acad Sci USA 100:8007–8012

    Article  PubMed  CAS  Google Scholar 

  6. Gupta P, Rustgi S (2004) Molecular markers from the transcribed/expressed region of the genome in higher plants. Funct Integr Genomics 4:139–162

    Article  PubMed  CAS  Google Scholar 

  7. Mian M, Saha M, Hopkins A, Wang Z (2005) Use of tall fescue EST-SSR markers in phylogenetic analysis of cool-season forage grasses. Genome 48:637–647

    Article  PubMed  CAS  Google Scholar 

  8. Rafalski A (2002) Applications of single nucleotide polymorphisms in crop genetics. Curr Opin Plant Biol 5:94–100

    Article  PubMed  CAS  Google Scholar 

  9. Varshney R, Thiel T, Stein N, Langridge P, Graner A (2002) In silico analysis on frequency and distribution of microsatellites in ESTs of some cereal species. Cell Mol Biol Lett 7:537–546

    PubMed  CAS  Google Scholar 

  10. Kuhl JC, Cheung F, Yuan Q, Martin W, Zewdie Y, McCallum J, Catanach A, Rutherford P, Sink KC, Jenderek M, Prince JP, Town CD, Havey MJ (2004) A unique set of 11,008 onion expressed sequence tags reveals expressed sequence and genomic differences between the monocot orders Asparagales and Poales. Plant Cell 16:114–125

    Article  PubMed  Google Scholar 

  11. Han Y, Kang Y, Torres-Jerez I, Cheung F, Town CD, Zhao PX, Udvardi MK, Monteros MJ (2011) Genome-wide SNP discovery in tetraploid alfalfa using 454 sequencing and high resolution melting analysis. BMC Genomics 12:350

    Article  CAS  Google Scholar 

  12. Yang S, Tu ZJ, Cheung F, Xu WW, Lamb JF, Jung HJ, Vance CP, Gronwald JW (2011) Using RNA-Seq for gene identification, polymorphism detection and transcript profiling in two alfalfa genotypes with divergent cell wall composition in stems. BMC Genomics 12:199

    Article  PubMed  CAS  Google Scholar 

  13. Cheung F, Win J, Lang J, Hamilton J, Vuong H, Leach J, Kamoun S, André Lévesque C, Tisserat N, Buell C (2008) Analysis of the Pythium ultimum transcriptome using Sanger and Pyrosequencing approaches. BMC Genomics 9:542

    Article  PubMed  Google Scholar 

  14. Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, Tsai J, Quackenbush J (2003) TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19:651–652

    Article  PubMed  CAS  Google Scholar 

  15. Childs K, Hamilton J, Zhu W, Ly E, Cheung F, Wu H, Rabinowicz P, Town C, Buell C, Chan A (2007) The TIGR Plant Transcript Assemblies database. Nucleic Acids Res 35:D846–D851

    Article  PubMed  CAS  Google Scholar 

  16. Boguski M, Lowe T, Tolstoshev C (1993) dbEST-database for “expressed sequence tags”. Nat Genet 4:332–333

    Article  PubMed  CAS  Google Scholar 

  17. Falgueras J, Lara A, Fernández-Pozo N, Cantón F, Pérez-Trabado G, Claros M (2010) SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read. BMC Bioinformatics 11:38

    Article  PubMed  Google Scholar 

  18. Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16:276–277

    Article  PubMed  CAS  Google Scholar 

  19. Goecks J, Nekrutenko A, Taylor J, Team G (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11:R86

    Article  PubMed  Google Scholar 

  20. Langmead B, Trapnell C, Pop M, Salzberg S (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25

    Article  PubMed  Google Scholar 

  21. Trapnell C, Pachter L, Salzberg S (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–1111

    Article  PubMed  CAS  Google Scholar 

  22. Trapnell C, Williams B, Pertea G, Mortazavi A, Kwan G, van Baren M, Salzberg S, Wold B, Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515

    Article  PubMed  CAS  Google Scholar 

  23. Zerbino D, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829

    Article  PubMed  CAS  Google Scholar 

  24. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652

    Article  PubMed  CAS  Google Scholar 

  25. Haas B, Delcher A, Mount S, Wortman J, Smith RJ, Hannick L, Maiti R, Ronning C, Rusch D, Town C, Salzberg S, White O (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31:5654–5666

    Article  PubMed  CAS  Google Scholar 

  26. Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9:868–877

    Article  PubMed  CAS  Google Scholar 

  27. Zhang Z, Schwartz S, Wagner L, Miller W (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7:203–214

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The author wishes to acknowledge funding and support from the NIH/NHLBI Center for Human Immunology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Foo Cheung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Cheung, F. (2012). Global Assembly of Expressed Sequence Tags. In: Jin, H., Gassmann, W. (eds) RNA Abundance Analysis. Methods in Molecular Biology, vol 883. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-839-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-839-9_15

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-61779-838-2

  • Online ISBN: 978-1-61779-839-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics