6. Conclusions
In this chapter we aimed to give a guide to the state-of-art in statistical methods for SAGE analysis. We just scratch some issues for the sake of being focused in differential expression detection problems, but we hope that main ideas could be useful to track the original literature. We saw that estimation of a tag abundance could not be simpler than observed counts divided by sequenced total, but rather can receive sophisticated treatments such as multinomial estimation, correction of potential sequencing errors, a priori knowledge incorporation, and so on. Given an (assumed) error-corrected data set, one could search for differentially expressed tags among conditions. Several methods for this were mentioned, but we stress the importance of using biological replication designs to capture general information. Finally, we want to point out that only accumulation of experimental data in public databases, with biological replication, and use of good statistics could improve usefulness of SAGE, MPSS or EST counting data in general terms, helping to elucidate basic/applied gene expression questions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Akmaev, V.R. and Wang, C.J. (2004) Correction of sequence based artifacts in serial analysis of gene expression. Bioinformatics 20, 1254–1263.
Audic S. and Claverie J. (1997) The significance of digital gene expression profiles. Genome Research 7, 986–995.
Baggerly, K.A., Deng, L., Morris, J.S. and Aldaz, C.M. (2003) Differential expression in SAGE: accounting for normal between-library variation. Bioinformatics 19, 1477–1483.
Beißbarth, T., Hyde, L., Smyth, G.K., Job, C., Boon, W., Tan, S., Scott, H.S. and Speed, T.P. (2004) Statistical modeling of sequencing errors in SAGE libraries. Bioinformatics 20, i31–i39.
Blades, N., Velculescu, V.E. and Parmigiani, G. (2004a) Estimation of sequencing error rates in SAGE libraries. Genome Biology in press.
Blades, N., Jones, J.B., Kern, S.E. and Parmigiani, G. (2004b) Denoising of data from serial analysis of gene expression. Bioinformatics in press.
Boon, K., Osório, E.C., Greenhut, S.F., Schaefer, C.F., Shoemaker, J., Polyak, K., Morin, P.J., Buetow, K.H., Strausberg, R.L., Souza, S.J. and Riggins, G.J. (2002) An anatomy of normal and malignant gene expression. Proc. Natl. Acad. Sci. USA 99, 11287–11292.
Brenner, S., Johnson, M., Bridgham, J., et al. (2000) Gene expression analysis by massively parallel signature sequencing (MPSS) on micro-bead arrays. Nature Biotechnology 18, 630–634.
Bueno, A.M.S., Pereira, C.A.B., Rabello-Gay, M.N. and Stern, J.M. (2002) Environmental genotoxicity evaluation: Bayesian approach for a mixture statistical model. Stochastic Environmental Research and Risk Assessment 16, 267–278.
Chen, H., Centola, M., Altschul, S.F. and Metzger H. (1998) Characterization of gene expression in resting and activated mast cells. J. Exp. Med 188, 1657–1668.
Colinge, J. and Feger, G. (2001) Detecting the impact of sequencing errors on SAGE data. Bioinformatics 17, 840–842.
Duda, R.O., Hart, P.E. and Stork, D.G. (2000) in Pattern Classification-2nd Edition, (Wiley-Interscience Press)
Ewing, B. and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. error probabilities. Genome Research 8, 186–194.
Greller, L.D. and Tobin, F.L. (1999) Detecting selective expression of genes and proteins. Genome Research 9, 282–296.
Ihaka, R. and Gentleman, R. (1996) R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics 5, 299–314.
Jeffreys, H. (1961) in Theory of Probability, (Oxford University Press).
Kal, A.J., van Zonneveld, A.J., Benes, V., van den Berg, M., Koerkamp, M.G., Albermann, K., Strack, N., Ruijter, J.M., Richter, A., Dujon, B., Ansorge, W. and Tabak, H.F. (1999) Dynamics of gene expression revealed by comparison of serial analysis of gene expression transcript profiles from yeast grown on two different carbon sources. Mol. Biol. Cell 10, 1859–1872.
Lal, A., Lash, A.E., Altschul, S.F., Velculescu, V., Zhang, L., McLendon, R.E., Marra, M.A., Prange, C., Morin, P.J., Polyak, K., Papadopoulos, N., Vogelstein, B., Kinzler, K.W., Strausberg, R.L. and Riggins, G.J. (1999) A public database for gene expression in human cancers. Cancer Research 21, 5403–5407.
Lash, A.E., Tolstoshev, C.M., Wagner, L., Schuler, G.D., Strausberg, R.L., Riggins, G.J. and Altschul, S.F. (2000) SAGEmap: a public gene expression resource. Genome Research 10, 1051–1060.
Madruga, M.R., Pereira, C.A.B. and Stern, J.M. (2003) Bayesian evidence test for precise hypotheses. Journal of Planning and Inference 117, 185–198.
Man, M.Z., Wang X. and Wang Y. (2000) POWER SAGE: comparing statistical tests for SAGE experiments. Bioinfomatics 16, 953–959.
Margulies, E.H., Kardia, S.L. and Innis, J.W. (2001) Identification and prevention of a GC content bias in SAGE libraries. Nucleic Acids Res. 29, e60.
Morris, J.S., Baggerly, K.A. and Coombes, K.R. (2003) Bayesian shrinkage estimation of the relative abundance of mRNA transcripts using SAGE. Biometrics 59, 476–486.
Romualdi, C., Bortoluzzi, S. and Danieli, G.A. (2001) Detecting differentially expressed genes in multiple tag sampling experiments: comparative evaluation of statistical tests. Human Molecular Genetics 10, 2133–2141.
Ruijter, J.M., Kampen, A.H.C. and Baas F. (2002) Statistical evaluation of SAGE libraries: consequences for experimental design. Physiol Genomics 11, 37–44.
Schuler, G.D. (1997) Pieces of the puzzle: expressed sequence tags and the catalog of human genes. J. Mol. Med. 75, 694–698.
Stekel, D.J., Git, Y. and Falciani, F. (2000) The comparison of gene expression from multiple cDNA libraries. Genome Research 10, 2055–2061.
Stern, M.D., Anisimov, S.V. and Boheler, K.R. (2003) Can transcriptome size be estimated from SAGE catalogs?. Bioinformatics 19, 443–448.
Stollberg, J., Urschitz, J., Urban, Z. and Boyd, C.D. (2000) A Quantitative Evaluation of SAGE. Genome Research 10, 1241–1248.
Vêncio, R.Z.N., Brentani H. and Pereira, C.A.B. (2003) Using credibility intervals instead of hypothesis tests in SAGE analysis. Bioinformatics 19, 2461–2464.
Vêncio, R.Z.N., Brentani, H., Patrão, D.F.C. and Pereira, C.A.B. (2004) Bayesian model accounting for within-class biological variability in Serial Analysis of Gene Expression (SAGE). BMC Bioinformatics 5, 119.
Velculescu, V.E., Zhang, L., Vogelstein, B. and Kinzler, K.W. (1995) Serial analysis of gene expression. Science 270, 484–487.
Velculescu, V.E., Zhang, L., Zhou, W., Vogelstein, J., Basrai M.A., Bassett, D.E., Hieter, P., Vogelstein, B. and Kinzler, K.W. (1997) Characterization of the yeast transcriptome. Cell 88, 243–251.
Zhang, L., Zhou, W., Velculescu, V.E., Kern, S.E., Hruban, R.H., Hamilton, S.R., Vogelstein, B., and Kinzler, K.W. (1997) Gene Expression Profiles in Normal and Cancer Cells. Science 276, 1268–1272.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Science+Business Media, Inc.
About this chapter
Cite this chapter
Vêncio, R.Z.N., Brentani, H. (2006). Statistical Methods in Serial Analysis of Gene Expression (Sage). In: Zhang, W., Shmulevich, I. (eds) Computational and Statistical Approaches to Genomics. Springer, Boston, MA. https://doi.org/10.1007/0-387-26288-1_11
Download citation
DOI: https://doi.org/10.1007/0-387-26288-1_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-26287-1
Online ISBN: 978-0-387-26288-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)