Introduction to Genomic and Proteomic Data Analysis

Berrar, Daniel; Granzow, Martin; Dubitzky, Werner

doi:10.1007/978-0-387-47509-7_1

Introduction to Genomic and Proteomic Data Analysis

Daniel Berrar³,
Martin Granzow⁴ &
Werner Dubitzky³

Chapter

2189 Accesses
4 Citations

Abstract

Genomics can be broadly defined as the systematic study of genes, their functions, and their interactions. Analogously, proteomics is the study of proteins, protein complexes, their localization, their interactions, and posttranslational modifications. Some years ago, genomics and proteomics studies focused on one gene or one protein at a time. With the advent of high-throughput technologies in biology and biotechnology, this has changed dramatically. We are currently witnessing a paradigm shift from a traditionally hypothesis-driven to a data-driven research. The activity and interaction of thousands of genes and proteins can now be measured simultaneously. Technologies for genome-and proteome-wide investigations have led to new insights into mechanisms of living systems. There is a broad consensus that these technologies will revolutionize the study of complex human diseases such as Alzheimer syndrome, HIV, and particularly cancer. With its ability to describe the clinical and histopathological phenotypes of cancer at the molecular level, gene expression profiling based on microarrays holds the promise of a patient-tailored therapy. Recent advances in high-throughput mass spectrometry allow the profiling of proteomic patterns in biofluids such as blood and urine, and complement the genomic portray of diseases.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alter, O., Brown, P.O., and Botstein, D. (2000). Singular-value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA, 97(18):10101–10106.
Article PubMed CAS Google Scholar
Ambroise, C. and McLachlan, G.J. (2002). Selection bias in gene extraction on th basis of microarray gene expression data. Proc. Natl. Acad. Sci. USA, 98:6562–6566.
Article CAS Google Scholar
Baggerly, K.A., Morris, J.S., and Coombes, K.R. (2004). Reproducibility of SELDI-TOF protein patterns in serum: Comparing datasets from different experiments. Bioinformatics, 20(5):777–785.
Article PubMed CAS Google Scholar
Bartlett, M.S. (1937). Properties of sufficiency and statistical tests. Proc. R. Stat. Soc. Series A, 160:268–282.
Article Google Scholar
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Stat. Soc, B57:289–300.
Google Scholar
Berrar, D., Bradbury, L, and Dubitzky, W. (2006). Avoiding model selection bias in small-sample genomic data sets. Bioinformatics, 22(10):1245–1250.
Article PubMed CAS Google Scholar
Berry, M.J.A. and Linoff, G. (1997). Data Mining Techniques: For Marketing, Sales, and Customer Support. Wiley, USA.
Google Scholar
Bouckaert, R.R. and Frank, E. (2004). Evaluating the replicability of significance tests for comparing learning algorithms. Proc. 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 3056:3–12.
Google Scholar
Braga-Neto, U.M. and Dougherty, E. (2004). Is cross-validation valid for small-sample microarray classification? Bioinformatics, 20(3):374–380.
Article PubMed CAS Google Scholar
Brown, M.B. and Forsythe, A.B. (1974). Robust tests for the equality of variances. J. Am. Stat. Ass., 69:264–267.
Article Google Scholar
Burnette, N.W. (1981). “Western Blotting”: Electrophoretic transfer of protein sodium dodecyl sulfate-polyacrylamid gels to unmodified nitrocellulose and radiographic detection with antibody and readiojodinated protein. Anal. Biochem., 112:195–203.
Article PubMed CAS Google Scholar
Bustin, S.A. (2000). Absolute quantification of mrna using real-time reverse transcription polymerase chain reaction assays. J. Mol. Endocrinol, 25:169–193.
Article PubMed CAS Google Scholar
Chen, D., Liu, Z., Ma, X., and Hua, D. (2005). Selecting genes by test statistics. J. Biomed. Biotech., 2:132–138.
Article CAS Google Scholar
Cochran, W.G. (1937). Problems arising in the analysis of a series of similar experiments. J. Roy. Stat. Soc. Ser. C. Appl. Stat., 4:102–118.
Google Scholar
Diatchenko, L., Lau, Y.F., and Campbell A.P., et al. (1996). Suppression subtractive hybridization: a method for generating differentially regulated or tissue-specific cDNA probes and libraries. Proc. Natl. Acad. Sci. USA, 93(12):6025–6030.
Article PubMed CAS Google Scholar
Dietterich, T. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comp., 10(7): 1895–1924.
Article Google Scholar
Dudoit, S., Fridlyand, J., and Speed, T.P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc, 97:77–87.
Article CAS Google Scholar
Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman & Hall.
Google Scholar
Fields, S. and Song, O. (1989). A novel genetic system to detect protein-protein interactions. Nature, 340:245–246.
Article PubMed CAS Google Scholar
Glish, G.L. and Vachet, R.W. (2003). The basics of mass spectrometry in the twenty-first century. Nat. Rev. Drug Discov., 2(2):140–150.
Article PubMed CAS Google Scholar
Golub, T.R., Slonim, D.K., and Tamayo P., et al. (1999). Molecular classification of cancer class discovery and class prediction by gene expression monitoring. Science, 286(5439):531–537.
Article PubMed CAS Google Scholar
Hastie, T., Tibshirani, R., and Friedman, J. (2002). The Elements of Statistical Learning. Springer Series in Statistics, New York/Berlin/Heidelberg.
Google Scholar
Hedenfalk, I., Ringnér, M., Ben-Dor, A., Yakhini, Z., Chen, Y., Chebil, G., Ach, R., Loman, N., Olsson, H., Meltzer, P., Borg, A., and Trent, J. (2003). Molecular classification of familial non-BRCA1/BRCA2 breast cancer. Proc. Natl. Acad. Sci. USA, 100(5):2532–2537.
Article PubMed CAS Google Scholar
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75:800–802.
Article Google Scholar
Hod, Y. (1992). A simplified ribonuclease protection assay. Biotechniques, 13:852–854.
PubMed CAS Google Scholar
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Stat, 6:65–70.
Google Scholar
Honoré, B., Ostergaard, M., and Vorum, H. (2004). Functional genomics studied by proteomics. Bioessays, 26(8):901–915.
Article PubMed CAS Google Scholar
Hoogenboom, H.R., de Bruine, A.P., Hufton, S.E., Hoet, R.M., Arends, J.W., and Roovers, R.C. (1998). Antibody phage display technology and its applications. Immunotechnology, 4(1):1–20.
Article PubMed CAS Google Scholar
Issaq, H.J., Veenstra, T.D., Conrads, T.P., and Felschow, D. (2002). The SELDI-TOF MS approach to proteomics: Protein profiling and biomarker identification. Biochem. Biophys. Res. Commun., 292(3):587–592.
Article PubMed CAS Google Scholar
Johansson, P. and Hakkinen, J. (2006). Improving missing value imputation of microarray data by using spot quality weights. BMC Bioinformatics, 7(1):306.
Article PubMed CAS Google Scholar
Karas, M., Bachmann, D., Bahr, U., and Hillenkamp, F. (1987). Matrix-assisted ultraviolet laser desorption of non-volatile compounds. Int. J. Mass Spectrom. Ion Processes, 78:53–68.
Article CAS Google Scholar
Klipp, E., Herwig, R., Kowald, A., Wierling, C, and Lehrach, H. (2005). Systems Biology in Practice. Wiley-VCH, Weinheim, Germany.
Book Google Scholar
Klose, J. and Kobalz, U. (1995). Two-dimensional electrophoresis of proteins: An updated protocol and implications for a functional analysis of the genome. Electrophoresis, 16(6):1034–1059.
Article PubMed CAS Google Scholar
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proc. 14th Intl. Joint Conf. Art. Int., pages 1137–1143.
Google Scholar
Kruskal, W.H. and Wallis, W.A. (1952). Use of ranks in one-criterion variance analysis. J. Am. Stat. Ass., 47:583–621.
Article Google Scholar
Levene, H. (1960). Robust tests for equality of variances. Contributions to Probability and Statistics, pages 278–292.
Google Scholar
Li, T., Zhang, C, and Ogihara, M. (2004). A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics, 20(15):2429–2437.
Article PubMed CAS Google Scholar
Liang, P. and Pardee, A.B. (1992). Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science, 257(5072):967–971.
Article PubMed CAS Google Scholar
Lorkowski, S. and Cullen, P. (2003). Analysing Gene Expression: A Handbook of Methods Possibilities and Pitfalls. Wiley-VCH, Weinheim, Germany.
Google Scholar
MacBeath, G. (2002). Protein microarrays and proteomics. The Chipping Forecast II, Nat. Gen., 32:526–532.
CAS Google Scholar
Manly, K.F., Nettleton, D., and Hwang, J.T.G (2004). Genomics, prior probability, and statistical tests of multiple hypotheses. Genome Res., 14:997–1001.
Article PubMed CAS Google Scholar
Martin, J.K. and Hirschberg, D.S. (1996). Small sample statistics for classification error rates II: Confidence intervals and significance tests. Technical Report 96-22, University of California, Irvine, CA.
Google Scholar
Mitchell, T.M. (1997). Machine Learning. McGraw-Hill Book Co., Singapore.
Google Scholar
Moody, D.E. (2001). Genomics techniques: An overview of methods for the study of gene expression. J. Anim. Sci., 79(E.Suppl.):E128–135.
Google Scholar
Morris, J.S., Yin, G., Baggerly, K., Wu, C, and Zhang, L. (2003). Identification of prognostic genes, combining information across different institutions and oligonucleotide arrays. Oral and Poster Presenters’ Abstracts, 4th Int. Conf. Critical Assessment of Methods for Microarray Data Analysis, pages 1–5.
Google Scholar
Morrison, N. and Hoyle, D.C. (2002). Normalization — Concepts and methods for normalizing microarray data. In Berrar, D., Dubitzky, W., and Granzow, M., editors, A Practical Approach to Microarray Analysis, pages 76–90. Kluwer Academic Publisher, Boston.
Google Scholar
Murphy, D. (2002). Gene expression studies using microarrays: Principles, problems, and prospects. Adv. Physiol. Educ., 26(4):256–270.
PubMed Google Scholar
Nadeau, C. and Bengio, Y. (2003). Inference for generalization error. Machine Learning, 52:239–281.
Article Google Scholar
O’Farrell, P.H. (1975). High-resolution two-dimensional gel electrophoresis of proteins. J. Biol. Chew., 250(10):4007–4021.
CAS Google Scholar
O’Neill, G.M., Catchpoole, D.R., and Golemis, E.A. (2003). From correlation to causality: Microarrays, cancer, and cancer treatment. BioTechniques, 34:S64–S71.
Google Scholar
Radmacher, M.D., McShane, L.M., and Simon, R. (2002). A paradigm for class prediction using gene expression profiles. J. Comp. Bio., 9(3):505–511.
Article CAS Google Scholar
Ramaswamy, S., Tamayo, P., and Rifkin, R., et al. (2001). Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA, 98(26): 15149–15154.
Article PubMed CAS Google Scholar
Raychaudhuri, S., Stuart, J.M, and Altman, R.B. (2000). Principal components analysis to summarize microarray experiments: Application to sporulation time series. Proc. 5th Pac. Symp. Biocomp., pages 455–566.
Google Scholar
Ripley, B.D. (1996). Pattern Recognition and Neural Networks. University Press, Cambridge.
Google Scholar
Saiki, R.K., Gelfand, D.H., Stoffel, S., Scharf, S.J., Higuchi, R., Horn, G.T., Mullis, K.B., and Erlich, H.A. (1988). Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science, 239(4839):487–491.
Article PubMed CAS Google Scholar
Salzberg, S. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1:317–327.
Article Google Scholar
Sargent, T.D. and Dawid, I.B. (1983). Differential gene expression in the gastrula of xenopus laevis. Science, 222(4620):135–139.
Article PubMed CAS Google Scholar
Schena, M., Shalon, D., Davis, R.W., and Brown, P.O. (1995). Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270(5235):467–470.
Article PubMed CAS Google Scholar
Simon, R. (2002). Classifying breast cancer models. The Scientist, 16(17).
Google Scholar
Simon, R. (2003). Supervised analysis when the number of candidate features (p) greatly exceeds the number of cases (n). SIGKDD Explorations, 5(2):31–36.
Article Google Scholar
Simon, R. (2005). Roadmap for developing and validation therapeutically relevant genomic classifiers. J. Clin. Onc., 23(29):7332–7341.
Article CAS Google Scholar
Somogyi, R., Fuhrman, S., and Wen, X. (2002). Genetic network inference in computational models and applications to large-scale gene expression data. In Bower, J.M. and Bolouri, H., editors, Computational Modeling of Genetic and Biochemical Networks, pages 119–157.
Google Scholar
Somorjai, R.L., Dolenko, B., and Baumgartner, R. (2003). Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: Curses, caveats, cautions. Bioinformatics, 19(12):1484–1491.
Article PubMed CAS Google Scholar
Statnikov, A., Aliferis, C.F., Tsamardinos, I., Hardin, D., and Levy, S. (2005). A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 21(5):631–643.
Article PubMed CAS Google Scholar
Storey, J.D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA, 100(16):9440–9445.
Article PubMed CAS Google Scholar
Tang, N., Tornatore, P., and Weinberger, S.R. (2004). Current developments in SELDI affinity technology. Mass. Spectrom. Rev., 23(1):34–44.
Article PubMed CAS Google Scholar
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R.B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520–525.
Article PubMed CAS Google Scholar
Unlu, M., Morgan, M.E., and Minden, J.S. (1997). Difference gel electrophoresis: A single gel method for detecting changes in protein extracts. Electrophoresis, 18(11):2071–2077.
Article PubMed CAS Google Scholar
Velculescu, V.E., Zhang, L., Vogelstein, B., and Kinzler, K.W. (1995). Serial analysis of gene expression. Science, 270(5235):484–487.
Article PubMed CAS Google Scholar
Welch, B.L. (1951). On the comparison of several mean values: An alternative approach. Biometrika, 38:330–336.
Google Scholar
Wolpert, D. and Macready, W. (1997). No free lunch theorems for optimization. IEEE Trans. Evolut. Comp., 1(1):67–82.
Article Google Scholar
Yamashita, M. and Fenn, J.B. (1984). Electrospray ion source, another variation of the free-jet theme. J. Phys. Chem., 88:4451–4459.
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Systems Biology Research Group, University of Ulster, Northern Ireland, UK
Daniel Berrar & Werner Dubitzky
quantiom bioinformatics GmbH & Co. KG, Ringstrasse 61, D-76356, Weingarten, Germany
Martin Granzow

Authors

Daniel Berrar
View author publications
You can also search for this author in PubMed Google Scholar
Martin Granzow
View author publications
You can also search for this author in PubMed Google Scholar
Werner Dubitzky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Ulster, Coleraine, Northern Ireland
Werner Dubitzky & Daniel Berrar &
Quantiom Bioinformatics GmbH & Co. KG, Weingarten/Baden, Germany
Martin Granzow

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Berrar, D., Granzow, M., Dubitzky, W. (2007). Introduction to Genomic and Proteomic Data Analysis. In: Dubitzky, W., Granzow, M., Berrar, D. (eds) Fundamentals of Data Mining in Genomics and Proteomics. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47509-7_1

Download citation

DOI: https://doi.org/10.1007/978-0-387-47509-7_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-47508-0
Online ISBN: 978-0-387-47509-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics