Abstract
Genomics can be broadly defined as the systematic study of genes, their functions, and their interactions. Analogously, proteomics is the study of proteins, protein complexes, their localization, their interactions, and posttranslational modifications. Some years ago, genomics and proteomics studies focused on one gene or one protein at a time. With the advent of high-throughput technologies in biology and biotechnology, this has changed dramatically. We are currently witnessing a paradigm shift from a traditionally hypothesis-driven to a data-driven research. The activity and interaction of thousands of genes and proteins can now be measured simultaneously. Technologies for genome-and proteome-wide investigations have led to new insights into mechanisms of living systems. There is a broad consensus that these technologies will revolutionize the study of complex human diseases such as Alzheimer syndrome, HIV, and particularly cancer. With its ability to describe the clinical and histopathological phenotypes of cancer at the molecular level, gene expression profiling based on microarrays holds the promise of a patient-tailored therapy. Recent advances in high-throughput mass spectrometry allow the profiling of proteomic patterns in biofluids such as blood and urine, and complement the genomic portray of diseases.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alter, O., Brown, P.O., and Botstein, D. (2000). Singular-value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA, 97(18):10101–10106.
Ambroise, C. and McLachlan, G.J. (2002). Selection bias in gene extraction on th basis of microarray gene expression data. Proc. Natl. Acad. Sci. USA, 98:6562–6566.
Baggerly, K.A., Morris, J.S., and Coombes, K.R. (2004). Reproducibility of SELDI-TOF protein patterns in serum: Comparing datasets from different experiments. Bioinformatics, 20(5):777–785.
Bartlett, M.S. (1937). Properties of sufficiency and statistical tests. Proc. R. Stat. Soc. Series A, 160:268–282.
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Stat. Soc, B57:289–300.
Berrar, D., Bradbury, L, and Dubitzky, W. (2006). Avoiding model selection bias in small-sample genomic data sets. Bioinformatics, 22(10):1245–1250.
Berry, M.J.A. and Linoff, G. (1997). Data Mining Techniques: For Marketing, Sales, and Customer Support. Wiley, USA.
Bouckaert, R.R. and Frank, E. (2004). Evaluating the replicability of significance tests for comparing learning algorithms. Proc. 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 3056:3–12.
Braga-Neto, U.M. and Dougherty, E. (2004). Is cross-validation valid for small-sample microarray classification? Bioinformatics, 20(3):374–380.
Brown, M.B. and Forsythe, A.B. (1974). Robust tests for the equality of variances. J. Am. Stat. Ass., 69:264–267.
Burnette, N.W. (1981). “Western Blotting”: Electrophoretic transfer of protein sodium dodecyl sulfate-polyacrylamid gels to unmodified nitrocellulose and radiographic detection with antibody and readiojodinated protein. Anal. Biochem., 112:195–203.
Bustin, S.A. (2000). Absolute quantification of mrna using real-time reverse transcription polymerase chain reaction assays. J. Mol. Endocrinol, 25:169–193.
Chen, D., Liu, Z., Ma, X., and Hua, D. (2005). Selecting genes by test statistics. J. Biomed. Biotech., 2:132–138.
Cochran, W.G. (1937). Problems arising in the analysis of a series of similar experiments. J. Roy. Stat. Soc. Ser. C. Appl. Stat., 4:102–118.
Diatchenko, L., Lau, Y.F., and Campbell A.P., et al. (1996). Suppression subtractive hybridization: a method for generating differentially regulated or tissue-specific cDNA probes and libraries. Proc. Natl. Acad. Sci. USA, 93(12):6025–6030.
Dietterich, T. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comp., 10(7): 1895–1924.
Dudoit, S., Fridlyand, J., and Speed, T.P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc, 97:77–87.
Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman & Hall.
Fields, S. and Song, O. (1989). A novel genetic system to detect protein-protein interactions. Nature, 340:245–246.
Glish, G.L. and Vachet, R.W. (2003). The basics of mass spectrometry in the twenty-first century. Nat. Rev. Drug Discov., 2(2):140–150.
Golub, T.R., Slonim, D.K., and Tamayo P., et al. (1999). Molecular classification of cancer class discovery and class prediction by gene expression monitoring. Science, 286(5439):531–537.
Hastie, T., Tibshirani, R., and Friedman, J. (2002). The Elements of Statistical Learning. Springer Series in Statistics, New York/Berlin/Heidelberg.
Hedenfalk, I., Ringnér, M., Ben-Dor, A., Yakhini, Z., Chen, Y., Chebil, G., Ach, R., Loman, N., Olsson, H., Meltzer, P., Borg, A., and Trent, J. (2003). Molecular classification of familial non-BRCA1/BRCA2 breast cancer. Proc. Natl. Acad. Sci. USA, 100(5):2532–2537.
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75:800–802.
Hod, Y. (1992). A simplified ribonuclease protection assay. Biotechniques, 13:852–854.
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Stat, 6:65–70.
Honoré, B., Ostergaard, M., and Vorum, H. (2004). Functional genomics studied by proteomics. Bioessays, 26(8):901–915.
Hoogenboom, H.R., de Bruine, A.P., Hufton, S.E., Hoet, R.M., Arends, J.W., and Roovers, R.C. (1998). Antibody phage display technology and its applications. Immunotechnology, 4(1):1–20.
Issaq, H.J., Veenstra, T.D., Conrads, T.P., and Felschow, D. (2002). The SELDI-TOF MS approach to proteomics: Protein profiling and biomarker identification. Biochem. Biophys. Res. Commun., 292(3):587–592.
Johansson, P. and Hakkinen, J. (2006). Improving missing value imputation of microarray data by using spot quality weights. BMC Bioinformatics, 7(1):306.
Karas, M., Bachmann, D., Bahr, U., and Hillenkamp, F. (1987). Matrix-assisted ultraviolet laser desorption of non-volatile compounds. Int. J. Mass Spectrom. Ion Processes, 78:53–68.
Klipp, E., Herwig, R., Kowald, A., Wierling, C, and Lehrach, H. (2005). Systems Biology in Practice. Wiley-VCH, Weinheim, Germany.
Klose, J. and Kobalz, U. (1995). Two-dimensional electrophoresis of proteins: An updated protocol and implications for a functional analysis of the genome. Electrophoresis, 16(6):1034–1059.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proc. 14th Intl. Joint Conf. Art. Int., pages 1137–1143.
Kruskal, W.H. and Wallis, W.A. (1952). Use of ranks in one-criterion variance analysis. J. Am. Stat. Ass., 47:583–621.
Levene, H. (1960). Robust tests for equality of variances. Contributions to Probability and Statistics, pages 278–292.
Li, T., Zhang, C, and Ogihara, M. (2004). A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics, 20(15):2429–2437.
Liang, P. and Pardee, A.B. (1992). Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science, 257(5072):967–971.
Lorkowski, S. and Cullen, P. (2003). Analysing Gene Expression: A Handbook of Methods Possibilities and Pitfalls. Wiley-VCH, Weinheim, Germany.
MacBeath, G. (2002). Protein microarrays and proteomics. The Chipping Forecast II, Nat. Gen., 32:526–532.
Manly, K.F., Nettleton, D., and Hwang, J.T.G (2004). Genomics, prior probability, and statistical tests of multiple hypotheses. Genome Res., 14:997–1001.
Martin, J.K. and Hirschberg, D.S. (1996). Small sample statistics for classification error rates II: Confidence intervals and significance tests. Technical Report 96-22, University of California, Irvine, CA.
Mitchell, T.M. (1997). Machine Learning. McGraw-Hill Book Co., Singapore.
Moody, D.E. (2001). Genomics techniques: An overview of methods for the study of gene expression. J. Anim. Sci., 79(E.Suppl.):E128–135.
Morris, J.S., Yin, G., Baggerly, K., Wu, C, and Zhang, L. (2003). Identification of prognostic genes, combining information across different institutions and oligonucleotide arrays. Oral and Poster Presenters’ Abstracts, 4th Int. Conf. Critical Assessment of Methods for Microarray Data Analysis, pages 1–5.
Morrison, N. and Hoyle, D.C. (2002). Normalization — Concepts and methods for normalizing microarray data. In Berrar, D., Dubitzky, W., and Granzow, M., editors, A Practical Approach to Microarray Analysis, pages 76–90. Kluwer Academic Publisher, Boston.
Murphy, D. (2002). Gene expression studies using microarrays: Principles, problems, and prospects. Adv. Physiol. Educ., 26(4):256–270.
Nadeau, C. and Bengio, Y. (2003). Inference for generalization error. Machine Learning, 52:239–281.
O’Farrell, P.H. (1975). High-resolution two-dimensional gel electrophoresis of proteins. J. Biol. Chew., 250(10):4007–4021.
O’Neill, G.M., Catchpoole, D.R., and Golemis, E.A. (2003). From correlation to causality: Microarrays, cancer, and cancer treatment. BioTechniques, 34:S64–S71.
Radmacher, M.D., McShane, L.M., and Simon, R. (2002). A paradigm for class prediction using gene expression profiles. J. Comp. Bio., 9(3):505–511.
Ramaswamy, S., Tamayo, P., and Rifkin, R., et al. (2001). Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA, 98(26): 15149–15154.
Raychaudhuri, S., Stuart, J.M, and Altman, R.B. (2000). Principal components analysis to summarize microarray experiments: Application to sporulation time series. Proc. 5th Pac. Symp. Biocomp., pages 455–566.
Ripley, B.D. (1996). Pattern Recognition and Neural Networks. University Press, Cambridge.
Saiki, R.K., Gelfand, D.H., Stoffel, S., Scharf, S.J., Higuchi, R., Horn, G.T., Mullis, K.B., and Erlich, H.A. (1988). Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science, 239(4839):487–491.
Salzberg, S. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1:317–327.
Sargent, T.D. and Dawid, I.B. (1983). Differential gene expression in the gastrula of xenopus laevis. Science, 222(4620):135–139.
Schena, M., Shalon, D., Davis, R.W., and Brown, P.O. (1995). Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270(5235):467–470.
Simon, R. (2002). Classifying breast cancer models. The Scientist, 16(17).
Simon, R. (2003). Supervised analysis when the number of candidate features (p) greatly exceeds the number of cases (n). SIGKDD Explorations, 5(2):31–36.
Simon, R. (2005). Roadmap for developing and validation therapeutically relevant genomic classifiers. J. Clin. Onc., 23(29):7332–7341.
Somogyi, R., Fuhrman, S., and Wen, X. (2002). Genetic network inference in computational models and applications to large-scale gene expression data. In Bower, J.M. and Bolouri, H., editors, Computational Modeling of Genetic and Biochemical Networks, pages 119–157.
Somorjai, R.L., Dolenko, B., and Baumgartner, R. (2003). Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: Curses, caveats, cautions. Bioinformatics, 19(12):1484–1491.
Statnikov, A., Aliferis, C.F., Tsamardinos, I., Hardin, D., and Levy, S. (2005). A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 21(5):631–643.
Storey, J.D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA, 100(16):9440–9445.
Tang, N., Tornatore, P., and Weinberger, S.R. (2004). Current developments in SELDI affinity technology. Mass. Spectrom. Rev., 23(1):34–44.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R.B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520–525.
Unlu, M., Morgan, M.E., and Minden, J.S. (1997). Difference gel electrophoresis: A single gel method for detecting changes in protein extracts. Electrophoresis, 18(11):2071–2077.
Velculescu, V.E., Zhang, L., Vogelstein, B., and Kinzler, K.W. (1995). Serial analysis of gene expression. Science, 270(5235):484–487.
Welch, B.L. (1951). On the comparison of several mean values: An alternative approach. Biometrika, 38:330–336.
Wolpert, D. and Macready, W. (1997). No free lunch theorems for optimization. IEEE Trans. Evolut. Comp., 1(1):67–82.
Yamashita, M. and Fenn, J.B. (1984). Electrospray ion source, another variation of the free-jet theme. J. Phys. Chem., 88:4451–4459.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Berrar, D., Granzow, M., Dubitzky, W. (2007). Introduction to Genomic and Proteomic Data Analysis. In: Dubitzky, W., Granzow, M., Berrar, D. (eds) Fundamentals of Data Mining in Genomics and Proteomics. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47509-7_1
Download citation
DOI: https://doi.org/10.1007/978-0-387-47509-7_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-47508-0
Online ISBN: 978-0-387-47509-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)