Statistical Methods and Models for Bridging Omics Data Levels

Part of the Methods in Molecular Biology book series (MIMB, volume 719)


Multiple Omics datasets (for example, high throughput mRNA and protein measurements for the same set of genes) are beginning to appear more widely within the fields of bioinformatics and computational biology. There are many tools available for the analysis of single datasets but two (or more) sets of coupled observations present more of a challenge. I describe some of the methods available – from classical statistical techniques to more recent advances from the fields of Machine Learning and Pattern Recognition for linking Omics data levels with particular focus on transcriptomics and proteomics profiles.

Key words

Data integration Clustering Classification Multi-view learning 


  1. 1.
    Holmes, I. and Bruno, W. J. (2000) Finding regulatory elements using joint likelihoods for sequence and expression profile data. Proceedings/International Conference on Intelligent Systems for Molecular Biology; ISMB International Conference on Intelligent Systems for Molecular Biology, 8, 202–210.Google Scholar
  2. 2.
    Bussey, K., Kane, D., Sunshine, M., Narasimhan, S., Nishizuka, S., Reinhold, W., Zeeberg, B., Ajay, W., and Weinstein, J. (2003) Matchminer: a tool for batch navigation among gene and gene product identifiers. Genome Biol, 4, 4.CrossRefGoogle Scholar
  3. 3.
    Gygi, S. P., Rochon, Y., Franza, B. R., and Aebersold, R. (1999) Correlation between protein and mRNA abundance in yeast. Mol Cell Biol, 19(3), 1720–1730.PubMedGoogle Scholar
  4. 4.
    Schmidt, M. W., Houseman, A., Ivanov, A. R., and Wolf, D. A. (2007) Comparative proteomic and transcriptomic profiling of the ­fission yeast schizosac-charomyces pombe. Mol Syst Biol, 3, 79.PubMedCrossRefGoogle Scholar
  5. 5.
    Meyer, P. (1978) Introductory probability and statistical applications. Addison-Wesley, 2nd edition.Google Scholar
  6. 6.
    Cox, B., Kislinger, T., and Emili, A. (2005). Integrating gene and protein expression data: pattern analysis and profile mining. Methods, 35(3), 303–314.PubMedCrossRefGoogle Scholar
  7. 7.
    Nie, L., Wu, G., Culley, D. E., Scholten, J. C. M., and Zhang, W. (2007) Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications. Crit Rev Biotechnol, 27(2), 63–75.PubMedCrossRefGoogle Scholar
  8. 8.
    Gibbons, J. D. (1971) Nonparametric statistical inference. McGraw-Hill.Google Scholar
  9. 9.
    Griffin, T. J., Gygi, S. P., Ideker, T., Rist, B., Eng, J., Hood, L., and Aebersold, R. (2002) Complementary profiling of gene expression at the transcriptome and proteome levels in Saccharomyces cerevisiae. Mol Cell Proteomics, 1(4), 323–333.PubMedCrossRefGoogle Scholar
  10. 10.
    Rogers, S., Girolami, M., Kolch, W., Waters, K. M., Liu, T., Thrall, B., and Wiley, H. S. (2008) Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models. Bioinformatics, 24(24), 2894–2900.PubMedCrossRefGoogle Scholar
  11. 11.
    Hotelling, H. (1936) Relations between two sets of variates. Biometrika, 28(3–4), 321–377.Google Scholar
  12. 12.
    Tripathi, A., Klami, A., and Kaski, S. (2008) Simple integrative preprocessing preserves what is shared in data sources. BMC Bioinformatics, 9, 111.PubMedGoogle Scholar
  13. 13.
    Shawe-Taylor, J. and Cristianini, N. (2004) Kernel methods for pattern analysis. Cambridge.Google Scholar
  14. 14.
    Schoölkopf, B., Tsuda, K., and Vert, J.-P., editors (2004) Kernel methods in computational biology. MIT Press.Google Scholar
  15. 15.
    Vert, J.-P. and Kanehisa, M. (2003) Graph-driven feature extraction from microarray data using diffusion kernels and kernel CCA. In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15. MIT press.Google Scholar
  16. 16.
    Yamanishi, Y., Vert, J.-P., and Kanehisa, M. (2004) Heterogenous data comparison and gene selection with kernel canonical correlation analysis. In Schoölkopf, B., Tsuda, K., and Vert, J.-P., editors, Kernel methods in computational biology, MIT Press.Google Scholar
  17. 17.
    Bach, F. and Jordan, M. (2005) A probabilistic interpretation of canonical correlation analysis. Technical Report 688, Department of Statistics, University of California, Berkeley.Google Scholar
  18. 18.
    Klami, A. and Kaski, S. (2007) Local dependent components. In ICML ‘07: Proceedings of the 24th international conference on Machine learning, pages 425–432, New York, NY, USA.Google Scholar
  19. 19.
    Fagan, A., Culhane, A. C., and Higgins, D. G. (2007) A multivariate analysis approach to the integration of proteomic and gene expression data. Proteomics, 7(13), 2162–2171.PubMedCrossRefGoogle Scholar
  20. 20.
    Furey, T., Cristianini, N., Duffy, N., Bednarski, D., Schummer, M., and Haussler, D. (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16(10), 906–914.PubMedCrossRefGoogle Scholar
  21. 21.
    Lanckriet, G., Bie, T. D., Cristianini, N., Jordan, M., and Stafford Noble, W. (2004) A statistical framework for genomic data fusion. Bioinformatics, 20(16), 2626–2635.PubMedCrossRefGoogle Scholar
  22. 22.
    Kuncheva, L. (2004) Combining pattern classifiers: methods and algorithms. Wiley.Google Scholar
  23. 23.
    Girolami, M. and Rogers, S. (2005) Hierarchic bayesian models for kernel learning. In ICML ‘05: Proceedings of the 22nd international conference on Machine learning, pages 241–248, New York, NY, USA.Google Scholar
  24. 24.
    Girolami, M. and Zhong, M. (2007) Data integration for classification problems employing gaussian process priors. In 20th annual conference on Neural Information Processing Systems – NIPS 2006. MIT Press.Google Scholar
  25. 25.
    Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA, 95(25), 14863–14868.PubMedCrossRefGoogle Scholar
  26. 26.
    Heard, N. A., Holmes, C. C., Stephens, D. A., Hand, D. J., and Dimopoulos, G. (2005) Bayesian coclustering of anopheles gene expression time series: study of immune defense response to multiple experimental challenges. Proc Natl Acad Sci USA, 102(47), 16939–16944.PubMedCrossRefGoogle Scholar
  27. 27.
    Nie, L., Wu, G., Brockman, F. J., and Zhang, W. (2006) Integrated analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: zero-inflated poisson regression models to predict abundance of undetected proteins. Bioinformatics, 22(13), 1641–1647.PubMedCrossRefGoogle Scholar
  28. 28.
    Kannan, A., Emili, A., and Frey, B. (2007) A bayesian model that links microarray mRNA measurements to mass spectrometry protein measurements. Research in Computational Molecular Biology, pages 325–338.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Inference Research Group, Department of Computing ScienceUniversity of GlasgowGlasgowUK

Personalised recommendations