Skip to main content

Biostatistics Methods in Cancer Research: Cluster Analysis of Gene Expression Data

  • Chapter
  • First Online:
Cancer Genetics and Psychotherapy

Abstract

In recent years, gene chips have been widely applied in basic researches (e.g., identifying biomarkers/genes related to cancer ). Therefore, it is important for biologists to understand the biostatistical methods used for analysis of biological data (e.g., gene expression level). There are many statistical methods to investigate factors associated with cancer. Gene mutation is one of the important factors in cancers. Microarray data is used to detect genes which have more expression in patients. Hence, modeling and classification of genes related to cancer is important. Clustering analysis is one of the capable biostatistical methods to classify genes based on gene expression level. There are many techniques for classifying genes into the clusters. These techniques have been established based on the distance between the paired observations (e.g., genes). In this chapter, we explain six distance similarity methods and two clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

FISH:

Fluorescence in situ hybridization

IHC:

Immunohistochemistry

HER2:

Human epidermal growth factor receptor 2

p :

p_value

DLBCL:

Diffuse large B cell lymphoma

FL:

Follicular lymphoma

CLL:

Chronic lymphocytic leukemia

Min:

Minimum

Max:

Maximum

References

  • Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511

    Article  CAS  PubMed  Google Scholar 

  • Bahreini F, Mohaddes Ardebili SM, Farajnia S, Ghareh Sooran J, Nabipour I, Soltanian A (2012) A study on association of SNP-43 polymorphism in Calpain-10 gene with type 2 diabetes mellitus in the population of Eastern Azerbaijan province. Iranian South Med J 1:35–41

    Google Scholar 

  • Bahreini F, Soltanian AR, Mehdipour P (2015) A meta-analysis on concordance between immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH) to detect HER2 gene over expression in breast cancer. Breast Cancer 22:615–625

    Article  PubMed  Google Scholar 

  • Baines KJ, Simpson JL, Bowden NA, Scott RJ, Gibson PG (2010) Differential gene expression and cytokine production from neutrophils in asthma phenotypes. Eur Respir J 35:522–531

    Article  CAS  PubMed  Google Scholar 

  • D’haeseleer P (2005) How does gene expression clustering work? Nat Biotechnol 23:1499–1501

    Article  PubMed  Google Scholar 

  • de Souto MCP, Costa IG, de Araujo DSA, Ludermir TB, Schliep A (2008) Clustering cancer gene expression data: a comparative study. BMC Bioinform 9:497–511

    Article  Google Scholar 

  • Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley, Hoboken

    Book  Google Scholar 

  • Jafarzadeh Kohneloo A, Soltanian AR, Poorolajal J, Mahjub M (2015) Applied the additive hazard model to predict the survival time of patient with diffuse large B-cell lymphoma and determine the effective genes, using microarray data. ISMJ 18:711–719

    Google Scholar 

  • Johason RA, Wichern DW (2008) Applied multivariate statistical analysis. Pearson, Upper Saddle River

    Google Scholar 

  • Kerr M, Churchill G (2001) Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proc Natl Acad Sci U S A 98:8961–8965

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kobayashi M, Ohyanagi H, Yano K (2015) Expression analysis and genome annotations with RNA sequencing. In: Sablok G, Kumar S, Ueno S, Kuo J, Varotto C (eds) Advances in the understanding of biological sciences using next generation sequencing (NGS) approaches. Springer, Berlin

    Google Scholar 

  • Luo F, Tang K, Khan L (2003) In: Proceedings of the third IEEE symposium on bioinformatics and bioengineering (BIBE’03), USA, 2003. IEEE Computer Society, Washington, DC

    Google Scholar 

  • Mahdieh N, Rabbani B (2013) An overview of mutation detection methods in genetic disorders. Iran J Pediatr 23:375–388

    PubMed  PubMed Central  Google Scholar 

  • McQueen JB (1967) In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, University of California Press, Berkeley, 1967, 1, 281–297

    Google Scholar 

  • Moslemi A, Mahjub H, Saidijam M, Poorolajal J, Soltanian AR (2016) Bayesian survival analysis of high-dimensional microarray data for mantle cell lymphoma patients. Asian Pac J Cancer Prev 17:95–100

    Article  PubMed  Google Scholar 

  • Rencher AC (2002) Methods of multivariate analysis. Wiley, Hoboken

    Book  Google Scholar 

  • Shannon W, Culverhouse R, Duncan J (2003) Analyzing microarray data using cluster analysis. Pharmacogenomics 4:41–51

    Article  CAS  PubMed  Google Scholar 

  • Singh D, Febbo P, Ross K, Jackson D, Manola J et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209

    Article  CAS  PubMed  Google Scholar 

  • Smolkin M, Hosh D (2003) Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 4:36–42

    Article  PubMed  PubMed Central  Google Scholar 

  • Tibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16:385–395

    Article  CAS  PubMed  Google Scholar 

  • Yeung KY, Ruzzo WL (2001) Principle component analysis for clustering gene expression data. Bioinformatics 17:763–774

    Article  CAS  PubMed  Google Scholar 

  • Yeung K, Haynor D, Ruzzo W (2001) Validating clustering for gene expression data. Bioinformatics 17:309–318

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Reza Soltanian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Bahreini, F., Soltanian, A.R. (2017). Biostatistics Methods in Cancer Research: Cluster Analysis of Gene Expression Data. In: Mehdipour, P. (eds) Cancer Genetics and Psychotherapy. Springer, Cham. https://doi.org/10.1007/978-3-319-64550-6_25

Download citation

Publish with us

Policies and ethics