Abstract
In recent years, gene chips have been widely applied in basic researches (e.g., identifying biomarkers/genes related to cancer ). Therefore, it is important for biologists to understand the biostatistical methods used for analysis of biological data (e.g., gene expression level). There are many statistical methods to investigate factors associated with cancer. Gene mutation is one of the important factors in cancers. Microarray data is used to detect genes which have more expression in patients. Hence, modeling and classification of genes related to cancer is important. Clustering analysis is one of the capable biostatistical methods to classify genes based on gene expression level. There are many techniques for classifying genes into the clusters. These techniques have been established based on the distance between the paired observations (e.g., genes). In this chapter, we explain six distance similarity methods and two clustering algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- FISH:
-
Fluorescence in situ hybridization
- IHC:
-
Immunohistochemistry
- HER2:
-
Human epidermal growth factor receptor 2
- p :
-
p_value
- DLBCL:
-
Diffuse large B cell lymphoma
- FL:
-
Follicular lymphoma
- CLL:
-
Chronic lymphocytic leukemia
- Min:
-
Minimum
- Max:
-
Maximum
References
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511
Bahreini F, Mohaddes Ardebili SM, Farajnia S, Ghareh Sooran J, Nabipour I, Soltanian A (2012) A study on association of SNP-43 polymorphism in Calpain-10 gene with type 2 diabetes mellitus in the population of Eastern Azerbaijan province. Iranian South Med J 1:35–41
Bahreini F, Soltanian AR, Mehdipour P (2015) A meta-analysis on concordance between immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH) to detect HER2 gene over expression in breast cancer. Breast Cancer 22:615–625
Baines KJ, Simpson JL, Bowden NA, Scott RJ, Gibson PG (2010) Differential gene expression and cytokine production from neutrophils in asthma phenotypes. Eur Respir J 35:522–531
D’haeseleer P (2005) How does gene expression clustering work? Nat Biotechnol 23:1499–1501
de Souto MCP, Costa IG, de Araujo DSA, Ludermir TB, Schliep A (2008) Clustering cancer gene expression data: a comparative study. BMC Bioinform 9:497–511
Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley, Hoboken
Jafarzadeh Kohneloo A, Soltanian AR, Poorolajal J, Mahjub M (2015) Applied the additive hazard model to predict the survival time of patient with diffuse large B-cell lymphoma and determine the effective genes, using microarray data. ISMJ 18:711–719
Johason RA, Wichern DW (2008) Applied multivariate statistical analysis. Pearson, Upper Saddle River
Kerr M, Churchill G (2001) Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proc Natl Acad Sci U S A 98:8961–8965
Kobayashi M, Ohyanagi H, Yano K (2015) Expression analysis and genome annotations with RNA sequencing. In: Sablok G, Kumar S, Ueno S, Kuo J, Varotto C (eds) Advances in the understanding of biological sciences using next generation sequencing (NGS) approaches. Springer, Berlin
Luo F, Tang K, Khan L (2003) In: Proceedings of the third IEEE symposium on bioinformatics and bioengineering (BIBE’03), USA, 2003. IEEE Computer Society, Washington, DC
Mahdieh N, Rabbani B (2013) An overview of mutation detection methods in genetic disorders. Iran J Pediatr 23:375–388
McQueen JB (1967) In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, University of California Press, Berkeley, 1967, 1, 281–297
Moslemi A, Mahjub H, Saidijam M, Poorolajal J, Soltanian AR (2016) Bayesian survival analysis of high-dimensional microarray data for mantle cell lymphoma patients. Asian Pac J Cancer Prev 17:95–100
Rencher AC (2002) Methods of multivariate analysis. Wiley, Hoboken
Shannon W, Culverhouse R, Duncan J (2003) Analyzing microarray data using cluster analysis. Pharmacogenomics 4:41–51
Singh D, Febbo P, Ross K, Jackson D, Manola J et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209
Smolkin M, Hosh D (2003) Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 4:36–42
Tibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16:385–395
Yeung KY, Ruzzo WL (2001) Principle component analysis for clustering gene expression data. Bioinformatics 17:763–774
Yeung K, Haynor D, Ruzzo W (2001) Validating clustering for gene expression data. Bioinformatics 17:309–318
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Bahreini, F., Soltanian, A.R. (2017). Biostatistics Methods in Cancer Research: Cluster Analysis of Gene Expression Data. In: Mehdipour, P. (eds) Cancer Genetics and Psychotherapy. Springer, Cham. https://doi.org/10.1007/978-3-319-64550-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-64550-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64548-3
Online ISBN: 978-3-319-64550-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)