Biostatistics Methods in Cancer Research: Cluster Analysis of Gene Expression Data

Bahreini, Fatemeh; Soltanian, Ali Reza

doi:10.1007/978-3-319-64550-6_25

Fatemeh Bahreini² &
Ali Reza Soltanian³

1029 Accesses

Abstract

In recent years, gene chips have been widely applied in basic researches (e.g., identifying biomarkers/genes related to cancer ). Therefore, it is important for biologists to understand the biostatistical methods used for analysis of biological data (e.g., gene expression level). There are many statistical methods to investigate factors associated with cancer. Gene mutation is one of the important factors in cancers. Microarray data is used to detect genes which have more expression in patients. Hence, modeling and classification of genes related to cancer is important. Clustering analysis is one of the capable biostatistical methods to classify genes based on gene expression level. There are many techniques for classifying genes into the clusters. These techniques have been established based on the distance between the paired observations (e.g., genes). In this chapter, we explain six distance similarity methods and two clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

FISH:: Fluorescence in situ hybridization
IHC:: Immunohistochemistry
HER2:: Human epidermal growth factor receptor 2
p :: p_value
DLBCL:: Diffuse large B cell lymphoma
FL:: Follicular lymphoma
CLL:: Chronic lymphocytic leukemia
Min:: Minimum
Max:: Maximum

References

Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511
Article CAS PubMed Google Scholar
Bahreini F, Mohaddes Ardebili SM, Farajnia S, Ghareh Sooran J, Nabipour I, Soltanian A (2012) A study on association of SNP-43 polymorphism in Calpain-10 gene with type 2 diabetes mellitus in the population of Eastern Azerbaijan province. Iranian South Med J 1:35–41
Google Scholar
Bahreini F, Soltanian AR, Mehdipour P (2015) A meta-analysis on concordance between immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH) to detect HER2 gene over expression in breast cancer. Breast Cancer 22:615–625
Article PubMed Google Scholar
Baines KJ, Simpson JL, Bowden NA, Scott RJ, Gibson PG (2010) Differential gene expression and cytokine production from neutrophils in asthma phenotypes. Eur Respir J 35:522–531
Article CAS PubMed Google Scholar
D’haeseleer P (2005) How does gene expression clustering work? Nat Biotechnol 23:1499–1501
Article PubMed Google Scholar
de Souto MCP, Costa IG, de Araujo DSA, Ludermir TB, Schliep A (2008) Clustering cancer gene expression data: a comparative study. BMC Bioinform 9:497–511
Article Google Scholar
Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley, Hoboken
Book Google Scholar
Jafarzadeh Kohneloo A, Soltanian AR, Poorolajal J, Mahjub M (2015) Applied the additive hazard model to predict the survival time of patient with diffuse large B-cell lymphoma and determine the effective genes, using microarray data. ISMJ 18:711–719
Google Scholar
Johason RA, Wichern DW (2008) Applied multivariate statistical analysis. Pearson, Upper Saddle River
Google Scholar
Kerr M, Churchill G (2001) Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proc Natl Acad Sci U S A 98:8961–8965
Article CAS PubMed PubMed Central Google Scholar
Kobayashi M, Ohyanagi H, Yano K (2015) Expression analysis and genome annotations with RNA sequencing. In: Sablok G, Kumar S, Ueno S, Kuo J, Varotto C (eds) Advances in the understanding of biological sciences using next generation sequencing (NGS) approaches. Springer, Berlin
Google Scholar
Luo F, Tang K, Khan L (2003) In: Proceedings of the third IEEE symposium on bioinformatics and bioengineering (BIBE’03), USA, 2003. IEEE Computer Society, Washington, DC
Google Scholar
Mahdieh N, Rabbani B (2013) An overview of mutation detection methods in genetic disorders. Iran J Pediatr 23:375–388
PubMed PubMed Central Google Scholar
McQueen JB (1967) In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, University of California Press, Berkeley, 1967, 1, 281–297
Google Scholar
Moslemi A, Mahjub H, Saidijam M, Poorolajal J, Soltanian AR (2016) Bayesian survival analysis of high-dimensional microarray data for mantle cell lymphoma patients. Asian Pac J Cancer Prev 17:95–100
Article PubMed Google Scholar
Rencher AC (2002) Methods of multivariate analysis. Wiley, Hoboken
Book Google Scholar
Shannon W, Culverhouse R, Duncan J (2003) Analyzing microarray data using cluster analysis. Pharmacogenomics 4:41–51
Article CAS PubMed Google Scholar
Singh D, Febbo P, Ross K, Jackson D, Manola J et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209
Article CAS PubMed Google Scholar
Smolkin M, Hosh D (2003) Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 4:36–42
Article PubMed PubMed Central Google Scholar
Tibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16:385–395
Article CAS PubMed Google Scholar
Yeung KY, Ruzzo WL (2001) Principle component analysis for clustering gene expression data. Bioinformatics 17:763–774
Article CAS PubMed Google Scholar
Yeung K, Haynor D, Ruzzo W (2001) Validating clustering for gene expression data. Bioinformatics 17:309–318
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Molecular Medicine and Genetics, School of Medicine, Hamadan University of Medical Sciences, Hamadan, Iran
Fatemeh Bahreini
Department of Biostatistics, School of Public Health and Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran
Ali Reza Soltanian

Authors

Fatemeh Bahreini
View author publications
You can also search for this author in PubMed Google Scholar
Ali Reza Soltanian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Reza Soltanian .

Editor information

Editors and Affiliations

Department of Medical Genetics, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
Parvin Mehdipour

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bahreini, F., Soltanian, A.R. (2017). Biostatistics Methods in Cancer Research: Cluster Analysis of Gene Expression Data. In: Mehdipour, P. (eds) Cancer Genetics and Psychotherapy. Springer, Cham. https://doi.org/10.1007/978-3-319-64550-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-64550-6_25
Published: 22 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64548-3
Online ISBN: 978-3-319-64550-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics