Analysis of breast cancer progression using principal component analysis and clustering

Alexe, G.; Dalgin, G. S.; Ganesan, S.; DeLisi, C.; Bhanot, G.

doi:10.1007/s12038-007-0102-4

Analysis of breast cancer progression using principal component analysis and clustering

Published: 06 November 2007

Volume 32, pages 1027–1039, (2007)
Cite this article

Journal of Biosciences Aims and scope Submit manuscript

G. Alexe^1,2,
G. S. Dalgin³,
S. Ganesan⁴,
C. DeLisi⁵ &
…
G. Bhanot^2,4,5,6

423 Accesses
26 Citations
Explore all metrics

Abstract

We develop a new technique to analyse microarray data which uses a combination of principal components analysis and consensus ensemble k-clustering to find robust clusters and gene markers in the data. We apply our method to a public microarray breast cancer dataset which has expression levels of genes in normal samples as well as in three pathological stages of disease; namely, atypical ductal hyperplasia or ADH, ductal carcinoma in situ or DCIS and invasive ductal carcinoma or IDC. Our method averages over clustering techniques and data perturbation to find stable, robust clusters and gene markers. We identify the clusters and their pathways with distinct subtypes of breast cancer (Luminal, Basal and Her2+). We confirm that the cancer phenotype develops early (in early hyperplasia or ADH stage) and find from our analysis that each subtype progresses from ADH to DCIS to IDC along its own specific pathway, as if each was a distinct disease.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systems Biology Approach for Unsupervised Clustering of High-Dimensional Data

Thresher: determining the number of clusters while removing outliers

Article Open access 08 January 2018

Clustering Reveals Common Check-Point and Growth Factor Receptor Genes Expressed in Six Different Cancer Types

Abbreviations

ADH:: Atypical ductal hyperplasia
DCIS:: ductal carcinoma in situ
FDR:: false-discover-rate
IDC:: invasive ductal carcinoma
PCA:: principal component analysis
SNR:: signal to noise ratio
WV:: weighted voting

References

Alexe G, Dalgin G S, Ramaswamy R, DeLisi C and Bhanot G 2006 Data perturbation independent diagnosis and validation of breast cancer subtypes using clustering and patterns; Cancer Informatics 2 243–274
Google Scholar
Benjamini Y and Hochberg Y 1995 Controlling the false discovery rate: a practical and powerful approach to multiple testing; J. R. Stat. Soc. Series B 57 289–300
Google Scholar
Bussey K J, Kane D, Sunshine M, Narasimhan S, Nishizuka S, Reinhold W C, Zeeberg B, Ajay W and Weinstein J N 2004 MatchMiner: a tool for batch navigation among gene and gene product identifiers; Genome Biol. 4 R27
Article Google Scholar
Cheng C-H, Fu A W and Zhang Y 1999 Entropy-based subspace clustering for mining numerical data; in Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (San Diego, California, United States ACM Press)
Google Scholar
Dempster A, Laird N and Rubin D 1977 Maximum likelihood from incomplete data via the EM algorithm; J. R. Stat. Soc. Series B 39 1–38
Google Scholar
Dennis G, Sherman B T, Hosack D A, Yang J, Gao W, Lane H C and Lempicki R A 2003 DAVID: Database for annotation, visualization, and integrated discovery; Genome Biol. 4 R60
Article Google Scholar
Everitt B S and Dunn G 2001 Applied multivariate data analysis (Arnold and Oxford University Press)
Fangusaro J R, Jiang Y, Holloway M P, Caldas H, Singh V, Boue D R, Hayes J and Altura R A 2005 Survivin, Survivin-2B, and Survivin-deItaEx3 expression in medulloblastoma: biologic markers of tumour morphology and clinical outcome; Br. J. Cancer 92 359–365
PubMed CAS Google Scholar
Friedman J H and Meulman J J 2004 Clustering objects on subsets of attributes; J. R. Stat. Soc. Series B 66 815–850
Article Google Scholar
Golub T R, Slonim D K, Tamayo P, Huard C, Gaasenbeek M, Mesirov J P, Coller H, Loh M L, Downing J R and Caligiuri M A 1999 Molecular classification of cancer: class discovery and class prediction by gene expression monitoring; Science 286 531–537
Article PubMed CAS Google Scholar
Hanahan D and Folkman J 1996 Patterns and emerging mechanisms of the angiogenic switch during tumorigenesis; Cell 86 353–364
Article PubMed CAS Google Scholar
Hanahan D and Weinberg R A 2000 The hallmarks of cancer; Cell 100 57–70
Article PubMed CAS Google Scholar
Hartigan J A 1975 Clustering algorithms (New York: John Wiley)
Google Scholar
Hoffmann R and Valencia A 2004 A gene network for navigating the literature; Nat. Genet. 36 664
Article PubMed CAS Google Scholar
Kaufmann L and Rousseeuw P J 1990 Finding groups in data: An introduction to cluster analysis First edition (John Wiley)
Lee J P, Chang K H, Han J H and Ryu H S 2005 Survivin, a novel anti-apoptosis inhibitor, expression in uterine cervical cancer and relationship with prognostic factors; Int. J. Gynecol. Cancer 15 113–119
Article PubMed Google Scholar
Ma X J, Salunga R, Tuggle J T, Gaudet J, Enright E, McQuary P, Payette T, Pistone M, Stecker K, Zhang B M et al 2003 Gene expression profiles of human breast cancer progression; Proc. Natl. Acad. Sci. USA 100 5974–5979
Article PubMed CAS Google Scholar
Monti S, Tamayo P, Mesirov J and Golub T 2003 Consensus Clustering: A resampling-based method for class discovery and visualization of gene expression microarray data; Machine Learning J. 52 91–118
Article Google Scholar
Perou C M, Sorlie T, Eisen M B, van de Rijn M, Jeffrey S S, Rees C A, Pollack J R, Ross D T, Johnsen H and Akslen L A 2000 Molecular portraits of human breast tumours; Nature (London) 406 747–752
Article CAS Google Scholar
Sørlie T, Tibshirani R, Parker J, Hastie T, Marron J S, Nobel A, Deng S, Johnsen H et al 2003 Repeated observation of breast tumor subtypes in independent gene expression data sets; Proc. Natl. Acad. Sci. USA 100 8418–8423
Article PubMed Google Scholar
Strehl A and Ghosh J 2002 Cluster ensembles: a knowledge reuse framework for combining partitionings; in Eighteenth National Conference on Artificial Intelligence, July 28–August 01, 2002 (Edmonton, Alberta, Canada) pp 93–98
Google Scholar
Tibshirani R, Walther G and Hastie T 2001 Estimating the number of clusters in a dataset via the Gap statistic; J. R. Stat. Soc. Series B 411–423
Wall M E, Rechtsteiner A and Rocha L M 2003 Singular value decomposition and principal component analysis; in A practical approach to microarray data analysis (eds) D P Berrar, W Dubitzky, M Granzow and M A Norwell (Kluwer) pp 91–109
Zhao Y and Karypis G 2003 Clustering in life sciences (Humana Press)

Download references

Author information

Authors and Affiliations

The Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA, 02142, USA
G. Alexe
The Simons Center for Systems Biology, Institute for Advanced Study, Princeton, NJ, 08540, USA
G. Alexe & G. Bhanot
Molecular Biology, Cell Biology and Biochemistry Program, Boston University, Boston, MA, 02215, USA
G. S. Dalgin
Cancer Institute of New Jersey, 195 Little Albany Street, New Brunswick, NJ, 08903, USA
S. Ganesan & G. Bhanot
Center for Advanced Genomic Technology, Department of Biomedical Engineering, Boston University, Boston, MA, 02215, USA
C. DeLisi & G. Bhanot
BioMaPS Institute and Department of Biomedical Engineering, Rutgers University, Piscataway, NJ, 08854, USA
G. Bhanot

Authors

G. Alexe
View author publications
You can also search for this author in PubMed Google Scholar
G. S. Dalgin
View author publications
You can also search for this author in PubMed Google Scholar
S. Ganesan
View author publications
You can also search for this author in PubMed Google Scholar
C. DeLisi
View author publications
You can also search for this author in PubMed Google Scholar
G. Bhanot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to C. DeLisi or G. Bhanot.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alexe, G., Dalgin, G.S., Ganesan, S. et al. Analysis of breast cancer progression using principal component analysis and clustering. J Biosci 32 (Suppl 1), 1027–1039 (2007). https://doi.org/10.1007/s12038-007-0102-4

Download citation

Published: 06 November 2007
Issue Date: August 2007
DOI: https://doi.org/10.1007/s12038-007-0102-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of breast cancer progression using principal component analysis and clustering

Abstract

Access this article

Similar content being viewed by others

A Systems Biology Approach for Unsupervised Clustering of High-Dimensional Data

Thresher: determining the number of clusters while removing outliers

Clustering Reveals Common Check-Point and Growth Factor Receptor Genes Expressed in Six Different Cancer Types

Abbreviations

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analysis of breast cancer progression using principal component analysis and clustering

Abstract

Access this article

Similar content being viewed by others

A Systems Biology Approach for Unsupervised Clustering of High-Dimensional Data

Thresher: determining the number of clusters while removing outliers

Clustering Reveals Common Check-Point and Growth Factor Receptor Genes Expressed in Six Different Cancer Types

Abbreviations

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation