SignatureClust: a tool for landmark gene-guided clustering

Chopra, Pankaj; Shin, Hanjun; Kang, Jaewoo; Lee, Sunwon

doi:10.1007/s00500-011-0725-0

SignatureClust: a tool for landmark gene-guided clustering

Focus
Published: 03 May 2011

Volume 16, pages 411–418, (2012)
Cite this article

Soft Computing Aims and scope Submit manuscript

Pankaj Chopra¹,
Hanjun Shin²,
Jaewoo Kang² &
…
Sunwon Lee²

215 Accesses
Explore all metrics

Abstract

Over the last several years, many clustering algorithms have been applied to gene expression data. However, most clustering algorithms force the user into having one set of clusters, resulting in a restrictive biological interpretation of gene function. It would be difficult to interpret the complex biological regulatory mechanisms and genetic interactions from this restrictive interpretation of microarray expression data. The software package SignatureClust allows users to select a group of functionally related genes (called ‘Landmark Genes’), and to project the gene expression data onto these genes. Compared to existing algorithms and software in this domain, our software package offers two unique benefits. First, by selecting different sets of landmark genes, it enables the user to cluster the microarray data from multiple biological perspectives. This encourages data exploration and discovery of new gene associations. Second, most packages associated with clustering provide internal validation measures, whereas our package validates the biological significance of the new clusters by retrieving significant ontology and pathway terms associated with the new clusters. SignatureClust is a free software tool that enables biologists to get multiple views of the microarray data. It highlights new gene associations that were not found using a traditional clustering algorithm. The software package ‘SignatureClust’ and the user manual can be downloaded from http://infos.korea.ac.kr/sigclust.php.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data

Article Open access 25 October 2018

GeneSetCluster: a tool for summarizing and integrating gene-set analysis results

Article Open access 07 October 2020

SGAClust: Semi-supervised Graph Attraction Clustering of gene expression data

Article 21 June 2022

Abbreviations

GO:: Gene Ontology
KEGG:: Kyoto Encyclopedia of Genes and Genomes
PFAM:: Protein Families

References

Allison DB, Cui X, Page GP, Sabripour M (2006) Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 7(5):406–406. doi:10.1038/nrg1869
Article Google Scholar
Andreopoulos B, An A, Wang X, Schroeder M (2009) A roadmap of clustering algorithms: finding a match for a biomedical application. Brief Bioinf 10(3):297–314. doi:10.1093/bib/bbn058
Article Google Scholar
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat Genet 25(1):25–29. doi:10.1038/75556
Article Google Scholar
Basu S, Banerjee A, Mooney RJ (2004) Active semi-supervision for pairwise constrained clustering. In: Proceedings of the SIAM international conference on data mining, pp 333–344
Beissbarth T, Speed TP (2004) GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20(9):1464–1465. doi:10.1093/bioinformatics/bth088. http://bioinformatics.oxfordjournals.org/cgi/reprint/20/9/1464.pdf
Google Scholar
Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: ICML ’04: proceedings of the twenty-first international conference on machine learning. ACM, New York, p 11. doi:10.1145/1015330.1015360
Casati P, Stapleton AE, Blum JE, Walbot V (2006) Genome-wide analysis of high-altitude maize and gene knockdown stocks implicates chromatin remodeling proteins in response to uv-b. Plant J 46(4):613–627. doi:10.1111/j.1365-313X.2006.02721.x
Article Google Scholar
Cheng Y, Church GM (2000) Biclustering of expression data. In: Eighth international conference on intelligent systems for molecular biology, pp 93–103
Chopra P, Kang J, Yang J, Cho H, Kim HS, Lee MG (2008) Microarray data mining using landmark gene-guided clustering. BMC Bioinf 9:92+. doi:10.1186/1471-2105-9-92
Covell DG, Wallqvist A, Rabow AA, Thanki N (2003) Molecular classification of cancer: unsupervised self-organizing map analysis of gene expression microarray data. Mol Cancer Ther 2(3):317–332
Google Scholar
Deegalla S, Bostrom H (2006) Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification. ICMLA, pp 245–250
Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA (2003) Global functional profiling of gene expression. Genomics 81(2):98–104
Article Google Scholar
Fern X, Brodley C (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: The twentieth international conference on machine learning (ICML-2003)
Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer ELL, Bateman A (2008) The Pfam protein families database. Nucl Acids Res 36(1):D281–D288. doi:10.1093/nar/gkm960
Google Scholar
Handl J, Knowles J, Kell DB (2005) Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15):3201–3212. doi:10.1093/bioinformatics/bti517. http://bioinformatics.oxfordjournals.org/cgi/reprint/21/15/3201.pdf
Google Scholar
Huang D, Wei P, Pan W (2006) Combining gene annotations and gene expression data in model-based clustering: Weighted method. OMICS J Integr Biol 10(1):28. doi:10.1089/omi.2006.10.28 http://www.liebertonline.com/doi/pdf/10.1089/omi.2006.10.28
Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: A survey. IEEE Trans Knowl Data Eng 16(11):1370–1386. doi:10.1109/TKDE.2004.68
Article Google Scholar
Kabbarah O, Mallon MA, Pfeifer JD, Goodfellow PJ (2006) Transcriptional profiling endometrial carcinomas microdissected from des-treated mice identifies changes in gene expression associated with estrogenic tumor promotion. Int J Cancer 119(8):1843–1849
Article Google Scholar
Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y (2008) KEGG for linking genomes to life and the environment. Nucl Acids Res 36(1):D480–484. doi:10.1093/nar/gkm882
Google Scholar
Kang J, Yang J, Xu W, Chopra P (2005) Integrating heterogeneous microarray data sources using correlation signatures. In: Ludäscher B, Raschid L (eds) DILS, lecture notes in computer science, vol 3615. Springer, Berlin, pp 105–120
Kohonen T (2000) Self-organizing maps. Springer, Berlin
McNicholas PD, Murphy TB (2010) Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26(21):2705–2712. doi:10.1093/bioinformatics/btq498. http://bioinformatics.oxfordjournals.org/content/26/21/2705.abstract, http://bioinformatics.oxfordjournals.org/content/26/21/2705.full.pdf+html
Google Scholar
Mimaroglu S, Erdil E (2010) Obtaining better quality final clustering by merging a collection of clusterings. Bioinformatics 26(20):2645–2646. doi:10.1093/bioinformatics/btq489. http://bioinformatics.oxfordjournals.org/content/26/20/2645.abstract, http://bioinformatics.oxfordjournals.org/content/26/20/2645.full.pdf+html
Google Scholar
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD Explor Newsl 6(1):90–105. doi:10.1145/1007730.1007731
Article Google Scholar
R Development Core Team (2006) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0
Ressom H, Wang D, Natarajan P (2003) Adaptive double self-organizing maps for clustering gene expression profiles. Neural Netw 16(5-6):633–640. doi:10.1016/S0893-6080(03)00102-3
Article Google Scholar
Tari L, Baral C, Kim S (2009) Fuzzy c-means clustering with prior biological knowledge. J Biomed Inf 42(1):74 – 81. doi:10.1016/j.jbi.2008.05.009. http://www.sciencedirect.com/science/article/B6WHD-4SKB3F9-1/2/5ce6f8bed2ce251d9b43fc060bcf504c
Google Scholar
Tseng GC, Wong WH (2005) Tight clustering: a resampling-based approach for identifying stable and tight patterns in data. Biometrics 61(1):10–16
Article MathSciNet MATH Google Scholar
Wagsta K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of 18th international conference on machine learning (ICML-01), pp 577–584
Yeung K, Medvedovic M, Bumgarner R (2003) Clustering gene-expression data with repeated measurements. Genome Biol 4(5):R34. doi:10.1186/gb-2003-4-5-r34. http://genomebiology.com/2003/4/5/R34
Zhao L, Zaki MJ (2005) Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, ACM Press, New York, pp 694–705. doi:10.1145/1066157.1066236
Zhou XJ, Kao MCJ, Huang H, Wong A, Nunez-Iglesias J, Primig M, Aparicio OM, Finch CE, Morgan TE, Wong WH (2005) Functional annotation and network reconstruction through cross-platform integration of microarray data. Nat Biotechnol 23(2):238–243. doi:10.1038/nbt1058
Article Google Scholar

Download references

Acknowledgments

This work was supported by the Second Brain Korea 21 Project Grant, a Microsoft Research Asia Grant, a Korea Research Foundation Grant funded by the Korean Government (MOEHRD, Basic Research Promotion Fund) (KRF-2008-331-D00481), a National Research Foundation of Korea (NRF) grant funded by the Korean government (MEST) (2010-0015713, 2010-0027793, 2010-0027592), and a National IT Industry Promotion Agency (NIPA) grant funded by the Korean government (ITAC1810100200160001000100100).

Author information

Authors and Affiliations

Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA, USA
Pankaj Chopra
Department of Computer Science, College of Information and Communication, Korea University, Seoul, South Korea
Hanjun Shin, Jaewoo Kang & Sunwon Lee

Authors

Pankaj Chopra
View author publications
You can also search for this author in PubMed Google Scholar
Hanjun Shin
View author publications
You can also search for this author in PubMed Google Scholar
Jaewoo Kang
View author publications
You can also search for this author in PubMed Google Scholar
Sunwon Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaewoo Kang.

Additional information

P. Chopra and H. Shin contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chopra, P., Shin, H., Kang, J. et al. SignatureClust: a tool for landmark gene-guided clustering. Soft Comput 16, 411–418 (2012). https://doi.org/10.1007/s00500-011-0725-0

Download citation

Published: 03 May 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s00500-011-0725-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SignatureClust: a tool for landmark gene-guided clustering

Abstract

Access this article

Similar content being viewed by others

Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data

GeneSetCluster: a tool for summarizing and integrating gene-set analysis results

SGAClust: Semi-supervised Graph Attraction Clustering of gene expression data

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SignatureClust: a tool for landmark gene-guided clustering

Abstract

Access this article

Similar content being viewed by others

Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data

GeneSetCluster: a tool for summarizing and integrating gene-set analysis results

SGAClust: Semi-supervised Graph Attraction Clustering of gene expression data

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation