A Novel Approach for Effective Learning of Cluster Structures with Biological Data Applications

Shin, Miyoung

doi:10.1007/11960669_2

Miyoung Shin²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4316))

Included in the following conference series:

VLDB Workshop on Data Mining and Bioinformatics

497 Accesses

Abstract

Recently DNA microarray gene expression studies have been actively performed for mining unknown biological knowledge hidden under a large volume of gene expression data in a systematic way. In particular, the problem of finding groups of co-expressed genes or samples has been largely investigated due to its usefulness in characterizing unknown gene functions or performing more sophisticated tasks, such as modeling biological pathways. Nevertheless, there are still some difficulties in practice to identify good clusters since many clustering methods require user’s arbitrary selection of the number of target clusters. In this paper we propose a novel approach to systematically identifying good candidates of cluster numbers so that we can minimize the arbitrariness in cluster generation. Our experimental results on both synthetic dataset and real gene expression dataset show the applicability and usefulness of this approach in microarray data mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hand, D.J., Heard, N.A.: Finding groups in gene expression data. Journal of Biomedicine and Biotechnology 2, 215–225 (2005)
Article Google Scholar
Slonim, D.K.: From patterns to pathways: gene expression data analysis comes of age. Nature genetics supplement 32, 502–508 (2002)
Article Google Scholar
Walker, M.G.: Pharmaceutical target identification by gene expression analysis. Mini reviews in medicinal chemistry 1, 197–205 (2001)
Article Google Scholar
Eisen, M.B., Spellman, P.T., Brown, P.O., Bostein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95, 14863–14868 (1998)
Article Google Scholar
Tamayo, P., et al.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. 96, 2907–2912 (1999)
Article Google Scholar
Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Liu, H., Li, J., Wong, L.: Use of extreme patient samples for outcome prediction from gene expression data. Bioinformatics 21(16), 3377–3384 (2005)
Article Google Scholar
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)
Article Google Scholar
Toh, H., Horimoto, K.: Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics 18(2), 287–297 (2002)
Article Google Scholar
Xu, R., Wunsch II, D.: Survey of clustering algorithms. IEEE Trans. on Neural Networks 16(3), 645–678 (2005)
Article Google Scholar
Horn, D., Axel, I.: Novel clustering algorithm for microarray expression data in a truncated SVD space. Bioinformatics 19, 1110–1115 (2003)
Article Google Scholar
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 1090–1099 (2003)
Article Google Scholar
Dhilon, I., et al.: Diametrical clustering for identifying anti-correlated gene clusters. Bioinformatics 19, 1612–1619
Google Scholar
Sharan, R., et al.: Click and expander: a system for clustering and visualizing gene expression data. Bioinformatics 19, 1787–1799 (2003)
Article MathSciNet Google Scholar
Bolshakova, N., Azuaje, F.: Estimating the number of clusters in DNA microarray data. Methods Inf. Med. 45(2), 153–157 (2006)
Google Scholar
Amato, R., et al.: A multi-step approach to time series analysis and gene expression clustering. Bioinformatics 22(5), 589–596 (2006)
Article MathSciNet Google Scholar
Tseng, V.S., Kao, C.-P.: Efficiently mining gene expression data via a novel parameterless clustering method. IEEE/ACM trans. on Comp. Biology and Bioinformatics 2(4), 355–365 (2005)
Article Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. The Johns Hopkins University Press (1996)
Google Scholar
Quackenbush, J.: Computational analysis of microarray data. Nature Reviews Genetics 2, 418–422 (2001)
Article Google Scholar
Cho, R.J., et al.: A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2, 65–73 (1998)
Article Google Scholar
Shin, M., Park, S.H.: Microarray expression data analysis using seed-based clustering method. Key engineering materials 277, 343–348 (2005)
Article Google Scholar
Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17(4), 309–318 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, Kyungpook National University, 1370 Sankyuk-dong, Buk-gu, Daegu, 702-701, Korea
Miyoung Shin

Authors

Miyoung Shin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Informatics, Indiana University, 901 E. 10th Street, 47408, Bloomington, IN,
Mehmet M. Dalkilic & Sun Kim &
EECS Department, Case Western Reserve Univ., 10900 Euclid Ave, 44106, Cleveland, OH, USA
Jiong Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shin, M. (2006). A Novel Approach for Effective Learning of Cluster Structures with Biological Data Applications. In: Dalkilic, M.M., Kim, S., Yang, J. (eds) Data Mining and Bioinformatics. VDMB 2006. Lecture Notes in Computer Science(), vol 4316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11960669_2

Download citation

DOI: https://doi.org/10.1007/11960669_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68970-6
Online ISBN: 978-3-540-68971-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics