Iterative Clustering Analysis for Grouping Missing Data in Gene Expression Profiles

Kim, Dae-Won; Kang, Bo-Yeong

doi:10.1007/11731139_17

Dae-Won Kim²² &
Bo-Yeong Kang²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3041 Accesses
4 Citations

Abstract

Clustering has been used as a popular technique for finding groups of genes that show similar expression patterns under multiple experimental conditions. Because a clustering method requires a complete data matrix as an input, we must estimate the missing values using an imputation method in the preprocessing step of clustering. However, a common limitation of these conventional approach is that once the estimates of missing values are fixed in the preprocessing step, they are not changed during subsequent process of clustering. Badly estimated missing values obtained in data preprocessing are likely to deteriorate the quality and reliability of clustering results. Thus, a new clustering method is required for improving missing values during iterative clustering process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hathaway, R.J., Bezdek, J.C.: Fuzzy c-means clustering of incomplete data. IEEE Transactions on Systems, Man, and Cybernetics–Part B: Cybernetics 31, 735–744 (2001)
Article Google Scholar
Troyanskaya, O., Cantor, M., Sherlock, G., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001)
Article Google Scholar
Ouyang, M., Welsh, W.J., Georgopoulos, P.: Guassian mixture clustering and imputation of microarray data. Bioinformatics 20, 917–923 (2004)
Article Google Scholar
Alizadeh, A.A., Eisen, M.B., David, R.E., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Article Google Scholar
Bo, T.H., Dysvik, B., Jonassen, I.: LSimpute: accurate estimation of missing values in microarray data with least square methods. Nucleic Acids Research 32, e34 (2004)
Article Google Scholar
Dumitrescu, D., Lazzerini, B., Jain, L.C.: Fuzzy Sets and Their Applications to Clustering and Traning. CRC Press, Florida (2000)
Google Scholar
Fuschik, M.E.: Methods for Knowledge Discovery in Microarray Data. Ph.D. Thesis, University of Otago (2003)
Google Scholar
Horn, D., Axel, I.: Novel clustering algorithm for microarray expression data in a truncated SVD space. Bioinformatics 19, 1110–1115 (2003)
Article Google Scholar
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 1090–1099 (2003)
Article Google Scholar
Mizuguchi, G., Shen, X., Landry, J., et al.: ATP-driven exchange of histone H2AZ variant catalyzed by SWR1 chromatin remodeling complex. Science 303, 343–348 (2004)
Article Google Scholar
Yoshimoto, H., Saltsman, K., Gasch, A.P., et al.: Genome-wide analysis of gene expression regulated by the Calcineurin/Crz1p signaling pathway in Saccharomyces cerevisiae. The Journal of Biological Chemistry 277, 31079–31088 (2002)
Article Google Scholar
Cho, R.J., Campbell, M.J., Winzeler, E.A., et al.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2, 65–73 (1998)
Article Google Scholar
Chu, S., DeRish, J., Eisen, M., et al.: The transcriptional program of sporulation in budding yeast. Science 282, 699–705 (1998)
Article Google Scholar
Dembele, D., Kastner, P.: Fuzzy c-means method for clustering microarray data. Bioinformatics 19, 973–980 (2003)
Article Google Scholar
Dhilon, I.S., Marcotte, E.M., Roshan, U.: Diametrical clustering for identifying anticorrelated gene clusters. Bioinformatics 19, 1612–1619 (2003)
Article Google Scholar
Eisen, M., Spellman, P.T., Brown, P.O., et al.: Cluster analysis and display of genomewide expression patterns. In: Proc. Natl. Acad. Sci. USA, vol. 95, pp. 14863–14868 (1998)
Google Scholar
Ashburner, M., Ball, C.A., Blake, J.A., et al.: Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000)
Article Google Scholar
Issel-Tarver, L., Christie, K.R., Dolinski, K., et al.: Saccharomyces genome database. Methods Enzymol 350, 329–346 (2002)
Article Google Scholar
Gibbons, F.D., Roth, F.P.: Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res. 12, 1574–1581 (2002)
Article Google Scholar
Kim, D.W., Lee, K.H., Lee, D.: Detecting clusters of different geometrical shapes in microarray gene expression data. Bioinformatics 21, 1927–1934 (2005)
Article Google Scholar
Sharan, R., Maron-Katz, A., Shamir, R.: CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics 19, 1787–1799 (2003)
Article Google Scholar
Steuer, R., Kurths, J., Daub, C.O., et al.: The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics 18, S231–S240 (2002)
Article Google Scholar
Tamayo, P., Slonim, D., Mesirov, J., et al.: Interpreting patters of gene expression with self-organizing maps - methods and application to hematopoietic differentiation. In: Proc. Natl. Acad. Sci. USA, vol. 96, pp. 2907–2912 (1999)
Google Scholar
Tavazoie, S., Hughes, J.D., Campbell, M.J., et al.: Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999)
Article Google Scholar
Xu, Y., Olman, V., Xu, D.: Clustering gene expression data using a graph-theoretic approach - an application of minimum spanning trees. Bioinformatics 17, 309–318 (2001)
Article Google Scholar
Yeung, K., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17, 309–318 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Chung-Ang University, Heukseok-dong, Dongjak-gu, 155-756, Seoul, Korea
Dae-Won Kim
Center of Healthcare Ontology R&D, Seoul National University, Yeongeon-dong, Jongro-gu, Seoul, Korea
Bo-Yeong Kang

Authors

Dae-Won Kim
View author publications
You can also search for this author in PubMed Google Scholar
Bo-Yeong Kang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nanyang Technological University, Singapore
Wee-Keong Ng
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
School of Computer Engineering, Nanyang Technological University, 639798, Singapore, Singapore
Kuiyu Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, DW., Kang, BY. (2006). Iterative Clustering Analysis for Grouping Missing Data in Gene Expression Profiles. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_17

Download citation

DOI: https://doi.org/10.1007/11731139_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics