Abstract
cDNA microarrays permit massively parallel gene expression analysis and have spawned a new paradigm in the study of molecular biology. One of the significant challenges in this genomic revolution is to develop sophisticated approaches to facilitate the visualization, analysis, and interpretation of the vast amounts of multi-dimensional gene expression data. We have applied self-organizing map (SOM) in order to meet these challenges. In essence, we utilize U-matrix and component planes in microarray data visualization and introduce general procedure for assessing significance for a cluster detected from U-matrix. Our case studies consist of two data sets. First, we have analyzed a data set containing 13,824 genes in 14 breast cancer cell lines. In the second case we show an example of the SOM in drug treatment of prostate cancer cells. Our results indicate that (1) SOM is capable of helping finding certain biologically meaningful clusters, (2) clustering algorithms could be used for finding a set of potential predictor genes for classification purposes, and (3) comparison and visualization of the effects of different drugs is straightforward with the SOM. In summary, the SOM provides an excellent format for visualization and analysis of gene microarray data, and is likely to facilitate extraction of biologically and medically useful information.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Chen, D.-R., Chang, R.-F., & Huang, Y.-L. (2000). Breast cancer diagnosis using self-organizing maps for sonog-raphy. Ultrasound in Medicine and Biology, 26:3, 405–411.
Chen, G., Jaradat, S., Banerjee, N., Tanaka, T., Ko, M., & Zhang, M. (2002). Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data. Statistica Sinica, 12, 241–262.
Eisen, M., Spellman, P., Brown, P., & Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences, USA, 95, 14863–14868.
Gibbons, F., & Roth, F. (2002). Judging the quality of gene expression-based clustering methods using gene annotation. Genome Research, 12, 1574–1581.
Hautaniemi, S., Ringnér, M., Kauraniemi, P., Kallioniemi, A., Edgren, H., Yli-Harja, O., Astola, J., & Kallion-iemi, O.-P. (2002). A strategy for identifying putative causes of gene expression variation in human cancer. In Proceedings of Workshop on Genomic Signal Processing and Statistics (GENSIPS), Raleigh, NC, USA, Oct. 2002.
Haykin, S. (1999). Neural Networks, a Comprehensive Foundation, 2nd edition, Prentice Hall.
Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Kallioniemi, O.-P., Wilfond, B., Borg, Å., & Trent, J. (2001). Gene-expression profiles in hereditary breast cancer. The New England Journal of Medicine, 344:8, 539–548.
Hill, A., Hunter, C., Tsung, B., Tucker-Kellogg, G., & Brown, E. (2000). Genomic analysis of gene expression in C. elegans. Science, 290, 809–812.
Hyman, E., Kauraniemi, P., Hautaniemi, S., Wolf, M., Mousses, S., Rozenblum, E., Ringnér, M., Sauter, G., Monni, O., Elkahloun, A., Kallioniemi, A., & Kallioniemi, O.-P. (2002). Impact of DNA amplification on gene expression patterns in breast cancer. Cancer Research, in press.
Kallioniemi, A., Kallioniemi, O.-P., Sudar, D., Rutovitz, D., Gray, J., Waldman, F., & Pinkel, D. (1992). Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science, 258, 818–882.
Kaski, S., Kangas, J., & Kohonen, T. (1998). Bibliography of self-organizing map (SOM) papers: 1981–1997. Neural Computing Surveys 1, 102–350.
Kaski, S., Nikkilä, J., Törönen, P., Castrén, E., & Wong, G. (2001). Analysis and visualization of gene expression data using self-organizing maps. In Proceedings of NSIP-01, IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing.
Kaski, S., & Sinkkonen, J. (2002). Clustering based on conditional distributions in an auxiliary space. Neural Computation, 14, 217–239.
Kauraniemi, P., Bärlund, M., Monni, O., & Kallioniemi, A. (2001). New amplified and highly expressed genes discovered in the ERBB2 amplicon in breast cancer by cDNA microarrays. Cancer Research, 61, 8235–8240.
Kohonen, T. (2001). Self-Organizing Maps, 3rd edn., Springer.
Mangiameli, P., Chen, S., & West, D. (1996). A comparison of SOM neural network and hierarchical clustering methods. European Journal of Operational Research, 93, 402–417.
Monni, O., Bärlund, M., Mousses, S., Kononen, J., Sauter, G., Heiskanen, M., Paavola, P., Avela, K., Chen, Y., Bittner, M., & Kallioniemi, A. (2001). Comprehensive copy number and gene expression profiling of the 17q23 amplicon in human breast cancer. Proceedings of the National Academy of Sciences, USA, 98:10, 5711–5716.
Mousses, S., Wagner, U., Chen, Y., Kim, J., Bubendorf, L., Bittner, M., Pretlow, T., Elkahloun, A., Trepel, J., & Kallioniemi, O.-P. (2001). Failure of hormone therapy in prostate cancer involves systematic restoration of androgen responsive genes and activation of rapamycin sensitive signaling. Oncogene, 20:46, 6718–6723.
Oja, M., Nikkilä, J., Törönen, P., Wong, G., Castrén, E., & Kaski, S. (2002). Exploratory clustering of gene expression profiles of mutated yeast strains. In W. Zhang & I. Shmulevich (Eds.), Computational and Statistical Approaches to Genomics, Kluwer Academic Publishers.
Parmigiani, G., Garrett, E., Anbazhagan, R., & Gabrielson, E. (2002). Astatistical framework for expression-based molecular classification in cancer. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64, 1–20.
Pollack, J., Perou, C., Alizadeh, A., Eisen, M., Pergamenschikov, A., Williams, C., Jeffrey, S., Botstein, D., & Brown, P. (1999). Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nature Genetics, 23:1, 41–46.
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E., & Golub, T. (2001). Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences, USA, 98:26, 15149–15154.
Raychaudhuri, S., Stuart, J., & Altman, R. (2000). Principal components analysis to summarize microarray ex-periments: Application to sporulation time series. Proceedings of the Pacific symposium on Bioinformatics, 5, 452–463.
Siegler, M., Jain, U., Raj, B., & Stern, R. (1997). Automatic segmentation, classification and clustering of broadcast news audio. In Proceedings of the DARPA Speech Recognition Workshop (97–99).
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E., & Golub, T. (1999). Interpreting patterns of gene expression with self-organizing maps; methods and application to hematopoietic differentiation. Proceedings of the National Academy of Sciences, USA, 96, 2907–2912.
Törönen, P., Kolehmainen, M., Wong, G., & Castrén, E. (1999). Analysis of gene expression data using self-organizing maps. FEBS Letters, 451:2, 142–146.
Ultsch, A., & Siemon, H. (1989). Exploratory data analysis: Using Kohonen networks on transputers. Technical Report 329, University of Dortmund, Germany.
Vesanto, J., Himberg, J., Alhoniemi, E., & Parhankangas, J. (2000). SOM toolbox for Matlab 5. Technical Report A57, Helsinki University of Technology, Finland.
Wall, M., Dyck, P., & Brettin, T. (2001). SVDMAN-singular value decomposition analysis of microarray data. Bioinformatics, 17:6, 566–568.
Xiong, M., Fang, X., & Zhao, J. (2001). Biomarker identification by feature wrappers. Genome Research, 11:11, 1878–1887.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hautaniemi, S., Yli-Harja, O., Astola, J. et al. Analysis and Visualization of Gene Expression Microarray Data in Human Cancer Using Self-Organizing Maps. Machine Learning 52, 45–66 (2003). https://doi.org/10.1023/A:1023941307670
Issue Date:
DOI: https://doi.org/10.1023/A:1023941307670