Skip to main content

Projection Based Clustering of Gene Expression Data

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNBI,volume 6160)

Abstract

The microarray DNA technologies have given researchers the ability to examine, discover and monitor thousands of genes in a single experiment. Nonetheless, the tremendous amount of data that can be obtained from microarray studies presents a challenge for data analysis, mainly due to the very high data dimensionality. A particular class of clustering algorithms has been very successful in dealing with such data, utilising information driven by the Principal Component Analysis. In this paper, we investigate the application of recently proposed projection based hierarchical clustering algorithms on gene expression microarray data. The algorithms apart from identifying the clusters present in a data set also calculate their number and thus require no special knowledge about the data.

Keywords

  • Unsupervised Clustering
  • Cluster Analysis
  • Principal Component Analysis
  • Kernel Density Estimation
  • Bioinformatics
  • Gene Expression Analysis

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)

    CrossRef  Google Scholar 

  2. Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide array. Proc. Natl. Acad. Sci. USA 96(12), 6745–6750 (1999)

    CrossRef  Google Scholar 

  3. Bellman, R.: Adaptive control processes: A guided tour. Princeton University Press, Princeton (1961)

    MATH  Google Scholar 

  4. Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful. In: 7th International Conference on Database Theory, pp. 217–235 (1999)

    Google Scholar 

  5. Boley, D.: Principal direction divisive partitioning. Data Mining and Knowledge Discovery 2(4), 325–344 (1998)

    CrossRef  Google Scholar 

  6. Brown, P., Botstein, D., Eisen, M., Spellman, P.: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America 95(25), 14863–14868 (1998)

    CrossRef  Google Scholar 

  7. Chute, C., Yang, Y.: An overview of statistical methods for the classification and retrieval of patient events. Methods Inf. Med. 34(1-2), 104–110 (1995)

    Google Scholar 

  8. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)

    CrossRef  Google Scholar 

  9. Dhillon, I., Kogan, J., Nicholas, C.: Feature selection and document clustering. A Comprehensive Survey of Text Mining, 73–100 (2003)

    Google Scholar 

  10. Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 269–274. ACM, New York (2001)

    CrossRef  Google Scholar 

  11. Golub, T., Slomin, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Caligiuri, M., Downing, J., Bloomfield, C., Lander, E.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 268, 531–537 (1999)

    CrossRef  Google Scholar 

  12. Greengard, L., Strain, J.: The fast gauss transform. SIAM J. Sci. Stat. Comput. 12(1), 79–94 (1991)

    CrossRef  MATH  MathSciNet  Google Scholar 

  13. Jain, A.K., Dubes, R.C.: Algorithms for clustering data (1988)

    Google Scholar 

  14. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999), http://citeseer.ist.psu.edu/jain99data.html

    CrossRef  Google Scholar 

  15. Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C., Peterson, C., Meltzer, P.: Classification and diagnostic prediction of cancers using expression profiling and artificial neural networks. Nature Medicine 7, 673–679 (2001)

    CrossRef  Google Scholar 

  16. Lax, P.D.: Linear algebra and its applications. Wiley Interscience, Hoboken (2007)

    MATH  Google Scholar 

  17. Nilsson, M.: Hierarchical Clustering Using Non-Greedy Principal Direction Divisive Partitioning. Information Retrieval 5(4), 311–321 (2002)

    CrossRef  Google Scholar 

  18. Notterman, D.A., Alon, U., Sierk, A.J., Levine, A.J.: Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Research 61, 3124–3130 (2001)

    Google Scholar 

  19. Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery 2(2), 169–194 (1998)

    CrossRef  Google Scholar 

  20. Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P.: Gene expression correlates of clinical prostate cancer behavior. Cancer cell 1(2), 203–209 (2002)

    CrossRef  Google Scholar 

  21. Steinbach, M., Ertz, L., Kumar, V.: The challenges of clustering high dimensional data. New Vistas in Statistical Physics: Applications in Econophysics, Bioinformatics, and Pattern Recognition (2003)

    Google Scholar 

  22. Tasoulis, S., Tasoulis, D.: Improving principal direction divisive clustering. In: 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), Workshop on Data Mining using Matrices and Tensors, Las Vegas, USA (2008)

    Google Scholar 

  23. Tryon, C.: Cluster Analysis. Edward Brothers, Ann Arbor (1939)

    Google Scholar 

  24. Wen, X., Fuhrman, S., Michaels, G., Carr, D., Smith, S., Barker, J., Somogyi, R.: Large-scale temporal gene expression mapping of cns development. Proceedings of the National Academy of Sciences of the United States of America 95, 334–339 (1998)

    CrossRef  Google Scholar 

  25. Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning 55(3), 311–331 (2004)

    CrossRef  MATH  Google Scholar 

  26. Yang, C., Duraiswami, R., Gumerov, N.A., Davis, L.: Improved fast gauss transform and efficient kernel density estimation. In: Proceedings of Ninth IEEE International Conference on Computer Vision, pp. 664–671 (2003)

    Google Scholar 

  27. Yeoh, E.J., Ross, M.E., Shurtleff, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M.V., Patel, A.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer cell 1(2), 133–143 (2002)

    CrossRef  Google Scholar 

  28. Zeimpekis, D., Gallopoulos, E.: PDDP(l): Towards a Flexing Principal Direction Divisive Partitioning Clustering Algorithms. In: Boley, D., Dhillon, I., Ghosh, J., Kogan, J. (eds.) Proc. IEEE ICDM ’03 Workshop on Clustering Large Data Sets, Melbourne, Florida, pp. 26–35 (2003)

    Google Scholar 

  29. Zeimpekis, D., Gallopoulos, E.: Principal direction divisive partitioning with kernels and k-means steering. In: Survey of Text Mining II: Clustering, Classification, and Retrieval, pp. 45–64 (2007)

    Google Scholar 

  30. Zhangi, A., Jiang, D., Tang, C.: Cluster analysis for gene expression data: a survey. IEEE Transactions on Knowledge Data Engineering 16(11), 1370–1386 (2004)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tasoulis, S.K., Plagianakos, V.P., Tasoulis, D.K. (2010). Projection Based Clustering of Gene Expression Data. In: Masulli, F., Peterson, L.E., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2009. Lecture Notes in Computer Science(), vol 6160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14571-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14571-1_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14570-4

  • Online ISBN: 978-3-642-14571-1

  • eBook Packages: Computer ScienceComputer Science (R0)