Skip to main content

Clustering High-Dimensional Data

  • Conference paper
  • First Online:
Clustering High--Dimensional Data (CHDD 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7627))

Included in the following conference series:

Abstract

This chapter introduces the task of clustering, concerning the definition of a structure aggregating the data, and the challenges related to its application to the unsupervised analysis of high-dimensional data. In the recent literature, many approaches have been proposed for facing this problem, as the development of efficient clustering methods for high-dimensional data is is a great challenge for Machine Learning as it is of vital importance to obtain safer decision-making processes and better decisions from the nowadays available Big Data, that can mean greater operational efficiency, cost reduction and risk reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Zimek, A.: Robust, complete, and efficient correlation clustering. In: International Conference on Data Mining SDM, pp. 413–418 (2007)

    Google Scholar 

  2. Aggarwal, C.C., Procopiuc, C., Wolf, J., Yu, P.S., Park, J.-S.: Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 61–72 (1999)

    Google Scholar 

  3. Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional space. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 70–81 (2000)

    Google Scholar 

  4. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 94–105 (1998)

    Google Scholar 

  5. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 11(1), 5–33 (2005)

    Article  MathSciNet  Google Scholar 

  6. Aristotle: Categories. In: Barnes, J. (ed.) The Complete Works of Aristotle. Translation J.L. Ackrill., vol. 2, pp. 3–24. Princeton University Press, Princeton(1995)

    Google Scholar 

  7. Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)

    Book  MATH  Google Scholar 

  8. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell (1981)

    Book  MATH  Google Scholar 

  9. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Conference on Computational Learning Theory, pp. 92–100 (1998)

    Google Scholar 

  10. Böhm, C., Kailing, K., Kriegel, H.-P., Kröger, P.: Density connected clustering with local subspace preferences. In: Fourth IEEE International Conference on Data Mining, pp. 27–34 (2004)

    Google Scholar 

  11. Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 455–466 (2004)

    Google Scholar 

  12. Bryan, K., Cunningham, P., Bolshakova, N.: Biclustering of expression data using simulated annealing. In: 18th IEEE Symposium on Computer-Based Medical Systems (CBMS 2005), pp. 383–388 (2005)

    Google Scholar 

  13. Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 93–103. AAAI Press (2000)

    Google Scholar 

  14. Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 189–196 (1999)

    Google Scholar 

  15. Cooper, J.M., Hutchinson, D.S. (eds.): Plato: Complete Works. Hackett Publishing Co., Inc., Indianapolis (1997)

    Google Scholar 

  16. Dasgupta, S., Littman, M., McAllester, D.: PAC generalization bounds for co-training. Proc. Neural Inf. Process. Syst. 14, 375–382 (2001)

    Google Scholar 

  17. Defays, D.: An efficient algorithm for a complete link method. Comput. J. (Br. Comput. Soc.) 20(4), 364–366 (1977)

    Google Scholar 

  18. Donoho, D.L.: High-dimensional data analysis: the curses and blessings of dimensionality. In: Aide-Memoire of a Lecture at the AMS Conference on Math Challenges of the 21st Century (2000)

    Google Scholar 

  19. Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Nat. Acad. Sci. 95(25), 1486–14868 (1998)

    Article  Google Scholar 

  20. Filippone, M., Masulli, F., Rovetta, S., Mitra, S., Banka, H.: Possibilistic Approach to Biclustering: An Application to Oligonucleotide Microarray Data Analysis. In: Priami, C. (ed.) CMSB 2006. LNCS (LNBI), vol. 4210, pp. 312–322. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  21. Hadamard, J.: Lectures on Cauchy’s Problem in Linear Partial Differential Equations. Dover Phoenix edn. Dover Publications, New York (1923)

    MATH  Google Scholar 

  22. Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67(337), 123–129 (1972)

    Article  Google Scholar 

  23. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice-Hall, Upper Saddle River (1999)

    MATH  Google Scholar 

  24. Kailing, K., Kriegel, H.P., Kröger, P.: Density-connected subspace clustering for high-dimensional data. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 246–257 (2004)

    Google Scholar 

  25. Koffka, K.: Principles of Gestalt Psychology. Harcourt, Brace, New York (1935)

    Google Scholar 

  26. Köhler, W.: Gestalt Psychology. Liveright, New York (1929)

    Google Scholar 

  27. Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data (TKDD) 3(1), 1–58 (2009)

    Article  Google Scholar 

  28. Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, New York (2004)

    Book  MATH  Google Scholar 

  29. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  30. Laney, D.: 3D data management: controlling data volume, velocity and variety. Gartner. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Accessed 6 February 2001

  31. Laney, D.: The importance of ‘Big Data’: a definition. Gartner. http://www.gartner.com/resId=2057415. Accessed 21 June 2012

  32. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE Trans. Comput. Biol. Bioinf. 1, 24–45 (2004)

    Article  Google Scholar 

  33. Mitra, S., Banka, H.: Multi-objective evolutionary biclustering of gene expression data. Pattern Recogn. 39(12), 2464–2477 (2006)

    Article  MATH  Google Scholar 

  34. Moore, G.E.: Cramming more components onto integrated circuits. Electronics 38(8), 114–117 (1965). Reprinted in. Proc. IEEE 86(1), 82–85 (1998)

    Article  MathSciNet  Google Scholar 

  35. Rokach, L., Maimon, O.: Clustering methods. In: Rokach, L., Maimon, O. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 321–352. Springer, USA (2005)

    Chapter  Google Scholar 

  36. Rovetta, S., Masulli, F.: Shared farthest neighbor approach to clustering of high dimensionality, low cardinality data. Pattern Recogn. 39, 2415–2425 (2006)

    Article  MATH  Google Scholar 

  37. Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. (Br. Comput. Soc.) 16(1), 30–34 (1973)

    Google Scholar 

  38. Steinbach, M., Ertoz, L., Kumar, V.: Challenges of clustering high dimensional data. In: Wille, L.T. (ed.) Proceedings of New Directions in Statistical Physics Econophysics, Bioinformatics, and Pattern Recognition, pp. 273–307. Springer, Berlin (2004)

    Google Scholar 

  39. Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18, S136–S144 (2002)

    Article  Google Scholar 

  40. Yang, J., Wang, H., Wang, W., Yu, P.: Enhanced biclustering on expression data. In: Proceedings of the Third IEEE Symposium on BioInformatics and Bioengineering (BIBE03), pp. 1–7 (2003)

    Google Scholar 

  41. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceeding ACL 1995 Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 189–196 (1995)

    Google Scholar 

  42. Wertheimer, M.: Untersuchungen zur Lehre von der Gestalt II. Psychologische Forschung 4, 301–350 (1923)

    Article  Google Scholar 

  43. Zhang, Z., Teo, A., Ooi, B.C., Tan, K.-L. : Mining deterministic biclusters in gene expression data. In: Proceedings of the Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE04), pp. 283–292 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Masulli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Masulli, F., Rovetta, S. (2015). Clustering High-Dimensional Data. In: Masulli, F., Petrosino, A., Rovetta, S. (eds) Clustering High--Dimensional Data. CHDD 2012. Lecture Notes in Computer Science(), vol 7627. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48577-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48577-4_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48576-7

  • Online ISBN: 978-3-662-48577-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics