Clustering High-Dimensional Data

Masulli, Francesco; Rovetta, Stefano

doi:10.1007/978-3-662-48577-4_1

Francesco Masulli^16,17 &
Stefano Rovetta¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7627))

Included in the following conference series:

International Workshop on Clustering High-Dimensional Data

1372 Accesses
6 Citations

Abstract

This chapter introduces the task of clustering, concerning the definition of a structure aggregating the data, and the challenges related to its application to the unsupervised analysis of high-dimensional data. In the recent literature, many approaches have been proposed for facing this problem, as the development of efficient clustering methods for high-dimensional data is is a great challenge for Machine Learning as it is of vital importance to obtain safer decision-making processes and better decisions from the nowadays available Big Data, that can mean greater operational efficiency, cost reduction and risk reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Big Data Clustering Techniques: Recent Advances and Survey

Big Data Clustering: A Review

On Joint Dimension Reduction and Clustering of Categorical Data

References

Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Zimek, A.: Robust, complete, and efficient correlation clustering. In: International Conference on Data Mining SDM, pp. 413–418 (2007)
Google Scholar
Aggarwal, C.C., Procopiuc, C., Wolf, J., Yu, P.S., Park, J.-S.: Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 61–72 (1999)
Google Scholar
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional space. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 70–81 (2000)
Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 94–105 (1998)
Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 11(1), 5–33 (2005)
Article MathSciNet Google Scholar
Aristotle: Categories. In: Barnes, J. (ed.) The Complete Works of Aristotle. Translation J.L. Ackrill., vol. 2, pp. 3–24. Princeton University Press, Princeton(1995)
Google Scholar
Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)
Book MATH Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell (1981)
Book MATH Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Conference on Computational Learning Theory, pp. 92–100 (1998)
Google Scholar
Böhm, C., Kailing, K., Kriegel, H.-P., Kröger, P.: Density connected clustering with local subspace preferences. In: Fourth IEEE International Conference on Data Mining, pp. 27–34 (2004)
Google Scholar
Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 455–466 (2004)
Google Scholar
Bryan, K., Cunningham, P., Bolshakova, N.: Biclustering of expression data using simulated annealing. In: 18th IEEE Symposium on Computer-Based Medical Systems (CBMS 2005), pp. 383–388 (2005)
Google Scholar
Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 93–103. AAAI Press (2000)
Google Scholar
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 189–196 (1999)
Google Scholar
Cooper, J.M., Hutchinson, D.S. (eds.): Plato: Complete Works. Hackett Publishing Co., Inc., Indianapolis (1997)
Google Scholar
Dasgupta, S., Littman, M., McAllester, D.: PAC generalization bounds for co-training. Proc. Neural Inf. Process. Syst. 14, 375–382 (2001)
Google Scholar
Defays, D.: An efficient algorithm for a complete link method. Comput. J. (Br. Comput. Soc.) 20(4), 364–366 (1977)
Google Scholar
Donoho, D.L.: High-dimensional data analysis: the curses and blessings of dimensionality. In: Aide-Memoire of a Lecture at the AMS Conference on Math Challenges of the 21st Century (2000)
Google Scholar
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Nat. Acad. Sci. 95(25), 1486–14868 (1998)
Article Google Scholar
Filippone, M., Masulli, F., Rovetta, S., Mitra, S., Banka, H.: Possibilistic Approach to Biclustering: An Application to Oligonucleotide Microarray Data Analysis. In: Priami, C. (ed.) CMSB 2006. LNCS (LNBI), vol. 4210, pp. 312–322. Springer, Heidelberg (2006)
Chapter Google Scholar
Hadamard, J.: Lectures on Cauchy’s Problem in Linear Partial Differential Equations. Dover Phoenix edn. Dover Publications, New York (1923)
MATH Google Scholar
Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67(337), 123–129 (1972)
Article Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice-Hall, Upper Saddle River (1999)
MATH Google Scholar
Kailing, K., Kriegel, H.P., Kröger, P.: Density-connected subspace clustering for high-dimensional data. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 246–257 (2004)
Google Scholar
Koffka, K.: Principles of Gestalt Psychology. Harcourt, Brace, New York (1935)
Google Scholar
Köhler, W.: Gestalt Psychology. Liveright, New York (1929)
Google Scholar
Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data (TKDD) 3(1), 1–58 (2009)
Article Google Scholar
Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, New York (2004)
Book MATH Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Laney, D.: 3D data management: controlling data volume, velocity and variety. Gartner. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Accessed 6 February 2001
Laney, D.: The importance of ‘Big Data’: a definition. Gartner. http://www.gartner.com/resId=2057415. Accessed 21 June 2012
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE Trans. Comput. Biol. Bioinf. 1, 24–45 (2004)
Article Google Scholar
Mitra, S., Banka, H.: Multi-objective evolutionary biclustering of gene expression data. Pattern Recogn. 39(12), 2464–2477 (2006)
Article MATH Google Scholar
Moore, G.E.: Cramming more components onto integrated circuits. Electronics 38(8), 114–117 (1965). Reprinted in. Proc. IEEE 86(1), 82–85 (1998)
Article MathSciNet Google Scholar
Rokach, L., Maimon, O.: Clustering methods. In: Rokach, L., Maimon, O. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 321–352. Springer, USA (2005)
Chapter Google Scholar
Rovetta, S., Masulli, F.: Shared farthest neighbor approach to clustering of high dimensionality, low cardinality data. Pattern Recogn. 39, 2415–2425 (2006)
Article MATH Google Scholar
Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. (Br. Comput. Soc.) 16(1), 30–34 (1973)
Google Scholar
Steinbach, M., Ertoz, L., Kumar, V.: Challenges of clustering high dimensional data. In: Wille, L.T. (ed.) Proceedings of New Directions in Statistical Physics Econophysics, Bioinformatics, and Pattern Recognition, pp. 273–307. Springer, Berlin (2004)
Google Scholar
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18, S136–S144 (2002)
Article Google Scholar
Yang, J., Wang, H., Wang, W., Yu, P.: Enhanced biclustering on expression data. In: Proceedings of the Third IEEE Symposium on BioInformatics and Bioengineering (BIBE03), pp. 1–7 (2003)
Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceeding ACL 1995 Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 189–196 (1995)
Google Scholar
Wertheimer, M.: Untersuchungen zur Lehre von der Gestalt II. Psychologische Forschung 4, 301–350 (1923)
Article Google Scholar
Zhang, Z., Teo, A., Ooi, B.C., Tan, K.-L. : Mining deterministic biclusters in gene expression data. In: Proceedings of the Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE04), pp. 283–292 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi DIBRIS, Università di Genova, Genova, Italy
Francesco Masulli & Stefano Rovetta
Center for Biotechnology, Temple University, Philadelphia, USA
Francesco Masulli

Authors

Francesco Masulli
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Rovetta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francesco Masulli .

Editor information

Editors and Affiliations

DIBRIS, University of Genoa DIBRIS, Genoa, Italy
Francesco Masulli
University of Naples "Parthenope", Naples, Italy
Alfredo Petrosino
DIBRIS, University of Genoa, Genoa, Italy
Stefano Rovetta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Masulli, F., Rovetta, S. (2015). Clustering High-Dimensional Data. In: Masulli, F., Petrosino, A., Rovetta, S. (eds) Clustering High--Dimensional Data. CHDD 2012. Lecture Notes in Computer Science(), vol 7627. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48577-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-662-48577-4_1
Published: 25 November 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48576-7
Online ISBN: 978-3-662-48577-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Clustering High-Dimensional Data

Abstract

Access this chapter

Similar content being viewed by others

Big Data Clustering Techniques: Recent Advances and Survey

Big Data Clustering: A Review

On Joint Dimension Reduction and Clustering of Categorical Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Clustering High-Dimensional Data

Abstract

Access this chapter

Similar content being viewed by others

Big Data Clustering Techniques: Recent Advances and Survey

Big Data Clustering: A Review

On Joint Dimension Reduction and Clustering of Categorical Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation