Clustering Very Large Data Sets with Principal Direction Divisive Partitioning

Littau, D.; Boley, D.

doi:10.1007/3-540-28349-8_4

D. Littau⁵ &
D. Boley⁵

9214 Accesses
7 Citations

Summary

We present a method to cluster data sets too large to fit in memory, based on a Low-Memory Factored Representation (LMFR). The LMFR represents the original data in a factored form with much less memory, while preserving the individuality of each of the original samples. The scalable clustering algorithm Principal Direction Divisive Partitioning (PDDP) can use the factored form in a natural way to obtain a clustering of the original dataset.

The resulting algorithm is the PieceMeal PDDP (PMPDDP) method. The scalability of PMPDDP is demonstrated with a complexity analysis and experimental results. A discussion on the practical use of this method by a casual user is provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

University of Minnesota, Minneapolis, MN, 55455, USA
D. Littau & D. Boley

Authors

D. Littau
View author publications
You can also search for this author in PubMed Google Scholar
D. Boley
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics and Statistics, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, Maryland, 21250, USA
Jacob Kogan
Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, Maryland, 21250, USA
Jacob Kogan
Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, Maryland, 21250, USA
Charles Nicholas
School of Mathematical Sciences, Tel-Aviv University, Ramat Aviv, Tel-Aviv, 69978, Israel
Marc Teboulle

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Littau, D., Boley, D. (2006). Clustering Very Large Data Sets with Principal Direction Divisive Partitioning. In: Kogan, J., Nicholas, C., Teboulle, M. (eds) Grouping Multidimensional Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28349-8_4

Download citation

DOI: https://doi.org/10.1007/3-540-28349-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28348-5
Online ISBN: 978-3-540-28349-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics