Clustering method based on data division and partition

Lu, Zhi-mao; Liu, Chen; Massinanke, S.; Zhang, Chun-xiang; Wang, Lei

doi:10.1007/s11771-014-1932-5

Clustering method based on data division and partition

Published: 01 March 2014

Volume 21, pages 213–222, (2014)
Cite this article

Journal of Central South University Aims and scope Submit manuscript

Zhi-mao Lu (卢志茂)¹,
Chen Liu (刘晨)¹,
S. Massinanke¹,
Chun-xiang Zhang (张春祥)² &
…
Lei Wang (王蕾)³

61 Accesses
5 Citations
Explore all metrics

Abstract

Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets (VLDS). In this work, a novel division and partition clustering method (DP) was proposed to solve the problem. DP cut the source data set into data blocks, and extracted the eigenvector for each data block to form the local feature set. The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector. Ultimately according to the global eigenvector, the data set was assigned by criterion of minimum distance. The experimental results show that it is more robust than the conventional clusterings. Characteristics of not sensitive to data dimensions, distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

K-Means and BIRCH: A Comparative Analysis Study

Connected graph decomposition for spectral clustering

Article 10 September 2018

On the Impact of Post-clustering Phase in Multi-way Spectral Partitioning

References

MACQUEEN J B. Some methods for classification and analysis of multivariate observations [C]// The 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA, 1967: 281–297.
Google Scholar
LI Qing-feng, PENG Wen-feng. A new clustering algorithm for large datasets [J]. Journal of Central South University of Technology, 2011, 18(3): 823–829.
Article Google Scholar
FREY B J, DUECK D. Clustering by passing message between data points [J]. Science, 2007, 315: 972–976.
Article MATH MathSciNet Google Scholar
ZHANG Tian, RAMAKRISHNA R, LIVNY M. Birch: An efficient data clustering method for large databases [C]// Proceedings of ACM-SIGMOD International Conference on Management of Data. Montreal, 1996: 103–114.
Google Scholar
GUHA S, RASTOGI R, SHIM K. Cure: An efficient clustering algorithm for large databases [C]// Proceedings of 1998 ACM-SIGMOD International Conference on Management of Data. Seattle, 1998: 73–84.
Chapter Google Scholar
NG R T, HAN J. Efficient and effective clustering methods for spatial data mining [C]// Proceeding of the 20th VLDB Conference Santiago. Chile, 1994: 144–155.
Google Scholar
SHEIKHOLESLAMI G, CHATTERJEE S, ZHANG A. WaveCluster: A muti-resolution clustering approach for very large spatial databases [C]// Proceedings of 24th International Conference on Very Large Database. New York, 1998: 428–439.
Google Scholar
KIDDLE S J, WINDRAM O P, MCHATTIE S. Temporal clustering by affinity propagation reveals transcriptional modules in Arabidopsis thaliana [J]. Bioinformatics, 2010, 26(3): 355–362.
Article Google Scholar
CHANG Chin-liang. Finding prototypes for nearest neighbor classifiers [J]. IEEE Transactions on Computers C, 1974, 23(11): 1179–1184.
Article MATH Google Scholar
LIANG Jiu-zhen, SONG Wei. Clustering based on Steiner points [J]. International Journal of Machine Learning and Cybernetics, DOI: 10.1007/s13042-011-0047-7.
BRADLEY P, FAYYAD U, REINA C. Scaling clustering algorithms to large databases [C]// Proceedings of the 4th International Conference on Knowledge Discovery & Data Mining. Redmond, USA, 1998: 9–15.
Google Scholar
BRADLEY P, FAYYAD U, REINA C. Scaling EM (Expectation-Maximization) clustering to large databases [R]. Redmond: Technical Report MSR-TR-98-35, Microsoft Research, 1998: 9–15.
Google Scholar
DEMPSTER A P, LAIRD N M, RUBIN D B. Maximum likelihood from incomplete data via the EM algorithm [J]. Journal of the Royal Statistical Society: Series B, 1977, 39(1): 1–38.
MATH MathSciNet Google Scholar
MÉZARD M, PARISI G, ZECCHINA R. Analytic and algorithmic solution of random satisfiability problems [J]. Science, 2002, 297: 812–815.
Article Google Scholar
Machine-learning-databases [EB/OL]. 2011-09-25. http://archive.ics.uci.edu/ml/machine-learning-databases/.
WITTEN L H, FRANK E, HALL M A. Data ming: Practical machine learning tools and techniques [M]. 3rd edition. Burlington, USA: Morgan Kaufmann, 2011: 173–182.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Communication Engineering, Harbin Engineering University, Harbin, 150001, China
Zhi-mao Lu (卢志茂), Chen Liu (刘晨) & S. Massinanke
School of Software, Harbin University of Science and Technology, Harbin, 150001, China
Chun-xiang Zhang (张春祥)
School of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, China
Lei Wang (王蕾)

Authors

Zhi-mao Lu (卢志茂)
View author publications
You can also search for this author in PubMed Google Scholar
Chen Liu (刘晨)
View author publications
You can also search for this author in PubMed Google Scholar
S. Massinanke
View author publications
You can also search for this author in PubMed Google Scholar
Chun-xiang Zhang (张春祥)
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang (王蕾)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Liu (刘晨).

Additional information

Foundation item: Projects(60903082, 60975042) supported by the National Natural Science Foundation of China; Project(20070217043) supported by the Research Fund for the Doctoral Program of Higher Education of China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Zm., Liu, C., Massinanke, S. et al. Clustering method based on data division and partition. J. Cent. South Univ. 21, 213–222 (2014). https://doi.org/10.1007/s11771-014-1932-5

Download citation

Received: 20 August 2012
Accepted: 25 March 2013
Published: 01 March 2014
Issue Date: January 2014
DOI: https://doi.org/10.1007/s11771-014-1932-5

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering method based on data division and partition

Abstract

Access this article

Similar content being viewed by others

K-Means and BIRCH: A Comparative Analysis Study

Connected graph decomposition for spectral clustering

On the Impact of Post-clustering Phase in Multi-way Spectral Partitioning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Clustering method based on data division and partition

Abstract

Access this article

Similar content being viewed by others

K-Means and BIRCH: A Comparative Analysis Study

Connected graph decomposition for spectral clustering

On the Impact of Post-clustering Phase in Multi-way Spectral Partitioning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation