The Journal of Supercomputing

, Volume 69, Issue 1, pp 452–467

A parallel clustering method combined information bottleneck theory and centroid-based clustering

Article

DOI: 10.1007/s11227-014-1174-1

Cite this article as:
Sun, Z., Fox, G., Gu, W. et al. J Supercomput (2014) 69: 452. doi:10.1007/s11227-014-1174-1

Abstract

Clustering is an important research topic of data mining. Information bottleneck theory-based clustering method is suitable for dealing with complicated clustering problems because that its information loss metric can measure arbitrary statistical relationships between samples. It has been widely applied to many kinds of areas. With the development of information technology, the electronic data scale becomes larger and larger. Classical information bottleneck theory-based clustering method is out of work to deal with large-scale dataset because of expensive computational cost. Parallel clustering method based on MapReduce model is the most efficient method to deal with large-scale data-intensive clustering problems. A parallel clustering method based on MapReduce model is developed in this paper. In the method, parallel information bottleneck theory clustering method based on MapReduce is proposed to determine the initial clustering center. An objective method is proposed to determine the final number of clusters automatically. Parallel centroid-based clustering method is proposed to determine the final clustering result. The clustering results are visualized with interpolation MDS dimension reduction method. The efficiency of the method is illustrated with a practical DNA clustering example.

Keywords

Clustering Information bottleneck theory MapReduce Centroid-based clustering 

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Zhanquan Sun
    • 1
  • Geoffrey Fox
    • 2
  • Weidong Gu
    • 1
  • Zhao Li
    • 3
  1. 1.Key Laboratory for Computer Network of Shandong ProvinceShandong Computer Science CenterJinan China
  2. 2.School of Informatics and Computing, Pervasive Technology InstituteIndiana University BloomingtonBloomingtonUSA
  3. 3.School of Software EngineeringBeijing Jiaotong UniversityBeijing China

Personalised recommendations