Chapter

Machine Learning and Knowledge Discovery in Databases

Volume 5212 of the series Lecture Notes in Computer Science pp 374-389

Parallel Spectral Clustering

  • Yangqiu SongAffiliated withDepartment of Automation, Tsinghua UniversityGoogle Research,  
  • , Wen-Yen ChenAffiliated withDepartment of Computer Science, University of CaliforniaGoogle Research,  
  • , Hongjie BaiAffiliated withGoogle Research,  
  • , Chih-Jen LinAffiliated withDepartment of Computer Science, National Taiwan UniversityGoogle Research,  
  • , Edward Y. ChangAffiliated withGoogle Research,  

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Spectral clustering algorithm has been shown to be more effective in finding clusters than most traditional algorithms. However, spectral clustering suffers from a scalability problem in both memory use and computational time when a dataset size is large. To perform clustering on large datasets, we propose to parallelize both memory use and computation on distributed computers. Through an empirical study on a large document dataset of 193,844 data instances and a large photo dataset of 637,137, we demonstrate that our parallel algorithm can effectively alleviate the scalability problem.

Keywords

Parallel spectral clustering distributed computing