Machine Learning and Knowledge Discovery in Databases
Volume 5212 of the series Lecture Notes in Computer Science pp 374-389
Parallel Spectral Clustering
- Yangqiu SongAffiliated withDepartment of Automation, Tsinghua UniversityGoogle Research,
- , Wen-Yen ChenAffiliated withDepartment of Computer Science, University of CaliforniaGoogle Research,
- , Hongjie BaiAffiliated withGoogle Research,
- , Chih-Jen LinAffiliated withDepartment of Computer Science, National Taiwan UniversityGoogle Research,
- , Edward Y. ChangAffiliated withGoogle Research,
Abstract
Spectral clustering algorithm has been shown to be more effective in finding clusters than most traditional algorithms. However, spectral clustering suffers from a scalability problem in both memory use and computational time when a dataset size is large. To perform clustering on large datasets, we propose to parallelize both memory use and computation on distributed computers. Through an empirical study on a large document dataset of 193,844 data instances and a large photo dataset of 637,137, we demonstrate that our parallel algorithm can effectively alleviate the scalability problem.
Keywords
Parallel spectral clustering distributed computing- Title
- Parallel Spectral Clustering
- Book Title
- Machine Learning and Knowledge Discovery in Databases
- Book Subtitle
- European Conference, ECML PKDD 2008, Antwerp, Belgium, September 15-19, 2008, Proceedings, Part II
- Pages
- pp 374-389
- Copyright
- 2008
- DOI
- 10.1007/978-3-540-87481-2_25
- Print ISBN
- 978-3-540-87480-5
- Online ISBN
- 978-3-540-87481-2
- Series Title
- Lecture Notes in Computer Science
- Series Volume
- 5212
- Series ISSN
- 0302-9743
- Publisher
- Springer Berlin Heidelberg
- Copyright Holder
- Springer-Verlag Berlin Heidelberg
- Additional Links
- Topics
- Keywords
-
- Parallel spectral clustering
- distributed computing
- Industry Sectors
- eBook Packages
- Editors
- Authors
-
- Yangqiu Song (1) (4)
- Wen-Yen Chen (2) (4)
- Hongjie Bai (4)
- Chih-Jen Lin (3) (4)
- Edward Y. Chang (4)
- Author Affiliations
-
- 1. Department of Automation, Tsinghua University, Beijing, China
- 4. Google Research, , USA/China
- 2. Department of Computer Science, University of California, Santa Barbara, USA
- 3. Department of Computer Science, National Taiwan University, Taipie, Taiwan
Continue reading...
To view the rest of this content please follow the download PDF link above.