High-Performance Cluster Estimation Using Many-Core Models
This paper presents implementation of high-performance subtractive clustering on a single instruction multiple data (SIMD) many-core processor. Since there is no general consensus that which grain size of the many-core processor provides the maximum performance, this paper explores the effects of varying the number of processing elements (PEs) and different amount of memories by introducing image data-per-processing element (IDPE) variation as a design variable, which is the amount of image data directly mapped to each PE. In this study, five PE configurations (IDPEs = 16, 64, 256, 1,024, and 4,096) are used for evaluating the system performance in terms of execution time and system utilization. In addition, this paper compares the proposed approach with a CPU-based implementation to show the potential for the improved performance of the proposed approach. Experimental results show that the proposed approach achieves 16.73× speedup at PEs = 4,096 over the CPU-based implementation.
KeywordsCluster estimation Fuzzy c-means Image segmentation Many-core architecture
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. NRF-2013R1A2A2A05004566).
- 1.Qing Y, Dongxu Z, Feng T (2010) An initialization method for fuzzy C-means algorithm using subtractive clustering. In: International conference on intelligent networks and intelligent systems, Shenyang, pp 393–396Google Scholar
- 4.Xiao S, Feng W–C (2010) Inter-block GPU communication via fast barrier synchronization. In: IEEE international symposium on parallel and distributed processing, Paris, pp 1–12Google Scholar
- 5.Feng W–C, Xiao S (2010) To GPU synchronize or not GPU synchronize? In: IEEE international symposium on circuits and systems, Paris, pp 3801–3804Google Scholar