GPU accelerated radio astronomy signal convolution
The increasing array size of radio astronomy interferometers is causing the associated computation to scale quadratically with the number of array signals. Consequently, efficient usage of alternate processing architectures should be explored in order to meet this computational challenge. Affordable parallel processors have been made available to the general scientific community in the form of the commodity graphics card. This work investigates the use of the Graphics Processing Unit in the parallelisation of the combined conjugate multiply and accumulation stage of a correlator for a radio astronomy array. Using NVIDIA’s Compute Unified Device Architecture, our testing shows processing speeds from one to two orders of magnitude faster than a Central Processing Unit approach.
KeywordsCorrelation CUDA Data parallel Radio astronomy
The authors thank Frank Briggs for providing Fortran code of a FX correlator, which was used as a reference for both the CPU and GPU implementations. The authors thank Paul Bourke for producing the diagrams in Fig. 2.
- 1.Venkatasubramanian, S.: The graphics card as a stream computer. In: SIGMOD-DIMACS Workshop on Management and Processing of Data Streams. San Diego, June (2003)Google Scholar
- 2.Fernando, R. (ed.): GPU Gems: Programming Techniques, Tips, and Tricks for Real-Time Graphics. Addison-Wesley (2004)Google Scholar
- 4.Coombe, G., Harris, M.J., Lastra, A.: Radiosity on graphics hardware. In: Graphics Interface, pp. 161–168. London, 17–19 May (2004)Google Scholar
- 5.Moreland, K., Angel, E.: The FFT on a GPU. In: Graphics Hardware. San Diego, July (2003)Google Scholar
- 7.NVIDA: CUDA Programming Guide 1.0. (2007)Google Scholar
- 8.Romney, J.D.: Cross correlators. Astron. Soc. Pac. Conf. Ser. 180 (1999)Google Scholar
- 10.Deller, A.T., Tingay, S.J., Bailes, M., West, C.: DiFX: A software correlator for very long baseline interferometry using multi-processor computing environments. Astro-ph/0702141 (2007)Google Scholar
- 11.B.J.D., et al.: Field deployment of prototype antenna tiles for the Mileura Widefield Array low frequency demonstrator. ArXiv:astro-ph/0611751 133, 1505 (2007)Google Scholar
- 13.Harrison, O., Waldron, J.: Optimising data movement rates for parallel processing applications on graphics processors. In: Parallel and Distributed Computing and Networks. Innsbruck, 13–15 February (2007)Google Scholar
- 14.Fung, W.W.L., Sham, I., Yuan, G., Aamodt, T.M.: Dynamic warp formation and scheduling for efficient GPU control flow. In: MICRO ’07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 407–420. IEEE Computer Society, Washington, DC (2007)Google Scholar