Abstract
A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are cross-correlated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will produce over 100 terabytes per day. The future SKA telescope will even require in the order of exaflops, and petabits/s of I/O. A recent trend is to correlate in software instead of dedicated hardware, to increase flexibility and to reduce development efforts. We evaluate the correlator algorithm on multi-core CPUs and many-core architectures, such as NVIDIA and ATI GPUs, and the Cell/B.E. The correlator is a streaming, real-time application, and is much more I/O intensive than applications that are typically implemented on many-core hardware today. We compare with the LOFAR production correlator on an IBM Blue Gene/P supercomputer. We investigate performance, power efficiency, and programmability. We identify several important architectural problems which cause architectures to perform suboptimally. Our findings are applicable to data-intensive applications in general. The processing power and memory bandwidth of current GPUs are highly imbalanced for correlation purposes. While the production correlator on the Blue Gene/P achieves a superb 96% of the theoretical peak performance, this is only 16% on ATI GPUs, and 32% on NVIDIA GPUs. The Cell/B.E. processor, in contrast, achieves an excellent 92%. We found that the Cell/B.E. and NVIDIA GPUs are the most energy-efficient solutions, they run the correlator at least 4 times more energy efficiently than the Blue Gene/P. The research presented is an important pathfinder for next-generation telescopes.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Advanced Micro Devices Corporation (AMD): AMD Stream Computing User Guide, Revision 1.1 (2008)
Barker, K.J., Davis, K., Hoisie, A., Kerbyson, D.J., Lang, M., Pakin, S., Sancho, J.C.: Entering the petaflop era: the architecture and performance of Roadrunner. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing (SC’08), Austin, Texas. IEEE Press. ISBN:978-1-4244-2835-9 (2008)
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: Stream computing on graphics hardware. In ACM transactions on graphics, Proceedings of SIGGRAPH 2004, pp. 777–786, Los Angeles, California. ACM Press (2004)
de Souza, L., Bunton, J.D., Campbell-Wilson, D., Cappallo, R.J. Kincaid, B.: A radio astronomy correlator optimized for the Xilinx Virtex-4 SX FPGA. In international conference on field programmable logic and applications (FPL’07), pp. 62–67, (2007)
Gschwind M., Hofstee H.P., Flachs B.K., Hopkins M., Watanabe Y., Yamazaki T.: Synergistic processing in cell’s multicore architecture. IEEE Micro. 26(2), 10–24 (2006)
Harris C., Haines K., Staveley-Smith L.: GPU accelerated radio astronomy signal convolution. Exp. Astron. 22(1–2), 129–141 (2008)
IBMBlue Gene team: Overview of the IBM Blue Gene/P project. IBM J. Res. Develop. 52(1/2), 199–220 (2008)
Johnston S., Taylor R., Bailes M. et al.: Science with ASKAP. The Australian square-kilometre-array pathfinder. Exp. Astron. 22(3), 151–273 (2008)
Khronos OpenCL Working Group. The opencl specification. version 1.0. See http://www.khronos.org/opencl/ (2009)
Lazowska E.D., Zahorjana J., Graham G.S., Sevcik K.C.: Quantitative System Performance, Computer System Analysis Using Queueing Network Models. Prentice-Hall, USA (1984)
Mattson, T.G., der Wijngaart, R.V., Frumkin, M.: Programming the Intel 80-core network-on-a-chip terascale processor. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing (SC’08), pages 1–11, Austin, Texas, (2008)
NVIDIA CUDA Compute Unified Device Architecture Programming Guide Version 2.0, july (2008)
Owens J.D., Luebke D., Govindaraju N., Harris M., Krüger J., Lefohn A.E., Purcell T.: A survey of general-purpose computation on graphics hardware. Comp. Graph. Forum 26(1), 80–113 (2007)
Romein, J.W., Broekema, P.C., Mol, J.D., van Nieuwpoort, Rob V.: The LOFAR correlator: implementation and performance analysis. In 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010), Bangalore, India. Accepted for for publication. See http://www.astron.nl/~romein/papers/ (2010)
Romein, J.W., Broekema, P.C., van Meijeren, E., van der Schaaf, K., Zwart, W.H.: Astronomical real-time streaming signal processing on a Blue Gene/L supercomputer. In ACM Symposium on Parallel Algorithms and Srchitectures (SPAA’06), pp. 59–66, Cambridge, MA, July (2006)
Schilizzi, R.T., Dewdney, P.E.F., Lazio, T.J.W.: The Square Kilometre Array. Proceedings of SPIE, 7012, july (2008)
Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., Hanrahan, P.: Larrabee: A many-core x86 architecture for visual computing. ACM Trans. Graph., 27(3), August (2008)
Silberstein, M., Schuster, A., Geiger, D., Patney, A., Owens, J.D.: Efficient computation of sum-products on GPUs through software-managed cache. In Proceedings of the 22nd ACM International Conference on Supercomputing, pp. 309–318, June (2008)
The Karoo Array Telescope (MeerKAT). See http://www.ska.ac.za/
van Nieuwpoort, Rob V., Romein, J.W.: Using many-core hardware to correlate radio astronomy signals. In Proceedings of the ACM International Conference on Supercomputing (ICS’09), pp. 440–449, Yorktown Heights, New York, USA, June (2009)
Varbanescu, A., van Amesfoort, A., Cornwell, T., van Diepen, G., van Nieuwpoort, R., Elmegreen, B., Sips, H.: Building high-resolution sky images using the cell/B.E. scientific programming (accepted, to appear) Special issue on high performance computing on the cell BE, (2008)
Wayth R.B., Greenhill L.J., Briggs F.H.: A GPU-based real-time software correlation system for the murchison widefield array prototype. Pub. Astron. Soc. Pacific 121, 857–865 (2009)
Williams, S., Datta, K., Carter, J., Oliker, L., Half, J., Yelick, K., Bailey, D.: PERI–Auto-tuning memory-intensive kernels for multicore. J. Phys.: Conference Series 125(012038), (2008)
Williams, S., Waterman, A., Patterson, D.: Roofline: An insightful visual performance model for floating-point programs and multicore architectures. Communications of the ACM (CACM), (2009). (to appear)
Acknowledgements
This work was performed in the context of the NWO STARE AstroStream project. We gratefully acknowledge NVIDIA, and in particular Dr. David Luebke, for providing freely some of the GPU cards used in this work. Finally, we thank Chris Broekema, Jan David Mol, and Alexander van Amesfoort for their comments on an earlier version of this paper.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
van Nieuwpoort, R.V., Romein, J.W. Correlating Radio Astronomy Signals with Many-Core Hardware. Int J Parallel Prog 39, 88–114 (2011). https://doi.org/10.1007/s10766-010-0144-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-010-0144-3