Kernel density estimation in accelerators

Lopez-Novoa, Unai; Mendiburu, Alexander; Miguel-Alonso, Jose

doi:10.1007/s11227-015-1577-7

Kernel density estimation in accelerators

Implementation and performance evaluation

Published: 08 December 2015

Volume 72, pages 545–566, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Unai Lopez-Novoa¹^nAff2,
Alexander Mendiburu¹ &
Jose Miguel-Alonso¹

344 Accesses
2 Citations
Explore all metrics

Abstract

Kernel density estimation (KDE) is a popular technique used to estimate the probability density function of a random variable. KDE is considered a fundamental data smoothing algorithm, and it is a common building block in many scientific applications. In a previous work we presented S-KDE, an efficient algorithmic approach to compute KDE that outperformed other state-of-the-art implementations, providing accurate results in much reduced execution times. Its parallel implementation targeted multi- and many-core processors. In this work we present an OpenCL implementation of S-KDE, targeting modern accelerators in a portable way. We test our implementation on three accelerators from different manufacturers, achieving speedups around \(5\times \) compared to a hand-tuned serial version of S-KDE. We also analyze the performance of the code in these accelerators, to find out to what extent our code exploits their capabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Age: An Overview of Multi-kernels

Communication-avoiding kernel ridge regression on parallel and distributed systems

Article 01 September 2021

Examining parallelization in kernel regression

Article 04 November 2023

Notes

References

Agosta G, Barenghi A, Di Federico A, Pelosi G (2015) Opencl performance portability for general-purpose computation on graphics processor units: an exploration on cryptographic primitives. Concurr Comput Pract Exp 27(14):3633–3660
Article Google Scholar
AMD (2013) App opencl programming guide. http://developer.amd.com/tools/hc/AMDAPPSDK/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf
Cramer T, Schmidl D, Klemm M, an Mey D (2012) Openmp programming on intel xeon phi coprocessors: an early performance comparison. In: Proceedings of the many-core applications research community symposium, pp 38–44
Danalis A, Marin G, McCurdy C, Meredith JS, Roth PC, Spafford K, Tipparaju V, Vetter JS (2010) The scalable heterogeneous computing (shoc) benchmark suite. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units, ACM, New York, GPGPU ’10, pp 63–74
Elgammal A, Duraiswami R, Davis L (2003) Efficient kernel density estimation using the fast gauss transform with applications to color modeling and tracking. IEEE Trans Pattern Anal Mach Intell 25(11):1499–1504
Article Google Scholar
Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic Press Professional Inc, San Diego
MATH Google Scholar
Jeffers J, Reinders J (2013) Intel Xeon Phi Coprocessor High Performance Programming, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco
Google Scholar
Jia H, Zhang Y, Long G, Xu J, Yan S, Li Y (2012) Gpuroofline: a model for guiding performance optimizations on gpus. Euro-Par Parallel Processing, Lecture Notes in Computer Science, vol 7484. Springer, Berlin, pp 920–932
Google Scholar
Khronos OpenCL Working Group , Munshi A (ed) (2008) The OpenCL specification. Khronos Group, Beaverton, OR
Kim KH, Kim K, Park QH (2011) Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model. Comput Phys Commun 182(6):1201–1207
Article MATH Google Scholar
Kirk DB, WmW Hwu (2010) Programming Massively Parallel Processors: A Hands-on Approach, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco
Google Scholar
Lattner C, Adve V (2004) Llvm: a compilation framework for lifelong program analysis transformation. In: Proceedings of the international symposium on code generation and optimization, CGO, pp 75–86
Lee VW, Kim C, Chhugani J, Deisher M, Kim D, Nguyen AD, Satish N, Smelyanskiy M, Chennupaty S, Hammarlund P, Singhal R, Dubey P (2010) Debunking the 100x gpu vs. cpu myth: an evaluation of throughput computing on cpu and gpu. SIGARCH Comput Archit News 38(3):451–460
Article Google Scholar
Lopez-Novoa U, Mendiburu A, Miguel-Alonso J (2015a) A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans Parallel Distrib Syst 26(1):272–281
Lopez-Novoa U, Sáenz J, Mendiburu A, Miguel-Alonso J (2015b) An efficient implementation of kernel density estimation for multi-core and many-core architectures. Int J High Perform Comput Appl 29(3):331–347
Article Google Scholar
Lopez-Novoa U, Sáenz J, Mendiburu A, Miguel-Alonso J, Errasti I, Esnaola G, Ezcurra A, Ibarra-Berastegi G (2015c) Multi-objective environmental model evaluation by means of multidimensional kernel density estimators: Efficient and multi-core implementations. Environ Model Softw 63:123–136
Article Google Scholar
Munshi A, Gaster B, Mattson TG, Fung J, Ginsburg D (2011) OpenCL Programming Guide, 1st edn. Addison-Wesley Professional, USA
Google Scholar
Nickolls J, Dally W (2010) The gpu computing era. IEEE Micro 30(2):56–69
Article Google Scholar
NVIDIA (2012) Opencl best practices guide. http://www.nvidia.com/content/cudazone/CUDABrowser/downloads/papers/NVIDIA_OpenCL_BestPracticesGuide.pdf
Pennycook S, Hammond S, Wright S, Herdman J, Miller I, Jarvis S (2013) An investigation of the performance portability of opencl. J Parallel Distrib Comput 73(11):1439–1450
Article Google Scholar
Seo S, Lee J, Jo G, Lee J (2013) Automatic opencl work-group size selection for multicore cpus. In: Proceedings of the 22nd international conference on parallel architectures and compilation techniques (PACT), pp 387–397
Sheather SJ (2004) Density estimation. Statist Sci 588–597
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman & Hall, London
Torres Y, Gonzalez-Escribano A, Llanos DR (2013) ubench: exposing the impact of cuda block geometry in terms of performance. J Supercomput 65(3):1150–1163
Article Google Scholar
Wang Y, Qin Q, SEE SCW, Lin J (2013) Performance portability evaluation for openacc on intel knights corner and nvidia kepler. In: HPC China 2013
Weissbach R (2006) A general kernel functional estimator with general bandwidth-strong consistency and applications. J Nonparam Stat 18(1):1–12
Article MathSciNet MATH Google Scholar
Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76
Article Google Scholar

Download references

Author information

Unai Lopez-Novoa
Present address: Deusto Institute of Technology, DeustoTech, University of Deusto, Avenida de las Universidades 24, 48007, Bilbao, Spain

Authors and Affiliations

Department of Computer Architecture and Technology, Intelligent Systems Group, University of the Basque Country UPV/EHU, P. Manuel Lardizabal 1, 20018, San Sebastián, Gipuzkoa, Spain
Unai Lopez-Novoa, Alexander Mendiburu & Jose Miguel-Alonso

Authors

Unai Lopez-Novoa
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Mendiburu
View author publications
You can also search for this author in PubMed Google Scholar
Jose Miguel-Alonso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Unai Lopez-Novoa.

Additional information

This work has been partially supported by the Saiotek and Research Groups 2013-2018 (IT- 609-13) programs (Basque Government), TIN2013-41272P (Ministry of Science and Technology), COMBIOMED-RD07/0067/0003 network in computational biomedicine (Carlos III Health Institute) and by the NICaiA Project PIRSES-GA-2009-247619 (European Commission).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lopez-Novoa, U., Mendiburu, A. & Miguel-Alonso, J. Kernel density estimation in accelerators. J Supercomput 72, 545–566 (2016). https://doi.org/10.1007/s11227-015-1577-7

Download citation

Published: 08 December 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s11227-015-1577-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel density estimation in accelerators

Abstract

Access this article

Similar content being viewed by others

A New Age: An Overview of Multi-kernels

Communication-avoiding kernel ridge regression on parallel and distributed systems

Examining parallelization in kernel regression

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Kernel density estimation in accelerators

Abstract

Access this article

Similar content being viewed by others

A New Age: An Overview of Multi-kernels

Communication-avoiding kernel ridge regression on parallel and distributed systems

Examining parallelization in kernel regression

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation