Emerging Architectures Enable to Boost Massively Parallel Data Mining Using Adaptive Sparse Grids

  • Alexander HeineckeEmail author
  • Dirk Pflüger


Gaining knowledge out of vast datasets is a main challenge in data-driven applications nowadays. Sparse grids provide a numerical method for both classification and regression in data mining which scales only linearly in the number of data points and is thus well-suited for huge amounts of data. Due to the recursive nature of sparse grid algorithms and their classical random memory access pattern, they impose a challenge for the parallelization on modern hardware architectures such as accelerators. In this paper, we present the parallelization on several current task- and data-parallel platforms, covering multi-core CPUs with vector units, GPUs, and hybrid systems. We demonstrate that a less efficient implementation from an algorithmical point of view can be beneficial if it allows vectorization and a higher degree of parallelism instead. Furthermore, we analyze the suitability of parallel programming languages for the implementation. Considering hardware, we restrict ourselves to the x86 platform with SSE and AVX vector extensions and to NVIDIA’s Fermi architecture for GPUs. We consider both multi-core CPU and GPU architectures independently, as well as hybrid systems with up to 12 cores and 2 Fermi GPUs. With respect to parallel programming, we examine both the open standard OpenCL and Intel Array Building Blocks, a recently introduced high-level programming approach, and comment on their ease of use. As the baseline, we use the best results obtained with classically parallelized sparse grid algorithms and their OpenMP-parallelized intrinsics counterpart (SSE and AVX instructions), reporting both single and double precision measurements. The huge data sets we use are a real-life dataset stemming from astrophysics and artificial ones, all of which exhibit challenging properties. In all settings, we achieve excellent results, obtaining speedups of up to 188 × using single precision on a hybrid system.


Sparse grids Adaptivity SIMD Many-core Multi-core Accelerators GPGPU Hybrid acceleration 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adelman-McCarthy J.K. et al.: The fifth data release of the Sloan Digital Sky Survey. ApJS 172, 634–644 (2007)CrossRefGoogle Scholar
  2. 2.
    Agullo E., Demmel J., Dongarra J., Hadri B., Kurzak J., Langou J., Ltaief H., Luszczek P., Tomov S.: Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J. Phys. Conf. Ser. 180(1), 012037 (2009)CrossRefGoogle Scholar
  3. 3.
    Allen D.M.: The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16(1), 125–127 (1974)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Auckenthaler T., Blum V., Bungartz H.-J., Huckle T., Johanni R., Krämer L., Lang B., Lederer H., Willems P.: Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Comput. 37(12), 783–794 (2011)CrossRefGoogle Scholar
  5. 5.
    Auer, B.O.F., Bisseling, R.H.: A GPU algorithm for greedy graph matching. In: Facing the Multi-core Challenge II, Karlsruhe, Germany (2011, accepted)Google Scholar
  6. 6.
    Bellman R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)zbMATHGoogle Scholar
  7. 7.
    Benk, J., Bungartz, H.-J., Nagy, A.-E., Schraufstetter, S.: An option pricing framework based on theta-calculus and sparse grids. In: Progress in Industrial Mathematics at ECMI 2010, Oct. 2010Google Scholar
  8. 8.
    Bungartz H.-J., Griebel M.: Sparse grids. Acta Numer. 13, 147–269 (2004)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Bungartz, H.-J., Heinecke, A., Pflüger, D., Schraufstetter, S.: Parallelizing a Black-Scholes solver based on finite elements and sparse grids. In: IEEE Proceedings of IPDPS (2010)Google Scholar
  10. 10.
    Bungartz, H.-J., Pflüger, D., Zimmer, S.: Adaptive sparse grid techniques for data mining. In: Bock, H., Kostina, E., Hoang, X., Rannacher, R. (eds.) Modelling, Simulation and Optimization of Complex Processes, Proceedings of the High Performance Scientific Computing 2006, Hanoi, Vietnam, pp. 121–130. Springer, Berlin, June 2008Google Scholar
  11. 11.
    Deisher M., Smelyanskiy M., Nickerson B., Lee V.W., Chuvelev M., Dubey P.: Designing and dynamically load balancing hybrid LU for multi/many-core. Comput. Sci. 26, 211–220 (2011)Google Scholar
  12. 12.
    Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines. In: Advances in Computational Mathematics, pp. 1–50. MIT Press, Cambridge (2000)Google Scholar
  13. 13.
    Friedman J.H.: Multivariate adaptive regression splines. Ann. Stat. 19(1), 1–67 (1991)zbMATHCrossRefGoogle Scholar
  14. 14.
    Ganapathysubramanian B., Zabaras N.: Sparse grid collocation schemes for stochastic natural convection problems. J. Comput. Phys. 225(1), 652–685 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Garcke, J.: Regression with the optimised combination technique. In: ICML ’06: Proceedings of the 23rd International Conference on Machine Learning, pp. 321–328. ACM Press, New York, NY, USA (2006)Google Scholar
  16. 16.
    Garcke, J.: A dimension adaptive sparse grid combination technique for machine learning. In: Read, W., Larson, J.W., Roberts, A.J. (eds.) Proceedings of the 13th Biennial Computational Techniques and Applications Conference, CTAC-2006, vol. 48 ANZIAM J., pp. C725–C740 (2007)Google Scholar
  17. 17.
    Garcke J., Griebel M., Thess M.: Data mining with sparse grids. Computing 67(3), 225–253 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Garcke J., Hegland M.: Fitting multidimensional data using gradient penalties and the sparse grid combination technique. Computing 84(1–2), 1–25 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Ghuloum, A., et al.: Array building blocks: a flexible parallel programming model for multi-core and many-core architectures. (2010, Online)Google Scholar
  20. 20.
    Heinecke, A., Klemm, M., Pflüger, D., Bode, A., Bungartz, H.-J.: Extending a highly parallel data mining algorithm to the Intel(R) many integrated core architecture. In: The 4th Workshop on UnConventional High Performance Computing 2011 (UCHPC 2011), Bordeaux, France (2011, accepted)Google Scholar
  21. 21.
    Heinecke, A., Pflüger, D.: Multi- and many-core data mining with adaptive sparse grids. In: Proceedings of the 2011 ACM International Conference on Computing Frontiers (2011, accepted)Google Scholar
  22. 22.
    Holtz, M.: Sparse Grid Quadrature in High Dimensions with Applications in Finance and Insurance. Dissertation, Institut für Numerische Simulation, Universität Bonn (2008)Google Scholar
  23. 23.
    Humphrey, J.R., Price, D.K., Spagnoli, K.E., Paolini, A.L., Kelmelis, E.J.: CULA: hybrid GPU accelerated linear algebra routines, pp. 770502–770502–7 (2010)Google Scholar
  24. 24.
    Intel. Intel Turbo Boost Technology in Intel Core microarchitecture (Nehalem) based processors (2008, Online)Google Scholar
  25. 25.
    Intel Corporation. Intel 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 3A and 3B, 2011. Document Number 325462-039USGoogle Scholar
  26. 26.
    Klimke A., Nunes R., Wohlmuth B.: Fuzzy arithmetic based on dimension-adaptive sparse grids: a case study of a large-scale finite element model under uncertain parameters. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 14, 561–577 (2006)CrossRefGoogle Scholar
  27. 27.
    Lee A., Yau C., Giles M.B., Doucet A., Holmes C.C.: On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods. J. Comput. Graph. Stat. 19(4), 769–789 (2010)CrossRefGoogle Scholar
  28. 28.
    Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In: ISCA, pp. 451–460 (2010)Google Scholar
  29. 29.
    McIntosh-Smith, S., Irwin, J.: The best of both worlds: delivering aggregated performance for high-performance math libraries in accelerated systems. In: Proceedings of the 2007 International Supercomputing Conference, Dresden, Germany (2009)Google Scholar
  30. 30.
    NVIDIA. Next Generation CUDA Compute Architecture: Fermi (2010)Google Scholar
  31. 31.
    NVIDIA. NVIDIA CUDA C Programming Guide (2011)Google Scholar
  32. 32.
    NVIDIA. OpenCL Best Practices Guide (2011)Google Scholar
  33. 33.
    OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 3.0 (2008)Google Scholar
  34. 34.
    Pflüger, D.: Spatially Adaptive Sparse Grids for High-Dimensional Problems. Verlag Dr. Hut, München (2010)Google Scholar
  35. 35.
    Pflüger D., Peherstorfer B., Bungartz H.-J.: Spatially adaptive sparse grids for high-dimensional data-driven problems. J. Complex. 26(5), 508–522 (2010)zbMATHCrossRefGoogle Scholar
  36. 36.
    Reisinger C., Wittum G.: Efficient hierarchical approximation of high-dimensional option pricing problems. SIAM J. Sci. Comput. 29(1), 440–458 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  37. 37.
    Skaugen, K.: Petascale to Exascale: extending Intel’s HPC commitment, Keynote ISC 2010, Hamburg. In: International Supercomputing Conference (ISC) (2010)Google Scholar
  38. 38.
    Tomov S., Dongarra J., Baboulin M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36, 232–240 (2010)zbMATHCrossRefGoogle Scholar
  39. 39.
    Trancoso, P., Artemiou, A.: Exploring the GPU to accelerate DSS query execution. In Proceedings of the 2008 ACM Conference on Computing Frontiers, Poster Session, pp. 109–110. ACM (2008)Google Scholar
  40. 40.
    Volkov, V., Demmel, J.: Benchmarking GPUs to Tune Dense Linear Algebra. In: Proceedings of the 2008 ACM/IEEE Confernec on Supercomputing, pp. 31:1–31:11 (2008)Google Scholar
  41. 41.
    von Petersdorff T., Schwab C.: Sparse finite element methods for operator equations with stochastic data. Appl. Math. 51(2), 145–180 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  42. 42.
    Wendland H.: Scattered Data Approximation. Cambridge University Press, Cambridge (2005)zbMATHGoogle Scholar
  43. 43.
    Widmer G., Hiptmair R., Schwab C.: Sparse adaptive finite elements for radiative transfer. J. Comput. Phys. 227, 6071–6105 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  44. 44.
    Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, Amsterdam (2011)Google Scholar
  45. 45.
    Zenger, C.: Sparse grids. In: Hackbusch, W. (ed.) Parallel Algorithms for Partial Differential Equations, vol. 31, Notes on Numerical Fluid Mechanics, pp. 241–251. Vieweg (1991)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Technische Universität München, Fakultät für InformatikGarchingGermany

Personalised recommendations