Empirical Evaluation of the Parallel Distribution Sweeping Framework on Multicore Architectures

  • Deepak Ajwani
  • Nodari Sitchinava
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8125)


In this paper, we perform an empirical evaluation of the Parallel External Memory (PEM) model in the context of geometric problems. In particular, we implement the parallel distribution sweeping framework of Ajwani, Sitchinava and Zeh to solve batched 1-dimensional stabbing max problem. While modern processors consist of sophisticated memory systems (multiple levels of caches, set associativity, TLB, prefetching), we empirically show that algorithms designed in simple models, that focus on minimizing the I/O transfers between shared memory and single level cache, can lead to efficient software on current multicore architectures. Our implementation exhibits significantly fewer accesses to slow DRAM and, therefore, outperforms traditional approaches based on plane sweep and two-way divide and conquer.


Shared Memory Query Point Recursive Call Cache Line Multicore Architecture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Communications of the ACM 31(9), 1116–1127 (1988)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Ajwani, D., Sitchinava, N.: Empirical evaluation of the parallel distribution sweeping framework on multicore architectures. CoRR abs/1306.4521 (2013)Google Scholar
  3. 3.
    Ajwani, D., Sitchinava, N., Zeh, N.: Geometric algorithms for private-cache chip multiprocessors. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part II. LNCS, vol. 6347, pp. 75–86. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Ajwani, D., Sitchinava, N., Zeh, N.: I/O-optimal distribution sweeping on private-cache chip multiprocessors. In: IPDPS, pp. 1114–1123 (2011)Google Scholar
  5. 5.
    Arge, L., Goodrich, M.T., Nelson, M.J., Sitchinava, N.: Fundamental parallel algorithms for private-cache chip multiprocessors. In: SPAA, pp. 197–206 (2008)Google Scholar
  6. 6.
    Bender, M.A., Farach-Colton, M., Fineman, J.T., Fogel, Y.R., Kuszmaul, B.C., Nelson, J.: Cache-oblivious streaming B-trees. In: SPAA, pp. 81–92 (2007)Google Scholar
  7. 7.
    Bentley, J.L., Ottmann, T.A.: Algorithms for reporting and counting geometric intersections. IEEE Transactions on Computers 28(9), 643–647 (1979)zbMATHCrossRefGoogle Scholar
  8. 8.
    Blelloch, G.E.: Prefix sums and their applications. In: Reif, J.H. (ed.) Synthesis of Parallel Algorithms, pp. 35–60. Morgan Kaufmann Publishers (1993)Google Scholar
  9. 9.
    Blelloch, G.E., Chowdhury, R.A., Gibbons, P.B., Ramachandran, V., Chen, S., Kozuch, M.: Provably good multicore cache performance for divide-and-conquer algorithms. In: SODA, pp. 501–510 (2008)Google Scholar
  10. 10.
    Blelloch, G.E., Fineman, J.T., Gibbons, P.B., Simhadri, H.V.: Scheduling irregular parallel computations on hierarchical caches. In: SPAA, pp. 355–366. ACM (2011)Google Scholar
  11. 11.
    Brodal, G.S., Fagerberg, R., Vinther, K.: Engineering a cache-oblivious sorting algorithm. ACM Journal of Experimental Algorithmics 12 (2007)Google Scholar
  12. 12.
    Chowdhury, R.A., Ramachandran, V.: The cache-oblivious gaussian elimination paradigm: Theoretical framework, parallelization and experimental evaluation. In: SPAA, pp. 71–80 (2007)Google Scholar
  13. 13.
    Chowdhury, R.A., Ramachandran, V.: Cache-efficient dynamic programming for multicores. In: SPAA, pp. 207–216 (2008)Google Scholar
  14. 14.
    Goodrich, M.T., Tsay, J.J., Vengroff, D.E., Vitter, J.S.: External-memory computational geometry. In: FOCS, pp. 714–723 (1993)Google Scholar
  15. 15.
    Kang, S., Ediger, D., Bader, D.A.: Algorithm engineering challenges in multicore and manycore systems. IT - Information Technology 53(6), 266–273 (2011)CrossRefGoogle Scholar
  16. 16.
    Mehlhorn, K., Sanders, P.: Scanning multiple sequences via cache memory. Algorithmica 35, 75–93 (2003), 10.1007/s00453-002-0993-2Google Scholar
  17. 17.
    Shamos, M.I., Hoey, D.: Geometric intersection problems. In: FOCS, pp. 208–215. IEEE Computer Society Press (1976)Google Scholar
  18. 18.
    Singler, J., Sanders, P., Putze, F.: MCSTL: The multi-core standard template library. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 682–694. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  19. 19.
    Sitchinava, N., Zeh, N.: A parallel buffer tree. In: SPAA, pp. 214–223 (2012)Google Scholar
  20. 20.
    Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.K., Leiserson, C.E.: The Pochoir stencil compiler. In: SPAA, pp. 117–128 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Deepak Ajwani
    • 1
  • Nodari Sitchinava
    • 2
    • 3
  1. 1.Bell Laboratories IrelandDublinIreland
  2. 2.Karlsruhe Institute of TechnologyKarlsruheGermany
  3. 3.University of Hawaii, ManoaUSA

Personalised recommendations