New Data Structures to Handle Speculative Parallelization at Runtime

  • Alvaro Estebanez
  • Diego R. Llanos
  • Arturo Gonzalez-Escribano
Article

Abstract

Software-based, thread-level speculation (TLS) is a software technique that optimistically executes in parallel loops whose fully-parallel semantics can not be guaranteed at compile time. Modern TLS libraries allow to handle arbitrary data structures speculatively. This desired feature comes at the high cost of local store and/or remote recovery times: The easier the local store, the harder the remote recovery. Unfortunately, both times are on the critical path of any TLS system. In this paper we propose a solution that performs local store in constant time, while recover values in a time that is in the order of \(T\), being \(T\) the number of threads. As we will see, this solution, together with some additional improvements, makes the difference between slowdowns and noticeable speedups in the speculative parallelization of non-synthetic, pointer-based applications on a real system. Our experimental results show a gain of 3.58\(\times \) to 28\(\times \) with respect to the baseline system, and a relative efficiency of up to, on average, 65 % with respect to a TLS implementation specifically tailored to the benchmarks used.

Keywords

Thread-level speculation Speculative parallelism  Memory improvements 

References

  1. 1.
    Bryant, R.E., O’Hallaron, D.R.: Computer Systems: a programmer’s perspective, 2nd edn. Addison-Wesley Publishing Company, USA (2010)Google Scholar
  2. 2.
    Ceze, L., Tuck, J., Torrellas, J., Cascaval, C.: Bulk disambiguation of speculative threads in multiprocessors. In: Proceedings of the 33rd International Symposium on Computer Architecture, ISCA ’06. IEEE Computer Society, Washington, DC, USA (2006)Google Scholar
  3. 3.
    Cintra, M., Llanos, D.R.: Toward efficient and robust software speculative parallelization on multiprocessors. In: Proceedings of the SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) (2003)Google Scholar
  4. 4.
    Cintra, M., Llanos, D.R.: Design space exploration of a software speculative parallelization scheme. IEEE Trans. Parallel Distrib. Syst. 16(6), 562–576 (2005)CrossRefGoogle Scholar
  5. 5.
    Cintra, M., Martínez, J.F., Torrellas, J.: Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In: Proceedings of the 27th International Symposium on Computer architecture (ISCA), pp. 256–264 (2000)Google Scholar
  6. 6.
    Clarkson, K.L., Mehlhorn, K., Seidel, R.: Four results on randomized incremental constructions. Comput. Geom. Theory Appl. 3(4), 185–212 (1993)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Dai, W., An, H., Li, Q., Li, G., Deng, B., Wu, S., Li, X., Liu, Y.: A priority-aware NoC to reduce squashes in thread level speculation for chip multiprocessors. In: Proceedings of the 2011 IEEE 9th International Symposium on Parallel and Distributed Processing with Applications, ISPA ’11. IEEE Computer Society, Washington, DC, USA (2011)Google Scholar
  8. 8.
    Dou, J., Cintra, M.: Compiler estimation of load imbalance overhead in speculative parallelization. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT ’04. IEEE Computer Society, Washington, DC, USA (2004)Google Scholar
  9. 9.
    Estebanez, A., Llanos, D.R., Gonzalez-Escribano, A.: Desarrollo de un motor de paralelización especulativa con soporte para aritmética de punteros. In: Proceedings of the XXIII Jornadas de Paralelismo. Elche, Alicante, Spain (2012)Google Scholar
  10. 10.
    Gao, L., Li, L., Xue, J., Yew, P.C.: SEED: a statically-greedy and dynamically-adaptive approach for speculative loop execution. IEEE Trans. Comput. 62(5), 1004–1016 (2013)Google Scholar
  11. 11.
    Gupta, M., Nim, R.: Techniques for speculative run-time parallelization of loops. Supercomputing (1998)Google Scholar
  12. 12.
    Hammond, L., Hubbert, B.A., Siu, M., Prabhu, M.K., Chen, M., Olukotun, K.: The stanford hydra CMP. IEEE Micro 20(2), 71–84 (2000)CrossRefGoogle Scholar
  13. 13.
    Harris, T., Plesko, M., Shinnar, A., Tarditi, D.: Optimizing memory transactions. In: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’06, pp. 14–25. ACM, New York, NY, USA (2006)Google Scholar
  14. 14.
    Jimborean, A., Clauss, P., Dollinger, J.F., Loechner, V., Martinez Caamao, J.: Dynamic and speculative polyhedral parallelization using compiler-generated skeletons. Int. J. Parallel Program. 1–17 (2013)Google Scholar
  15. 15.
    Kelsey, K., Bai, T., Ding, C., Zhang, C.: Fast track: a software system for speculative program optimization. In: Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’09, pp. 157–168. IEEE Computer Society, Washington, DC, USA (2009)Google Scholar
  16. 16.
    Krishnan, V., Torrellas, J.: A chip-multiprocessor architecture with speculative multithreading. Comput. IEEE Trans. 48(9), 866–880 (1999)CrossRefGoogle Scholar
  17. 17.
    Kulkarni, M., Burtscher, M., Inkulu, R., Pingali, K., Casçaval, C.: How much parallelism is there in irregular applications? In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’09. New York, USA (2009)Google Scholar
  18. 18.
    Kulkarni, M., Carribault, P., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P.: Scheduling strategies for optimistic parallel execution of irregular programs. In: Proceedings of the 20th Annual Symposium on Parallelism in Algorithms and Architectures, SPAA ’08, pp. 217–228. ACM, New York, NY, USA (2008)Google Scholar
  19. 19.
    Kulkarni, M., Nguyen, D., Prountzos, D., Sui, X., Pingali, K.: Exploiting the commutativity lattice. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11. ACM, New York, NY, USA (2011)Google Scholar
  20. 20.
    Kulkarni, M., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P.: Optimistic parallelism benefits from data partitioning. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIII, pp. 233–243. ACM, New York, NY, USA (2008)Google Scholar
  21. 21.
    Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, L.P.: Optimistic parallelism requires abstractions. In: PLDI 2007 Proceedings. ACM (2007)Google Scholar
  22. 22.
    Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, L.P.: Optimistic parallelism requires abstractions. Commun. ACM 52(9), 89–97 (2009)CrossRefGoogle Scholar
  23. 23.
    Lee, D., Schachter, B.: Two algorithms for constructing a delaunay triangulation. Int. J. Comput. Inf. Sci. 9(3), 219–242 (1980). doi:10.1007/BF00977785 MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Marcuello, P., Gonzalez, A., Tubella, J.: Speculative multithreaded processors. In: Proceedings of the 12th International Conference on Supercomputing, ICS ’98. ACM, New York, USA (1998)Google Scholar
  25. 25.
    Mehrara, M., Hao, J., Hsu, P.C., Mahlke, S.: Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In: Proceedings of the 2009 Conference on Program Language Design and Implementation, PLDI ’09. NY, USA (2009)Google Scholar
  26. 26.
    Méndez-Lojo, M., Nguyen, D., Prountzos, D., Sui, X., Hassaan, M.A., Kulkarni, M., Burtscher, M., Pingali, K.: Structure-driven optimizations for amorphous data-parallel programs. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’10, pp. 3–14. ACM, New York, USA (2010)Google Scholar
  27. 27.
    Oancea, C.E., Mycroft, A., Harris, T.: A lightweight in-place implementation for software thread-level speculation. In: Proceedings of the Twenty-First Annual Symposium on Parallelism in Algorithms and Architectures, SPAA ’09. ACM, New York, USA (2009)Google Scholar
  28. 28.
    Prabhu, M.K., Olukotun, K.: Using thread-level speculation to simplify manual parallelization. In: Proceedings of the Ninth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’03. ACM, New York, NY, USA (2003)Google Scholar
  29. 29.
    Raman, E., Vahharajani, N., Rangan, R., August, D.I.: Spice: speculative parallel iteration chunk execution. In: Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’08. ACM, New York, USA (2008)Google Scholar
  30. 30.
    Rauchwerger, L., Padua, D.: The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In: Proceedings of the 1995 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’95, pp. 218–232. ACM, New York, NY, USA (1995)Google Scholar
  31. 31.
    Sankaralingam, K., Nagarajan, R., Liu, H., Kim, C., Huh, J., Burger, D., Keckler, S., Moore, C.: Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In: Proceedings of the 30th Annual International Symposium on Computer Architecture, ISCA ’03 (2003)Google Scholar
  32. 32.
    Steffan, J.G., Colohan, C.B., Zhai, A., Mowry, T.C.: A scalable approach to thread-level speculation. In: Proceedings of the 27th Annual International Symposium on Computer architecture ISCA ’00, pp. 1–12. ACM, New York, NY, USA (2000)Google Scholar
  33. 33.
    Tian, C., Feng, M., Gupta, R.: Supporting speculative parallelization in the presence of dynamic data structures. In: Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’10. ACM, New York, NY, USA (2010)Google Scholar
  34. 34.
    Tian, C., Feng, M., Nagarajan, V., Gupta, R.: Copy or discard execution model for speculative parallelization on multicores. In: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’41. Washington, DC, USA (2008)Google Scholar
  35. 35.
    Tian, C., Feng, M., Nagarajan, V., Gupta, R.: Speculative parallelization of sequential loops on multicores. Int. J. Parallel Program. 37(5), 508–535 (2009)CrossRefMATHGoogle Scholar
  36. 36.
    Yiapanis, P., Rosas-Ham, D., Brown, G., Luján, M.: Optimizing software runtime systems for speculative parallelization. ACM Trans. Archit. Code Optim. 9(4), 39:1–39:27 (2013)Google Scholar
  37. 37.
    Zhao, Z., Wu, B., Shen, X.: Speculative parallelization needs rigor: probabilistic analysis for optimal speculation of finite-state machine applications. In: Proceedings 21st International Conference on Parallel Architectures and Compilation Techniques, PACT ’12. New York, USA (2012)Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Alvaro Estebanez
    • 1
  • Diego R. Llanos
    • 1
  • Arturo Gonzalez-Escribano
    • 1
  1. 1.Departamento de InformaticaUniversidad de ValladolidValladolidSpain

Personalised recommendations