Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications

Góes, Luís Fabrício Wanderley; Ribeiro, Christiane Pousa; Castro, Márcio; Méhaut, Jean-François; Cole, Murray; Cintra, Marcelo

doi:10.1007/s10766-013-0253-x

Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications

Published: 31 May 2013

Volume 42, pages 365–382, (2014)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Luís Fabrício Wanderley Góes¹,
Christiane Pousa Ribeiro²,
Márcio Castro²,
Jean-François Méhaut²,
Murray Cole³ &
…
Marcelo Cintra³

327 Accesses
4 Citations
Explore all metrics

Abstract

Memory affinity has become a key element to achieve scalable performance on multi-core platforms. Mechanisms such as thread scheduling, page allocation and cache prefetching are commonly employed to enhance memory affinity which keeps data close to the cores that access it. In particular, software transactional memory (STM) applications exhibit irregular memory access behavior that makes harder to determine which and when data will be needed by each core. Additionally, existing STM runtime systems are decoupled from issues such as thread and memory management. In this paper, we thus propose a skeleton-driven mechanism to improve memory affinity on STM applications that fit the worklist pattern employing a two-level approach. First, it addresses memory affinity in the DRAM level by automatic selecting page allocation policies. Then it employs data prefetching helper threads to improve affinity in the cache level. It relies on a skeleton framework to exploit the application pattern in order to provide automatic memory page allocation and cache prefetching. Our experimental results on the STAMP benchmark suite show that our proposed mechanism can achieve performance improvements of up to 46 %, with an average of 11 %, over a baseline version on two NUMA multi-core machines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sharing-Aware Data Mapping in Software Transactional Memory

Pattern-Aware Staging for Hybrid Memory Systems

HSCS: a hybrid shared cache scheduling scheme for multiprogrammed workloads

Article 04 June 2018

Notes

Remote read latency divided by local read latency (obtained from BenchIT).

References

Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: A view of the parallel computing landscape. Commun. ACM 52(10), 56–67 (2009)
Article Google Scholar
Awasthi, M., Nellans, D.W., Sudan, K., Balasubramonian, R., Davis, A.: Handling the problems and opportunities posed by multiple on-chip memory controllers. In: PACT, pp. 319–330. ACM (2010). doi:10.1145/1854273.1854314
Baek, W., Minh, C.C., Trautmann, M., Kozyrakis, C., Olukotun, K.: The openTM transactional application programming interface. In: PACT 2007, pp. 376–387. IEEE Computer Society (2007)
Broquedis, F., Aumage, O., Goglin, B., Thibault, S., Wacrenier, P.A., Namyst, R.: Structuring the execution of openMP applications for multicore architectures. In: IPDPS, pp. 1–10. IEEE Computer Society (2010)
Broquedis, F., Clet Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: A generic framework for managing hardware affinities in HPC applications. In: PDP, pp. 180–186. IEEE Computer Society (2010)
Castro, M., Góes, L.F.W., Fernandes, L.G., Méhaut, J.F.: Dynamic thread mapping based on machine learning for transactional memory applications. In: Euro-Par, pp. 465–476 (2012)
Castro, M., Góes, L.F.W., Ribeiro, C.P., Cole, M., Cintra, M., Méhaut, J.F.: A machine learning-based approach for thread mapping on transactional memory applications. In: HiPC, pp. 1–10 (2011)
Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press & Pitman, London (1989)
MATH Google Scholar
Collins, J.D., Wang, H., Tullsen, D.M., Hughes, C., Lee, Y.F., Lavery, D., Shen, J.P.: Speculative Precomputation: Long-Range Prefetching of Delinquent Loads. In: ISCA, pp. 14–25. ACM (2001)
Dalessandro, L., Dice, D., Scott, M., Shavit, N., Spear, M.: Transactional mutex locks. In: Euro-Par, pp. 2–13. Springer (2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, pp. 137–150. USENIX Association (2004)
Diener, M., Madruga, F., Rodrigues, E., Alves, M., Schneider, J., Navaux, P., Heiss, H.U.: Evaluating thread placement based on memory access patterns for multi-core processors. In: HPCC, pp. 491–496. IEEE Computer Society (2010)
Felber, P., Fetzer, C., Riegel, T.: Dynamic Performance tuning of word-based software transactional memory. In: PPoPP, pp. 237–246. ACM (2008). doi:10.1145/1345206.1345241
Felber, P., Fetzer, C., Riegel, T., Sturzrehm, H.: Transactifying applications using an open compiler framework. In: TRANSACT. ACM (2007)
Garner, B.D., Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14, 189–204 (2000)
Article Google Scholar
Góes, L.F.W.: Automatic skeleton-driven performance optimizations for transactional memory. Ph.D. thesis, School of Informatics, University of Edinburgh, UK (2012)
Goes, L.F.W., Ioannou, N., Xekalakis, P., Cole, M., Cintra, M.: Autotuning skeleton-driven optimizations for transactional worklist applications. IEEE Trans. Parallel Distrib. Syst. 23(12), 2205–2218 (2012)
Article Google Scholar
Hong, S., Narayanan, S.H.K., Kandemir, M., Özturk, O.: Process variation aware thread mapping for chip multiprocessors. In: DATE, pp. 821–826. European Design and Automation Association (2009)
Kleen, A.: A NUMA API for Linux. Tech. Rep. Novell-4621437 (2005)
Larus, J., Rajwar, R.: Transactional Memory. Morgan & Claypool Publishers (2006)
McCool, M.: Structured parallel programming with deterministic patterns. In: HotPar, pp. 25–30. USENIX Association (2010)
Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: STAMP: Stanford transactional applications for multi-processing. In: IISWC, pp. 35–46. IEEE Computer Society (2008)
Nikas, K., Anastopoulos, N., Goumas, G., Koziris, N.: Employing transactional memory and helper threads to speedup Dijkstra’s algorithm. In: ICPP, pp. 388–395. IEEE Computer Society (2009)
Pousa Ribeiro, C., Castro, M., Carissimi, A., Méhaut, J.F.: Improving memory affinity of geophysics applications on NUMA platforms using Minas. In: VECPAR. Springer (2010)
Song, Y., Kalogeropulos, S., Tirumalai, P.: Design and implementation of a compiler framework for helper threading on multicore processors. In: PACT, pp. 99–109. IEEE Computer Society (2005)

Download references

Author information

Authors and Affiliations

PPGEE, GSDC Group, Pontifícia Universidade Católica de Minas Gerais, Belo Horizonte, Brazil
Luís Fabrício Wanderley Góes
INRIA, CEA, LIG Laboratory, Grenoble University, Grenoble, France
Christiane Pousa Ribeiro, Márcio Castro & Jean-François Méhaut
School of Informatics, ICSA, CARD Group, University of Edinburgh, Edinburgh, UK
Murray Cole & Marcelo Cintra

Authors

Luís Fabrício Wanderley Góes
View author publications
You can also search for this author in PubMed Google Scholar
Christiane Pousa Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Márcio Castro
View author publications
You can also search for this author in PubMed Google Scholar
Jean-François Méhaut
View author publications
You can also search for this author in PubMed Google Scholar
Murray Cole
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Cintra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luís Fabrício Wanderley Góes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Góes, L.F.W., Ribeiro, C.P., Castro, M. et al. Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications. Int J Parallel Prog 42, 365–382 (2014). https://doi.org/10.1007/s10766-013-0253-x

Download citation

Received: 25 January 2013
Accepted: 21 May 2013
Published: 31 May 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s10766-013-0253-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications

Abstract

Access this article

Similar content being viewed by others

Sharing-Aware Data Mapping in Software Transactional Memory

Pattern-Aware Staging for Hybrid Memory Systems

HSCS: a hybrid shared cache scheduling scheme for multiprogrammed workloads

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications

Abstract

Access this article

Similar content being viewed by others

Sharing-Aware Data Mapping in Software Transactional Memory

Pattern-Aware Staging for Hybrid Memory Systems

HSCS: a hybrid shared cache scheduling scheme for multiprogrammed workloads

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation