A flexible and dynamic page migration infrastructure based on hardware counters
Performance counters, also known as hardware counters, are a powerful monitoring mechanism included in the Performance Monitoring Unit (PMU) of most of the modern microprocessors. Their use is gaining popularity as an analysis and validation tool for profiling, since their impact is virtually imperceptible and their precision has noticeably increased thanks to the new Precise Event-Based Sampling (PEBS) features.
In this paper, we present and evaluate a novel user-level tool, based on hardware counters, for monitoring and migrating pages dynamically. This tool supports different migration strategies, being able to attach and monitor a target application without need to modify it whatsoever. The page migration process is performed timely and its overhead is overcome by the benefit of the data locality achieved.
As a case study, an access-based migration algorithm was implemented and integrated into our tool. Performance results on a NUMA system show a noticeable reduction of remote accesses and execution time, achieving speedups of up to ∼21 % in a multiprogrammed environment.
KeywordsHardware counters Page migration NUMA
This work has been partially supported by Hewlett-Packard under contract 2008/CE377, by the Ministry of Education and Science of Spain, FEDER funds under contract TIN 2010-17541 and by the Xunta de Galicia (Spain) under contract 2010/28 and project 09TIC002CT. This work is in the frame of the Spanish network CAPAP-H. The authors also wish to thank the supercomputer facilities provided by CESGA.
- 1.Bolosky WJ, Scott ML, Fitzgerald RP, Fowler RJ, Cox AL (1991) NUMA policies and their relation to memory architecture. In: Int conf on architectural support for programming languages and operating systems, pp 212–221 Google Scholar
- 2.Bull JM, Johnson C (2002) Data distribution, migration and replication on a ccNUMA architecture. In: Proceedings of the fourth European workshop on OpenMP Google Scholar
- 3.Eranian S (2005) The Perfmon2 interface specification. Technical report HPL-2004-200R1, HP Labs Google Scholar
- 4.Galicia supercomputing centre (CESGA): http://www.cesga.es
- 5.Goglin B, Furmento N (2009) Enabling high-performance memory migration for multithreaded applications on Linux. In: Proc of the IEEE int symposium on parallel & distributed processing, pp 1–9 Google Scholar
- 6.Hewlett Packard (2006) Dual-core update to the Intel Itanium 2 processor reference manual. Technical paper Google Scholar
- 7.Jin H, Jin H, Frumkin M, Frumkin M, Yan J, Yan J (1999) The OpenMP implementation of NAS parallel benchmarks and its performance. Technical report Google Scholar
- 10.Marathe J, Mueller F (2006) Hardware profile-guided automatic page placement for ccNUMA systems. In: Proc of the ACM SIGPLAN symposium on principles and practice of parallel programming, pp 90–99 Google Scholar
- 11.move_pages manual: http://linux.die.net/man/2/move_pages
- 12.Nikolopoulos DS, Papatheodorou TS, Polychronopoulos CD, Labarta J, Ayguadé E (2000) A case for user-level dynamic page migration. In: Proceedings of the int conf on supercomputing, pp 119–130 Google Scholar
- 13.Nikolopoulos DS, Papatheodorou TS, Polychronopoulos CD, Labarta J, Ayguadé E (2000) User-level dynamic page migration for multiprogrammed shared-memory multiprocessors. In: Proc of the int conf on parallel processing, p 95 Google Scholar
- 15.OpenMP: Simple, portable, scalable SMP programming. http://openmp.org
- 16.Perfmon2 monitoring interface and Pfmon monitoring tool: http://perfmon2.sourceforge.net
- 17.Tao J, Schulz M, Karl W (2002) Improving data locality using dynamic page migration based on memory access histograms. In: Proc of the international conference on computational science—Part II, pp 933–942 Google Scholar
- 18.Thakkar V (2008) Dynamic page migration on ccNUMA platforms guided by hardware tracing. Master’s thesis, Graduate Faculty of North Carolina State University Google Scholar
- 21.Wang X, Wen X, Li Y, Luo Y, Li X, Wang Z (2012) A dynamic cache partitioning mechanism under virtualization environment. In: Proc of the 11th international conf on trust, security and privacy in computing and communications (TrustCom), pp 1907–1911 Google Scholar
- 22.Wilson KM, Aglietti BB (2001) Dynamic page placement to improve locality in CC-NUMA multiprocessors for TPC-C. In: Proceedings of the ACM/IEEE conference on supercomputing, pp 98–107 Google Scholar