Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Evaluating the impacts of hugepage on virtual machines

评估大页对内存虚拟化的影响

Abstract

Modern applications often require a large amount of memory. Conventional 4KB pages lead to large page tables and thus exert high pressure on TLB address translations. This pressure is more prominent in a virtualized system, which adds an additional layer of address translation. Page walks due to TLB misses can result in a significant performance overhead. One effort in reducing this overhead is to use hugepage. Linux kernel has supported transparent hugepage since 2.6.38, which provides an alternate large page size. Generally, hugepage demonstrates better performance on address translations and page table modifications. This paper first analyzes the impact of hugepage on native system, and then, compares the impact of hugepage on different memory virtualization approaches: hardware-assisted paging (HAP), shadow paging, and para-virtualization. We observe that the current implementation of transparent hugepage is inefficient. It cannot exploit the full performance advantage of hugepages. Worse yet, the conservative strategy of transparent hugepage may conflict with existing OS functions, which can lead to performance degradation. So, we propose a new memory allocation strategy, alignment-based hugepage (ABH) that promotes hugepage allocations. We apply ABH to different paging modes in virtualized systems. The results show that the new allocation strategy can significantly reduce TLB misses and up to 90% page walk cycles due to TLB misses and thus improve the performance in real world applications.

创新点

当前环境下, 很多应用需要的内存越来越大。传统的4KB页会导致地址转换开销过大的问题。在虚拟化环境下, 因为需要增加一层额外的地址转化, 这个问题更为明显。一种减少地址转化开销的方法是使用大页。一般来说, 大页相对于普通4K页, 在访问页表和处理缺页中断上有更好的性能。Linux内核自2.6.38开始支持透明大页, 可以在不影响用户程序的前提下, 为程序分配大页, 提升性能。但是透明大页存在缺陷, 使用大页有额外的对齐要求, 当前的实现无法满足。

本文首先分析了Linux和虚拟化环境下内存的性能, 以及透明大页的效果; 发现因为地址对齐的限制, 透明大页在很多情况下, 使用效率不足25%; 提出一种基于对齐的内存管理方案, 提升大页使用比例, 并提升程序性能。

在Linux和几种虚拟化环境下, 评估了新的内存管理方案。实验结果表明: 新的方案, 最多可以减少90%页表访问的开销; 在虚拟化环境中, KVM的影子页表模式有最好的性能; XEN的影子页表模式目前无法使用大页, 但可以通过支持大页获得更好的性能。

This is a preview of subscription content, log in to check access.

References

  1. 1

    Henning J L. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput Architect News, 2006, 34: 1–17

  2. 2

    Bienia C, Kumar S, Singh J P, et al. The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. New York: ACM, 2008. 72–81

  3. 3

    Bhargava R, Serebrin B, Spadini F, et al. Accelerating two-dimensional page walks for virtualized systems. ACM SIGOPS Oper Syst Rev, 2008, 42: 26–35

  4. 4

    Luo T W, Wang X L, Hu J Y, et al. Improving TLB performance by increasing hugepage ratio. In: Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). Washington, DC: IEEE, 2015

  5. 5

    Ganapathy N, Schimmel C. General purpose operating system support for multiple page sizes. In: Proceedings of USENIX Annual Technical Conference. Berkeley: USENIX Association Berkeley, 1998. 8

  6. 6

    Navarro J, Iyer S, Druschel P, et al. Practical, transparent operating system support for superpages. ACM SIGOPS Oper Syst Rev, 2002, 36: 89–104

  7. 7

    Lu H J, Seth R, Doshi K, et al. Using hugetlbfs for mapping application text regions. In: Proceedings of the Linux Symposium, Ottawa, 2006. 2: 75–82

  8. 8

    Romer T H, Ohlrich W H, Karlin A R, et al. Reducing tlb and memory overhead using online superpage promotion. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture. New York: ACM, 1995. 176–187

  9. 9

    Du Y, Zhou M, Childers B R, et al. Supporting superpages in non-contiguous physical memory. In: Proceedings of IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), Burlingame, 2015. 223–234

  10. 10

    Swanson M, Stoller L, Carter J. Increasing TLB reach using super backed by shadow memory. ACM SIGARCH Comput Architect News, 1998, 26: 204–213

  11. 11

    Talluri M, Hill M D. Surpassing the TLB performance of super with less operating system support. ACM SIGPLAN Notices, 1994, 29: 171–182

  12. 12

    Bhattacharjee A. Large-reach memory management unit caches. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. New York: ACM, 2013. 383–394

  13. 13

    Bhattacharjee A, Lustig D, Martonosi M. Shared last-level tlbs for chip multiprocessors. In: Proceedings of IEEE 17th International Symposium on High Performance Computer Architecture (HPCA). Washington, DC: IEEE, 2011. 62–63

  14. 14

    Lustig D, Bhattacharjee A, Martonosi M. TLB improvements for chip multiprocessors: inter-core cooperative prefetchers and shared last-level TLBs. ACM Trans Architect Code Optim, 2013, 10: 2

  15. 15

    Srikantaiah S, Kandemir M. Synergistic tlbs for high performance address translation in chip multiprocessors. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. Washington, DC: IEEE, 2010. 313–324

  16. 16

    Barr T W, Cox A L, Rixner S. Translation caching: skip, don’t walk (the page table). ACM SIGARCH Comput Architect News, 2010, 38: 48–59

  17. 17

    Barr T W, Cox A L, Rixner S. SpecTLB: a mechanism for speculative address translation. In: Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA). New York: ACM, 2011. 307–317

  18. 18

    Papadopoulou M-M, Tong X, Seznec A, et al. Prediction-based superpage-friendly TLB designs. In: Proceedings of IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), Burlingame, 2015. 210–222

  19. 19

    Basu A, Gandhi J, Chang J C, et al. Efficient virtual memory for big memory servers. ACM SIGARCH Comput Architect News, 2013, 41: 237–248

  20. 20

    Karakostas V, Gandhi J, Ayar F, et al. Redundant memory mappings for fast access to large memories. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture. New York: ACM, 2015. 66–78

  21. 21

    Fang Z, Zhang L X, Carter J B, et al. Reevaluating online superpage promotion with hardware support. In: Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA). Washington, DC: IEEE, 2001. 63–72

  22. 22

    Saulsbury A, Dahlgren F, Stenström P. Recency-based TLB preloading. ACM SIGARCH Comput Architect News, 2000, 28: 117–127

  23. 23

    Kandiraju G B, Sivasubramaniam A. Going the distance for TLB prefetching: an application-driven study. ACM SIGARCH Comput Architect News, 2002, 30: 195–206

  24. 24

    Bhattacharjee A, Martonosi M. Characterizing the TLB behavior of emerging parallel workloads on chip multiprocessors. In: Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09). Washington, DC: IEEE, 2009. 29–40

  25. 25

    Bhattacharjee A, Martonosi M. Inter-core cooperative TLB for chip multiprocessors. ACM SIGARCH Comput Architect News, 2010, 38: 359–370

  26. 26

    Adams K, Agesen O. A comparison of software and hardware techniques for x86 virtualization. ACM SIGPLAN Notices, 2006, 41: 2–13

  27. 27

    Bhatia N. Performance evaluation of Intel EPT hardware assist. VMware, Inc, 2009. http://www.vmware.com/techpapers/2009/performance-evaluation-of-intel-ept-hardware-assis-1000.html

  28. 28

    Buell J, Hecht D, Heo J, et al. Methodology for performance analysis of VMware vSphere under Tier-1 applications. VMware Technical J, 2013. 19

  29. 29

    Ahn J, Jin S, Huh J. Revisiting hardware-assisted page walks for virtualized systems. ACM SIGARCH Comput Architect News, 2012, 40: 476–487

  30. 30

    Gandhi J, Basu A, Hill M D, et al. Efficient memory virtualization. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). Washington, DC: IEEE, 2014. 178–189

  31. 31

    Gadre A S, Kabra K, Vasani A, et al. X-xen: huge page support in xen. In: Proceedings of the Linux Symposium, Ottawa, 2011. 7

  32. 32

    Pham B, Vesely J, Loh G H, et al. Using TLB Speculation to Overcome Page Splintering in Virtual Machines. Technical Report DCS-TR-7132015. Rutgers University, 2015

  33. 33

    Wang X L, Zang J R, Wang Z L, et al. Selective hardware/software memory virtualization. ACM SIGPLAN Notices, 2011, 46: 217–226

  34. 34

    Wang X L, Weng L M, Wang Z L, et al. Revisiting memory management on virtualized environments. ACM Trans Architect Optim, 2013, 10: 48

  35. 35

    Chang X T, Franke H, Ge Y, et al. Improving virtualization in the presence of software managed translation lookaside buffers. In: Proceedings of the 40th Annual International Symposium on Computer Architecture. New York: ACM, 2013. 120–129

Download references

Author information

Correspondence to Yingwei Luo.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Luo, T., Hu, J. et al. Evaluating the impacts of hugepage on virtual machines. Sci. China Inf. Sci. 60, 012103 (2017). https://doi.org/10.1007/s11432-015-0764-7

Download citation

Keywords

  • hugepage
  • memory management
  • translation lookaside buffer
  • virtualization
  • performance

关键词

  • 大页
  • 内存管理
  • TLB
  • 虚拟化
  • 性能