IOMMU: Strategies for Mitigating the IOTLB Bottleneck

  • Nadav Amit
  • Muli Ben-Yehuda
  • Ben-Ami Yassour
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6161)


The input/output memory management unit (IOMMU) was recently introduced into mainstream computer architecture when both Intel and AMD added IOMMUs to their chip-sets. An IOMMU provides memory protection from I/O devices by enabling system software to control which areas of physical memory an I/O device may access. However, this protection incurs additional direct memory access (DMA) overhead due to the required address resolution and validation.

IOMMUs include an input/output translation lookaside buffer (IOTLB) to speed-up address resolution, but still every IOTLB cache-miss causes a substantial increase in DMA latency and performance degradation of DMA-intensive workloads. In this paper we first demonstrate the potential negative impact of IOTLB cache-misses on workload performance. We then propose both system software and hardware enhancements to reduce IOTLB miss rate and accelerate address resolution. These enhancements can lead to a reduction of over 60% in IOTLB miss-rate for common I/O intensive workloads.


Virtual Machine Direct Memory Access Device Driver Memory Region Page Table 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Bellard, F.: QEMU, a fast and portable dynamic translator. In: ATEC 2005: Proceedings of the Annual Conference on USENIX (41–41) (2005)Google Scholar
  3. 3.
    Ben-Yehuda, M., Mason, J., Xenidis, J., Krieger, O., van Doorn, L., Nakajima, J., Mallick, A., Wahlig, E.: Utilizing IOMMUs for virtualization in Linux and Xen. In: OLS 2006: The 2006 Ottawa Linux Symposium, pp. 71–86 (July 2006)Google Scholar
  4. 4.
    Ben-Yehuda, M., Xenidis, J., Ostrowski, M., Rister, K., Bruemmer, A., van Doorn, L.: The price of safety: Evaluating IOMMU performance. In: OLS 2007: The 2007 Ottawa Linux Symposium, pp. 9–20 ( July 2007)Google Scholar
  5. 5.
    Hill, M.D., Kong, S.I., Patterson, D.A., Talluri, M.: Tradeoffs in supporting two page sizes. Tech. rep., Mountain View, CA, USA (1993)Google Scholar
  6. 6.
    Linux 2.6.31:drivers/Documentation/networking/e1000.txtGoogle Scholar
  7. 7.
    Intel: Intel virtualization technology for directed I/O, architecture specification,
  8. 8.
    Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. SIGARCH Comput. Archit. News 18(3a), 364–373 (1990)CrossRefGoogle Scholar
  9. 9.
    Kandiraju, G.B., Sivasubramaniam, A.: Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks. SIGMETRICS Perform. Eval. Rev. 30(1), 129–139 (2002)CrossRefGoogle Scholar
  10. 10.
    Kandiraju, G.B., Sivasubramaniam, A.: Going the distance for TLB prefetching: An application-driven study. In: International Symposium on Computer Architecture, p. 195 (2002)Google Scholar
  11. 11.
    Kivity, A., Kamay, Y., Laor, D., Lublin, U., Liguori, A.: KVM: the Linux Virtual Machine Monitor. In: Proceedings of the Linux Symposium, Ottawa, Ontario (2007),
  12. 12.
  13. 13.
    Miller, D.S., Henderson, R., Jelinek, J.: Linux 2.6.31:Documentation/DMA-mapping.txtGoogle Scholar
  14. 14.
    Moll, L., Shand, M.: Systems performance measurement on PCI pamette. In: Proceedings of the 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines, April 1997, pp. 125–133 (1997)Google Scholar
  15. 15.
    Navarro, J., Iyer, S., Druschel, P., Cox, A.: Practical, transparent operating system support for superpages. In: OSDI 2002: Proceedings of the 5th Symposium on Operating Systems Design and Implementation, pp. 89–104. ACM, New York (2002), Google Scholar
  16. 16.
    Sugerman, J., Venkitachalam, G., Lim, B.H.: Virtualizing I/O devices on VMware workstation’s hosted virtual machine monitor. In: USENIX Annual Technical Conference. USENIX Association, Berkeley (2001), Google Scholar
  17. 17.
    Tomonori, F.: DMA representations sg_table vs. sg_ring IOMMUs and LLDś restrictions. LSF 2008
  18. 18.
    Vaidyanathan, K., Huang, W., Chai, L., Panda, D.K.: Designing efficient asynchronous memory operations using hardware copy engine: A case study with I/OAT. In: Proceedings of 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), March 26-30, pp. 1–8. IEEE, Long Beach (2007)Google Scholar
  19. 19.
    Willmann, P., Rixner, S., Cox, A.L.: Protection strategies for direct access to virtualized I/O devices. In: ATC 2008: USENIX 2008 Annual Technical Conference on Annual Technical Conference, pp. 15–28. USENIX Association, Berkeley (2008)Google Scholar
  20. 20.
    Yassour, B.A., Ben-Yehuda, M., Wasserman, O.: On the DMA mapping problem in direct device assignment. In: SYSTOR 2010: The 3rd Annual Haifa Experimental Systems Conference (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Nadav Amit
    • 1
  • Muli Ben-Yehuda
    • 2
  • Ben-Ami Yassour
    • 2
  1. 1.TechnionIsrael Institute of TechnologyHaifaIsrael
  2. 2.IBM R&D Labs in IsraelHaifaIsrael

Personalised recommendations