The Cost of TLB Consistency
When paged virtual memory is supported as part of the memory hierarchy in a shared-memory multiprocessor system, translation-lookaside buffers (TLBs) are often used to cache copies of virtual-to-physical address translation information. This translation information is also stored in data structures called page tables. Since there can be multiple images of the translation information for a page accessible by processors, the modification of one image can result in inconsistency among the other images stored in TLBs and the page table. This TLB consistency problem can cause a processor to use stale translation information, which may result in incorrect program execution.
TLB consistency-ensuring management carries with it performance overhead. This cost is manifested in the processor time attributable, either explicitly or implicitly, to the adopted solution. Some solutions to this problem have been shown to be effective in small-scale multiprocessor systems but are not likely to be satisfactory for large-scale systems. In the absence of performance data, this paper examines performance costs associated with solutions to the TLB consistency problem and endeavors to delineate those characteristics of solutions that are desirable in terms of performance in large-scale systems.
enlist the participation of a processor only when it will use inconsistent information,
place necessary locks on the smallest possible data entities,
not introduce serialization,
keep extra communication to a minimum, and
have an insignificant impact on network traffic.
Two solutions are described that meet the first four criteria but that may have an impact on network traffic.
Unable to display preview. Download preview PDF.
- Black, D. L., R. F. Rashid, D. B. Golub, C. R. Hill, and R. V. Baron, “Translation lookaside buffer consistency: a software approach,” Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, IEEE Cat. No. 89CH2710-2, pp. 113–122, April 1989.Google Scholar
- Kruskal, C. P., M. Snir, and A. Weiss, “The distribution of waiting times in clocked multistage interconnection networks,” IEEE Transactions on Computers, 37, 11, pp. 1337–1352, November 1988, and Proceedings of the 1986 International Conference on Parallel Processing, IEEE Catalog No. 86CH2355-6, pp. 12-19, August 1986.MathSciNetMATHCrossRefGoogle Scholar
- Ritchie, S. A., “TLB for free: in-cache address translation for a multiprocessor workstation,” Technical Report No. UCB/CSD 85/233, U. C. Berkeley, Computer Science Division, May 1985.Google Scholar
- Tang, C. K., “Cache system design in the tightly coupled multiprocessor system,” Proceedings of NCC, pp. 749–753, 1976.Google Scholar
- Teller, P. J., R. Kenner, and M. Snir, “TLB consistency on highlyparallel shared-memory multiprocessors,” Proceedings of the 21st Hawaii International Conference on System Sciences, IEEE Catalog No. 85TH0209-7, pp. 184–193, January 1988.Google Scholar
- Wood, D. A., et al, “An in-cache address Translation mechanism,” Proceedings of the 13th Annual International Symposium on Computer Architecture, IEEE Catalog No. 86CH2291-3, pp. 358–365, June 1986.Google Scholar