Skip to main content
Log in

The complexity gap in the static analysis of cache accesses grows if procedure calls are added

  • Published:
Formal Methods in System Design Aims and scope Submit manuscript

Abstract

The static analysis of cache accesses consists in correctly predicting which accesses are hits or misses. While there exist good exact and approximate analyses for caches implementing the least recently used (LRU) replacement policy, such analyses were harder to find for other replacement policies. A theoretical explanation was found: for an appropriate setting of analysis over control-flow graphs, cache analysis is PSPACE-complete for all common replacement policies (FIFO, PLRU, NMRU) except for LRU, for which it is only NP-complete. In this paper, we show that if procedure calls are added to the control flow, then the gap widens: analysis remains NP-complete for LRU, but becomes EXPTIME-complete for the three other policies. For this, we improve on earlier results on the complexity of reachability problems on Boolean programs with procedure calls. In addition, for the LRU policy we derive a backtracking algorithm as well as an approach for using it as a last resort after other analyses have failed to conclude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Absint’s aiT is one such tool, used in industries such as avionics, automotive, energy and space. https://www.absint.com/ait/ Non-commercial tools include OTAWA. http://www.otawa.fr/

  2. Static analysis tools may perform more refined analyses, such as persistence analysis, refinements according to execution paths or loop indices, etc. We do not cover these here. Our goal is to study difficulty even in the simplest, most easily understood analysis.

  3. This could be incorrect if we were considering complex microarchitectures with cache prefetching etc., since the availability of data in a cache set may result in loads being made or not made to other cache sets. Again, we consider a simple setting here. Separate analysis may however be used for safe over-approximations of the behavior of the system.

  4. A safe, constant-time, approximate static analysis is to answer “unknown” to any request. In order to study complexity, some form of minimal precision must be imposed. It is unclear what metric should be used for this; thus our choice to require exactness.

  5. For complexity theoretical purposes, we assume that the input is the program to be analyzed, as a set of procedures consisting of explicitly represented control-flow graphs labeled with array accesses, preceded by the associativity \(K\) of the cache written in unary notation.

  6. This is the same definition as Esparza et al. [4] except we keep a word alphabet.

  7. Succinctly described circuits, and the EXPTIME-completeness of their value problem, have long been known Papadimitriou [14, Ch. 20]. We however recall how to establish this result for the sake of completeness and easier understanding of how we turn successive reductions for P-completeness for explicitly described problems into successive reductions for EXPTIME-completeness on succinctly described problems.

References

  1. Al-Zoubi H, Milenkovic A, Milenkovic M (2004) Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite. In: Proceedings of the 42Nd annual southeast regional conference, ACM-SE 42, pp 267–272, New York, ACM. https://doi.org/10.1145/986537.986601

  2. Berg C (2006) PLRU cache domino effects. In: Mueller F (ed) 6th international workshop on worst-case execution time analysis (WCET’06), volume 4 of OpenAccess Series in Informatics (OASIcs), pages 69–71, Dagstuhl, Germany. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. https://doi.org/10.4230/OASIcs.WCET.2006.672

  3. Bouajjani A, Esparza J, Maler O (1997) Reachability analysis of pushdown automata: application to model-checking. In: Mazurkiewicz AW, Winkowski J (eds) CONCUR ’97: Concurrency Theory, 8th International Conference, Warsaw, Poland, July 1-4, Proceedings, volume 1243 of Lecture Notes in Computer Science, pp 135–150. Springer, Berlin https://doi.org/10.1007/3-540-63141-0_10

  4. Esparza J, Hansel D, Rossmanith P, Schwoon S (2000) Efficient algorithms for model checking pushdown systems. In: Emerson EA, Sistla AP (eds) Computer aided verification, 12th international conference, CAV 2000, Chicago, IL, USA, July 15–19, Proceedings. volume 1855 of Lecture Notes in Computer Science, pp 232–247. Springer https://doi.org/10.1007/10722167_20

  5. Ferdinand C, Wilhelm R (1999) Efficient and precise cache behavior prediction for real-time systems. Real-Time Syst 17(2–3):131–181. https://doi.org/10.1023/A:1008186323068

    Article  Google Scholar 

  6. Hana Galperin, Avi Wigderson (1983) Succinct representations of graphs. Inf. Control 56(3):183–198. https://doi.org/10.1016/S0019-9958(83)80004-7

    Article  MathSciNet  MATH  Google Scholar 

  7. Godefroid P, Yannakakis M (2013) Analysis of Boolean programs. In: Piterman N, Smolka SA (eds) Tools and algorithms for the construction and analysis of systems (TACAS), volume 7795 of Lecture Notes in Computer Science, pp 214–229. Springer, Berlin https://doi.org/10.1007/978-3-642-36742-7_16

  8. Goldschlager LM (1977) The monotone and planar circuit value problems are log space complete for P. SIGACT News 9(2):25–29. https://doi.org/10.1145/1008354.1008356

    Article  MATH  Google Scholar 

  9. Raymond GH, Hoover J, Ruzzo WL (1995) Limits to parallel computation: P-completeness theory. Oxford University Press. https://homes.cs.washington.edu/~ruzzo/papers/limits.pdf

  10. Reinhold Heckmann, Marc Langenbach, Stephan Thesing, Reinhard Wilhelm (2003) The influence of processor architecture on the design and the results of WCET tools. Proc IEEE 91(7):1038–1054. https://doi.org/10.1109/JPROC.2003.814618

    Article  Google Scholar 

  11. Ladner RE (1975) The circuit value problem is log space complete for P. SIGACT News 7(1):18–20. https://doi.org/10.1145/990518.990519

    Article  Google Scholar 

  12. Malamy A, Patel RN, Hayes NM (1994) Methods and apparatus for implementing a pseudo-LRU cache memory replacement scheme with a locking feature. US patent 5,353,425, US Patent Office, October. https://patents.google.com/patent/US5353425

  13. David M, Valentin T (2019) On the complexity of cache analysis for different replacement policies. J ACM 66(6):41:1-41:22. https://doi.org/10.1145/3366018

  14. Papadimitriou CH (1993) Computational complexity. Addison-Wesley, Boston

    MATH  Google Scholar 

  15. Papadimitriou Christos H, Mihalis Yannakakis (1986) A note on succinct representations of graphs. Inf Control 71(3):181–185. https://doi.org/10.1016/S0019-9958(86)80009-2

    Article  MathSciNet  MATH  Google Scholar 

  16. Reineke J (2008) Caches in WCET analysis: predictability, competitiveness, sensitivity. PhD thesis, Universität des Saarlandes. http://www.rw.cdl.uni-saarland.de/~reineke/publications/DissertationCachesInWCETAnalysis.pdf

  17. Valentin Touzeau, Claire Maïza, David Monniaux, Jan Reineke (2017) Ascertaining uncertainty for efficient exact cache analysis. In: Viktor Kuncak, Rupak Majumdar (eds) Comput Aided Verif (CAV). Springer, Berlin

    Google Scholar 

  18. Touzeau V, Maiza C, Monniaux D, Reineke J (2019) Fast and exact analysis for lru caches. Proc ACM Program Lang 3:54:1-54:29. https://doi.org/10.1145/3290367

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Monniaux.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Alternative proof for EXPTIME-completeness of reachability in Boolean programs

Appendix A: Alternative proof for EXPTIME-completeness of reachability in Boolean programs

Godefroid and Yannakakis [7] claim EXPTIME-hardness for reachability in Boolean programs (Theorem 4), but they refer the reader to a full version of their article, which is available only by request to the authors. We thus provide, in the next subsections, an independent proof of EXPTIME-hardness for Boolean programs.

Note that EXPTIME membership is easily established. A Boolean register machine with procedure calls may be expanded into an equivalent pushdown system, at the cost of exponential blowup: just consider one control location in the pushdown automaton for each control location in the Boolean register machine and each of the (exponentially many) vector of values of the registers; then apply Theorem 1.

1.1 A.1: Succinctly represented problems

We have seen how a reachability problem involving Boolean registers can be expanded into a reachability problem not involving registers, that is, a reachability problem in an oriented graph at the cost of exponential blowup. This is an instance of a more general pattern relating the complexity of problems when they are represented as explicit lists of transitions versus “implicit” list of transitions, for instance involving registers, in the same way that a small Boolean formula is a succinct representation for a much larger explicit truth table.

Galperin and Wigderson [6] studied the complexity of various problems on graphs when these graphs are succinctly represented, by which they mean that graph vertices are labeled by a vector of bits, and the adjacency relation is defined by a Boolean circuit taking as inputs two vectors of bits and answering one bit: whether the vertices labeled by these two vectors are connected. Papadimitriou and Yannakakis [15] generalized their results: a NP-complete problem (respectively, P-complete; NLOGSPACE-complete) problem on explicitly represented graphs, under some fairly permissive condition on the reduction used for showing this completeness property, becomes NEXPTIME-complete (respectively, EXPTIME-complete; PSPACE-complete) on succinctly represented graphs. A well-known example of this phenomenon is the reachability problem: given two vertices in a directed graph, say whether one is reachable from another—it is NLOGSPACE-complete on explicitly represented graphs, and becomes PSPACE-complete on succinctly represented graphs, where it is also known as the reachability problem in implicit-state model checking.

The reachability problem for explicitly represented pushdown systems, which are very close to Boolean register machines with procedure calls but no registers, is known to be P-complete. We can thus hope that it becomes EXPTIME-complete for succinctly represented pushdown systems; however we cannot use Papadimitriou and Yannakakis’ results because they pertain solely to graph problems. We can however follow the same general approach as their hardness proof: analyze the reduction from the acceptance problem for polynomial-time Turing machines to the problem for explicitly represented pushdown systems, which are close to Boolean programs without registers, and construct a reduction from the acceptance problem for exponential-time Turing machines to the problem for succinctly represented pushdown systems, which are close to Boolean programs with registers.

It takes four reduction steps to show that the reachability problem for Boolean register machines with procedure calls and 0 registers is P-hard: (i) from the acceptance problem for polynomial-time Turing machines to the circuit value problem (CVP) Greenlaw et al. [9, 4.2] (ii) from the CVP to the monotone circuit value problem Greenlaw et al. [9, A.1.3] (iii) from the monotone CVP to the emptiness problem for context-free grammars Greenlaw et al. [9, A.7.2] (iv) from emptiness in context-free grammars to reachability in Boolean programs with local variables.

1.2 A.2: Reductions for explicit descriptions

The circuit value problem (CVP) is: given a Boolean circuit, using logical gates \(\wedge \), \(\vee \), \(\lnot \), with known inputs, compute its output. The first reduction step [11] [9, Th. 4.2.2] encodes the bounded deterministic execution of a Turing machine into a circuit in much the same way that one encodes the bounded nondeterministic execution of a Turing machine into a Boolean satisfiability problem: the value \(c_{i,j}\) of each cell at each position j in the tape at each point in time \(i > 0\) is defined as a function of \(c_{i-1,j-1}\), \(c_{i-1,j}\) and \(c_{i+1,j}\), with a different value whether the read/write head is on the cell; then these values \(c_{i,j}\) are encoded into a vector of bits (of size logarithmic in the size of the tape alphabet and the number of control states), and one then obtains a circuit. It then suffices to add initialization for values \(c_{0,j}\) of the cells at time 0, and a test for a reachability condition.

The monotone CVP is: given a Boolean circuit, using logical gates \(\wedge \) and \(\vee \) with known inputs, compute its output. Obviously it is a subset of the general CVP. A general CVP can be encoded into a monotone CVP by using “dual rail encoding” [8] [9, Th. 6.2.2]: each wire b in the original circuit is encoded into two wires \(b_0\) and \(b_1\), where \(b_0\) is 1 if b is 0, 0 if b is 1, and \(b_1\) is 1 if b is 1, 0 if b is 0. It is possible to simulate each \(\wedge \) or \(\vee \) gate of the original circuit by two monotone gates; \(\lnot \) gates map to swapping of two wires.

Let us now encode the monotone CVP into the context-free grammar emptiness problem [9, A.7.2, crediting Martin Tompa]. To each wire \(w_i\) in the circuit one associates a nonterminal \(\nu _i\). If \(w_i\) is initialized to 1, then we add a rule \(\nu _i \rightarrow \varepsilon \) (meaning that \(\nu _i\) accepts the empty word; equivalently one may introduce a nonterminal a and have a rule \(\nu _i \rightarrow a\)). We add no rule if \(w_i\) is initialized to 0. If \(w_i\) is defined as \(w_j \vee w_k\), then we add two rules \(\nu _i \rightarrow \nu _j\) and \(\nu _i \rightarrow \nu _k\). If \(w_i\) is defined as \(w_j \wedge w_k\), then we add a rule \(\nu _i \rightarrow \nu _j \nu _k\). The nonterminal \(\nu _1\) to test for emptiness is the one that corresponds to the output wire of the monotone circuit.

Finally, let us encode the context-free grammar emptiness problem into the reachability problem for a Boolean program without registers. This is the well-known relationship between context-free grammars and procedure calls in structured programs. Each nonterminal in the grammar becomes a procedure. A derivation rule \(L \rightarrow R_1 \dots R_n\) becomes a sequence of calls to th procedures corresponding to nonterminals \(R_1\) to \(R_n\), starting in the initial control location of the procedure associated with nonterminal L and ending in the final location of that procedure.

1.3 A.3: Lifting reductions to implicitly represented problems

In the above reductions, circuits are described as a list of gates. The first reduction step, from Turing machines to CVP, is however highly repetitive: the same construction is applied for all \(i > 0\) and j. We thus use the notion of succinctly described circuit Papadimitriou [14, ch. 20]: wires \(w_i\) are identified by their index i written in binary, and gates in the succinctly represented circuits are introduced by rules of the form \(C(i,j,k): w_i = w_j \wedge w_k\), \(C(i,j,k): w_i = w_j \vee w_k\), \(C(i,j): w_i = \lnot w_j\), where C is a condition over the binary encodings of indices ijk, itself expressed as a Boolean circuit, that constrains for which indices the gate is created. The notion of succinctly described monotone circuit is defined similarly.

The encodings described above for turning a reachability problem on the execution of a polynomially bounded Turing machine into an explicitly described CVP of polynomial size, then into an explicitly described monotone CVP of polynomial size, can be applied to turn a reachability problem on the execution of an exponentially bounded Turing machine into a succinctly described CVP of polynomial size, then into a succinctly described monotone CVP of polynomial size.Footnote 7

We define similarly the notion of a succinctly represented context-free grammar. A succinct rule \(C(i,j,k): \nu _i \rightarrow \nu _j \nu _k\) (for arity 2; other arities are similarly defined), where C is a Boolean circuit over the binary encodings of i, j and k, encodes a family of rules \(\nu _i \rightarrow \nu _j \nu _k\) for all ijk such that C(ijk) returns 1. As with explicitly described monotone CVPs, a succinctly described monotone CVP can be transformed into a succinctly represented context-free grammar emptiness problem.

The variables i, j etc. are binary encodings. For the final reduction to Boolean register machines with procedures, we put these Boolean encodings into the local variables of the Boolean programs.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Monniaux, D. The complexity gap in the static analysis of cache accesses grows if procedure calls are added. Form Methods Syst Des 59, 1–20 (2021). https://doi.org/10.1007/s10703-022-00392-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10703-022-00392-w

Keywords

Navigation