On the Smoothness of Paging Algorithms

Article
  • 25 Downloads

Abstract

We study the smoothness of paging algorithms. How much can the number of page faults increase due to a perturbation of the request sequence? We call a paging algorithm smooth if the maximal increase in page faults is proportional to the number of changes in the request sequence. We also introduce quantitative smoothness notions that measure the smoothness of an algorithm. We derive lower and upper bounds on the smoothness of deterministic and randomized demand-paging and competitive algorithms. Among strongly-competitive deterministic algorithms, LRU matches the lower bound, while FIFO matches the upper bound. Well-known randomized algorithms such as Partition, Equitable, or Mark are shown not to be smooth. We introduce two new randomized algorithms, called Smoothed-LRU and LRU-Random. Smoothed-LRU allows sacrificing competitiveness for smoothness, where the trade-off is controlled by a parameter. LRU-Random is at least as competitive as any deterministic algorithm but smoother.

Keywords

Paging Caching Smoothness Online algorithms Real-time systems 

1 Introduction

Due to their strong influence on system performance, paging algorithms have been studied extensively since the 1960s. Early studies were based on probabilistic request models [2, 6, 19]. In their seminal work, Sleator and Tarjan [26] introduced the notion of competitiveness, which relates the performance of an online algorithm to that of the optimal offline algorithm. By now, the competitiveness of well-known deterministic and randomized paging algorithms is well understood, and various strongly-competitive online algorithms [1, 20, 26] have been identified.

In this paper, we study the smoothness of paging algorithms. We seek to answer the following question: How strongly may the performance of a paging algorithm change when the sequence of memory requests is slightly perturbed? This question is relevant in various domains: Can the cache performance of an algorithm suffer significantly due to the occasional execution of interrupt handling code? Can the execution time of a safety-critical real-time application be safely and tightly bounded in the presence of interference on the cache? Can secret-dependent memory requests have a significant influence on the number of cache misses of a cryptographic protocol and thus give rise to a timing side-channel attack?

We formalize the notion of smoothness by identifying the performance of a paging algorithm with the number of page faults and the magnitude of a perturbation with the edit distance between two request sequences.

We show that for any deterministic, demand-paging or competitive algorithm, a single additional memory request may cause k + 1 additional faults, where k is the size of the cache. Least-recently-used (LRU) matches this lower bound, indicating that there is no trade-off between competitiveness and smoothness for deterministic algorithms. In contrast, First-in first-out (FIFO) is shown to be least smooth among all strongly-competitive deterministic algorithms. Interestingly, our model shows a significant difference between these two algorithms, whose theoretical performance has proven difficult to separate.

Randomized algorithms have been shown to be more competitive than deterministic ones. We derive lower bounds for the smoothness of randomized, demand-paging and randomized strongly-competitive algorithms that indicate that randomization might also help with smoothness. However, we show that none of the well-known randomized algorithms Mark, Equitable, and Partition is smooth. The simple randomized algorithm that evicts one of the cached pages uniformly at random is shown to be as smooth as LRU, but not more.

We then introduce a new parameterized randomized algorithm, Smoothed-LRU, that allows sacrificing competitiveness for smoothness. For some parameter values Smoothed-LRU is smoother than any randomized strongly-competitive algorithm can possibly be, indicating a trade-off between smoothness and competitiveness for randomized algorithms. This leaves the question of whether there is a randomized algorithm that is smoother than any deterministic algorithm without sacrificing competitiveness. We answer this question in the affirmative by introducing LRU-Random, a randomized version of LRU that evicts older pages with a higher probability than younger ones. We show that LRU-Random is smoother than any deterministic algorithm for k = 2. While we conjecture that this is the case as well for general k, this remains an open problem.

The notion of smoothness we present is not meant to be an alternative to competitive analysis for the evaluation of the performance of a paging algorithm; rather, it is a complementary quantitative measure that provides guarantees about the performance of an algorithm under uncertainty of the input. In general, smoothness is useful in both testing and verification:
  • In testing: if a system is smooth, then a successful test run is indicative of the system’s correct behavior not only on the particular test input, but also in its neighborhood.

  • In verification, systems are shown to behave correctly under some assumption on their environment. Due to incomplete environment specifications, operator errors, faulty implementations, or other causes, the environment assumption may not always hold completely. In such a case, if the system is smooth, “small” violations of the environment assumptions will, in the worst case, result in “small” deviations from correct behavior.

An example of the latter case that motivates our present work appears in safety-critical real-time systems, where static analyzes are employed to derive guarantees on the worst-case execution time (WCET) of a program on a particular microarchitecture [28]. While state-of-the-art WCET analyzes are able to derive fairly precise bounds on execution times, they usually only hold for the uninterrupted execution of a single program with no interference from the environment whatsoever. These assumptions are increasingly difficult to satisfy with the adoption of preemptive scheduling or even multi-core architectures, which may introduce interference on shared resources such as caches and busses. Given a smooth cache hierarchy, it is possible to separately analyze the effects of interference on the cache, e.g. due to interrupts, preemptions, or even co-running programs on other cores. Our results may thus inform the design and analysis of microarchitectures for real-time systems [3].
Table 1 summarizes our results. An algorithm A is (α, β, δ)-smooth, if the number of page faults A(σ) of A on request sequence σ is bounded by αA(σ) + β whenever σ can be transformed into σ by at most δ insertions, deletions, or substitutions of individual requests. Often, our results apply to a generic value of δ. In such cases, we express the smoothness of a paging algorithm by a pair (α, β), where α and β are functions of δ, and A is (α(δ), β(δ), δ)-smooth for every δ. Usually, the smoothness of an algorithm depends on the size of the cache, which we denote by k. As an example, under LRU the number of faults may increase by at most δ(k + 1), where δ is the number of changes in the sequence. A precise definition of these notions is given in Section 3.
Table 1

Upper and lower bounds on the smoothness of paging algorithms

 

Algorithm

Lower bound

Upper bound

Deterministic

Demand-paging

(1, δ(k + 1))

\(\infty \)

c-competitive w/additive constant β

(1, δ(k + 1))

(c, 2δc + β)

Strongly-competitive

(1, δ(k + 1))

(k, 2δk + β)

Optimal offline

(1, 2δ)

(1, 2δ)

LRU

(1, δ(k + 1))

(1, δ(k + 1))

FWF

(1, 2δk)

(1, 2δk)

FIFO

(k, γ, 1)

(k, 2δk)

Randomized

Demand-paging

\((1,H_{k}+\frac {1}{k},1)\)

\(\infty \)

Strongly-competitive w/add. const. β

(1, δHk)

(Hk, 2δHk + β)

Equitable, Partition, OnlineMin

(1 + 𝜖, γ, 1)

(Hk, 2δHk)

Mark

(Ω(Hk), γ, 1)

(2Hk − 1, δ(4Hk − 2))

Random

(1, δ(k + 1))

(1, δ(k + 1))

Evict-On-Access

\((1,\delta (1+\frac {k}{2k-1}))\)

\((1,\delta (1+\frac {k}{2k-1}))\)

SmoothedLRUk, i

\((1, \delta \min \{\frac {2k-1}{2i+1}+1, \frac {k+i-1}{2i+1}+2\})\)

\((1, \delta \min \{\frac {2k-1}{2i+1}+1, \frac {k+i-1}{2i+1}+2\})\)

In the table, k is the size of the cache, δ is the distance between input sequences, Hk denotes the kth harmonic number, and γ is an arbitrary constant

The rest of this paper is organized as follows. In Section 2 we briefly describe other notions of smoothness and review the definition of the paging problem and its most relevant results. In Section 3 we formally define the smoothness of paging algorithms. We study the smoothness of deterministic paging algorithms in Section 4, while in Section 5 we analyze the smoothness of randomized algorithms and the trade-off between competitiveness and smoothness. For readability, we place some of the proofs of our results in the Appendix.

2 Related Work

2.1 Notions of Smoothness

Robust control is a branch of control theory that explicitly deals with uncertainty in its approach to controller design. Informally, a controller designed for a particular set of parameters is said to be robust if it would also work well under a slightly different set of assumptions. In computer science, the focus has long been on the binary property of correctness, as well as on average- and worst-case performance. Lately, however, various notions of smoothness have received increasing attention: Chaudhuri et al. [10] develop analysis techniques to determine whether a given program computes a Lipschitz-continuous function. Lipschitz continuity is a special case of our notion of smoothness. Continuity is also strongly related to differential privacy [13], where the result of a query may not depend strongly on the information about any particular individual. Differential privacy proofs with respect to cache side channels [11] may be achievable in a compositional manner for caches employing smooth paging algorithms.

Doyen et al. [12] consider the robustness of sequential circuits. They determine how long into the future a single disturbance in the inputs of a sequential circuit may affect the circuit’s outputs. Much earlier, but in a similar vein, Kleene [15], Perles, Rabin, Shamir [22], and Liu [18] developed the theory of definite events and definite automata. The outputs of a definite automaton are determined by a fixed-length suffix of its inputs. Definiteness is a sufficient condition for smoothness.

Beckmann and Sanchez [5] are concerned with cliffs in cache performance, where minor changes in cache size cause large changes in miss rate. They propose Talus, a scheme to remove such cliffs. The basic idea is to divide the access stream into two streams, which are served by two separate shadow partitions. By carefully controlling the division of the access stream, and the sizes of the two shadow partitions, Talus is able to achieve convex cache performance, i.e., the cache miss rate in terms of the cache size is a convex function. Our work is concerned with changes in the access stream rather than changes in the cache size.

The work of Reineke and Grund [24] is closest to ours: they study the maximal difference in the number of page faults on the same request sequence starting from two different initial states for various deterministic paging algorithms. In contrast, here, we study the effect of differences in the request sequences on the number of faults. Also, in addition to studying particular deterministic algorithms as in [24], in this paper we determine smoothness properties that apply to classes of algorithms, such as all demand-paging or strongly-competitive ones, as well as to randomized algorithms. One motivation to consider randomized algorithms in this work are recent efforts to employ randomized caches in the context of hard real-time systems [9].

Spielman and Teng [27] introduce the notion of smoothed analysis. In smoothed analysis, one measures the performance of an algorithm on arbitrary inputs averaged over slight random perturbations of the input. Smoothed analysis thus provides a compromise between purely worst-case analysis on the one hand and average-case analysis on the other hand, which sometimes better captures the real-world performance of algorithms. Some algorithms, such as the simplex algorithm, can be shown to have polynomial smoothed complexity while their worst-case complexity is exponential. Becchetti et al. [4] apply the idea of smoothed analysis to online algorithms by defining the notion of smoothed competitive analysis. Then they analyze the multi-level feedback algorithm in this setting. This work is different from ours in that smoothness is concerned with the worst-case effect of a small perturbation on an arbitrary input, while smoothed analysis is concerned with the worst-case performance of an algorithm on arbitrary inputs averaged over slight random perturbations. The two notions might be connected in the following way: the smoothed complexity of an algorithm can only be better than its worst-case complexity if the algorithm is not smooth. It is future work to capture the above claim more formally and to prove it. We also leave the definition of a smoothed notion of smoothness for future work.

2.2 The Paging Problem

Paging models a two-level memory system with a small fast memory known as cache and a large but slow memory, usually referred to simply as memory. During a program’s execution, data is transferred between the cache and memory in units of data known as pages. The size of the cache in pages is usually referred to as k. The size of the memory can be assumed to be infinite. The input to the paging problem is a sequence of page requests which must be made available in the cache as they arrive. When a request for a page arrives and this page is already in the cache, then no action is required. This is known as a hit. Otherwise, the page must be brought from memory to the cache, possibly requiring the eviction of another page from the cache. This is known as a page fault or miss. We use the terms fault, page fault, and miss interchangeably throughout the paper. A paging algorithm must decide which pages to keep in the cache in order to minimize the number of faults.

A paging algorithm is said to be demand paging if it only evicts a page from the cache upon a fault with a full cache. Any non-demand-paging algorithm can be made to be demand paging without sacrificing performance [7].

In general, paging algorithms must make decisions as requests arrive, with no knowledge of future requests. That is, paging is an online problem. The most prevalent way to analyze online algorithms is competitive analysis [26]. In this framework, the performance of an online algorithm is defined relative to an optimal algorithm with full knowledge of the input sequence, known as optimal offline or OPT. We denote by A(σ) the number of misses of an algorithm A when processing the request sequence σ. A paging algorithm A is said to be c-competitive if for all sequences σ, A(σ) ≤ c ⋅OPT(σ) + β, where β is a constant independent of σ. The competitive ratio of an algorithm is the infimum over all possible values of c satisfying the inequality above. An algorithm is called competitive if it has a constant competitive ratio and strongly competitive if its competitive ratio is the best possible [20].

Traditional paging algorithm are Least-recently-used (LRU)—evict the page in the cache that has been requested least recently— and First-in first-out (FIFO)—evict the page in the cache that was brought into the cache the earliest. Another simple algorithm often considered is Flush-when-full (FWF)—empty the cache if the cache is full and a fault occurs. These algorithms are k-competitive, which is the best ratio that can be achieved for deterministic online algorithms [26]. An optimal offline algorithm for paging is Furthest-in-the-future, also known as Longest-forward-distance and Belady’s algorithm [6]. This algorithm evicts the page in the cache that will be requested at the latest time in the future.

A competitive ratio less than k can be achieved by the use of randomization. Important randomized paging algorithms are Random—evict a page chosen uniformly at random— and Mark [14]—Mark a page when it is unmarked and requested, and upon a fault evict a page chosen uniformly at random among unmarked pages (unMarking all pages first if no unmarked pages remain). Random achieves a competitive ratio of k, while Mark’s competitive ratio is 2Hk − 1 [1], where \(H_{k} = {\sum }_{i=1}^{k} \frac {1}{i}\) is the kth harmonic number. The strongly-competitive algorithms Partition [20], Equitable [1] and OnlineMin [8] achieve the optimal ratio of Hk.

3 Smoothness of Paging Algorithms

We now formalize the notion of smoothness of paging algorithms. We are interested in answering the following question: How does the number of misses of a paging algorithm vary as its inputs vary? We quantify the similarity of two request sequences by their edit distance:

Definition 1 (Distance)

Let σ = x1,…, xn and \(\sigma ^{\prime }=x^{\prime }_{1},\ldots ,x^{\prime }_{m}\) be two request sequences. Then we denote by Δ(σ, σ) their edit distance, defined as the minimum number of substitutions, insertions, or deletions required to transform σ into σ.

This is also referred to as the Levenshtein distance. Based on this notion of distance we define (α, β, δ)-smoothness:

Definition 2 ((α, β, δ)-smoothness)

Given a paging algorithm A, we say that A is (α, β, δ)-smooth, if for all pairs of sequences σ, σ with Δ(σ, σ) ≤ δ,
$$A(\sigma^{\prime})\le \alpha \cdot A(\sigma)+\beta$$

For randomized algorithms, A(σ) denotes the algorithm’s expected number of faults when serving σ.

An algorithm that is (α, β, δ)-smooth may also be (α, β, δ)-smooth for α > α and β < β. As the multiplicative factor α dominates the additive constant β in the long run, when analyzing the smoothness of an algorithm, we first look for the minimal α such that the algorithm is (α, β, δ)-smooth for any β.

We say that an algorithm is smooth if it is (1, β,1)-smooth for some β. In this case, the maximal increase in the number of page faults is proportional to the number of changes in the request sequence. This is called Lipschitz continuity in mathematical analysis. For smooth algorithms, we also analyze the additive part β in detail, otherwise we concentrate the analysis on the multiplicative factor α.

We use the above notation when referring to a specific distance δ. For a generic value of δ we omit this parameter and express the smoothness of a paging algorithm with a pair (α, β), where both α and β are functions of δ.

Definition 3 ((α, β)-smoothness)

Given a paging algorithm A, we say that A is (α, β)-smooth, if for all pairs of sequences σ, σ,
$$A(\sigma^{\prime})\le \alpha(\delta) \cdot A(\sigma)+\beta(\delta), $$
where α and β are functions, and δ = Δ(σ, σ).

Often, it is enough to determine the effects of one change in the inputs to characterize the smoothness of an algorithm A.

Lemma 1

If A is (α, β, γ)-smooth, then A is\((\alpha ^{\delta },\beta {\sum }_{i=0}^{\delta -1}\alpha ^{i},\delta \gamma )\)-smoothfor anyδ.

Proof

We proceed by induction on δ. The case δ = 1 is trivial. Assume the hypothesis is true for 1 ≤ δh. Let σh+1 and σ be any pair of sequences such that hγ < Δ(σ, σh+1) ≤ (h + 1)γ. Then there exists a sequence σh such that Δ(σ, σh) = hγ and Δ(σh, σh+1) ≤ γ. Since A is (α, β, γ)-smooth, then A(σh+1) ≤ αA(σh) + β. By the inductive hypothesis, \(A(\sigma _h)\le \alpha ^h A(\sigma )+\beta {\sum }_{i=0}^{h-1}\alpha ^i\). Therefore, \(A(\sigma _{h+1})\le \alpha (\alpha ^h A(\sigma )+\beta {\sum }_{i=0}^{h-1}\alpha ^i)+\beta =\alpha ^{h+1} A(\sigma )+\beta {\sum }_{i=0}^{h}\alpha ^i\), and thus A is \((\alpha ^{\delta },\beta {\sum }_{i=0}^{\delta -1}\alpha ^i,\delta \gamma )\)-smooth. □

Corollary 1

If A is (1, β,1)-smooth, then A is (1, δβ)-smooth.

Corollary 2

If A is (1, β, γ)-smooth, then A is (1, δβ, δγ)-smoothfor anyδ.

4 Smoothness of Deterministic Paging Algorithms

4.1 Bounds on the Smoothness of Deterministic Paging Algorithms

Before considering particular deterministic online algorithms, we determine upper and lower bounds for several important classes of algorithms. Many natural algorithms are demand paging.

Theorem 1 (Lower bound for deterministic, demand-paging algorithms)

No deterministic, demand-paging algorithm is (1, δ(k + 1 − 𝜖), δ)-smoothfor any𝜖 > 0 and anyδ > 0.

Proof

Let A be any deterministic, demand-paging algorithm, and let δ > 0 be arbitrary. Using k + 1 distinct pages, we can construct a sequence σA(δ) of length k + δ(k + 1) such that A faults on every request: first request the k + 1 distinct pages in any order; then arbitrarily extend the sequence by requesting the page that A has just evicted. Let p be the page that occurs least frequently in σA(δ). By removing all requests to p from σA(δ), we obtain a sequence \(\sigma ^{\prime }_A(\delta )\) that consists of k distinct pages only. By assumption A is demand paging. Thus, A incurs only k page faults on the entire sequence. Assume for a contradiction that A is (1, δ(k + 1 − 𝜖), δ)-smooth for some 𝜖 > 0. Then, we have by definition:
$$A(\sigma_A(\delta)) \leq 1\cdot A(\sigma^{\prime}_A(\delta)) + {\Delta}(\sigma^{\prime}_A(\delta), \sigma_A(\delta))\cdot(k+1-\epsilon). $$
Clearly, p occurs at most \(\left \lfloor \frac {k+\delta (k+1)}{k+1} \right \rfloor = \delta \) times in σA(δ). So \({\Delta }(\sigma ^{\prime }_A(\delta ), \sigma _A(\delta )) \leq \delta \), and we get:
$$\begin{array}{@{}rcl@{}} k+\delta(k+1) &\leq& k + \delta\cdot(k+1-\epsilon)\\ \Leftrightarrow \epsilon &\leq& 0, \end{array} $$
which contradicts the assumption that 𝜖 > 0. □

While most algorithms are demand paging, it is not a necessary condition for an algorithm to be competitive, as demonstrated by FWF. However, we obtain the same lower bound for competitive algorithms as for demand-paging ones.

Theorem 2 (Lower bound for deterministic, competitive paging algorithms)

No deterministic, competitive paging algorithm is (1, δ(k + 1 − 𝜖), δ)-smoothfor any𝜖 > 0 and anyδ > 0.

Proof

Let A be any c-competitive deterministic online paging algorithm. We use the same sequences σA(δ) and \(\sigma ^{\prime }_A(\delta )\) as in the proof of Theorem 1, where the choice of δ is determined below. Again, A faults k + δ(k + 1) times on σA(δ). As the optimal offline algorithm OPT faults exactly k times on \(\sigma ^{\prime }_A(\delta )\), we can conclude that A faults at most ck + β on \(\sigma ^{\prime }_A(\delta )\), for some constant β, due to the competitiveness of A.

Assuming for a contradiction that A is (1, δ(k + 1 − 𝜖))-smooth for some 𝜖, we get:
$$\begin{array}{lll} & A(\sigma_A(\delta)) & \leq 1\cdot A(\sigma^{\prime}_A(\delta)) + {\Delta}(\sigma^{\prime}_A(\delta), \sigma_A(\delta))\cdot(k+1-\epsilon)\\ \implies & k+\delta(k+1) & \leq ck + \beta + \delta\cdot(k+1-\epsilon)\\ \implies & \delta\epsilon & \leq (c-1)k+\beta\\ \implies & \delta & \leq \frac{(c-1)k+\beta}\epsilon \end{array} $$
Thus, A is not (1, δ(k + 1 − 𝜖), δ)-smooth for any \(\delta > \frac {(c-1)k+\beta }\epsilon \).

To complete the proof, assume for a contradiction that A is (1, δ(k + 1 − 𝜖), δ)-smooth for some positive \(\delta \leq \frac {(c-1)k+\beta }\epsilon \). By Corollary 2 this would imply that A is also (1, λδ(k + 1 − 𝜖), λδ)-smooth for any λ. However, for \(\lambda > \frac {(c-1)k+\beta }{\delta \cdot \epsilon }\) this contradicts the result of the first part of the proof. □

Intuitively, the optimal offline algorithm should be very smooth, and this is indeed the case:

Theorem 3 (Smoothness of OPT1)

OPT is (1,2δ)-smooth.This is tight.

The idea of the proof is as follows. Given two sequences σ and σ with Δ(σ, σ) = 1, we show that there exists an offline algorithm A serving σ such that A(σ) ≤OPT(σ) + 2. A acts like OPT until the single the difference between σ and σ. Right after that, the contents of the caches of OPT serving σ and A serving σ differ by at most one page. On the equal suffix of both sequences A can behave so that this difference in caches can translate into at most one extra fault compared to OPT. In fact, while the page that A is missing from OPT’s cache is not requested, A incurs no more faults than OPT does and its cache is missing at most one page from OPT’s cache. Whenever the missing page is requested, A faults and evicts the page that makes both caches equal. From then on both algorithms behave exactly the same. Then, including the initial fault, A(σ) ≤OPT(σ) + 2. The theorem follows from the optimality of OPT and Corollary 1.

With Theorem 3 it is easy to show the following upper bound on the smoothness of any competitive algorithm:

Theorem 4 (Smoothness of competitive algorithms)

Let Abe any paging algorithm such that for all sequencesσ, A(σ) ≤ cOPT(σ) + β.Then A is (c,2δc + β)-smooth.

Proof

Let σ be a sequence such that Δ(σ, σ) = δ. By Theorem 3, OPT(σ) ≤OPT(σ) + 2δ. Therefore, A(σ) ≤ c ⋅ (OPT(σ) + 2δ) + βcA(σ) + 2δc + β. □

Note that the above theorem applies to both deterministic and randomized algorithms. Given that every competitive algorithm is (α, β)-smooth for some α and β, the natural question to ask is whether the converse also holds. Below, we answer this question in the affirmative for deterministic, bounded-memory, demand-paging algorithms. By bounded memory we mean algorithms that, in addition to the contents of their fast memory, only have a finite amount of additional state. For a more formal definition consult [7, page 93]. Paging algorithms implemented in hardware caches are bounded memory. Our proof requires the notion of a k-phase partition:

Definition 4 (k-phase partition)

The k-phase partition of a sequence σ is a partition of σ into contiguous subsequences called k-phases, or simply phases. The first phase starts with the first request of σ, and a new phase starts with the request that constitutes the (k + 1)st distinct page requested since the beginning of the previous phase. Thus except for possibly the last phase of the k-phase partition, each phase consists of exactly k distinct pages.

Theorem 5 (Competitiveness of smooth algorithms)

If algorithm A is deterministic, bounded memory, demand paging, and (α, β)-smoothfor someα andβ, then A is also competitive.

Proof

Assume algorithm A is not competitive. Then, there is no bound on the number of misses in a single k-phase for A: otherwise, if r is a bound on the number of misses of A in every phase, then A is competitive with competitive ratio cr, since OPT must fault at least once in every k-phase.

Let n be the number of states of A. Within a k-phase, a demand-paging algorithm can reach at most (2k)k different configurations: each of the k slots can either contain one of the k “old” pages cached at the start of the k-phase, or one of the up to k “new” pages requested within the phase. Let σ be a sequence of minimal length that ends on a phase in which A misses more than n ⋅ (2k)k times. By the pigeon-hole principle, A must assume the same state and configuration pair twice within that phase. Due to the minimality of the sequence, A must fault at least once between those two occurrences. By repeating the sequence of requests between the two occurrences, we can thus pump up the sequence and the number of faults arbitrarily without increasing the number of phases. By removing the finite prefix of the pumped sequence that comprises all but the final phase, we can construct a sequence σ containing at most k distinct pages. Any demand-paging algorithm, in particular A, will fault at most k times on this sequence. The edit distance between σ and σ is constant, but the difference in faults is unbounded. This shows that A is not (α, β)-smooth for any α and β, thus proving the theorem. □

Also, note that the smoothness condition is necessary, as an algorithm can be bounded memory and demand paging but not smooth, and thus not competitive (by the contrapositive of Theorem 4). An example of such an algorithm is Last-in first-out (LIFO), which evicts the page in the cache that was brought into the cache the latest. LIFO is clearly bounded memory and demand paging, but it is not smooth: consider the sequence σ = x1, x2,…, xk, x1, xk, x1, xk,..., where xixj for all ij. Let σ be the sequence obtained by substituting the first request to x1 by a request to a page \(x_1^{\prime }\) with x1′≠xi for all i. Thus, Δ(σ, σ) = 1. LIFO faults k times on σ but an unbounded number of times on σ and hence it is not smooth. We conjecture that the above theorem also holds without the bounded-memory assumption.

4.2 Smoothness of Particular Deterministic Algorithms

Now let us turn to the analysis of three well-known deterministic algorithms: LRU, FWF, and FIFO. We show that both LRU and FWF are smooth. On the other hand, FIFO is not smooth, as a single change in the request sequence may increase the number of misses by a factor of k.

Theorem 6 (Smoothness of Least-recently-used)

LRU is (1, δ(k + 1))-smooth.This is tight.

Proof

We show that LRU is (1, k + 1,1)-smooth. Corollary 1 then immediately implies that LRU is (1, δ(k + 1))-smooth. Tightness follows from Theorem 1 as LRU is demand paging. To analyze LRU, it is convenient to introduce the notion of age. The age of page p is the number of distinct pages that have been requested since the previous request to p. Before their first request, all pages have age . A request to page p results in a fault if and only if p’s age is greater than or equal to k, the size of the cache. Finite ages are unique, i.e., no two pages have the same age less than . At any time at most k pages are cached, and at most k pages have an age less than k.

Let us now consider how the insertion of one request may affect ages and the expected number of faults. By definition, the age of any page is only affected from the point of insertion up to its next request. Only the next request to a page may thus turn from a hit into a miss. So at most k requests may turn from hits into misses. As the inserted request itself may also introduce a fault, the overall number of faults may thus increase by at most k + 1.

Substitutions are similar to insertions: they turn at most k succeeding hits into misses, and the substituted request itself may introduce one additional fault. The deletion of a request to page p does not increase the ages of other pages. Only the next request to p may turn from a hit into a miss. □

So LRU matches the lower bound for both demand-paging and competitive paging algorithms. We now show that FWF is also smooth, with a factor that is almost twice that of LRU. The smoothness of FWF follows from the fact that it always misses k times per phase, and the number of phases can only change marginally when perturbing a sequence, as we show in Lemma 2.

For a sequence σ, let Φ(σ) denote the number of phases in its k-phase partition.

Proposition 1

For a sequenceσ, let Φ(σ) denote the number of phases in itsk-phase partition. Letσbe a sequence, letρbe a suffix ofσ, and let anddenote the number of distinct pages in the last phase ofσ andρ, respectively. Then Φ(ρ) ≤ Φ(σ).Furthermore, if Φ(ρ) = Φ(σ)then.

Proof

Let ij and \(i_j^{\prime }\) denote the indices in σ of the first request of the jth phase in σ and ρ, respectively, with ij = |σ| + 1 for j > Φ(σ) and \(i_j^{\prime }=|\rho ^{\prime }|+1\) for j > Φ(ρ). Then, for all j it holds that \(i_j\le i_j^{\prime }\). We prove this by induction on j. The case j = 1 is trivially true as i1 = 1 and i1′≥ 1 since ρ is a suffix of σ. Suppose that the hypothesis holds for 1 ≤ jn. It is easy to see that it holds for j = n + 1: Since there are at most k distinct pages between ij and ij+1 − 1, inclusive, and by the inductive hypothesis \(i_j^{\prime }\ge i_j\), the request that ends the jth phase in ρ cannot be earlier than ij+1, and hence \(i_{j+1}^{\prime }\ge i_{j+1}\). Since this is true for all phases including the last one, then Φ(ρ) ≤ Φ(σ). Now, assume that Φ(ρ) = Φ(σ) = m. Then \(i_{m}^{\prime } < |\rho ^{\prime }|+1\) and by the proof above \(i_{m}\le i_{m}^{\prime }\), which implies that . □

Lemma 2

Letσ andσbe two sequences such that Δ(σ, σ) = 1.Then Φ(σ) ≤ Φ(σ) + 2.Furthermore, let andbe the number of distinct pages in the last phase ofσ andσ, respectively. If Φ(σ) = Φ(σ) + 2, then.

Proof

Let ij and \(i_j^{\prime }\) be the indices of the requests that Mark the first page of the jth phase in σ and σ, respectively, with ij = |σ| + 1 for j > Φ(σ) and \(i_j^{\prime }=|\sigma ^{\prime }|+1\) for j > Φ(σ). Let Φ(σ, j) denote the number of phases of σ starting from the jth phase (with Φ(σ, j) = 0 if j > Φ(σ)). Let h − 1 be the phase in σ where the difference between the sequences occurs. For simplicity, assume that if the difference is an insertion (deletion) on \(\sigma ^{\prime }_i\), then i refers to an empty page in σ (σ), i.e., unaffected requests have equal indices in both sequences. If the difference is a deletion, then \(i_{h}\le i^{\prime }_{h}\) and by Proposition 1, Φ(σ, h) ≤ Φ(σ, h), which implies the lemma. If it is a substitution, suppose that q in σ is changed to p in σ. Then consider σ resulting from the deletion of q from σ. By the argument above, Φ(σ) ≤ Φ(σ). Hence, showing that Φ(σ) ≤ Φ(σ) + 2 implies as well that Φ(σ) ≤ Φ(σ) + 2. Since σ is the result of inserting p into σ, it suffices to consider the insertion case (we argue later that if Φ(σ) = Φ(σ) + 2, then < also holds).

Let p be the page that is added to σ to make σ. We analyze Φ(σ) in terms of Φ(σ). We have the following cases:
  • [p is not the first page of phase h − 1]. If p occurs again in the phase, then Φ(σ) = Φ(σ). This is also the case if h − 1 is the last phase of σ. Otherwise, \(i_{h}^{\prime } < i_{h} \le i_{h+1}^{\prime }\) (ih′ cannot be larger than ih+1 as in this case phase h − 1 in σ would include the k + 1 distinct pages in σ[ih..ih+1]). Then by Proposition 1, Φ(σ, h + 1) ≤ Φ(σ, h), and therefore Φ(σ) ≤ Φ(σ) + 1.

  • [p is the first page of phase h − 1]. Then \(i_{h-1}>i_{h-1}^{\prime }\). We have two cases:
    • If \(i_{h-1} \le i_h^{\prime }\), then we have the same case as above but with ih−1 and \(i_h^{\prime }\). Thus, Φ(σ) ≤ Φ(σ) + 1.

    • If \(i_h^{\prime } < i_{h-1} \le i_{h+1}^{\prime }\) (again, ih−1 cannot be greater than \(i^{\prime }_{h+1}\) as in this case the (h − 2)nd phase of σ would include all k + 1 distinct pages in \(\sigma ^{\prime }[i^{\prime }_h..i^{\prime }_{h+1}]\)). Then by Proposition 1, Φ(σ, h + 1) ≤ Φ(σ, h − 1). If Φ(σ, h + 1) = Φ(σ, h − 1), then and Φ(σ) = Φ(σ) + 2. Otherwise, Φ(σ, h + 1) < Φ(σ, h − 1) and Φ(σ) ≤ Φ(σ) + 1.

In all cases above either Φ(σ) ≤ Φ(σ) + 1 or Φ(σ) = Φ(σ) + 2 with . If the difference is a substitution, let q be the page in σ that is replaced by p in σ. Note that the case Φ(σ) = Φ(σ) + 2 can only happen if q is requested earlier in the same phase in σ. Then, removing the request to q would not change the k-phase partition of σ, and hence the same analysis above for an insertion applies and thus as well. □

Theorem 7 (Smoothness of Flush-when-full)

FWF is (1,2δk)-smooth.This is tight.

Proof

Let σ and σ be two sequences such that Δ(σ, σ) = 1. Let Φ(σ) (resp. Φ(σ)) be the number of phases in the k-phase partition of σ (resp. σ), and let (resp. ) be the number distinct pages in the last phase of the partition of σ (resp. σ). FWF misses exactly k times in any phase of a sequence, except possibly for the last one, in which it misses a number of times equal to the number of distinct pages in the phase. Then, FWF(σ) = k ⋅ (Φ(σ) − 1) + , and FWF(σ) = k ⋅ (Φ(σ) − 1) + . By Lemma 2 if Φ(σ) = Φ(σ) + 2, , and thus FWF(σ) ≤ k(Φ(σ) + 2 − 1) + = FWF(σ) + 2k. Otherwise, if Φ(σ) ≤ Φ(σ) + 1, then FWF(σ) ≤ k(Φ(σ) + 1 − 1) + = FWF(σ) − + k + . Since ≥ 0 and k, FWF(σ) ≤ FWF(σ) + 2k. The upper bound in the lemma follows by Corollary 1. To see that this upper bound is tight, let σ = (x1,…, xk)2δ+1, where xixj for all ij. Let σ = x1,…, xk(xk+1, x1,…, xk, x1,…, xk)δ, where xk+1xi for all ik. Thus, Δ(σ, σ) = δ. Clearly FWF(σ) = k, while FWF(σ) = k + 2δk, and hence FWF(σ) = FWF(σ) + 2δk. □

We now show that FIFO is not smooth. In fact, we show that with only a single difference in the sequences, the number of misses of FIFO can be k times higher than the number of misses in the original sequence. On the other hand, since FIFO is strongly competitive, the multiplicative factor k is also an upper bound for FIFO’s smoothness.

Theorem 8 (Smoothness of First-in first-out)

FIFO is (k,2δk)-smooth.FIFO is not (k𝜖, γ,1)-smoothfor any𝜖 > 0 andγ.

Proof

The upper bound follows from the competitiveness of FIFO and Theorem 4.

In the following, we use ⋰ to denote ascending, and ⋱ to denote descending sequences. For example 3, ⋰, 7 denotes the ascending sequence 3, 4, 5, 6, 7. If a < b, then \(a, \ddots , b\) and \(b, \ddots , a\) denote empty sequences. For the lower bound, we show how to construct two sequences σk and \(\sigma ^{\prime }_k\) for each cache size k, such that \({\Delta }(\sigma ^{\prime }_k, \sigma _k) = 1\), that yield configurations c = [1, ⋰ , k] and \(c^{\prime } = [k, \ddots , 1]\), where pages are sorted from last-in to first-in from left to right. Then, the sequence 0, ⋰ , k − 1 yields k misses starting from configuration c and only one miss starting from c. The resulting configurations are [0, ⋰ , k − 1] and \([k-1, \ddots , 0]\), which are equal to c and c up to renaming. So we can construct an arbitrarily long sequence that yields k times as many misses starting from configuration c as it does from configuration c.

For k = 2, σ2 = 2,1 and \(\sigma ^{\prime }_2 = 1,2,1 = 1 \circ \sigma _2\) have edit distance \({\Delta }(\sigma ^{\prime }_2, \sigma _2) = 1\) and yield configurations [1,2] and [2,1], respectively. For k = 3, σ3 = 2,3,1,4,2,1,5,1,4 and \(\sigma ^{\prime }_3 = 1 \circ \sigma _3\) yield configurations [4,1,5] and [5,1,4], respectively, which are equal up to renaming to [1,2,3] and [3,2,1].

For k > 3, we present a recursive construction of σk and \(\sigma ^{\prime }_k\) based on σk−1 and \(\sigma ^{\prime }_{k-1}\). Notice, that \(\sigma ^{\prime }_2 = 1 \circ \sigma _2\) and \(\sigma ^{\prime }_3 = 1 \circ \sigma _3\). We will maintain that \(\sigma ^{\prime }_k = 1 \circ \sigma _k\) in the recursive construction.

As σk−1 and \(\sigma ^{\prime }_{k-1}\) are constructed for a cache of size k − 1, they will behave differently on a larger cache of size k. However, we can pad σk−1 and \(\sigma ^{\prime }_{k-1}\) with requests to one additional page x that fills up the additional space in the cache. This can be achieved as follows: Add a request to x at the start of the two sequences (following the request to 1 in \(\sigma ^{\prime }_k\)). Also, whenever x is evicted in either of the two sequences, in a cache of size k, add a request to x in both sequences. By construction, the additional requests do not increase the edit distance between the two sequences. Further, the additional requests ensure that every request that belongs to the original sequences faults in the new sequence on a cache of size k if and only if it faults in the original sequence on a cache of size k − 1. We call the resulting sequences σk, pre and \(\sigma _{k, pre}^{\prime }\). The two sequences yield configurations c = [1, ⋰, i, x, i + 1, ⋰ , k − 1] and \(c^{\prime } = [k-1, \ddots , j^{\prime }+1, x, j^{\prime }, \ddots , 1]\), respectively, which, unless i = j, are almost solutions to the original problem.

We distinguish two cases, depending on whether i < j (case A) or j > i (case B):

Case A: Observe that if i < j, then c = [1,..., i, x, i + 1, ⋰ , k − 1] and \(c^{\prime } = [k-1, \ddots , j^{\prime }+1, x, j^{\prime }, \ddots , 1]\) are equal up to renaming to d = [1, ⋰ , k] and \(d^{\prime } = [k , \ddots , j+1, i, j, \ddots , i+1, i-1, \ddots , 1]\) for some i, j with 1 ≤ i < jk.

We distinguish five cases depending on the values of i and j:

Case A.1: 1 < i < j < k. Below we incrementally build a suffix σk, post that finishes the construction:

Requests

State for prefix σk, pre

State for prefix \(\sigma ^{\prime }_{k, pre}\)

 

[1, ⋰, k]

\([k,\ddots , j+1,i,j,\ddots ,i+1,i-1,\ddots , 1]\)

Open image in new window

[v,1, ⋰, k − 1]

\([v, k, \ddots ,j+1,i,j, \ddots , i+1, i-1, \ddots , 2]\)

Open image in new window

[i + 1, ⋰, k, v,1, ⋰ , i − 1]

\([v,k, \ddots , j+1,i,j,\ddots , i+1, i-1, \ddots , 2]\)

Open image in new window

[i + 1, ⋰, k, v,1, ⋰ , i − 1]

\([i-1, \ddots , 1,v,k, \ddots , j+1,i,j,\ddots , i+2]\)

If j = i + 1, then the two final state above simplify to [j, ⋰ , k, v,1, ⋰ , i − 1] and \([i-1,\ddots, 1,v,k,\ddots ,j+1,j-1]\) and the following sequence finishes the construction:

Requests

State for prefix σk, pre

State for prefix \(\sigma ^{\prime }_{k, pre}\)

Open image in new window

[w, j, ⋰, k, v,1, ⋰ , i − 2]

\([w,i-1, \ddots , 1, v, k, \ddots , j+1]\)

 

[j, ⋰, k, v,1, ⋰, i − 1]

\([i-1,\ddots , 1,v,k, \ddots , j+1,j-1]\)

Open image in new window

[w, j, ⋰, k, v,1, ⋰ , i − 2]

\([i-2, \ddots , 1,v,k, \ddots , j,w]\)

Otherwise, if ji + 2, the construction can be finished as follows:

Requests

State for prefix σk, pre

State for prefix \(\sigma ^{\prime }_{k, pre}\)

 

[i + 1, ⋰, k, v,1, ⋰, i − 1]

\([i-1, \ddots , 1,v,k, \ddots , j+1,i,j, \ddots , i+2]\)

Open image in new window

[w, i + 1, ⋰ , k, v,1, ⋰ , i − 2]

\([w,i-1, \ddots , 1, v, k, \ddots , j+1,i,j, \ddots , i+3]\)

Open image in new window

[w, i + 1, ⋰, k, v,1, ⋰ , i − 2]

\([i-2, \ddots , 1,v,k, \ddots , i+1,w]\)

Case A.2: 1 = i < j < k − 1. Consider the following suffix:

Requests

State for prefix σk, pre

State for prefix \(\sigma ^{\prime }_{k, pre}\)

 

[1, ⋰ , k]

\([k, \ddots , j+1,1,j, \ddots , 2]\)

Open image in new window

[y,1, ⋰, k − 1]

\([y,k, {\ddots } ,j+1,1,j, \ddots , 3]\)

Open image in new window

[y,1, ⋰ , k − 1]

\([j, \ddots , 2, y,k ,\ddots , j+1]\)

Open image in new window

[y, 1, ⋰ , k − 1]

\([1, j, \ddots , 2, y, k, \ddots , j+2]\)

Open image in new window

[y,1, ⋰ , k − 1]

\([k-1, \ddots , j+1, 1, j,\ddots , 2,y]\)

The final pair of states is equal up to renaming to the pair d = [1, ⋰ , k] and \(d^{\prime } = [k, \ddots , j+2, 2, j+1, \ddots , 3, 1]\), and so it fulfills the conditions under which the suffix σk, post constructed in Case A.1 finishes the construction.

Case A.3: 1 = i < j = k − 1. Consider the following suffix:

Requests

State for prefix σk, pre

State for prefix \(\sigma ^{\prime }_{k, pre}\)

 

[1, ⋰ , k]

\([k, 1, k-1, \ddots , 2]\)

Open image in new window

[x,1, ⋰, k − 1]

\([x, k, 1, k-1, \ddots , 3]\)

Open image in new window

[x, 1, ⋰ , k − 1]

\([k-1, \ddots , 2, x, k]\)

Open image in new window

[z, y, x,1, ⋰ , k − 3]

\([z, y, k-1, \ddots , 2]\)

Open image in new window

[z, y, x, 1, ⋰ , k − 3]

\([k-3, \ddots , 1, x, z, y]\)

The final pair of states is equal up to renaming to the pair d = [1, ⋰, k] and \(d^{\prime } = [k, \ddots , 3, 1, 2]\), and so it fulfills the conditions under which Case A.2 continues the construction.

Case A4: 1 < i < j = k. In this case, d = [1, ⋰ , k] and \(d^{\prime } = [i, k, \ddots , i+1, i-1, \ddots , 1]\). These are equal up to renaming to \(e = [k, \ddots , j^{\prime }+1, 1, j^{\prime }, \ddots , 2]\) and e = [1, ⋰, k], with j = k − (i − 1). Thus, exchanging σk, pre and \(\sigma ^{\prime }_{k, pre}\) yields states that fulfill the conditions of either Case A.2 or Case A.3.

Case A5: 1 = i < j = k. Consider the following suffix:

Requests

State for prefix σk, pre

State for prefix \(\sigma ^{\prime }_{k, pre}\)

 

[1, ⋰ , k]

\([1, k, \ddots , 2]\)

Open image in new window

[x,1, ⋰ , k − 1]

\([x,1, k ,\ddots , 3]\)

Open image in new window

[x,1, ⋰ , k − 1]

\([k-1, \ddots , 2, x, 1]\)

The resulting pair of states is equal up to renaming to [1, ⋰ , k] and \([k, \ddots , 3, 1, 2]\), which corresponds to Case A.2.

Case B: Observe that if i > j, then c = [1, ⋰ , i, x, i + 1, ⋰ , k − 1] and \(c^{\prime } = [k-1, \ddots , j^{\prime }+1, x, j^{\prime }, \ddots , 1]\) are equal up to renaming to d = [1, ⋰ , k] and \(d^{\prime } = [k , \ddots , j+1, j-1, \ddots , i+1, j, i, \ddots , 1]\) for some i, j with 1 ≤ i < jk.

Requests

State for prefix σk, pre

State for prefix \(\sigma ^{\prime }_{k, pre}\)

 

[1, ⋰ , k]

\([k, \ddots , j+1, j-1, \ddots , i+1, j, i, \ddots , 1]\)

Open image in new window

[x,1, ⋰ , k − 1]

\([x, k, \ddots , j+1, j-1, \ddots , i+1, j, i, \ddots , 2]\)

Open image in new window

[x,1, ⋰ , k − 1]

\([i, \ddots , 1, x, k, \ddots , j+1, j-1, \ddots , i+1]\)

At this point, we distinguish two cases:

Case B.1: j = i + 1. Then the suffix \(j-1, \ddots , i+1\) is empty, and the final state for prefix \(\sigma ^{\prime }_{k, pre}\) above can be simplified to \([j-1, \ddots , 1, x, k, \ddots , j+1]\). If, in addition, j = k, the suffix \(k, \ddots , j+1\) is also empty, and this state can further be simplified to \([k-1, \ddots , 1, x]\) and the construction is finished. Otherwise, if j < k, we continue the construction as follows:

Requests

State for prefix σk, pre

State for prefix \(\sigma ^{\prime }_{k, pre}\)

 

[x,1, ⋰ , k − 1]

\([j-1, \ddots , 1, x, k, \ddots , j+1]\)

Open image in new window

[y, x,1, ⋰, k − 2]

\([y, j-1, \ddots , 1, x, k, \ddots , j+2]\)

Open image in new window

[y, x,1, ⋰ , k − 2]

\([k-2, \ddots , j, y, j-1, \ddots , 1, x]\)

The final pair of states is equal up to renaming to the pair d = [1, ⋰ , k] and \(d^{\prime }=[k, \ddots , j+2, 1, j+1, \ddots , 3, 2]\), which corresponds to Case A.2, A.3, or A.5.

Case B.2. j > i + 1. Then, we can continue the construction as follows:

Requests

State for prefix σk, pre

State for prefix \(\sigma ^{\prime }_{k, pre}\)

 

[x,1, ⋰ , k − 1]

\([i, \ddots , 1, x, k, \ddots , j+1, j-1, \ddots , i+1]\)

Open image in new window

[y, x,1, ⋰ , k − 2]

\([y, i, \ddots , 1, x, k, \ddots , j+1, j-1, \ddots , i+2]\)

Open image in new window

[y, x,1, ⋰ , k − 2]

\([k-2, \ddots , i+1, y, i, \ddots , 1, x]\)

The final pair of states is equal up to renaming to the pair d = [1, ⋰ , k] and \(d^{\prime }=[k, \ddots , i+3, 1, i+2, \ddots , 3, 2]\), which corresponds to Case A.2, A.3, or A.5. □

We have shown upper and lower bounds for the smoothness of several classes of algorithms. In particular, we have shown that any c-competitive algorithm is also (c, β)-smooth for some β. On the other hand, we have shown that no online deterministic demand-paging or competitive algorithm can be better than (1, δ(k + 1))-smooth. We have then analyzed the smoothness of LRU, FIFO, and FWF. LRU matches the lower bound while FIFO matches the upper bound for any strictly k-competitive algorithm, demonstrating that the upper and lower bounds for the smoothness of strongly-competitive algorithms are tight. Furthermore, for deterministic algorithms there is no trade-off between competitiveness and smoothness. Next, we turn our attention to randomized algorithms, for which this trade-off does exist.

5 Smoothness of Randomized Paging Algorithms

Randomized algorithms have been shown to be more competitive than deterministic ones. As for deterministic algorithms, in the following, we first derive lower bounds for the smoothness of demand-paging and strongly-competitive algorithms. These suggest that randomization might also help with smoothness.

However, we then go on to show that the well-known competitive randomized algorithms Mark, Equitable, and Partition are not smooth. The simple randomized algorithm that evicts one of the cached pages uniformly at random is shown to be as smooth as LRU, but not more. With randomized algorithms it is possible to sacrifice competitiveness for smoothness, a tradeoff we explore by introducing an algorithm called Smoothed-LRU. We conclude the study of randomized algorithms, by introducing LRU-Random, a randomized version of LRU that is as competitive as LRU, but smoother, at least for a cache of size 2.

5.1 Bounds on the Smoothness of Randomized Paging Algorithms

Similarly to deterministic algorithms, we can show a lower bound on the smoothness of any randomized demand-paging algorithm. Notice that the lower bound only applies to δ = 1 and so additional disturbances might have a smaller effect than the first one.

Theorem 9 (Lower bound for randomized, demand-paging algorithms)

No randomized, demand-paging algorithm is\((1, H_{k}+\frac {1}{k}-\epsilon , 1)\)-smoothfor any𝜖 > 0.

Proof

For a given randomized, demand-paging algorithm A, we show how an oblivious adversary can construct two sequences, a “bad” sequence \(\sigma ^{\prime }_A\) and a “good” sequence σA, with edit distance 1, such that A(σ) is at least \(k+H_k+\frac {1}{k}\) and A(σ) is exactly k. The existence of such sequences immediately implies the theorem. The construction is inspired by the nemesis sequence devised by Fiat et al. [14] in their proof of a lower bound for the competitiveness of randomized algorithms.

The sequence \(\sigma ^{\prime }_A\) consists of requests to n = k + 1 distinct pages. During the construction of the sequence, the adversary maintains for each of the n pages its probability pi of not being in the cache. This is possible, because the adversary knows the probability distribution used by A. We have \({\sum }_i p_i \geq 1\), as only k of the n = k + 1 pages can be in the fast memory.

The “bad” sequence \(\sigma ^{\prime }_A\) begins by n requests, a single request to each of the n pages in an arbitrary order. Initially, the fast memory is empty, and so these requests will result in k + 1 faults. After those requests, as pn = 0, there will be at least one page i with \(p_i \geq \frac {1}{k}\). The next request in \(\sigma ^{\prime }_A\) is to such a page. We will later refer to this page as m. The remainder of \(\sigma ^{\prime }_A\) is composed of k − 1 subphases, the ith subphase of which will contribute an expected number of \(\frac {1}{k-i+1}\) page faults. By linearity of expectation, we can sum up the expected faults on the entire sequence, and obtain \(A(\sigma ^{\prime }) \geq k+1 + \frac {1}{k} + {\sum }_{i=1}^{k-1} \frac {1}{k-i+1} = k+1+\frac {1}{k} + H_k-1 = k + H_k + \frac {1}{k}\). It remains to show how to construct the remaining k − 1 subphases and the “good” sequence σA.

Each of the k − 1 subphases consists of zero or more requests to marked pages followed by exactly one request to an unmarked page. A page is marked at the start of subphase i if it is page m or if it has been requested in at least one of the preceding subphases 1 ≤ j < i. Let M be the set of marked pages at the start of the jth subphase. Then the number of marked pages is |M| = j and the number of unmarked pages is u = k + 1 − j. Let \(p_M = {\sum }_{i \in M} p_i\). If pM = 0, then, there must be an unmarked page n with \(p_n \geq \frac {1}{u}\) and the adversary can pick this page to end the subphase. Otherwise, if pM > 0 there must be a marked page l with pl > 0. The first request of subphase j is to page l. Let 𝜖 = pl. The adversary can now generate requests to marked pages using the following loop:

While the expected number of faults in subphase i is less than \(\frac {1}{u}\), and while pM > 𝜖, request the marked page l such that l = argmaxiMpi.

Note that the loop must terminate, as each iteration will contribute \(p_l \geq \frac {p_M}{|M|} > \frac {\epsilon }{|M|}\) expected faults. If the loop terminates due to the first condition, the adversary can request an arbitrary unmarked page to end the subphase. Otherwise, the adversary requests the unmarked page i with the highest probability value. Clearly, \(p_i \geq \frac {1-p_M}{u} > \frac {1-\epsilon }{u}\). The total expected number of faults of the subphase is then \(\epsilon + p_i > \frac {1}{u}\). This concludes the construction of \(\sigma ^{\prime }_A\).

Notice that there is one unmarked page that has only been requested in the initial n requests of σA′. We obtain the “good” sequence σA by deleting the request to this unmarked page from σA′. By construction, σA contains requests to only k distinct pages. As A is by assumption demand paging, σA will thus incur k page faults only. □

For strongly-competitive randomized algorithms we can show a similar statement using a similar yet more complex construction:

Theorem 10 (Lower bound for strongly-competitive randomized paging algorithms2)

No strongly-competitive randomized paging algorithm is (1, δ(Hk𝜖), δ)-smoothfor any𝜖 > 0 and anyδ > 0.

In contrast to the deterministic case, this lower bound only applies to strongly-competitive algorithms, as opposed to simply competitive. So with randomization there might be a trade-off between competitiveness and smoothness. There might be competitive algorithms that are smoother than all strongly-competitive ones.

5.2 Smoothness of Particular Randomized Algorithms

Two known strongly-competitive randomized paging algorithms are Partition, introduced by McGeoch and Sleator [20] and Equitable, introduced by Achlioptas, Chrobak, and Noga [1]. We show that neither of the two algorithms is smooth.

Theorem 11 (Smoothness of Partition and Equitable3)

For any cache sizek ≥ 2, there is an𝜖 > 0, such that neitherPartitionnorEquitable is (1 + 𝜖, γ,1)-smoothfor anyγ. Also,Partition andEquitableare (Hk,2δHk)-smooth.

The lower bound in the theorem above may not be tight, but it shows that neither of the two algorithms matches the lower bound from Theorem 10. This leaves open the question whether the lower bound from Theorem 10 is tight.

Note that the lower bound for Equitable applies equally to OnlineMin [8], as OnlineMin has the same expected number of faults as Equitable on all request sequences.

Mark [14] is a simpler randomized algorithm that is (2Hk − 1)-competitive. We show that it is not smooth either.

Theorem 12 (Smoothness of Mark4)

Let\(\alpha =\max _{1< \ell \leq k}\left \{\frac {\ell (1+H_{k}-H_{\ell })}{\ell -1+H_{k}-H_{\ell -1}}\right \}={\Omega }(H_{k})\), where k is the cache size.Mark is not (α𝜖, γ,1)-smoothfor any𝜖 > 0 and anyγ.Also,Mark is (2Hk − 1, δ(4Hk − 2))-smooth.

We conjecture that the lower bound for Mark is tight, i.e., that Mark is (α, β)-smooth for α as defined in Theorem 12 and some β.

We now prove that Random achieves the same bounds for smoothness as LRU and the best possible for any deterministic, demand-paging or competitive algorithm. For simplicity, we prove the theorem for a non-demand-paging definition of Random in which each page gets evicted upon a miss with probability 1/k even if the cache is not yet full. This modification with respect to the demand-paging version does not change the competitiveness of the algorithm and it allows us to avoid the analysis of special cases when proving properties about smoothness. In fact, for the non-demand-paging version of the algorithm, given any pair of sequences σ, σ, it is possible to construct two sequences ρ and ρ with Δ(ρ, ρ) = Δ(σ, σ) such that the number of faults of σ and σ starting from an empty cache equals the number of faults of ρ and ρ starting with a full cache containing an arbitrary set of pages. This can be achieved by renaming in σ and σ any occurrences of the pages in the initial cache so that these pages do not appear in the rest of the sequences. This implies that any property derived on the smoothness of the algorithm starting with an empty cache can also be achieved when the cache is assumed to be initially full. Note that the same property holds for LRU-Random, which is introduced in Section 5.4.

Intuitively, the additive term k + 1 in the smoothness of Random is explained by the fact that a single difference between two sequences can make the caches of both executions differ by one page p. Since Random evicts a page with probability 1/k, the expected number of faults until p is evicted is k.

Theorem 13 (Smoothness of Random)

Random is (1, δ(k + 1))-smooth.This is tight.

Proof

For the lower bound, we use a similar construction as the one used for the lower bound of Random’s competitiveness in [23]. Consider the sequences σ = σ1...kσ1...k and σ = σ1...kxk+1σ1...k with σ1...k = (x1, x2,…, xk)n. The sequences are identical but for the insertion of xk+1 in σ and thus Δ(σ, σ) = 1. For any 𝜖 > 0, the expected number of faults in the second half of σ is less than 𝜖 for sufficiently large n. On σ, Random faults on xk+1 and evicts one of x1,…, xk. Then, on each of the n subsequences of k requests in the second part of σ, and while xk+1 is still in its cache, RAND will incur a fault. If in one of these faults RAND evicts xk+1, then it does not incur any faults for the rest of the sequence. Since on every fault RAND evicts xk+1 with probability 1/k, the expected number of faults until this happens exceeds k𝜖 for any 𝜖 > 0 for sufficiently large n. This, plus the initial request to xk+1 yield RAND(σ) ≥RAND(σ) + k + 1 − 2𝜖 for any 𝜖 and sufficiently large n. Now, for general δ we follow the same idea: instead of one, we have δ subsequences (x1xk)n in σ and δ subsequences yi(x1xk)n in σ, where yi (1 ≤ iδ) is a new page not requested so far, and it is distinct in every repetition. The number of expected faults in each repetition in σ is at least k + 1 − 𝜖 for any 𝜖 and sufficiently large n, while RAND does not incur extra faults. Thus, RAND(σ) ≥RAND(σ) + δ(k + 1) − 𝜖(δ + 1).

In order to prove the upper bound, we look at the state distributions of Random when serving two sequences σ and σ with Δ(σ, σ) = 1. We use a potential function defined as the distance between two state distributions. For this distance, we define a version of the earth mover’s distance. Let D and D be two probability distributions of cache states. We define the distance between D and D as the minimum cost of transforming D into D by means of transferring probability mass from the states of D to the states of D.

Let s and s be two cache states in D and D with probabilities ps and \(p_{s^{\prime }}\), respectively. Let α be a function that denotes the amount of probability mass to be transferred from states in D to states in D. The earth mover’s distance between D and D is defined as
$${\Delta}(D, D^{\prime}) := \underset{\alpha}{\min} \sum\limits_{s,s^{\prime}} \alpha(s, s^{\prime})\cdot d(s,s^{\prime}), $$
where for all s, \({\sum }_{s^{\prime }} \alpha (s,s^{\prime })=p_s\), for all s, \({\sum }_{s} \alpha (s,s^{\prime })=p_{s^{\prime }}\), and d(s, s) is the distance between states s and s. We define \(d(s,s^{\prime })=k \cdot H_{c(s,s^{\prime })}\), where c(s, s) = max{|ss|,|ss|}, and H is the th harmonic number. Note that |ss| might not equal |ss| if either state does not represent a full cache. For example, let k = 3, s = [1,2,3] and s = [1,4,5]. Then, c(s, s) = 2 and d(s, s) = kH2 = 3 ⋅ 3/2 = 9/2. For convenience we let H0 = 0. Figure 1 shows the distance between two example distributions.
Fig. 1

Example of distance between state distributions. Labels over arrows show the probability mass that is transferred between corresponding states. The cost of the transfer is \({\sum }_{s,s^{\prime }} \alpha (s, s^{\prime })\cdot d(s,s^{\prime })=1/3\cdot d([1,2,3],[1,2,3]) + 1/3\cdot d([1,3,4],[1,2,3]) + 1/3\cdot d([1,2,4],[1,2,5]) = 1/3\cdot 3 \cdot H_0 + 2\cdot 1/3\cdot 3\cdot H_1 = 2\). Since this assignment minimizes the transfer cost, Δ(D, D) = 2

We will prove the following claim:

Claim

Let D and D be two probability distributions over cache states. Let σ be any request sequence and let MD(σ) and \(M_{D^{\prime }}(\sigma )\) be two random variables equal to the number of misses on σ by RAND when starting from distributions D and D, respectively. Then, \(E[M_D^{\prime }(\sigma )] - E[M_{D}(\sigma )] \le {\Delta }(D,D^{\prime })\).

Let us assume that the claim is true. Then, we prove the theorem by considering two sequences ρ and ρ such that δ = Δ(ρ, ρ) = 1 and arguing that Δ(D, D) ≤ k for any pair of distributions D and D that can be reached, respectively, by serving prefixes of ρ and ρ starting from an empty cache. If this prefix includes the single difference between both sequences, then the theorem follows by applying the claim above to the maximal suffix σ shared by both sequences.

Let j be the minimum j such that ρ[(j + 1)..|ρ|] = ρ[(j + 1)..|ρ|] = σ. Then, \(\rho _{j}\ne \rho ^{\prime }_{j}\) (one of the two might be empty) and ρ[1..(j − 1)] = ρ[1..(j − 1)]. Since RAND(ρ) and RAND(ρ) both start with an empty cache, their distributions and expected misses before serving ρj and \(\rho _j^{\prime }\) coincide. We now argue that after serving ρj and \(\rho _j^{\prime }\) the distance between the resulting distributions D and D is at most k.

Let F be the state distribution of both executions before serving ρj and \(\rho _j^{\prime }\). Suppose first that \(\rho _j^{\prime }\) is empty and thus D = F (the case when ρj is empty is symmetric). We look at the minimum cost to transfer the probability mass from each state from F to D. Let si be a state in F with probability pi. If ρjsi, then si has probability at least pi in D and hence we can transfer pi mass between these states in F and D at cost zero. Otherwise, if ρjsi, D contains k states \(s_{i_1}^{\prime },\ldots ,s_{i_k}^{\prime }\) resulting from the eviction of each of the k pages of si, with \(c(s_i,s_{i_r}^{\prime })=1\) and hence \(d(s_i,s_{i_r}^{\prime })=kH_1=k\) for all 1 ≤ rk. Moreover, the probability of these states is at least pi/k and hence we can transfer all the mass of si to these states at a total cost of \({\sum }_{r=1}^k k(p_i/k)= kp_i\). Adding up over all states siF, we can transfer all probability mass of F to D at a cost of at most \(k\sum {p_i}=k\), since \(\sum {p_i}=1\). Since the distance between D and F is the minimum cost of transferring the probability mass from F to D, this cost is at most k. For the case when \(\rho _j\ne \rho _j^{\prime }\) and neither is empty, we apply a similar argument. Let si be a state in F with probability pi. If both ρj and \(\rho _j^{\prime }\) are in si, then this state is also in D and D, and we can transfer pi from D to D at cost zero. Assume that ρjsi but \(\rho _j^{\prime }\notin s_i\). Then, as we argued above, in D, there are k states with probability at least pi/k with distance 1 to si. Since siD, we can transfer a mass of pi to these states at a cost of kpi. Now, if ρjsi but \(\rho _j^{\prime }\in s_i\), si is in D and there are k states in D with distance 1 to si. We can transfer pi/k mass from each of these states in D to si in D at a cost of kpi. Finally, if ρjsi and \(\rho _j^{\prime }\notin s_i\), then there are k pairs of states (s, s) with sD and sD resulting from the replacement of the same page in si by ρj and \(\rho _j^{\prime }\), respectively, and thus c(s, s) = 1. In the distance Δ(D, D) we can transfer pi/k from s to s at a cost of k. Since there are k such such pairs for each si, the total cost contributed by these pairs is pik. Since for all cases the cost contributed by a state siF when transferring mass from D to D is at most kpi, the distance Δ(D, D) is at most \(k\sum {p_i}=k\).

Since serving ρj and \(\rho _j^{\prime }\) can add at most 1 to the difference in expected misses, and by the claim above the difference in expected misses in the suffix σ is at most Δ(D, D) = k, it follows that E[RAND(ρ) −RAND(ρ)] ≤ k + 1. The theorem follows by Corollary 1.

We now prove the claim. Let MD(σi) be a random variable equal to the number of misses of Random when σi is requested and when the state distribution of RAND is D. Let D0 = D and D0′ = D. Then, it is sufficient to prove that for every request σiσ, for 1 ≤ i ≤|σ|,
$$ E[M_{D_{i-1}}(\sigma_i)]-E[M_{D^{\prime}_{i-1}}(\sigma_i)] \le {\Delta}(D_{i-1},D^{\prime}_{i-1})-{\Delta}(D_{i},D^{\prime}_{i}). $$
(1)

This implies that \(E[M_D(\sigma )] - E[M_{D^{\prime }}(\sigma )] \le {\Delta }(D_{0},D^{\prime }_{0}) - {\Delta }(D_{f},D^{\prime }_{f}) \le {\Delta }(D_{0},D^{\prime }_{0}) = {\Delta }(D,D^{\prime })\), since Δ(⋅,⋅) ≥ 0 for any pair of distributions.

Let Di−1 and \(D_{i-1}^{\prime }\) be the distributions before the request to σi.
$${\Delta}(D_{i-1},D^{\prime}_{i-1})=\sum\limits_{s_u,s_v}\alpha(s_u,s_v)d(s_u,s_v), $$
where α(su, sv) is the amount of mass transferred from su to sv (which could be zero). We look at two states suD with probability pu and svD with probability pv and construct a valid assignment α after the request to σi for the distance \({\Delta }(D_{i},D^{\prime }_{i})\).
We separate the analysis into the following cases:
  1. 1.

    [σisu, sv] In this case suDi with probability at least pu and \(s_v \in D^{\prime }_i\) with probability at least pv. Hence, since α(su, sv) ≤ min{pu, pv} we can make α(su, sv) = α(su, sv). The contribution of this pair of states to \({\Delta }(D_{i},D^{\prime }_{i})\) is \(\alpha (s_u,s_v)d(s_u,s_v)= \alpha (s_u,s_v)kH_{c(s_u,s_v)}\).

     
  2. 2.
    [σisu, sv] There are k states r = {r1,…, rk} in Di and t = {t1,…, tk} in \(D_i^{\prime }\) resulting from the eviction of each page of su and sv, respectively. The probability of each state of r and t is at least pu/k and pv/k, respectively. Let c = c(su, sv) and α = α(su, sv). If c = 0, then we pair states in r and t such that \(r_{j_1}=t_{j_2}\) and we make \(\alpha ^{\prime }(r_{j_1},t_{j_2})=\alpha /k\). Otherwise, there are c pages that su and sv do not have in common. We sort the states in r and s such that the first c states are those that result from evicting a page from su that is not in sv and vice versa, while the rest of the states are the ones resulting from evicting a common page. We pair the states in order and set α(rj, tj) = α/k. Note that c(rj, tj) = c − 1 for all jc and c(rj, tj) = c for all j > c (see Fig. 2). The contribution of this pair of states to \({\Delta }(D_{i},D^{\prime }_{i})\) is at most
    $$(\alpha/k) (ckH_{c-1}+(k-c)kH_{c})=\alpha(kH_{c}+c(H_{c-1}-H_c))=\alpha(kH_{c}-1) $$
     
  3. 3.

    [σisu, σisv] We transfer α(su, sv)/k to the k states in \(D^{\prime }_{i}\) resulting from evictions from sv. As in case 2. there are c = c(su, sv) states that result from evicting a non-common page with su and the rest evict a common page. Each of the first c states has c − 1 non-common pages with su, while the rest have c non-common pages. Hence, the contribution of these states to \({\Delta }(D_{i},D^{\prime }_{i})\) is \(\alpha (s_u,s_v)(kH_{c(s_u,s_v)}-1)\).

     
  4. 4.

    [σisu, σisv] This case is analogous to case 3. We transfer α(su, sv)/k mass to svD from each of the k states in D that result from evictions from su. The contribution of these states is \(\alpha (s_u,s_v)(kH_{c(s_u,s_v)}-1)\).

     
Fig. 2

States suDi−1 and svDi−1′ and the resulting states in Di and Di′ after the request to σisu, sv. If c = c(su, sv) > 0, su and sv have m = kc pages in common and the resulting states can be paired such that c of them have m + 1 pages in common while kc pairs have m pages in common (if c = 0 then we pair equal states). If α = α(su, sv) is the probability mass transferred from su to sv in the minimum cost assignment between Di−1 and Di−1′ then we assign α/k probability mass between the paired states for the assignment between Di and Di

Since in the cases above we account for all the probability mass of all possible states in Di and \(D_i^{\prime }\), the described mass transfer is a valid distance between the distributions, and its cost is:
$$\begin{array}{@{}rcl@{}} {\Delta}(D_{i},D^{\prime}_{i}) & \le &\sum\limits_{s_u,s_v |\sigma_i\in s_u,s_v} \alpha(s_u,s_v)kH_{c(s_u,s_v)}+\sum\limits_{s_u,s_v |\sigma_i\notin s_u,s_v} \alpha(s_u,s_v)(kH_{c(s_u,s_v)}-1)\\ & & + \sum\limits_{s_u,s_v |\sigma_i\in s_u, \sigma_i \notin s_v} \alpha(s_u,s_v)(kH_{c(s_u,s_v)}-1)\\ &&+ \sum\limits_{s_u,s_v |\sigma_i\notin s_u, \sigma_i \in s_v} \alpha(s_u,s_v)(kH_{c(s_u,s_v)}-1)\\ & = & {\Delta}(D_{i-1},D^{\prime}_{i-1}) - \sum\limits_{s_u,s_v |\sigma_i \notin s_u \vee \sigma_i \notin s_v}\alpha(s_u,s_v)\\ & \le & {\Delta}(D_{i-1},D^{\prime}_{i-1}) - \sum\limits_{s_u,s_v |\sigma_i \notin s_u}\alpha(s_u,s_v) \end{array} $$

Therefore, \({\Delta }(D_{i-1},D^{\prime }_{i-1})-{\Delta }(D_{i},D^{\prime }_{i})\ge {\sum }_{s_u,s_v |\sigma _i \notin s_u}\alpha (s_u,s_v)\). On the other hand, \(E[M_{D_{i-1}}(\sigma _i)] - E[M_{D^{\prime }_{i-1}}(\sigma _i)] \le E[M_{D_{i-1}}(\sigma _i)] = {\sum }_{s_u,s_v |\sigma _i \notin s_u}\alpha (s_u,s_v)\), and hence \(E[M_{D_{i-1}}(\sigma _i)] - E[M_{D^{\prime }_{i-1}}(\sigma _i)] \le {\Delta }(D_{i-1},D^{\prime }_{i-1})-{\Delta }(D_{i},D^{\prime }_{i})\). □

5.3 Trading Competitiveness for Smoothness

We have seen that none of the well-known randomized algorithms are particularly smooth. Random is the only known randomized algorithm that is (1, δc)-smooth for some c. However, it is neither smoother nor more competitive than LRU, the smoothest deterministic algorithm. In this section we show that greater smoothness can be achieved at the expense of competitiveness. First, as an extreme example of this, we show that Evict-on-access (EOA) [9]—the policy that evicts each page with a probability of \(\frac {1}{k}\) upon every request, i.e., not only on faults but also on hits—beats the lower bounds of Theorems 9 and 10 and is strictly smoother than OPT. This policy is non–demand paging and it is obviously not competitive. We then introduce Smoothed-LRU, a parameterized randomized algorithm that trades competitiveness for smoothness.

Theorem 14 (Smoothness of EOA5)

EOA is\((1,\delta (1+\frac {k}{2k-1}))\)-smooth.This is tight.

5.3.1 Smoothed-LRU

We now describe Smoothed-LRU. The main idea of this algorithm is to smooth out the transition from the hit to the miss case.

Recall the notion of age that is convenient in the analysis of LRU: The age of page p is the number of distinct pages that have been requested since the previous request to p. LRU faults if and only if the requested page’s age is greater than or equal to k, the size of the cache. An inserted request may increase the ages of k cached pages by one. At the next request to each of the cached pages, the page’s age may thus increase from k − 1 to k, and turn the request from a hit into a miss, resulting in k additional misses. By construction, under Smoothed-LRU, the hit probability of a request decreases only gradually with increasing age. The speed of the transition from definite hit to definite miss is controlled by a parameter i, with 0 ≤ i < k. Under Smoothed-LRU, the hit probability \(P(\textit {hit}_{\textsc {Smoothed}-LRU_{k,i}}(a))\) of a request to a page with age a is:
$$ P(\textit{hit}_{\textsc{Smoothed}-LRU_{k,i}}(a)) = \left\{\begin{array}{ll} 1 & : a < k-i\\ \frac{k+i-a}{2i+1} & : k-i \leq a < k+i\\ 0 & : a \geq k+i \end{array}\right. $$
(2)
where k is the size of the cache. Figure 3 illustrates this graphically in relation to LRU for cache size k = 8 and i = 4. It is easy to see that for i = 0, Smoothed-LRU reduces to LRU. We will later demonstrate how to realize an algorithm with the hit probabilities defined above. Before doing so we analyze such an algorithm’s smoothness and competitiveness.
Fig. 3

Hit probabilities of LRU and Smoothed-LRU in terms of the age of the requested page for k = 8 and i = 4

Theorem 15 (Smoothness of Smoothed-LRU)

SmoothedLRUk, i is\((1,\delta (\frac {k+i-1}{2i+1}+2))\)-smoothfork > 3i, and\((1,\delta (\frac {2k-1}{2i+1}+1))\)-smoothfork ≤ 3i.This is tight6.

Proof

The proof of the upper bound is similar to that for LRU. The key difference is that, in contrast to LRU, an age increase may only increase the miss probability of a page by \(\frac {1}{2i+1}\). We show that Smoothed − LRUk, i is \((1,\frac {k+i-1}{2i+1}+2,1)\)-smooth for k > 3i and \((1,\frac {2k-1}{2i+1}+1,1)\)-smooth for k ≤ 3i. Corollary 1 then implies the theorem.

Let us first consider how the insertion of one request may affect ages and the expected number of faults. By definition, the age of any page is only affected from the point of insertion up to its next request. Only the hit probability of the next request to a page may thus change due to an additional request. Under Smoothed − LRUk, i, at most k + i pages have a non-zero hit probability at any time. Other than the inserted request, only the next requests to these k + i pages may increase the expected number of misses. By construction, increasing the age of a request by one may only decrease its hit probability by \(\frac {1}{2i+1}\). As the inserted request itself may also introduce a fault, the overall number of faults may thus increase by at most \(\frac {k+i}{2i+1}+1\), which is no larger than both bounds given above.

The deletion of a request to page p does not increase the ages of other pages. (a) Only the miss probability of the next request to p may increase. (b) On the other hand, deleting the request to p reduces the expected number of misses of a sequence by the request’s own miss probability. Thus, a deletion cannot increase the expected number of faults by more than one, but a more careful analysis helps with the analysis for substitutions.
  1. (a)

    The increase in p’s fault probability on its next request depends on p’s age right before the deletion of the request, which we denote by a in the following. Upon the next request to p, p’s age is at most a higher than it would have been without the deletion. Thus, its miss probability increases by at most \(\frac {a}{2i+1}\). The increase is also bounded by 1, as the next request to p may only cause a single additional miss.

     
  2. (b)
    According to (2) the miss probability of the deleted request to p is:
    $$ P(\textit{miss}_{\textsc{Smoothed}-LRU_{k,i}}(a)) = \left\{\begin{array}{ll} 0 & : a < k-i\\ 1-\frac{k+i-a}{2i+1} & : k-i \leq a < k+i\\ 1 & : a \geq k+i \end{array}\right. $$
    (3)
    Observe that for all three cases in (3), the difference between (a) and (b) is bounded by both \(\frac {k-i-1}{2i+1}\) and 1, and thus a deletion may increase the expected number of misses at most by \(\min \{\frac {k-i-1}{2i+1}, 1\}\).
     
For the most complicated case of a substitution, consider the sequence σ = σprepσpost. A substitution of the request to p by a request to q can be achieved by first deleting the request to p and then inserting the request to q, yielding sequences σ = σpreσpost and σ = σpreqσpost. From the considerations of insertions and deletions above, we know that \(\textsc {Smoothed}-\text {LRU}_{k,i}(\sigma ^{\prime })-\textsc {Smoothed}-\text {LRU}_{k,i}(\sigma ) \leq \min \{\frac {k-i-1}{2i+1}, 1\}\) and \(\textsc {Smoothed}-\text {LRU}_{k,i}(\sigma ^{\prime \prime })-\textsc {Smoothed}-\text {LRU}_{k,i}(\sigma ^{\prime }) \leq \frac {k+i}{2i+1}+1\).

Now, consider the case where the deletion causes a whole additional fault, i.e, Smoothed −LRUk, i(σ) −Smoothed −LRUk, i(σ) = 1. Then, the page p must be cached with probability 1 after σpre. Also, the next request to p in σpost causes a fault with probability 1 in σ. Thus, the insertion of q does not increase the request’s miss probability in σ. In this case, the insertion of q thus only increases the miss probabilities of k + i − 1 pages. Combining the arguments above we get \(\textsc {Smoothed}-LRU_{k,i}(\sigma ^{\prime \prime })-\textsc {Smoothed}-LRU_{k,i}(\sigma ) \leq \min \{\frac {k-i-1}{2i+1}+\frac {k+i}{2i+1}+1, 1+\frac {k+i-1}{2i+1}+1\} = \min \{\frac {2k-1}{2i+1}+1, \frac {k+i-1}{2i+1}+2\}\).

For k > 3i, the difference simplifies to \(\frac {k+i-1}{2i+1}+2\), as \(\frac {k+i-1}{2i+1}+2 = \frac {k+3i}{2i+1}+1 \leq \frac {2k-1}{2i+1}+1\). On the other hand, for k ≤ 3i, \(\frac {k+i-1}{2i+1}+2 = \frac {k+3i}{2i+1}+1 > \frac {2k-1}{2i+1}+1\).

For tightness, consider the two sequences σ = 1,2,…, k + i, ki,1,2,…, k + i, y and σ = 1,2,…, k + i, x,1,2,…, k + i, y with y > x > k + i. Then, Δ(σ, σ) = 1. The request to x incurs one additional fault. The second request to each page in the set {1,…, ki − 1, ki + 1,…k + i} has an age of k + i in σ, while it only has an age of k + i − 1 in σ. Thus each of these requests incurs an additional \(\frac {1}{2i+1}\) expected faults, for a total of \(\frac {k+i-1}{2i+1}\) additional expected faults. By construction, the third request to ki in σ incurs no faults, as it has an age of ki − 1. The corresponding request in σ has an age of k + i, thus incurring one additional fault. Finally, the second request to ki in σ, which is missing in σ, has an age of k + i − (ki) = 2i, and thus causes no faults if 2i < kik > 3i. In this case, \(\textsc {Smoothed}-LRU_{k,i}(\sigma ^{\prime })-\textsc {Smoothed}-LRU_{k,i}(\sigma ) = 1+\frac {k+i-1}{2i+1}+1-0 = \frac {k+i-1}{2i+1}+2\), matching the upper bound. If 2ikik ≤ 3i, the second request to ki in σ incurs an expected \(1-\frac {k+i-2i}{2i+1} = 1-\frac {k-i}{2i+1}\) faults. In this case, \(\textsc {Smoothed}-LRU_{k,i}(\sigma ^{\prime })-\textsc {Smoothed}-LRU_{k,i}(\sigma ) = 1+\frac {k+i-1}{2i+1}+1-(1-\frac {k-i}{2i+1}) = \frac {2k-1}{2i+1}+1\). For δ > 1, consider the sequences σδ and \(\sigma ^{\prime }_{\delta }\) obtained by concatenating δ copies of σ and σ, respectively. □

For i = 0, SmoothedLRU is identical to LRU and (1, δ(k + 1))-smooth. At the other extreme, for i = k − 1, SmoothedLRU is (1,2δ)-smooth, like the optimal offline algorithm. However, for larger i, Smoothed-LRU is less competitive than LRU:

Lemma 3 (Competitiveness of Smoothed-LRU)

For any sequenceσ andlki,
$$\textsc{Smoothed}\text{-LRU}_{k,i}(\sigma) \leq \frac{k-i}{k-i-l+1}\cdot \text{OPT}_{l}(\sigma) + l, $$
where OPTl(σ) denotes the number of faults of the optimal offline algorithm processingσon a fast memoryof sizel. For l > ki and anyα andβσ, such thatSmoothed − LRUk, i(σ) > α ⋅OPTl(σ) + β.

Proof

Let LRUl(σ) denote the number of faults of LRU on a cache of size l. As SmoothedLRUk, i caches all pages younger than ki with probability one, we have Smoothed − LRUk, i(σ) ≤ LRUki(σ) for any sequence σ, From Sleator and Tarjan [26], we know that \(LRU_{k-i}(\sigma ) \leq \frac {k-i}{k-i-l+1}\cdot \text {OPT}_l(\sigma ) + l\).

For the second part of the theorem consider the sequence σn = (1,…, l)n, which contains l distinct pages. The optimal offline algorithm misses exactly l times on this sequence independently of n. For ki < l, on the other hand, SmoothedLRUk, i has a non-zero miss probability of at least \(\frac {1}{2i+1}\) on every request. For every α and β there is an n such that SmoothedLRUk, i(σn) > α ⋅OPTl(σn) + β. □

So far we have analyzed Smoothed-LRU based on the hit probabilities given in (2). We have yet to show that a randomized algorithm satisfying (2) can be realized.

In the following, we construct a probability distribution on the set of all deterministic algorithms using a fast memory of size k that satisfies Equation (2). This is commonly referred to as a mixed strategy [7].

First, we decompose an instance of Smoothed-LRU into i + 1 instances of a simpler algorithm called Step-LRU. Then we show how Step-LRU can be realized as a mixed strategy. Like Smoothed-LRU, Step-LRU is parameterized by i, and it exhibits the following hit probabilities in terms of the age of a requested page:
$$ P(\textit{hit}_{\textsc{Step}-\text{LRU}_{k,i}}(a)) = \left\{\begin{array}{ll} 1 & : a < k-i\\ \frac{1}{2} & : k-i \leq a < k+i\\ 0 & : a \geq k+i \end{array}\right. $$
(4)

Lemma 4 (Decomposition of Smoothed-LRU in terms of Step-LRU7)

For all ages a,
$$P(\textit{hit}_{\textsc{Smoothed}\text{-}LRU_{k,i}}(a)) = \frac{1}{2i+1}\left( 1\cdot P(\textit{hit}_{\textsc{Step}\text{-}\text{LRU}_{k,0}}(a)) + \sum\limits_{j=1}^{i} 2\cdot P(\textit{hit}_{\textsc{Step}\text{-}\text{LRU}_{k,j}}(a))\right). $$

As a consequence, we can realize Smoothed-LRU as a mixed strategy if we can realize Step-LRU as a mixed strategy.

While the hit probabilities \(P(\textit {hit}_{\textsc {Step}-\text {LRU}_{k,i}}(a))\) do not fully define Step-LRU, by linearity of expectation they are sufficient to determine the expected number of faults on any sequence σ, which we denote by Step- LRUk, i(σ). We are able to show that Step-LRU can be realized as a mixed strategy:

Proposition 2 (Step-LRU as a mixed strategy8)

There is a probability distribution\(d : {\mathcal A} \rightarrow \mathbb {R}\)over a finite set of deterministic paging algorithms\({\mathcal A}\)using a fast memory of size k, such that for allsequencesσ,
$$\textsc{Step}\text{-LRU}_{k,i}(\sigma) = \sum\limits_{A \in {\mathcal A}} d(A)\cdot A(\sigma). $$

This immediately implies that Smoothed-LRU can be realized as a mixed strategy:

Corollary 3 (Smoothed-LRU as a mixed strategy)

There is a probability distribution\(d : {\mathcal A} \rightarrow \mathbb {R}\)over a finite set of deterministic paging algorithms\({\mathcal A}\)using a fast memory of size k, such that for all sequencesσ,
$$\textsc{Smoothed}-LRU_{k,i}(\sigma) = \sum\limits_{A \in {\mathcal A}} d(A)\cdot A(\sigma). $$

Proof

This follows immediately from Lemma 4 and Proposition 2. □

5.4 A Competitive and Smooth Randomized Paging Algorithm: LRU-Random

In this section we introduce and analyze LRU-Random, a competitive randomized algorithm that is smoother than any competitive deterministic algorithm. LRU-Random orders the pages in the fast memory by their recency of use; like LRU. Upon a miss, LRU-Random evicts older pages with a higher probability than younger pages. More precisely, the ith oldest page in the cache is evicted with probability \(\frac {1}{i\cdot H_{k}}\). By construction the eviction probabilities sum up to 1: \({\sum }_{i=1}^{k} \frac {1}{i\cdot H_{k}} = \frac {1}{H_{k}}\cdot {\sum }_{i=1}^{k} \frac {1}{i} = 1\). LRU-Random is not demand paging: if the cache is not yet entirely filled, it may still evict cached pages according to the probabilities mentioned above. For a cache of size 8, Fig. 4 illustrates the probabilities of evicting the ith oldest page from the cache upon a miss under LRU, LRU-Random, and Random.
Fig. 4

Eviction probability of the ith oldest page in the cache upon a miss under LRU, LRU-Random, and Random for a cache of size 8

LRU-Random is at least as competitive as strongly-competitive deterministic algorithms:

Theorem 16 (Competitiveness of LRU-Random)

For anysequenceσ,
$$\text{LRU-}\textsc{Random}(\sigma) \leq k \cdot \text{OPT}(\sigma). $$

Proof

We actually prove a stronger statement, namely that LRU-Random is k-competitive against any adaptive online adversary [21]. Our proof is based on a potential argument.

Let SADV and SLRUR be the set of pages contained in the adversary’s and LRU-Random’s fast memory, respectively. Further, let age(p) be the age of page pSLRUR, i.e., age(p) is 0 for the most-recently-used page and k − 1 for the least-recently-used one among those pages that are in SLRUR. Based on age(p), we define s(p) = kage(p). In other words, s(p) is 1 for the oldest cached page, and k for the youngest, most-recently-used. Using this notation we define the following potential function:
$${\Phi} = H_k\cdot \sum\limits_{p \in S_{\text{LRUR}}\setminus S_{\text{ADV}}} \frac{s(p)}{H_{s(p)}}. $$
We will show that for any page x and any decision of the adversary to evict a page from its memory, we have
$$ \text{LRU-}\textsc{Random}(x) + {\Delta}{\Phi}(x) \leq k\cdot \text{ADV}(x), $$
(5)
where LRU-Random(x) and ADV(x) denote the cost of the request, and ΔΦ(x) is the expected change in the potential function. Note that the potential function is initially zero, given that both caches are initially empty. Further it is never negative. From this and Inequality (5) the k-competitiveness of LRU-Random against an adaptive online adversary follows. To prove Inequality (5), we distinguish four cases upon a request to page x:
  1. 1.

    LRU-Random hits and ADV hits. Then, LRU-Random(x) = ADV(x) = 0. The request does not decrease the ages of pages in SLRURSADV and so the potential does not increase, as \(\frac {s}{H_s}\) is monotonically increasing in s.

     
  2. 2.
    LRU-Random hits and ADV misses. As LRU-Random(x) = 0 and ADV(x) = 1, we have to show that ΔΦ(x) ≤ k. The contribution of each page pSLRURSADV to the potential drops or stays the same, as the ages of these pages may not decrease. The potential only increases if ADV chooses to evict a page in SLRURSADV. The maximal increase is achieved by evicting the youngest such page p. After the request, p’s age is at least 1, as it was not the requested page. Therefore it contributes at most \(H_k\cdot \frac {k-1}{H_{k-1}}\) to the potential, which is
    $$H_k\cdot \frac{k-1}{H_{k-1}} = \left( H_{k-1} + \frac{1}{k}\right)\cdot \frac{k-1}{H_{k-1}} = k-1 + \frac{k-1}{k\cdot H_{k-1}} < k. $$
     
  3. 3.
    LRU-Random misses and ADV hits. Then, we have to show that the potential reduces by at least 1 in expectation. Again, the contribution of no page pSLRURSADV may increase. Further, as ADV may not evict a page, no new page contributes to the potential. We show that the contribution of each page pSLRURSADV drops by at least 1 in expectation. There are three possible cases for a page p with s(p) = s:
    1. (a)

      A younger page is replaced, and p’s contribution to the potential does not change. This happens with probability \({\sum }_{i=s+1}^k \frac {1}{i\cdot H_k} = 1 - \frac {H_s}{H_k}\).

       
    2. (b)

      Page p gets replaced. This happens with probability \(\frac {1}{s\cdot H_k}\) and it reduces the potential by \(\frac {H_k\cdot s}{H_s}\).

       
    3. (c)

      An older page is replaced, and p’s age increases by one. This happens with probability \({\sum }_{i=1}^{s-1} \frac {1}{i\cdot H_k} = \frac {H_{s-1}}{H_k}\) and it reduces the potential by \(H_k\cdot \left (\frac {s}{H_{s}}-\frac {s-1}{H_{s-1}}\right ) = H_k\cdot \frac {H_s-1}{H_sH_{s-1}}.\)

       
    So the expected change in potential due to page p is
    $$\frac{1}{s\cdot H_k}\cdot \frac{-H_k\cdot s}{H_s} + \frac{H_{s-1}}{H_k}\cdot H_k\cdot\frac{1-H_s}{H_sH_{s-1}} = -\frac{1}{H_s} + \frac{1-H_s}{H_s} = -1.</p><p class="noindent">$$
     
  4. 4.
    LRU-Random misses and ADV misses. If, before the request, SLRURSADV, then we can combine the arguments from cases 2 and 3 to show that the potential increases by at most k − 1. This does not cover the case where SLRUR = SADV. In this case, the potential is increased maximally if the adversary chooses to evict the most-recently-used page. If LRU-Random replaces a different page, the potential increases by \(H_k\cdot \frac {k-1}{H_{k-1}}\). However, with probability \(\frac {1}{k\cdot H_k}\), LRU-Random also replaces the most-recently-used page (in which case the potential remains the same). The expected change in potential is thus bounded by
    $$\begin{array}{@{}rcl@{}} \left( 1-\frac{1}{k H_k}\right) H_k \frac{k-1}{H_{k-1}} &=& \frac{kH_k-1}{kH_k} H_k \frac{k-1}{H_{k-1}} = \frac{(kH_k-1)(k-1)}{H_{k-1}k}\\ &=& \frac{kH_{k-1}(k-1)}{H_{k-1}k} = k-1. \end{array} $$
     

The proof of Theorem 16 applies to an adaptive online adversary. An analysis for an oblivious adversary might yield a lower competitive ratio. On the other hand, the result is tight for adaptive adversaries. This can be seen by considering cases 3 and 4 of the previous proof. An optimal adversary forces case 3 by accessing a block in SADVSLRUR, as long as SLRURSADV. Whenever SLRUR = SADV the adversary accesses a page contained in neither SADV nor SLRUR and replaces the most-recently-used page in SLRUR, resulting in case 4. In both cases the expected change in potential is equal to the difference in the expected number of misses.

For k = 2, we also show that LRU-Random is (1, δc)-smooth, where c is less than k + 1, which is the best possible among deterministic, demand-paging or competitive algorithms. Specifically, c is \(1+11/6=2.8\bar {3}\). Although our proof technique does not scale beyond k = 2, we conjecture that this algorithm is in fact smoother than (1, δ(k + 1)) for all k.

Theorem 17 (Smoothness of LRU-Random9)

Letk = 2LRU-Random is\((1,\frac {17}{6}\delta )\)-smooth.

Conjecture 1 (Smoothness of LRU-Random)

LRU-Random is\((1,{\Theta }({H_{k}^{2}})\delta )\)-smooth.

6 Discussion

We have determined fundamental limits on the smoothness of deterministic and randomized paging algorithms. No deterministic competitive algorithm can be smoother than (1, δ(k + 1))-smooth. Under the restriction to bounded-memory algorithms, which is natural for hardware implementations of caches, smoothness implies competitiveness. We conjecture that smoothness generally implies competitiveness without the restriction to bounded-memory algorithms. LRU is strongly competitive, and it matches the lower bound for deterministic competitive algorithms, while FIFO matches the upper bound. There is no trade-off between smoothness and competitiveness for deterministic algorithms.

In contrast, among randomized algorithms, we have identified Smoothed-LRU, an algorithm that is very smooth, but not competitive. In particular, it is smoother than any strongly-competitive randomized algorithm can be. The well-known randomized algorithms Mark, Partition, and Equitable are not smooth. It is an open question, whether there is a randomized “LRU sibling” that is both strongly-competitive and (1, δHk)-smooth. With LRU-Random we introduce a randomized algorithm that is at least as competitive as any deterministic algorithm, yet provably smoother, at least for k = 2. While its exact smoothness remains open, we conjecture that LRU-Random is \((1,{\Theta }({H_{k}^{2}})\delta )\)-smooth. Figure 5 schematically illustrates many of our results.
Fig. 5

Schematic view of the smoothness and competitiveness landscape. Crosses indicate tight results, whereas ellipses indicate upper bounds. Braces denote upper and lower bounds on the smoothness or competitiveness of classes of algorithms. For simplicity of exposition, γ1 and γ2 are left unspecified; γ can be chosen arbitrarily. More precise statements are provided in the respective theorems

Randomization introduces variance in the number of faults even on the same request sequence. In our current framework this variance is hidden, as we analyze the expected number of faults. In the analysis of real-time systems, however, the tail of the probability distribution is of greater interest than its expected value. It would thus be interesting to study how the probability distribution and in particular its tail changes in response to perturbations in the request sequence. Recent results by Komm et al. [16] suggest that smoothness “with high probability” is possible.

Footnotes

  1. 1.

    See the Appendix for the proof.

  2. 2.

    See the Appendix for the proof.

  3. 3.

    See the Appendix for the proof.

  4. 4.

    See the Appendix for the proof.

  5. 5.

    See the Appendix for the proof.

  6. 6.

    Note that the conference version of this article [25] incorrectly claimed Smoothed −LRUk, ito be \((1,\delta (\frac {k+i}{2i+1}+1))\)-smooth.

  7. 7.

    See the Appendix for the proof.

  8. 8.

    See the Appendix for the proof.

  9. 9.

    See the Appendix for the proof.

References

  1. 1.
    Achlioptas, D., Chrobak, M., Noga, J.: Competitive analysis of randomized paging algorithms. Theoretical Comput. Sci. 234 (1-2), 203–218 (2000). https://doi.org/10.1016/S0304-3975(98)00116-9. http://www.sciencedirect.com/science/article/pii/S0304397598001169 MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Aho, A., Denning, P., Ullman, J.: Principles of optimal page replacement. J. ACM 18(1), 80–93 (1971)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Axer, P., et al.: Building timing predictable embedded systems. ACM Trans. Embed. Comput. Syst. 13(4), 82:1–8:372 (2014). https://doi.org/10.1145/2560033 CrossRefGoogle Scholar
  4. 4.
    Becchetti, L., Leonardi, S., Marchetti-Spaccamela, A., Schäfer, G., Vredeveld, T.: Average-case and smoothed competitive analysis of the multilevel feedback algorithm. Math. Oper. Res. 31(1), 85–108 (2006). https://doi.org/10.1287/moor.1050.0170 MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Beckmann, N., Sanchez, D.: Talus: a Simple Way to Remove Cliffs in Cache Performance. In: 21St IEEE International Symposium on High Performance Computer Architecture, HPCA 2015, Burlingame, CA, USA, February 7-11, 2015, pp. 64–75 (2015). https://doi.org/10.1109/HPCA.2015.7056022
  6. 6.
    Belady, L.A.: A study of replacement algorithms for virtual-storage computer. IBM Syst. J. 5(2), 78–101 (1966)CrossRefGoogle Scholar
  7. 7.
    Borodin, A., El-Yaniv, R.: Online computation and competitive analysis. Cambridge University Press, New York (1998)MATHGoogle Scholar
  8. 8.
    Brodal, G.S., Moruz, G., Negoescu, A.: Onlinemin: a fast strongly competitive randomized paging algorithm. Theory Comput. Syst. 56(1), 22–40 (2015). https://doi.org/10.1007/s00224-012-9427-y MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Cazorla, F.J., et al.: PROARTIS: Probabilistically analyzable real-time systems. ACM Trans. Embed. Comput. Syst. 12(2s), 94:1–94:26 (2013). https://doi.org/10.1145/2465787.2465796 CrossRefGoogle Scholar
  10. 10.
    Chaudhuri, S., Gulwani, S., Lublinerman, R.: Continuity and robustness of programs. Commun. ACM 55(8), 107–115 (2012). https://doi.org/10.1145/2240236.2240262 CrossRefMATHGoogle Scholar
  11. 11.
    Doychev, G., et al.: CacheAudit: A tool for the static analysis of cache side channels. ACM Trans. Inf. Syst. Secur 18(1), 4:1–4:32 (2015). https://doi.org/10.1145/2756550 CrossRefGoogle Scholar
  12. 12.
    Doyen, L., Henzinger, T., Legay, A., Nickovic, D.: Robustness of Sequential Circuits. In: ACSD ’10, pp 77–84 (2010). https://doi.org/10.1109/ACSD.2010.26
  13. 13.
    Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP ’06, Part II, LNCS, vol. 4052, pp 1–12. Springer (2006). https://doi.org/10.1007/11787006_1
  14. 14.
    Fiat, A., Karp, R.M., Luby, M., McGeoch, L.A., Sleator, D.D., Young, N.E.: Competitive paging algorithms. J. Algorithms 12(4), 685–699 (1991)CrossRefMATHGoogle Scholar
  15. 15.
    Kleene, S.: Automata studies, chap. Representation of events in nerve nets and finite automata. Princeton University Press, Princeton (1956)Google Scholar
  16. 16.
    Komm, D., Královic, R., Královic, R., Mömke, T.: Randomized online algorithms with high probability guarantees. In: STACS ’14, vol. 25, pp 470–481 (2014)Google Scholar
  17. 17.
    Koutsoupias, E., Papadimitriou, C.: Beyond competitive analysis. SIAM J. Comput. 30(1), 300–317 (2000). https://doi.org/10.1137/S0097539796299540 MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Liu, C.L.: Some memory aspects of finite automata. Tech. Rep. 411 Massachusetts Institute of Technology (1963)Google Scholar
  19. 19.
    Mattson, R.L., Gecsei, J., Slutz, D.R., Traiger, I.L.: Evaluation techniques for storage hierarchies. IBM Syst. J. 9(2), 78–117 (1970)CrossRefMATHGoogle Scholar
  20. 20.
    McGeoch, L., Sleator, D.: A strongly competitive randomized paging algorithm. Algorithmica 6, 816–825 (1991). https://doi.org/10.1007/BF01759073 MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Motwani, R., Raghavan, P.: Randomized algorithms. Cambridge University Press, New York (1995)CrossRefMATHGoogle Scholar
  22. 22.
    Perles, M., Rabin, M., Shamir, E.: The theory of definite automata. IEEE Trans. Electron. Comput. 12(3), 233–243 (1963). https://doi.org/10.1109/PGEC.1963.263534 MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Raghavan, P., Snir, M.: Memory versus randomization in on-line algorithms (Extended Abstract). In: Ausiello, G., Dezani-Ciancaglini, M., Rocca, S.R.D. (eds.) ICALP ’89, Lecture Notes in Computer Science, vol. 372, pp 687–703. Springer (1989). https://doi.org/10.1007/BFb0035792
  24. 24.
    Reineke, J., Grund, D.: Sensitivity of cache replacement policies. ACM Trans. Embed. Comput. Syst. 12(1s), 42:1–42:18 (2013). https://doi.org/10.1145/2435227.2435238 CrossRefGoogle Scholar
  25. 25.
    Reineke, J., Salinger, A.: On the smoothness of paging algorithms. In: Sanità, L., Skutella, M. (eds.) Approximation and Online Algorithms - 13th International Workshop, WAOA 2015, Patras, Greece, September 17-18, 2015. Revised Selected Papers, Lecture Notes in Computer Science, vol. 9499, pp 170–182. Springer (2015). https://doi.org/10.1007/978-3-319-28684-6_15
  26. 26.
    Sleator, D.D., Tarjan, R.E.: Amortized efficiency of list update and paging rules. Commun. ACM 28(2), 202–208 (1985)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Spielman, D.A., Teng, S.H.: Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time. J. ACM 51(3), 385–463 (2004). https://doi.org/10.1145/990308.990310 MathSciNetCrossRefMATHGoogle Scholar
  28. 28.
    Wilhelm, R., et al.: The worst-case execution-time problem—overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst. 7(3), 36:1–36:53 (2008). https://doi.org/10.1145/1347375.1347389 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceSaarland UniversitySaarbrückenGermany
  2. 2.SAP SEWalldorfGermany

Personalised recommendations