## Abstract

We study the smoothness of paging algorithms. How much can the number of page faults increase due to a perturbation of the request sequence? We call a paging algorithm *smooth* if the maximal increase in page faults is proportional to the number of changes in the request sequence. We also introduce quantitative smoothness notions that measure the smoothness of an algorithm. We derive lower and upper bounds on the smoothness of deterministic and randomized demand-paging and competitive algorithms. Among strongly-competitive deterministic algorithms, LRU matches the lower bound, while FIFO matches the upper bound. Well-known randomized algorithms such as Partition, Equitable, or Mark are shown not to be smooth. We introduce two new randomized algorithms, called Smoothed-LRU and LRU-Random. Smoothed-LRU allows sacrificing competitiveness for smoothness, where the trade-off is controlled by a parameter. LRU-Random is at least as competitive as any deterministic algorithm but smoother.

### Keywords

Paging Caching Smoothness Online algorithms Real-time systems## 1 Introduction

Due to their strong influence on system performance, paging algorithms have been studied extensively since the 1960s. Early studies were based on probabilistic request models [2, 6, 19]. In their seminal work, Sleator and Tarjan [26] introduced the notion of competitiveness, which relates the performance of an online algorithm to that of the optimal offline algorithm. By now, the competitiveness of well-known deterministic and randomized paging algorithms is well understood, and various strongly-competitive online algorithms [1, 20, 26] have been identified.

In this paper, we study the *smoothness* of paging algorithms. We seek to answer the following question: How strongly may the performance of a paging algorithm change when the sequence of memory requests is slightly perturbed? This question is relevant in various domains: Can the cache performance of an algorithm suffer significantly due to the occasional execution of interrupt handling code? Can the execution time of a safety-critical real-time application be safely and tightly bounded in the presence of interference on the cache? Can secret-dependent memory requests have a significant influence on the number of cache misses of a cryptographic protocol and thus give rise to a timing side-channel attack?

We formalize the notion of smoothness by identifying the performance of a paging algorithm with the number of page faults and the magnitude of a perturbation with the edit distance between two request sequences.

We show that for any deterministic, demand-paging or competitive algorithm, a single additional memory request may cause *k* + 1 additional faults, where *k* is the size of the cache. Least-recently-used (LRU) matches this lower bound, indicating that there is no trade-off between competitiveness and smoothness for deterministic algorithms. In contrast, First-in first-out (FIFO) is shown to be least smooth among all strongly-competitive deterministic algorithms. Interestingly, our model shows a significant difference between these two algorithms, whose theoretical performance has proven difficult to separate.

Randomized algorithms have been shown to be more competitive than deterministic ones. We derive lower bounds for the smoothness of randomized, demand-paging and randomized strongly-competitive algorithms that indicate that randomization might also help with smoothness. However, we show that none of the well-known randomized algorithms Mark, Equitable, and Partition is smooth. The simple randomized algorithm that evicts one of the cached pages uniformly at random is shown to be as smooth as LRU, but not more.

We then introduce a new parameterized randomized algorithm, Smoothed-LRU, that allows sacrificing competitiveness for smoothness. For some parameter values Smoothed-LRU is smoother than any randomized strongly-competitive algorithm can possibly be, indicating a trade-off between smoothness and competitiveness for randomized algorithms. This leaves the question of whether there is a randomized algorithm that is smoother than any deterministic algorithm without sacrificing competitiveness. We answer this question in the affirmative by introducing LRU-Random, a randomized version of LRU that evicts older pages with a higher probability than younger ones. We show that LRU-Random is smoother than any deterministic algorithm for *k* = 2. While we conjecture that this is the case as well for general *k*, this remains an open problem.

In testing: if a system is smooth, then a successful test run is indicative of the system’s correct behavior not only on the particular test input, but also in its neighborhood.

In verification, systems are shown to behave correctly under some assumption on their environment. Due to incomplete environment specifications, operator errors, faulty implementations, or other causes, the environment assumption may not always hold completely. In such a case, if the system is smooth, “small” violations of the environment assumptions will, in the worst case, result in “small” deviations from correct behavior.

*A*is (

*α*,

*β*,

*δ*)-smooth, if the number of page faults

*A*(

*σ*

^{′}) of

*A*on request sequence

*σ*

^{′}is bounded by

*α*⋅

*A*(

*σ*) +

*β*whenever

*σ*can be transformed into

*σ*

^{′}by at most

*δ*insertions, deletions, or substitutions of individual requests. Often, our results apply to a generic value of

*δ*. In such cases, we express the smoothness of a paging algorithm by a pair (

*α*,

*β*), where

*α*and

*β*are functions of

*δ*, and

*A*is (

*α*(

*δ*),

*β*(

*δ*),

*δ*)-smooth for every

*δ*. Usually, the smoothness of an algorithm depends on the size of the cache, which we denote by

*k*. As an example, under LRU the number of faults may increase by at most

*δ*(

*k*+ 1), where

*δ*is the number of changes in the sequence. A precise definition of these notions is given in Section 3.

Upper and lower bounds on the smoothness of paging algorithms

Algorithm | Lower bound | Upper bound | |
---|---|---|---|

Deterministic | Demand-paging | (1, | \(\infty \) |

| (1, | ( | |

Strongly-competitive | (1, | ( | |

Optimal offline | (1, 2 | (1, 2 | |

LRU | (1, | (1, | |

FWF | (1, 2 | (1, 2 | |

FIFO | ( | ( | |

Randomized | Demand-paging | \((1,H_{k}+\frac {1}{k},1)\) | \(\infty \) |

Strongly-competitive w/add. const. | (1, | ( | |

Equitable, Partition, OnlineMin | (1 + | ( | |

Mark | (Ω( | (2 | |

Random | (1, | (1, | |

Evict-On-Access | \((1,\delta (1+\frac {k}{2k-1}))\) | \((1,\delta (1+\frac {k}{2k-1}))\) | |

Smoothed − | \((1, \delta \min \{\frac {2k-1}{2i+1}+1, \frac {k+i-1}{2i+1}+2\})\) | \((1, \delta \min \{\frac {2k-1}{2i+1}+1, \frac {k+i-1}{2i+1}+2\})\) |

The rest of this paper is organized as follows. In Section 2 we briefly describe other notions of smoothness and review the definition of the paging problem and its most relevant results. In Section 3 we formally define the smoothness of paging algorithms. We study the smoothness of deterministic paging algorithms in Section 4, while in Section 5 we analyze the smoothness of randomized algorithms and the trade-off between competitiveness and smoothness. For readability, we place some of the proofs of our results in the Appendix.

## 2 Related Work

### 2.1 Notions of Smoothness

Robust control is a branch of control theory that explicitly deals with uncertainty in its approach to controller design. Informally, a controller designed for a particular set of parameters is said to be robust if it would also work well under a slightly different set of assumptions. In computer science, the focus has long been on the binary property of correctness, as well as on average- and worst-case performance. Lately, however, various notions of smoothness have received increasing attention: Chaudhuri et al. [10] develop analysis techniques to determine whether a given program computes a *Lipschitz-continuous* function. Lipschitz continuity is a special case of our notion of smoothness. Continuity is also strongly related to differential privacy [13], where the result of a query may not depend strongly on the information about any particular individual. Differential privacy proofs with respect to cache side channels [11] may be achievable in a compositional manner for caches employing smooth paging algorithms.

Doyen et al. [12] consider the robustness of sequential circuits. They determine how long into the future a single disturbance in the inputs of a sequential circuit may affect the circuit’s outputs. Much earlier, but in a similar vein, Kleene [15], Perles, Rabin, Shamir [22], and Liu [18] developed the theory of definite events and definite automata. The outputs of a definite automaton are determined by a fixed-length suffix of its inputs. Definiteness is a sufficient condition for smoothness.

Beckmann and Sanchez [5] are concerned with cliffs in cache performance, where minor changes in cache size cause large changes in miss rate. They propose Talus, a scheme to remove such cliffs. The basic idea is to divide the access stream into two streams, which are served by two separate shadow partitions. By carefully controlling the division of the access stream, and the sizes of the two shadow partitions, Talus is able to achieve convex cache performance, i.e., the cache miss rate in terms of the cache size is a convex function. Our work is concerned with changes in the access stream rather than changes in the cache size.

The work of Reineke and Grund [24] is closest to ours: they study the maximal difference in the number of page faults on the *same* request sequence starting from two different initial states for various deterministic paging algorithms. In contrast, here, we study the effect of differences in the request sequences on the number of faults. Also, in addition to studying particular deterministic algorithms as in [24], in this paper we determine smoothness properties that apply to classes of algorithms, such as all demand-paging or strongly-competitive ones, as well as to randomized algorithms. One motivation to consider randomized algorithms in this work are recent efforts to employ randomized caches in the context of hard real-time systems [9].

Spielman and Teng [27] introduce the notion of smoothed analysis. In smoothed analysis, one measures the performance of an algorithm on arbitrary inputs averaged over slight random perturbations of the input. Smoothed analysis thus provides a compromise between purely worst-case analysis on the one hand and average-case analysis on the other hand, which sometimes better captures the real-world performance of algorithms. Some algorithms, such as the simplex algorithm, can be shown to have polynomial smoothed complexity while their worst-case complexity is exponential. Becchetti et al. [4] apply the idea of smoothed analysis to online algorithms by defining the notion of smoothed competitive analysis. Then they analyze the multi-level feedback algorithm in this setting. This work is different from ours in that smoothness is concerned with the worst-case effect of a small perturbation on an arbitrary input, while smoothed analysis is concerned with the worst-case performance of an algorithm on arbitrary inputs averaged over slight random perturbations. The two notions might be connected in the following way: the smoothed complexity of an algorithm can only be better than its worst-case complexity if the algorithm is *not* smooth. It is future work to capture the above claim more formally and to prove it. We also leave the definition of a smoothed notion of smoothness for future work.

### 2.2 The Paging Problem

Paging models a two-level memory system with a small fast memory known as cache and a large but slow memory, usually referred to simply as memory. During a program’s execution, data is transferred between the cache and memory in units of data known as pages. The size of the cache in pages is usually referred to as *k*. The size of the memory can be assumed to be infinite. The input to the paging problem is a sequence of page requests which must be made available in the cache as they arrive. When a request for a page arrives and this page is already in the cache, then no action is required. This is known as a *hit*. Otherwise, the page must be brought from memory to the cache, possibly requiring the eviction of another page from the cache. This is known as a *page fault* or *miss*. We use the terms fault, page fault, and miss interchangeably throughout the paper. A paging algorithm must decide which pages to keep in the cache in order to minimize the number of faults.

A paging algorithm is said to be *demand paging* if it only evicts a page from the cache upon a fault with a full cache. Any non-demand-paging algorithm can be made to be demand paging without sacrificing performance [7].

In general, paging algorithms must make decisions as requests arrive, with no knowledge of future requests. That is, paging is an online problem. The most prevalent way to analyze online algorithms is competitive analysis [26]. In this framework, the performance of an online algorithm is defined relative to an optimal algorithm with full knowledge of the input sequence, known as optimal offline or OPT. We denote by *A*(*σ*) the number of misses of an algorithm *A* when processing the request sequence *σ*. A paging algorithm *A* is said to be *c*-competitive if for all sequences *σ*, *A*(*σ*) ≤ *c* ⋅OPT(*σ*) + *β*, where *β* is a constant independent of *σ*. The *competitive ratio* of an algorithm is the infimum over all possible values of *c* satisfying the inequality above. An algorithm is called competitive if it has a constant competitive ratio and *strongly competitive* if its competitive ratio is the best possible [20].

Traditional paging algorithm are Least-recently-used (LRU)—evict the page in the cache that has been requested least recently— and First-in first-out (FIFO)—evict the page in the cache that was brought into the cache the earliest. Another simple algorithm often considered is Flush-when-full (FWF)—empty the cache if the cache is full and a fault occurs. These algorithms are *k*-competitive, which is the best ratio that can be achieved for deterministic online algorithms [26]. An optimal offline algorithm for paging is Furthest-in-the-future, also known as Longest-forward-distance and Belady’s algorithm [6]. This algorithm evicts the page in the cache that will be requested at the latest time in the future.

A competitive ratio less than *k* can be achieved by the use of randomization. Important randomized paging algorithms are Random—evict a page chosen uniformly at random— and Mark [14]—Mark a page when it is unmarked and requested, and upon a fault evict a page chosen uniformly at random among unmarked pages (unMarking all pages first if no unmarked pages remain). Random achieves a competitive ratio of *k*, while Mark’s competitive ratio is 2*H*_{k} − 1 [1], where \(H_{k} = {\sum }_{i=1}^{k} \frac {1}{i}\) is the *k*^{th} harmonic number. The strongly-competitive algorithms Partition [20], Equitable [1] and OnlineMin [8] achieve the optimal ratio of *H*_{k}.

## 3 Smoothness of Paging Algorithms

We now formalize the notion of smoothness of paging algorithms. We are interested in answering the following question: How does the number of misses of a paging algorithm vary as its inputs vary? We quantify the similarity of two request sequences by their edit distance:

**Definition 1** (Distance)

Let *σ* = *x*_{1},…, *x*_{n} and \(\sigma ^{\prime }=x^{\prime }_{1},\ldots ,x^{\prime }_{m}\) be two request sequences. Then we denote by Δ(*σ*, *σ*^{′}) their edit distance, defined as the minimum number of substitutions, insertions, or deletions required to transform *σ* into *σ*^{′}.

This is also referred to as the *Levenshtein distance*. Based on this notion of distance we define (*α*, *β*, *δ*)-smoothness:

**Definition 2** ((*α*, *β*, *δ*)-smoothness)

*A*, we say that

*A*is (

*α*,

*β*,

*δ*)-smooth, if for all pairs of sequences

*σ*,

*σ*

^{′}with Δ(

*σ*,

*σ*

^{′}) ≤

*δ*,

For randomized algorithms, *A*(*σ*) denotes the algorithm’s *expected* number of faults when serving *σ*.

An algorithm that is (*α*, *β*, *δ*)-smooth may also be (*α*^{′}, *β*^{′}, *δ*)-smooth for *α*^{′} > *α* and *β*^{′} < *β*. As the multiplicative factor *α* dominates the additive constant *β* in the long run, when analyzing the smoothness of an algorithm, we first look for the minimal *α* such that the algorithm is (*α*, *β*, *δ*)-smooth for any *β*.

We say that an algorithm is *smooth* if it is (1, *β*,1)-smooth for some *β*. In this case, the maximal increase in the number of page faults is proportional to the number of changes in the request sequence. This is called *Lipschitz continuity* in mathematical analysis. For smooth algorithms, we also analyze the additive part *β* in detail, otherwise we concentrate the analysis on the multiplicative factor *α*.

We use the above notation when referring to a specific distance *δ*. For a generic value of *δ* we omit this parameter and express the smoothness of a paging algorithm with a pair (*α*, *β*), where both *α* and *β* are functions of *δ*.

**Definition 3** ((*α*, *β*)-smoothness)

*A*, we say that

*A*is (

*α*,

*β*)-smooth, if for all pairs of sequences

*σ*,

*σ*

^{′},

*α*and

*β*are functions, and

*δ*= Δ(

*σ*,

*σ*

^{′}).

Often, it is enough to determine the effects of one change in the inputs to characterize the smoothness of an algorithm *A*.

**Lemma 1**

*If A** is* (*α*, *β*, *γ*)*-smooth, **then A** is*\((\alpha ^{\delta },\beta {\sum }_{i=0}^{\delta -1}\alpha ^{i},\delta \gamma )\)*-smooth**for any**δ**.*

*Proof*

We proceed by induction on *δ*. The case *δ* = 1 is trivial. Assume the hypothesis is true for 1 ≤ *δ* ≤ *h*. Let *σ*_{h+1} and *σ* be any pair of sequences such that *h**γ* < Δ(*σ*, *σ*_{h+1}) ≤ (*h* + 1)*γ*. Then there exists a sequence *σ*_{h} such that Δ(*σ*, *σ*_{h}) = *h**γ* and Δ(*σ*_{h}, *σ*_{h+1}) ≤ *γ*. Since *A* is (*α*, *β*, *γ*)-smooth, then *A*(*σ*_{h+1}) ≤ *α**A*(*σ*_{h}) + *β*. By the inductive hypothesis, \(A(\sigma _h)\le \alpha ^h A(\sigma )+\beta {\sum }_{i=0}^{h-1}\alpha ^i\). Therefore, \(A(\sigma _{h+1})\le \alpha (\alpha ^h A(\sigma )+\beta {\sum }_{i=0}^{h-1}\alpha ^i)+\beta =\alpha ^{h+1} A(\sigma )+\beta {\sum }_{i=0}^{h}\alpha ^i\), and thus A is \((\alpha ^{\delta },\beta {\sum }_{i=0}^{\delta -1}\alpha ^i,\delta \gamma )\)-smooth. □

**Corollary 1**

*If A** is* (1, *β*,1)*-smooth, **then A** is* (1, *δ**β*)*-smooth.*

**Corollary 2**

*If A** is* (1, *β*, *γ*)*-smooth, **then A** is* (1, *δ**β*, *δ**γ*)*-smooth**for any**δ**.*

## 4 Smoothness of Deterministic Paging Algorithms

### 4.1 Bounds on the Smoothness of Deterministic Paging Algorithms

Before considering particular deterministic online algorithms, we determine upper and lower bounds for several important classes of algorithms. Many natural algorithms are demand paging.

**Theorem 1** (Lower bound for deterministic, demand-paging algorithms)

*No deterministic, demand-paging algorithm is* (1, *δ*(*k* + 1 − *𝜖*), *δ*)*-smooth**for any**𝜖* > 0* and any**δ* > 0*.*

*Proof*

*A*be any deterministic, demand-paging algorithm, and let

*δ*> 0 be arbitrary. Using

*k*+ 1 distinct pages, we can construct a sequence

*σ*

_{A}(

*δ*) of length

*k*+

*δ*(

*k*+ 1) such that

*A*faults on every request: first request the

*k*+ 1 distinct pages in any order; then arbitrarily extend the sequence by requesting the page that

*A*has just evicted. Let

*p*be the page that occurs least frequently in

*σ*

_{A}(

*δ*). By removing all requests to p from

*σ*

_{A}(

*δ*), we obtain a sequence \(\sigma ^{\prime }_A(\delta )\) that consists of

*k*distinct pages only. By assumption

*A*is demand paging. Thus,

*A*incurs only k page faults on the entire sequence. Assume for a contradiction that

*A*is (1,

*δ*(

*k*+ 1 −

*𝜖*),

*δ*)-smooth for some

*𝜖*> 0. Then, we have by definition:

*p*occurs at most \(\left \lfloor \frac {k+\delta (k+1)}{k+1} \right \rfloor = \delta \) times in

*σ*

_{A}(

*δ*). So \({\Delta }(\sigma ^{\prime }_A(\delta ), \sigma _A(\delta )) \leq \delta \), and we get:

*𝜖*> 0. □

While most algorithms are demand paging, it is not a necessary condition for an algorithm to be competitive, as demonstrated by FWF. However, we obtain the same lower bound for competitive algorithms as for demand-paging ones.

**Theorem 2** (Lower bound for deterministic, competitive paging algorithms)

*No deterministic, competitive paging algorithm is* (1, *δ*(*k* + 1 − *𝜖*), *δ*)*-smooth**for any**𝜖* > 0* and any**δ* > 0*.*

*Proof*

Let *A* be any *c*-competitive deterministic online paging algorithm. We use the same sequences *σ*_{A}(*δ*) and \(\sigma ^{\prime }_A(\delta )\) as in the proof of Theorem 1, where the choice of *δ* is determined below. Again, *A* faults *k* + *δ*(*k* + 1) times on *σ*_{A}(*δ*). As the optimal offline algorithm *OPT* faults exactly *k* times on \(\sigma ^{\prime }_A(\delta )\), we can conclude that *A* faults at most *c**k* + *β* on \(\sigma ^{\prime }_A(\delta )\), for some constant *β*, due to the competitiveness of *A*.

*A*is (1,

*δ*(

*k*+ 1 −

*𝜖*))-smooth for some

*𝜖*, we get:

*A*is not (1,

*δ*(

*k*+ 1 −

*𝜖*),

*δ*)-smooth for any \(\delta > \frac {(c-1)k+\beta }\epsilon \).

To complete the proof, assume for a contradiction that *A* is (1, *δ*(*k* + 1 − *𝜖*), *δ*)-smooth for some positive \(\delta \leq \frac {(c-1)k+\beta }\epsilon \). By Corollary 2 this would imply that *A* is also (1, *λ**δ*(*k* + 1 − *𝜖*), *λ**δ*)-smooth for any *λ*. However, for \(\lambda > \frac {(c-1)k+\beta }{\delta \cdot \epsilon }\) this contradicts the result of the first part of the proof. □

Intuitively, the optimal offline algorithm should be very smooth, and this is indeed the case:

**Theorem 3** (Smoothness of OPT^{1})

*OPT* is (1,2*δ*)*-smooth.This is tight.*

The idea of the proof is as follows. Given two sequences *σ* and *σ*^{′} with Δ(*σ*, *σ*^{′}) = 1, we show that there exists an offline algorithm *A* serving *σ*^{′} such that *A*(*σ*^{′}) ≤OPT(*σ*) + 2. *A* acts like OPT until the single the difference between *σ* and *σ*^{′}. Right after that, the contents of the caches of OPT serving *σ* and *A* serving *σ*^{′} differ by at most one page. On the equal suffix of both sequences *A* can behave so that this difference in caches can translate into at most one extra fault compared to OPT. In fact, while the page that *A* is missing from OPT’s cache is not requested, *A* incurs no more faults than OPT does and its cache is missing at most one page from OPT’s cache. Whenever the missing page is requested, *A* faults and evicts the page that makes both caches equal. From then on both algorithms behave exactly the same. Then, including the initial fault, *A*(*σ*^{′}) ≤OPT(*σ*) + 2. The theorem follows from the optimality of OPT and Corollary 1.

With Theorem 3 it is easy to show the following upper bound on the smoothness of any competitive algorithm:

**Theorem 4** (Smoothness of competitive algorithms)

*Let A**be any paging algorithm such that for all sequences**σ**, **A*(*σ*) ≤ *c* ⋅ *OPT*(*σ*) + *β**.**Then A** is* (*c*,2*δ**c* + *β*)*-smooth.*

*Proof*

Let *σ*^{′} be a sequence such that Δ(*σ*, *σ*^{′}) = *δ*. By Theorem 3, OPT(*σ*^{′}) ≤OPT(*σ*) + 2*δ*. Therefore, *A*(*σ*^{′}) ≤ *c* ⋅ (OPT(*σ*) + 2*δ*) + *β* ≤ *c* ⋅ *A*(*σ*) + 2*δ**c* + *β*. □

Note that the above theorem applies to both deterministic and randomized algorithms. Given that every competitive algorithm is (*α*, *β*)-smooth for some *α* and *β*, the natural question to ask is whether the converse also holds. Below, we answer this question in the affirmative for deterministic, bounded-memory, demand-paging algorithms. By *bounded memory* we mean algorithms that, in addition to the contents of their fast memory, only have a finite amount of additional state. For a more formal definition consult [7, page 93]. Paging algorithms implemented in hardware caches are bounded memory. Our proof requires the notion of a *k*-phase partition:

**Definition 4** (*k*-phase partition)

The *k*-phase partition of a sequence *σ* is a partition of *σ* into contiguous subsequences called *k*-phases, or simply phases. The first phase starts with the first request of *σ*, and a new phase starts with the request that constitutes the (*k* + 1)^{st} distinct page requested since the beginning of the previous phase. Thus except for possibly the last phase of the *k*-phase partition, each phase consists of exactly *k* distinct pages.

**Theorem 5** (Competitiveness of smooth algorithms)

*If algorithm A** is deterministic, bounded memory, demand paging, and* (*α*, *β*)*-smooth**for some**α** and**β**, **then A** is also competitive.*

*Proof*

Assume algorithm *A* is not competitive. Then, there is no bound on the number of misses in a single *k*-phase for *A*: otherwise, if *r* is a bound on the number of misses of *A* in every phase, then *A* is competitive with competitive ratio *c* ≤ *r*, since OPT must fault at least once in every *k*-phase.

Let *n* be the number of states of *A*. Within a *k*-phase, a demand-paging algorithm can reach at most (2*k*)^{k} different configurations: each of the *k* slots can either contain one of the *k* “old” pages cached at the start of the *k*-phase, or one of the up to k “new” pages requested within the phase. Let *σ* be a sequence of minimal length that ends on a phase in which *A* misses more than *n* ⋅ (2*k*)^{k} times. By the pigeon-hole principle, *A* must assume the same state and configuration pair twice within that phase. Due to the minimality of the sequence, *A* must fault at least once between those two occurrences. By repeating the sequence of requests between the two occurrences, we can thus pump up the sequence and the number of faults arbitrarily without increasing the number of phases. By removing the finite prefix of the pumped sequence that comprises all but the final phase, we can construct a sequence *σ*^{′} containing at most *k* distinct pages. Any demand-paging algorithm, in particular *A*, will fault at most *k* times on this sequence. The edit distance between *σ*^{′} and *σ* is constant, but the difference in faults is unbounded. This shows that *A* is not (*α*, *β*)-smooth for any *α* and *β*, thus proving the theorem. □

Also, note that the smoothness condition is necessary, as an algorithm can be bounded memory and demand paging but not smooth, and thus not competitive (by the contrapositive of Theorem 4). An example of such an algorithm is Last-in first-out (LIFO), which evicts the page in the cache that was brought into the cache the latest. LIFO is clearly bounded memory and demand paging, but it is not smooth: consider the sequence *σ* = *x*_{1}, *x*_{2},…, *x*_{k}, *x*_{1}, *x*_{k}, *x*_{1}, *x*_{k},..., where *x*_{i}≠*x*_{j} for all *i*≠*j*. Let *σ*^{′} be the sequence obtained by substituting the first request to *x*_{1} by a request to a page \(x_1^{\prime }\) with *x*1′≠*x*_{i} for all *i*. Thus, Δ(*σ*, *σ*^{′}) = 1. LIFO faults *k* times on *σ* but an unbounded number of times on *σ*^{′} and hence it is not smooth. We conjecture that the above theorem also holds without the bounded-memory assumption.

### 4.2 Smoothness of Particular Deterministic Algorithms

Now let us turn to the analysis of three well-known deterministic algorithms: LRU, FWF, and FIFO. We show that both LRU and FWF are smooth. On the other hand, FIFO is not smooth, as a single change in the request sequence may increase the number of misses by a factor of *k*.

**Theorem 6** (Smoothness of Least-recently-used)

*LRU is* (1, *δ*(*k* + 1))*-smooth.**This is tight.*

*Proof*

We show that *LRU* is (1, *k* + 1,1)-smooth. Corollary 1 then immediately implies that *LRU* is (1, *δ*(*k* + 1))-smooth. Tightness follows from Theorem 1 as *LRU* is demand paging. To analyze *LRU*, it is convenient to introduce the notion of *age*. The age of page *p* is the number of distinct pages that have been requested since the previous request to *p*. Before their first request, all pages have age *∞*. A request to page *p* results in a fault if and only if *p*’s age is greater than or equal to *k*, the size of the cache. Finite ages are unique, i.e., no two pages have the same age less than *∞*. At any time at most *k* pages are cached, and at most *k* pages have an age less than *k*.

Let us now consider how the insertion of one request may affect ages and the expected number of faults. By definition, the age of any page is only affected from the point of insertion up to its next request. Only the next request to a page may thus turn from a hit into a miss. So at most *k* requests may turn from hits into misses. As the inserted request itself may also introduce a fault, the overall number of faults may thus increase by at most *k* + 1.

Substitutions are similar to insertions: they turn at most *k* succeeding hits into misses, and the substituted request itself may introduce one additional fault. The deletion of a request to page *p* does not increase the ages of other pages. Only the next request to *p* may turn from a hit into a miss. □

So LRU matches the lower bound for both demand-paging and competitive paging algorithms. We now show that FWF is also smooth, with a factor that is almost twice that of LRU. The smoothness of FWF follows from the fact that it always misses *k* times per phase, and the number of phases can only change marginally when perturbing a sequence, as we show in Lemma 2.

For a sequence *σ*, let Φ(*σ*) denote the number of phases in its *k*-phase partition.

**Proposition 1**

*For a sequence**σ**, **let* Φ(*σ*) *denote the number of phases in its**k-phase partition. Let**σ**be a sequence, let**ρ**be a suffix of**σ**, ** and let**ℓ** and**ℓ*^{′}*denote the number of distinct pages in the last phase of**σ** and**ρ**, **respectively. Then* Φ(*ρ*) ≤ Φ(*σ*)*.**Furthermore, if* Φ(*ρ*) = Φ(*σ*)*then**ℓ*^{′}≤ *ℓ**.*

*Proof*

Let *i*_{j} and \(i_j^{\prime }\) denote the indices in *σ* of the first request of the *j*^{th} phase in *σ* and *ρ*, respectively, with *i*_{j} = |*σ*| + 1 for *j* > Φ(*σ*) and \(i_j^{\prime }=|\rho ^{\prime }|+1\) for *j* > Φ(*ρ*^{′}). Then, for all j it holds that \(i_j\le i_j^{\prime }\). We prove this by induction on j. The case *j* = 1 is trivially true as *i*_{1} = 1 and *i*1′≥ 1 since *ρ* is a suffix of *σ*. Suppose that the hypothesis holds for 1 ≤ *j* ≤ *n*. It is easy to see that it holds for *j* = *n* + 1: Since there are at most k distinct pages between *i*_{j} and *i*_{j+1} − 1, inclusive, and by the inductive hypothesis \(i_j^{\prime }\ge i_j\), the request that ends the *j*^{th} phase in *ρ* cannot be earlier than *i*_{j+1}, and hence \(i_{j+1}^{\prime }\ge i_{j+1}\). Since this is true for all phases including the last one, then Φ(*ρ*) ≤ Φ(*σ*). Now, assume that Φ(*ρ*) = Φ(*σ*) = *m*. Then \(i_{m}^{\prime } < |\rho ^{\prime }|+1\) and by the proof above \(i_{m}\le i_{m}^{\prime }\), which implies that *ℓ*^{′}≤ *ℓ*. □

**Lemma 2**

*Let**σ** and**σ*^{′}*be two sequences such that* Δ(*σ*, *σ*^{′}) = 1*.**Then* Φ(*σ*^{′}) ≤ Φ(*σ*) + 2*.**Furthermore, let**ℓ** and**ℓ*^{′}*be the number of distinct pages in the last phase of**σ** and**σ*^{′}*, **respectively. If* Φ(*σ*^{′}) = Φ(*σ*) + 2*, **then**ℓ*^{′}≤ *ℓ**.*

*Proof*

Let *i*_{j} and \(i_j^{\prime }\) be the indices of the requests that Mark the first page of the *j*^{th} phase in *σ* and *σ*^{′}, respectively, with *i*_{j} = |*σ*| + 1 for *j* > Φ(*σ*) and \(i_j^{\prime }=|\sigma ^{\prime }|+1\) for *j* > Φ(*σ*^{′}). Let Φ(*σ*, *j*) denote the number of phases of *σ* starting from the *j*^{th} phase (with Φ(*σ*, *j*) = 0 if *j* > Φ(*σ*)). Let *h* − 1 be the phase in *σ*^{′} where the difference between the sequences occurs. For simplicity, assume that if the difference is an insertion (deletion) on \(\sigma ^{\prime }_i\), then *i* refers to an empty page in *σ* (*σ*^{′}), i.e., unaffected requests have equal indices in both sequences. If the difference is a deletion, then \(i_{h}\le i^{\prime }_{h}\) and by Proposition 1, Φ(*σ*^{′}, *h*) ≤ Φ(*σ*, *h*), which implies the lemma. If it is a substitution, suppose that *q* in *σ* is changed to p in *σ*^{′}. Then consider *σ*^{″} resulting from the deletion of *q* from *σ*. By the argument above, Φ(*σ*^{″}) ≤ Φ(*σ*). Hence, showing that Φ(*σ*^{′}) ≤ Φ(*σ*^{″}) + 2 implies as well that Φ(*σ*^{′}) ≤ Φ(*σ*) + 2. Since *σ*^{′} is the result of inserting p into *σ*^{″}, it suffices to consider the insertion case (we argue later that if Φ(*σ*^{′}) = Φ(*σ*^{″}) + 2, then *ℓ*^{″} < *ℓ*^{′} also holds).

*p*be the page that is added to

*σ*to make

*σ*

^{′}. We analyze Φ(

*σ*) in terms of Φ(

*σ*

^{′}). We have the following cases:

[

*p*is not the first page of phase*h*− 1]. If*p*occurs again in the phase, then Φ(*σ*) = Φ(*σ*^{′}). This is also the case if*h*− 1 is the last phase of*σ*^{′}. Otherwise, \(i_{h}^{\prime } < i_{h} \le i_{h+1}^{\prime }\) (*i**h*′ cannot be larger than*i*_{h+1}as in this case phase*h*− 1 in*σ*^{′}would include the*k*+ 1 distinct pages in*σ*[*i*_{h}..*i*_{h+1}]). Then by Proposition 1, Φ(*σ*^{′},*h*+ 1) ≤ Φ(*σ*,*h*), and therefore Φ(*σ*^{′}) ≤ Φ(*σ*) + 1.- [
*p*is the first page of phase*h*− 1]. Then \(i_{h-1}>i_{h-1}^{\prime }\). We have two cases:If \(i_{h-1} \le i_h^{\prime }\), then we have the same case as above but with

*i*_{h−1}and \(i_h^{\prime }\). Thus, Φ(*σ*^{′}) ≤ Φ(*σ*) + 1.If \(i_h^{\prime } < i_{h-1} \le i_{h+1}^{\prime }\) (again,

*i*_{h−1}cannot be greater than \(i^{\prime }_{h+1}\) as in this case the (*h*− 2)^{nd}phase of*σ*would include all*k*+ 1 distinct pages in \(\sigma ^{\prime }[i^{\prime }_h..i^{\prime }_{h+1}]\)). Then by Proposition 1, Φ(*σ*^{′},*h*+ 1) ≤ Φ(*σ*,*h*− 1). If Φ(*σ*^{′},*h*+ 1) = Φ(*σ*,*h*− 1), then*ℓ*^{′}≤*ℓ*and Φ(*σ*^{′}) = Φ(*σ*) + 2. Otherwise, Φ(*σ*^{′},*h*+ 1) < Φ(*σ*,*h*− 1) and Φ(*σ*^{′}) ≤ Φ(*σ*) + 1.

In all cases above either Φ(*σ*^{′}) ≤ Φ(*σ*) + 1 or Φ(*σ*^{′}) = Φ(*σ*) + 2 with *ℓ*^{′} ≤ *ℓ*. If the difference is a substitution, let *q* be the page in *σ* that is replaced by p in *σ*^{′}. Note that the case Φ(*σ*^{′}) = Φ(*σ*) + 2 can only happen if *q* is requested earlier in the same phase in *σ*. Then, removing the request to *q* would not change the *k*-phase partition of *σ*, and hence the same analysis above for an insertion applies and thus *ℓ*^{′}≤ *ℓ* as well. □

**Theorem 7** (Smoothness of Flush-when-full)

*FWF** is* (1,2*δ**k*)*-smooth.**This is tight.*

*Proof*

Let *σ* and *σ*^{′} be two sequences such that Δ(*σ*, *σ*^{′}) = 1. Let Φ(*σ*) (resp. Φ(*σ*^{′})) be the number of phases in the *k*-phase partition of *σ* (resp. *σ*^{′}), and let *ℓ* (resp. *ℓ*^{′}) be the number distinct pages in the last phase of the partition of *σ* (resp. *σ*^{′}). FWF misses exactly *k* times in any phase of a sequence, except possibly for the last one, in which it misses a number of times equal to the number of distinct pages in the phase. Then, *F**W**F*(*σ*) = *k* ⋅ (Φ(*σ*) − 1) + *ℓ*, and *F**W**F*(*σ*^{′}) = *k* ⋅ (Φ(*σ*^{′}) − 1) + *ℓ*^{′}. By Lemma 2 if Φ(*σ*^{′}) = Φ(*σ*) + 2, *ℓ*^{′}≤ *ℓ*, and thus *F**W**F*(*σ*^{′}) ≤ *k*(Φ(*σ*) + 2 − 1) + *ℓ* = *F**W**F*(*σ*) + 2*k*. Otherwise, if Φ(*σ*^{′}) ≤ Φ(*σ*) + 1, then *F**W**F*(*σ*^{′}) ≤ *k*(Φ(*σ*) + 1 − 1) + *ℓ*^{′} = *F**W**F*(*σ*) − *ℓ* + *k* + *ℓ*^{′}. Since *ℓ* ≥ 0 and *ℓ*^{′}≤ *k*, *F**W**F*(*σ*^{′}) ≤ *F**W**F*(*σ*) + 2*k*. The upper bound in the lemma follows by Corollary 1. To see that this upper bound is tight, let *σ* = (*x*_{1},…, *x*_{k})^{2δ+1}, where *x*_{i}≠*x*_{j} for all *i*≠*j*. Let *σ*^{′} = *x*_{1},…, *x*_{k}(*x*_{k+1}, *x*_{1},…, *x*_{k}, *x*_{1},…, *x*_{k})^{δ}, where *x*_{k+1}≠*x*_{i} for all *i* ≤ *k*. Thus, Δ(*σ*, *σ*^{′}) = *δ*. Clearly *F**W**F*(*σ*) = *k*, while *F**W**F*(*σ*^{′}) = *k* + 2*δ**k*, and hence *F**W**F*(*σ*^{′}) = *F**W**F*(*σ*) + 2*δ**k*. □

We now show that FIFO is not smooth. In fact, we show that with only a single difference in the sequences, the number of misses of FIFO can be *k* times higher than the number of misses in the original sequence. On the other hand, since FIFO is strongly competitive, the multiplicative factor *k* is also an upper bound for FIFO’s smoothness.

**Theorem 8** (Smoothness of First-in first-out)

*FIFO is* (*k*,2*δ**k*)*-smooth.**FIFO is not* (*k* − *𝜖*, *γ*,1)*-smooth**for any**𝜖* > 0* and**γ**.*

*Proof*

The upper bound follows from the competitiveness of FIFO and Theorem 4.

In the following, we use ⋰ to denote ascending, and ⋱ to denote descending sequences. For example 3, ⋰, 7 denotes the ascending sequence 3, 4, 5, 6, 7. If *a* < *b*, then \(a, \ddots , b\) and \(b, \ddots , a\) denote empty sequences. For the lower bound, we show how to construct two sequences *σ*_{k} and \(\sigma ^{\prime }_k\) for each cache size k, such that \({\Delta }(\sigma ^{\prime }_k, \sigma _k) = 1\), that yield configurations *c* = [1, ⋰ , *k*] and \(c^{\prime } = [k, \ddots , 1]\), where pages are sorted from last-in to first-in from left to right. Then, the sequence 0, ⋰ , *k* − 1 yields k misses starting from configuration *c*^{′} and only one miss starting from c. The resulting configurations are [0, ⋰ , *k* − 1] and \([k-1, \ddots , 0]\), which are equal to c and *c*^{′} up to renaming. So we can construct an arbitrarily long sequence that yields *k* times as many misses starting from configuration *c*^{′} as it does from configuration c.

For *k* = 2, *σ*_{2} = 2,1 and \(\sigma ^{\prime }_2 = 1,2,1 = 1 \circ \sigma _2\) have edit distance \({\Delta }(\sigma ^{\prime }_2, \sigma _2) = 1\) and yield configurations [1,2] and [2,1], respectively. For *k* = 3, *σ*_{3} = 2,3,1,4,2,1,5,1,4 and \(\sigma ^{\prime }_3 = 1 \circ \sigma _3\) yield configurations [4,1,5] and [5,1,4], respectively, which are equal up to renaming to [1,2,3] and [3,2,1].

For *k* > 3, we present a recursive construction of *σ*_{k} and \(\sigma ^{\prime }_k\) based on *σ*_{k−1} and \(\sigma ^{\prime }_{k-1}\). Notice, that \(\sigma ^{\prime }_2 = 1 \circ \sigma _2\) and \(\sigma ^{\prime }_3 = 1 \circ \sigma _3\). We will maintain that \(\sigma ^{\prime }_k = 1 \circ \sigma _k\) in the recursive construction.

As *σ*_{k−1} and \(\sigma ^{\prime }_{k-1}\) are constructed for a cache of size *k* − 1, they will behave differently on a larger cache of size *k*. However, we can pad *σ*_{k−1} and \(\sigma ^{\prime }_{k-1}\) with requests to one additional page *x* that fills up the additional space in the cache. This can be achieved as follows: Add a request to *x* at the start of the two sequences (following the request to 1 in \(\sigma ^{\prime }_k\)). Also, whenever *x* is evicted in either of the two sequences, in a cache of size *k*, add a request to *x* in both sequences. By construction, the additional requests do not increase the edit distance between the two sequences. Further, the additional requests ensure that every request that belongs to the original sequences faults in the new sequence on a cache of size *k* if and only if it faults in the original sequence on a cache of size *k* − 1. We call the resulting sequences *σ*_{k, pre} and \(\sigma _{k, pre}^{\prime }\). The two sequences yield configurations *c* = [1, ⋰, *i*^{′}, *x*, *i*^{′} + 1, ⋰ , *k* − 1] and \(c^{\prime } = [k-1, \ddots , j^{\prime }+1, x, j^{\prime }, \ddots , 1]\), respectively, which, unless *i*^{′} = *j*^{′}, are almost solutions to the original problem.

We distinguish two cases, depending on whether *i*^{′} < *j*^{′} (case A) or *j*^{′} > *i*^{′} (case B):

Case A: Observe that if *i*^{′} < *j*^{′}, then *c* = [1,..., *i*^{′}, *x*, *i*^{′} + 1, ⋰ , *k* − 1] and \(c^{\prime } = [k-1, \ddots , j^{\prime }+1, x, j^{\prime }, \ddots , 1]\) are equal up to renaming to *d* = [1, ⋰ , *k*] and \(d^{\prime } = [k , \ddots , j+1, i, j, \ddots , i+1, i-1, \ddots , 1]\) for some *i*, *j* with 1 ≤ *i* < *j* ≤ *k*.

We distinguish five cases depending on the values of i and j:

*i*<

*j*<

*k*. Below we incrementally build a suffix

*σ*

_{k, post}that finishes the construction:

Requests | State for prefix | State for prefix \(\sigma ^{\prime }_{k, pre}\) |
---|---|---|

[1, ⋰, | \([k,\ddots , j+1,i,j,\ddots ,i+1,i-1,\ddots , 1]\) | |

[ | \([v, k, \ddots ,j+1,i,j, \ddots , i+1, i-1, \ddots , 2]\) | |

[ | \([v,k, \ddots , j+1,i,j,\ddots , i+1, i-1, \ddots , 2]\) | |

[ | \([i-1, \ddots , 1,v,k, \ddots , j+1,i,j,\ddots , i+2]\) |

*j*=

*i*+ 1, then the two final state above simplify to [

*j*, ⋰ ,

*k*,

*v*,1, ⋰ ,

*i*− 1] and \([i-1,\ddots, 1,v,k,\ddots ,j+1,j-1]\) and the following sequence finishes the construction:

Requests | State for prefix | State for prefix \(\sigma ^{\prime }_{k, pre}\) |
---|---|---|

[ | \([w,i-1, \ddots , 1, v, k, \ddots , j+1]\) | |

[ | \([i-1,\ddots , 1,v,k, \ddots , j+1,j-1]\) | |

[ | \([i-2, \ddots , 1,v,k, \ddots , j,w]\) |

*j*≥

*i*+ 2, the construction can be finished as follows:

Requests | State for prefix | State for prefix \(\sigma ^{\prime }_{k, pre}\) |
---|---|---|

[ | \([i-1, \ddots , 1,v,k, \ddots , j+1,i,j, \ddots , i+2]\) | |

[ | \([w,i-1, \ddots , 1, v, k, \ddots , j+1,i,j, \ddots , i+3]\) | |

[ | \([i-2, \ddots , 1,v,k, \ddots , i+1,w]\) |

*i*<

*j*<

*k*− 1. Consider the following suffix:

Requests | State for prefix | State for prefix \(\sigma ^{\prime }_{k, pre}\) |
---|---|---|

[1, ⋰ , | \([k, \ddots , j+1,1,j, \ddots , 2]\) | |

[ | \([y,k, {\ddots } ,j+1,1,j, \ddots , 3]\) | |

[ | \([j, \ddots , 2, y,k ,\ddots , j+1]\) | |

[ | \([1, j, \ddots , 2, y, k, \ddots , j+2]\) | |

[ | \([k-1, \ddots , j+1, 1, j,\ddots , 2,y]\) |

The final pair of states is equal up to renaming to the pair *d* = [1, ⋰ , *k*] and \(d^{\prime } = [k, \ddots , j+2, 2, j+1, \ddots , 3, 1]\), and so it fulfills the conditions under which the suffix *σ*_{k, post} constructed in Case A.1 finishes the construction.

*i*<

*j*=

*k*− 1. Consider the following suffix:

Requests | State for prefix | State for prefix \(\sigma ^{\prime }_{k, pre}\) |
---|---|---|

[1, ⋰ , | \([k, 1, k-1, \ddots , 2]\) | |

[ | \([x, k, 1, k-1, \ddots , 3]\) | |

[ | \([k-1, \ddots , 2, x, k]\) | |

[ | \([z, y, k-1, \ddots , 2]\) | |

[ | \([k-3, \ddots , 1, x, z, y]\) |

*d*= [1, ⋰,

*k*] and \(d^{\prime } = [k, \ddots , 3, 1, 2]\), and so it fulfills the conditions under which Case A.2 continues the construction.

Case A4: 1 < *i* < *j* = *k*. In this case, *d* = [1, ⋰ , *k*] and \(d^{\prime } = [i, k, \ddots , i+1, i-1, \ddots , 1]\). These are equal up to renaming to \(e = [k, \ddots , j^{\prime }+1, 1, j^{\prime }, \ddots , 2]\) and *e*^{′} = [1, ⋰, *k*], with *j*^{′} = *k* − (*i* − 1). Thus, exchanging *σ*_{k, pre} and \(\sigma ^{\prime }_{k, pre}\) yields states that fulfill the conditions of either Case A.2 or Case A.3.

*i*<

*j*=

*k*. Consider the following suffix:

Requests | State for prefix | State for prefix \(\sigma ^{\prime }_{k, pre}\) |
---|---|---|

[1, ⋰ , | \([1, k, \ddots , 2]\) | |

[ | \([x,1, k ,\ddots , 3]\) | |

[ | \([k-1, \ddots , 2, x, 1]\) |

The resulting pair of states is equal up to renaming to [1, ⋰ , *k*] and \([k, \ddots , 3, 1, 2]\), which corresponds to Case A.2.

*i*

^{′}>

*j*

^{′}, then

*c*= [1, ⋰ ,

*i*

^{′},

*x*,

*i*

^{′}+ 1, ⋰ ,

*k*− 1] and \(c^{\prime } = [k-1, \ddots , j^{\prime }+1, x, j^{\prime }, \ddots , 1]\) are equal up to renaming to

*d*= [1, ⋰ ,

*k*] and \(d^{\prime } = [k , \ddots , j+1, j-1, \ddots , i+1, j, i, \ddots , 1]\) for some

*i*,

*j*with 1 ≤

*i*<

*j*≤

*k*.

Requests | State for prefix | State for prefix \(\sigma ^{\prime }_{k, pre}\) |
---|---|---|

[1, ⋰ , | \([k, \ddots , j+1, j-1, \ddots , i+1, j, i, \ddots , 1]\) | |

[ | \([x, k, \ddots , j+1, j-1, \ddots , i+1, j, i, \ddots , 2]\) | |

[ | \([i, \ddots , 1, x, k, \ddots , j+1, j-1, \ddots , i+1]\) |

At this point, we distinguish two cases:

*j*=

*i*+ 1. Then the suffix \(j-1, \ddots , i+1\) is empty, and the final state for prefix \(\sigma ^{\prime }_{k, pre}\) above can be simplified to \([j-1, \ddots , 1, x, k, \ddots , j+1]\). If, in addition,

*j*=

*k*, the suffix \(k, \ddots , j+1\) is also empty, and this state can further be simplified to \([k-1, \ddots , 1, x]\) and the construction is finished. Otherwise, if

*j*<

*k*, we continue the construction as follows:

Requests | State for prefix | State for prefix \(\sigma ^{\prime }_{k, pre}\) |
---|---|---|

[ | \([j-1, \ddots , 1, x, k, \ddots , j+1]\) | |

[ | \([y, j-1, \ddots , 1, x, k, \ddots , j+2]\) | |

[ | \([k-2, \ddots , j, y, j-1, \ddots , 1, x]\) |

The final pair of states is equal up to renaming to the pair *d* = [1, ⋰ , *k*] and \(d^{\prime }=[k, \ddots , j+2, 1, j+1, \ddots , 3, 2]\), which corresponds to Case A.2, A.3, or A.5.

*j*>

*i*+ 1. Then, we can continue the construction as follows:

Requests | State for prefix | State for prefix \(\sigma ^{\prime }_{k, pre}\) |
---|---|---|

[ | \([i, \ddots , 1, x, k, \ddots , j+1, j-1, \ddots , i+1]\) | |

[ | \([y, i, \ddots , 1, x, k, \ddots , j+1, j-1, \ddots , i+2]\) | |

[ | \([k-2, \ddots , i+1, y, i, \ddots , 1, x]\) |

The final pair of states is equal up to renaming to the pair *d* = [1, ⋰ , *k*] and \(d^{\prime }=[k, \ddots , i+3, 1, i+2, \ddots , 3, 2]\), which corresponds to Case A.2, A.3, or A.5. □

We have shown upper and lower bounds for the smoothness of several classes of algorithms. In particular, we have shown that any *c*-competitive algorithm is also (*c*, *β*)-smooth for some *β*. On the other hand, we have shown that no online deterministic demand-paging or competitive algorithm can be better than (1, *δ*(*k* + 1))-smooth. We have then analyzed the smoothness of LRU, FIFO, and FWF. LRU matches the lower bound while FIFO matches the upper bound for any strictly *k*-competitive algorithm, demonstrating that the upper and lower bounds for the smoothness of strongly-competitive algorithms are tight. Furthermore, for deterministic algorithms there is no trade-off between competitiveness and smoothness. Next, we turn our attention to randomized algorithms, for which this trade-off does exist.

## 5 Smoothness of Randomized Paging Algorithms

Randomized algorithms have been shown to be more competitive than deterministic ones. As for deterministic algorithms, in the following, we first derive lower bounds for the smoothness of demand-paging and strongly-competitive algorithms. These suggest that randomization might also help with smoothness.

However, we then go on to show that the well-known competitive randomized algorithms Mark, Equitable, and Partition are *not* smooth. The simple randomized algorithm that evicts one of the cached pages uniformly at random is shown to be as smooth as LRU, but not more. With randomized algorithms it is possible to sacrifice competitiveness for smoothness, a tradeoff we explore by introducing an algorithm called Smoothed-LRU. We conclude the study of randomized algorithms, by introducing LRU-Random, a randomized version of LRU that is as competitive as LRU, but smoother, at least for a cache of size 2.

### 5.1 Bounds on the Smoothness of Randomized Paging Algorithms

Similarly to deterministic algorithms, we can show a lower bound on the smoothness of any randomized demand-paging algorithm. Notice that the lower bound only applies to *δ* = 1 and so additional disturbances might have a smaller effect than the first one.

**Theorem 9** (Lower bound for randomized, demand-paging algorithms)

*No randomized, demand-paging algorithm is*\((1, H_{k}+\frac {1}{k}-\epsilon , 1)\)*-smooth**for any**𝜖* > 0*.*

*Proof*

For a given randomized, demand-paging algorithm *A*, we show how an oblivious adversary can construct two sequences, a “bad” sequence \(\sigma ^{\prime }_A\) and a “good” sequence *σ*_{A}, with edit distance 1, such that *A*(*σ*^{′}) is at least \(k+H_k+\frac {1}{k}\) and *A*(*σ*) is exactly *k*. The existence of such sequences immediately implies the theorem. The construction is inspired by the nemesis sequence devised by Fiat et al. [14] in their proof of a lower bound for the competitiveness of randomized algorithms.

The sequence \(\sigma ^{\prime }_A\) consists of requests to *n* = *k* + 1 distinct pages. During the construction of the sequence, the adversary maintains for each of the n pages its probability *p*_{i} of not being in the cache. This is possible, because the adversary knows the probability distribution used by *A*. We have \({\sum }_i p_i \geq 1\), as only k of the *n* = *k* + 1 pages can be in the fast memory.

The “bad” sequence \(\sigma ^{\prime }_A\) begins by *n* requests, a single request to each of the *n* pages in an arbitrary order. Initially, the fast memory is empty, and so these requests will result in *k* + 1 faults. After those requests, as *p*_{n} = 0, there will be at least one page i with \(p_i \geq \frac {1}{k}\). The next request in \(\sigma ^{\prime }_A\) is to such a page. We will later refer to this page as *m*. The remainder of \(\sigma ^{\prime }_A\) is composed of *k* − 1 subphases, the *i*^{th} subphase of which will contribute an expected number of \(\frac {1}{k-i+1}\) page faults. By linearity of expectation, we can sum up the expected faults on the entire sequence, and obtain \(A(\sigma ^{\prime }) \geq k+1 + \frac {1}{k} + {\sum }_{i=1}^{k-1} \frac {1}{k-i+1} = k+1+\frac {1}{k} + H_k-1 = k + H_k + \frac {1}{k}\). It remains to show how to construct the remaining *k* − 1 subphases and the “good” sequence *σ*_{A}.

Each of the *k* − 1 subphases consists of zero or more requests to *marked* pages followed by exactly one request to an *unmarked* page. A page is *marked* at the start of subphase i if it is page m or if it has been requested in at least one of the preceding subphases 1 ≤ *j* < *i*. Let *M* be the set of marked pages at the start of the *j*^{th} subphase. Then the number of marked pages is |*M*| = *j* and the number of unmarked pages is *u* = *k* + 1 − *j*. Let \(p_M = {\sum }_{i \in M} p_i\). If *p*_{M} = 0, then, there must be an unmarked page n with \(p_n \geq \frac {1}{u}\) and the adversary can pick this page to end the subphase. Otherwise, if *p*_{M} > 0 there must be a marked page l with *p*_{l} > 0. The first request of subphase j is to page l. Let *𝜖* = *p*_{l}. The adversary can now generate requests to marked pages using the following loop:

While the expected number of faults in subphase i is less than \(\frac {1}{u}\), and while *p*_{M} > *𝜖*, request the marked page l such that *l* = argmax*i*∈*M**p*_{i}.

Note that the loop must terminate, as each iteration will contribute \(p_l \geq \frac {p_M}{|M|} > \frac {\epsilon }{|M|}\) expected faults. If the loop terminates due to the first condition, the adversary can request an arbitrary unmarked page to end the subphase. Otherwise, the adversary requests the unmarked page *i* with the highest probability value. Clearly, \(p_i \geq \frac {1-p_M}{u} > \frac {1-\epsilon }{u}\). The total expected number of faults of the subphase is then \(\epsilon + p_i > \frac {1}{u}\). This concludes the construction of \(\sigma ^{\prime }_A\).

Notice that there is one unmarked page that has only been requested in the initial *n* requests of *σ**A*′. We obtain the “good” sequence *σ*_{A} by deleting the request to this unmarked page from *σ**A*′. By construction, *σ*_{A} contains requests to only *k* distinct pages. As *A* is by assumption demand paging, *σ*_{A} will thus incur k page faults only. □

For strongly-competitive randomized algorithms we can show a similar statement using a similar yet more complex construction:

**Theorem 10** (Lower bound for strongly-competitive randomized paging algorithms^{2})

*No strongly-competitive randomized paging algorithm is* (1, *δ*(*H*_{k} − *𝜖*), *δ*)*-smooth**for any**𝜖* > 0* and any**δ* > 0.

In contrast to the deterministic case, this lower bound only applies to strongly-competitive algorithms, as opposed to simply competitive. So with randomization there might be a trade-off between competitiveness and smoothness. There might be competitive algorithms that are smoother than all strongly-competitive ones.

### 5.2 Smoothness of Particular Randomized Algorithms

Two known strongly-competitive randomized paging algorithms are Partition, introduced by McGeoch and Sleator [20] and Equitable, introduced by Achlioptas, Chrobak, and Noga [1]. We show that neither of the two algorithms is smooth.

**Theorem 11** (Smoothness of Partition and Equitable^{3})

*For any cache size**k* ≥ 2*, **there is an**𝜖* > 0*, **such that neither*Partition*nor*Equitable* is* (1 + *𝜖*, *γ*,1)*-smooth**for any**γ*. *Also,*Partition* and*Equitable*are* (*H*_{k},2*δ**H*_{k})*-smooth.*

The lower bound in the theorem above may not be tight, but it shows that neither of the two algorithms matches the lower bound from Theorem 10. This leaves open the question whether the lower bound from Theorem 10 is tight.

Note that the lower bound for Equitable applies equally to OnlineMin [8], as OnlineMin has the same expected number of faults as Equitable on all request sequences.

Mark [14] is a simpler randomized algorithm that is (2*H*_{k} − 1)-competitive. We show that it is not smooth either.

**Theorem 12** (Smoothness of Mark^{4})

*Let*\(\alpha =\max _{1< \ell \leq k}\left \{\frac {\ell (1+H_{k}-H_{\ell })}{\ell -1+H_{k}-H_{\ell -1}}\right \}={\Omega }(H_{k})\)*, **where* k * is the cache size.*Mark* is not* (*α* − *𝜖*, *γ*,1)*-smooth**for any**𝜖* > 0* and any**γ**.**Also,*Mark* is* (2*H*_{k} − 1, *δ*(4*H*_{k} − 2))*-smooth.*

We conjecture that the lower bound for Mark is tight, i.e., that Mark is (*α*, *β*)-smooth for *α* as defined in Theorem 12 and some *β*.

We now prove that Random achieves the same bounds for smoothness as LRU and the best possible for any deterministic, demand-paging or competitive algorithm. For simplicity, we prove the theorem for a non-demand-paging definition of Random in which each page gets evicted upon a miss with probability 1/*k* even if the cache is not yet full. This modification with respect to the demand-paging version does not change the competitiveness of the algorithm and it allows us to avoid the analysis of special cases when proving properties about smoothness. In fact, for the non-demand-paging version of the algorithm, given any pair of sequences *σ*, *σ*^{′}, it is possible to construct two sequences *ρ* and *ρ*^{′} with Δ(*ρ*, *ρ*^{′}) = Δ(*σ*, *σ*^{′}) such that the number of faults of *σ* and *σ*^{′} starting from an empty cache equals the number of faults of *ρ* and *ρ*^{′} starting with a full cache containing an arbitrary set of pages. This can be achieved by renaming in *σ* and *σ*^{′} any occurrences of the pages in the initial cache so that these pages do not appear in the rest of the sequences. This implies that any property derived on the smoothness of the algorithm starting with an empty cache can also be achieved when the cache is assumed to be initially full. Note that the same property holds for LRU-Random, which is introduced in Section 5.4.

Intuitively, the additive term *k* + 1 in the smoothness of Random is explained by the fact that a single difference between two sequences can make the caches of both executions differ by one page *p*. Since Random evicts a page with probability 1/*k*, the expected number of faults until *p* is evicted is *k*.

**Theorem 13** (Smoothness of Random)

Random* is* (1, *δ*(*k* + 1))*-smooth.**This is tight.*

*Proof*

For the lower bound, we use a similar construction as the one used for the lower bound of Random’s competitiveness in [23]. Consider the sequences *σ* = *σ*_{1...k} ⋅ *σ*_{1...k} and *σ*^{′} = *σ*_{1...k} ⋅ *x*_{k+1} ⋅ *σ*_{1...k} with *σ*_{1...k} = (*x*_{1}, *x*_{2},…, *x*_{k})^{n}. The sequences are identical but for the insertion of *x*_{k+1} in *σ*^{′} and thus Δ(*σ*, *σ*^{′}) = 1. For any *𝜖* > 0, the expected number of faults in the second half of *σ* is less than *𝜖* for sufficiently large n. On *σ*^{′}, Random faults on *x*_{k+1} and evicts one of *x*_{1},…, *x*_{k}. Then, on each of the n subsequences of k requests in the second part of *σ*^{′}, and while *x*_{k+1} is still in its cache, RAND will incur a fault. If in one of these faults RAND evicts *x*_{k+1}, then it does not incur any faults for the rest of the sequence. Since on every fault RAND evicts *x*_{k+1} with probability 1/*k*, the expected number of faults until this happens exceeds *k* − *𝜖* for any *𝜖* > 0 for sufficiently large n. This, plus the initial request to *x*_{k+1} yield RAND(*σ*^{′}) ≥RAND(*σ*) + *k* + 1 − 2*𝜖* for any *𝜖* and sufficiently large *n*. Now, for general *δ* we follow the same idea: instead of one, we have *δ* subsequences (*x*_{1}…*x*_{k})^{n} in *σ* and *δ* subsequences *y*_{i}(*x*_{1}…*x*_{k})^{n} in *σ*^{′}, where *y*_{i} (1 ≤ *i* ≤ *δ*) is a new page not requested so far, and it is distinct in every repetition. The number of expected faults in each repetition in *σ*^{′} is at least *k* + 1 − *𝜖* for any *𝜖* and sufficiently large n, while RAND does not incur extra faults. Thus, RAND(*σ*^{′}) ≥RAND(*σ*) + *δ*(*k* + 1) − *𝜖*(*δ* + 1).

In order to prove the upper bound, we look at the state distributions of Random when serving two sequences *σ* and *σ*^{′} with Δ(*σ*, *σ*^{′}) = 1. We use a potential function defined as the distance between two state distributions. For this distance, we define a version of the earth mover’s distance. Let *D* and *D*^{′} be two probability distributions of cache states. We define the distance between *D* and *D*^{′} as the minimum cost of transforming *D* into *D*^{′} by means of transferring probability mass from the states of *D* to the states of *D*^{′}.

*s*and

*s*

^{′}be two cache states in

*D*and

*D*

^{′}with probabilities

*p*

_{s}and \(p_{s^{\prime }}\), respectively. Let

*α*be a function that denotes the amount of probability mass to be transferred from states in

*D*to states in

*D*

^{′}. The earth mover’s distance between

*D*and

*D*

^{′}is defined as

*s*, \({\sum }_{s^{\prime }} \alpha (s,s^{\prime })=p_s\), for all

*s*

^{′}, \({\sum }_{s} \alpha (s,s^{\prime })=p_{s^{\prime }}\), and

*d*(

*s*,

*s*

^{′}) is the distance between states

*s*and

*s*

^{′}. We define \(d(s,s^{\prime })=k \cdot H_{c(s,s^{\prime })}\), where

*c*(

*s*,

*s*

^{′}) = max{|

*s*∖

*s*

^{′}|,|

*s*

^{′}∖

*s*|}, and

*H*

_{ℓ}is the

*ℓ*

^{th}harmonic number. Note that |

*s*∖

*s*

^{′}| might not equal |

*s*

^{′}∖

*s*| if either state does not represent a full cache. For example, let

*k*= 3,

*s*= [1,2,3] and

*s*

^{′}= [1,4,5]. Then,

*c*(

*s*,

*s*

^{′}) = 2 and

*d*(

*s*,

*s*

^{′}) =

*k*⋅

*H*

_{2}= 3 ⋅ 3/2 = 9/2. For convenience we let

*H*

_{0}= 0. Figure 1 shows the distance between two example distributions.

We will prove the following claim:

### Claim

Let *D* and *D*^{′} be two probability distributions over cache states. Let *σ* be any request sequence and let *M*_{D}(*σ*) and \(M_{D^{\prime }}(\sigma )\) be two random variables equal to the number of misses on *σ* by RAND when starting from distributions *D* and *D*^{′}, respectively. Then, \(E[M_D^{\prime }(\sigma )] - E[M_{D}(\sigma )] \le {\Delta }(D,D^{\prime })\).

Let us assume that the claim is true. Then, we prove the theorem by considering two sequences *ρ* and *ρ*^{′} such that *δ* = Δ(*ρ*, *ρ*^{′}) = 1 and arguing that Δ(*D*, *D*^{′}) ≤ *k* for any pair of distributions *D* and *D*^{′} that can be reached, respectively, by serving prefixes of *ρ* and *ρ*^{′} starting from an empty cache. If this prefix includes the single difference between both sequences, then the theorem follows by applying the claim above to the maximal suffix *σ* shared by both sequences.

Let *j* be the minimum *j* such that *ρ*[(*j* + 1)..|*ρ*|] = *ρ*^{′}[(*j* + 1)..|*ρ*^{′}|] = *σ*. Then, \(\rho _{j}\ne \rho ^{\prime }_{j}\) (one of the two might be empty) and *ρ*[1..(*j* − 1)] = *ρ*^{′}[1..(*j* − 1)]. Since RAND(*ρ*) and RAND(*ρ*^{′}) both start with an empty cache, their distributions and expected misses before serving *ρ*_{j} and \(\rho _j^{\prime }\) coincide. We now argue that after serving *ρ*_{j} and \(\rho _j^{\prime }\) the distance between the resulting distributions *D* and *D*^{′} is at most *k*.

Let *F* be the state distribution of both executions before serving *ρ*_{j} and \(\rho _j^{\prime }\). Suppose first that \(\rho _j^{\prime }\) is empty and thus *D*^{′} = *F* (the case when *ρ*_{j} is empty is symmetric). We look at the minimum cost to transfer the probability mass from each state from *F* to *D*. Let *s*_{i} be a state in *F* with probability *p*_{i}. If *ρ*_{j} ∈ *s*_{i}, then *s*_{i} has probability at least *p*_{i} in *D* and hence we can transfer *p*_{i} mass between these states in *F* and *D* at cost zero. Otherwise, if *ρ*_{j}∉*s*_{i}, *D* contains *k* states \(s_{i_1}^{\prime },\ldots ,s_{i_k}^{\prime }\) resulting from the eviction of each of the *k* pages of *s*_{i}, with \(c(s_i,s_{i_r}^{\prime })=1\) and hence \(d(s_i,s_{i_r}^{\prime })=kH_1=k\) for all 1 ≤ *r* ≤ *k*. Moreover, the probability of these states is at least *p*_{i}/*k* and hence we can transfer all the mass of *s*_{i} to these states at a total cost of \({\sum }_{r=1}^k k(p_i/k)= kp_i\). Adding up over all states *s*_{i} ∈ *F*, we can transfer all probability mass of *F* to *D* at a cost of at most \(k\sum {p_i}=k\), since \(\sum {p_i}=1\). Since the distance between *D* and *F* is the minimum cost of transferring the probability mass from *F* to *D*, this cost is at most *k*. For the case when \(\rho _j\ne \rho _j^{\prime }\) and neither is empty, we apply a similar argument. Let *s*_{i} be a state in *F* with probability *p*_{i}. If both *ρ*_{j} and \(\rho _j^{\prime }\) are in *s*_{i}, then this state is also in *D* and *D*^{′}, and we can transfer *p*_{i} from *D* to *D*^{′} at cost zero. Assume that *ρ*_{j} ∈ *s*_{i} but \(\rho _j^{\prime }\notin s_i\). Then, as we argued above, in *D*^{′}, there are *k* states with probability at least *p*_{i}/*k* with distance 1 to *s*_{i}. Since *s*_{i} ∈ *D*, we can transfer a mass of *p*_{i} to these states at a cost of *k**p*_{i}. Now, if *ρ*_{j}∉*s*_{i} but \(\rho _j^{\prime }\in s_i\), *s*_{i} is in *D*^{′} and there are *k* states in *D* with distance 1 to *s*_{i}. We can transfer *p*_{i}/*k* mass from each of these states in *D* to *s*_{i} in *D*^{′} at a cost of *k**p*_{i}. Finally, if *ρ*_{j}∉*s*_{i} and \(\rho _j^{\prime }\notin s_i\), then there are *k* pairs of states (*s*, *s*^{′}) with *s* ∈ *D* and *s*^{′}∈ *D*^{′} resulting from the replacement of the same page in *s*_{i} by *ρ*_{j} and \(\rho _j^{\prime }\), respectively, and thus *c*(*s*, *s*^{′}) = 1. In the distance Δ(*D*, *D*^{′}) we can transfer *p*_{i}/*k* from *s* to *s*^{′} at a cost of *k*. Since there are *k* such such pairs for each *s*_{i}, the total cost contributed by these pairs is *p*_{i}*k*. Since for all cases the cost contributed by a state *s*_{i} ∈ *F* when transferring mass from *D* to *D*^{′} is at most *k**p*_{i}, the distance Δ(*D*, *D*^{′}) is at most \(k\sum {p_i}=k\).

Since serving *ρ*_{j} and \(\rho _j^{\prime }\) can add at most 1 to the difference in expected misses, and by the claim above the difference in expected misses in the suffix *σ* is at most Δ(*D*, *D*^{′}) = *k*, it follows that *E*[RAND(*ρ*^{′}) −RAND(*ρ*)] ≤ *k* + 1. The theorem follows by Corollary 1.

*M*

_{D}(

*σ*

_{i}) be a random variable equal to the number of misses of Random when

*σ*

_{i}is requested and when the state distribution of RAND is

*D*. Let

*D*

_{0}=

*D*and

*D*0′ =

*D*

^{′}. Then, it is sufficient to prove that for every request

*σ*

_{i}∈

*σ*, for 1 ≤

*i*≤|

*σ*|,

This implies that \(E[M_D(\sigma )] - E[M_{D^{\prime }}(\sigma )] \le {\Delta }(D_{0},D^{\prime }_{0}) - {\Delta }(D_{f},D^{\prime }_{f}) \le {\Delta }(D_{0},D^{\prime }_{0}) = {\Delta }(D,D^{\prime })\), since Δ(⋅,⋅) ≥ 0 for any pair of distributions.

*D*

_{i−1}and \(D_{i-1}^{\prime }\) be the distributions before the request to

*σ*

_{i}.

*α*(

*s*

_{u},

*s*

_{v}) is the amount of mass transferred from

*s*

_{u}to

*s*

_{v}(which could be zero). We look at two states

*s*

_{u}∈

*D*with probability

*p*

_{u}and

*s*

_{v}∈

*D*

^{′}with probability

*p*

_{v}and construct a valid assignment

*α*

^{′}after the request to

*σ*

_{i}for the distance \({\Delta }(D_{i},D^{\prime }_{i})\).

- 1.
[

*σ*_{i}∈*s*_{u},*s*_{v}] In this case*s*_{u}∈*D*_{i}with probability at least*p*_{u}and \(s_v \in D^{\prime }_i\) with probability at least*p*_{v}. Hence, since*α*(*s*_{u},*s*_{v}) ≤ min{*p*_{u},*p*_{v}} we can make*α*^{′}(*s*_{u},*s*_{v}) =*α*(*s*_{u},*s*_{v}). The contribution of this pair of states to \({\Delta }(D_{i},D^{\prime }_{i})\) is \(\alpha (s_u,s_v)d(s_u,s_v)= \alpha (s_u,s_v)kH_{c(s_u,s_v)}\). - 2.[
*σ*_{i}∉*s*_{u},*s*_{v}] There are*k*states*r*= {*r*_{1},…,*r*_{k}} in*D*_{i}and*t*= {*t*_{1},…,*t*_{k}} in \(D_i^{\prime }\) resulting from the eviction of each page of*s*_{u}and*s*_{v}, respectively. The probability of each state of*r*and*t*is at least*p*_{u}/*k*and*p*_{v}/*k*, respectively. Let*c*=*c*(*s*_{u},*s*_{v}) and*α*=*α*(*s*_{u},*s*_{v}). If*c*= 0, then we pair states in*r*and*t*such that \(r_{j_1}=t_{j_2}\) and we make \(\alpha ^{\prime }(r_{j_1},t_{j_2})=\alpha /k\). Otherwise, there are*c*pages that*s*_{u}and*s*_{v}do not have in common. We sort the states in*r*and*s*such that the first*c*states are those that result from evicting a page from*s*_{u}that is not in*s*_{v}and vice versa, while the rest of the states are the ones resulting from evicting a common page. We pair the states in order and set*α*^{′}(*r*_{j},*t*_{j}) =*α*/*k*. Note that*c*(*r*_{j},*t*_{j}) =*c*− 1 for all*j*≤*c*and*c*(*r*_{j},*t*_{j}) =*c*for all*j*>*c*(see Fig. 2). The contribution of this pair of states to \({\Delta }(D_{i},D^{\prime }_{i})\) is at most$$(\alpha/k) (ckH_{c-1}+(k-c)kH_{c})=\alpha(kH_{c}+c(H_{c-1}-H_c))=\alpha(kH_{c}-1) $$ - 3.
[

*σ*_{i}∈*s*_{u},*σ*_{i}∉*s*_{v}] We transfer*α*(*s*_{u},*s*_{v})/*k*to the*k*states in \(D^{\prime }_{i}\) resulting from evictions from*s*_{v}. As in case 2. there are*c*=*c*(*s*_{u},*s*_{v}) states that result from evicting a non-common page with*s*_{u}and the rest evict a common page. Each of the first*c*states has*c*− 1 non-common pages with*s*_{u}, while the rest have*c*non-common pages. Hence, the contribution of these states to \({\Delta }(D_{i},D^{\prime }_{i})\) is \(\alpha (s_u,s_v)(kH_{c(s_u,s_v)}-1)\). - 4.
[

*σ*_{i}∉*s*_{u},*σ*_{i}∈*s*_{v}] This case is analogous to case 3. We transfer*α*(*s*_{u},*s*_{v})/*k*mass to*s*_{v}∈*D*^{′}from each of the*k*states in*D*that result from evictions from*s*_{u}. The contribution of these states is \(\alpha (s_u,s_v)(kH_{c(s_u,s_v)}-1)\).

*D*

_{i}and \(D_i^{\prime }\), the described mass transfer is a valid distance between the distributions, and its cost is:

Therefore, \({\Delta }(D_{i-1},D^{\prime }_{i-1})-{\Delta }(D_{i},D^{\prime }_{i})\ge {\sum }_{s_u,s_v |\sigma _i \notin s_u}\alpha (s_u,s_v)\). On the other hand, \(E[M_{D_{i-1}}(\sigma _i)] - E[M_{D^{\prime }_{i-1}}(\sigma _i)] \le E[M_{D_{i-1}}(\sigma _i)] = {\sum }_{s_u,s_v |\sigma _i \notin s_u}\alpha (s_u,s_v)\), and hence \(E[M_{D_{i-1}}(\sigma _i)] - E[M_{D^{\prime }_{i-1}}(\sigma _i)] \le {\Delta }(D_{i-1},D^{\prime }_{i-1})-{\Delta }(D_{i},D^{\prime }_{i})\). □

### 5.3 Trading Competitiveness for Smoothness

We have seen that none of the well-known randomized algorithms are particularly smooth. Random is the only known randomized algorithm that is (1, *δ**c*)-smooth for some *c*. However, it is neither smoother nor more competitive than LRU, the smoothest deterministic algorithm. In this section we show that greater smoothness can be achieved at the expense of competitiveness. First, as an extreme example of this, we show that Evict-on-access (EOA) [9]—the policy that evicts each page with a probability of \(\frac {1}{k}\) upon *every* request, i.e., not only on faults but also on hits—beats the lower bounds of Theorems 9 and 10 and is strictly smoother than OPT. This policy is non–demand paging and it is obviously not competitive. We then introduce Smoothed-LRU, a parameterized randomized algorithm that trades competitiveness for smoothness.

**Theorem 14** (Smoothness of EOA^{5})

*EOA is*\((1,\delta (1+\frac {k}{2k-1}))\)*-smooth.**This is tight.*

#### 5.3.1 Smoothed-LRU

We now describe Smoothed-LRU. The main idea of this algorithm is to smooth out the transition from the hit to the miss case.

*p*is the number of distinct pages that have been requested since the previous request to

*p*. LRU faults if and only if the requested page’s age is greater than or equal to

*k*, the size of the cache. An inserted request may increase the ages of

*k*cached pages by one. At the next request to each of the cached pages, the page’s age may thus increase from

*k*− 1 to

*k*, and turn the request from a hit into a miss, resulting in

*k*additional misses. By construction, under Smoothed-LRU, the hit probability of a request decreases only gradually with increasing age. The speed of the transition from definite hit to definite miss is controlled by a parameter

*i*, with 0 ≤

*i*<

*k*. Under Smoothed-LRU, the hit probability \(P(\textit {hit}_{\textsc {Smoothed}-LRU_{k,i}}(a))\) of a request to a page with age

*a*is:

*k*is the size of the cache. Figure 3 illustrates this graphically in relation to LRU for cache size

*k*= 8 and

*i*= 4. It is easy to see that for

*i*= 0, Smoothed-LRU reduces to LRU. We will later demonstrate how to realize an algorithm with the hit probabilities defined above. Before doing so we analyze such an algorithm’s smoothness and competitiveness.

**Theorem 15** (Smoothness of Smoothed-LRU)

Smoothed − *L**R**U*_{k, i}* is*\((1,\delta (\frac {k+i-1}{2i+1}+2))\)*-smooth**for**k* > 3*i**, ** and*\((1,\delta (\frac {2k-1}{2i+1}+1))\)*-smooth**for**k* ≤ 3*i**.**This is tight*^{6}.

*Proof*

The proof of the upper bound is similar to that for LRU. The key difference is that, in contrast to LRU, an age increase may only increase the miss probability of a page by \(\frac {1}{2i+1}\). We show that Smoothed − *L**R**U*_{k, i} is \((1,\frac {k+i-1}{2i+1}+2,1)\)-smooth for *k* > 3*i* and \((1,\frac {2k-1}{2i+1}+1,1)\)-smooth for *k* ≤ 3*i*. Corollary 1 then implies the theorem.

Let us first consider how the insertion of one request may affect ages and the expected number of faults. By definition, the age of any page is only affected from the point of insertion up to its next request. Only the hit probability of the next request to a page may thus change due to an additional request. Under Smoothed − *L**R**U*_{k, i}, at most *k* + *i* pages have a non-zero hit probability at any time. Other than the inserted request, only the next requests to these *k* + *i* pages may increase the expected number of misses. By construction, increasing the age of a request by one may only decrease its hit probability by \(\frac {1}{2i+1}\). As the inserted request itself may also introduce a fault, the overall number of faults may thus increase by at most \(\frac {k+i}{2i+1}+1\), which is no larger than both bounds given above.

*p*does not increase the ages of other pages. (a) Only the miss probability of the next request to

*p*may increase. (b) On the other hand, deleting the request to

*p*reduces the expected number of misses of a sequence by the request’s own miss probability. Thus, a deletion cannot increase the expected number of faults by more than one, but a more careful analysis helps with the analysis for substitutions.

- (a)
The increase in

*p*’s fault probability on its next request depends on*p*’s age right before the deletion of the request, which we denote by*a*in the following. Upon the next request to*p*,*p*’s age is at most*a*higher than it would have been without the deletion. Thus, its miss probability increases by at most \(\frac {a}{2i+1}\). The increase is also bounded by 1, as the next request to*p*may only cause a single additional miss. - (b)According to (2) the miss probability of the deleted request to
*p*is:Observe that for all three cases in (3), the difference between (a) and (b) is bounded by both \(\frac {k-i-1}{2i+1}\) and 1, and thus a deletion may increase the expected number of misses at most by \(\min \{\frac {k-i-1}{2i+1}, 1\}\).$$ P(\textit{miss}_{\textsc{Smoothed}-LRU_{k,i}}(a)) = \left\{\begin{array}{ll} 0 & : a < k-i\\ 1-\frac{k+i-a}{2i+1} & : k-i \leq a < k+i\\ 1 & : a \geq k+i \end{array}\right. $$(3)

*σ*=

*σ*

_{pre}⋅

*p*⋅

*σ*

_{post}. A substitution of the request to

*p*by a request to

*q*can be achieved by first deleting the request to

*p*and then inserting the request to

*q*, yielding sequences

*σ*

^{′}=

*σ*

_{pre}⋅

*σ*

_{post}and

*σ*

^{″}=

*σ*

_{pre}⋅

*q*⋅

*σ*

_{post}. From the considerations of insertions and deletions above, we know that \(\textsc {Smoothed}-\text {LRU}_{k,i}(\sigma ^{\prime })-\textsc {Smoothed}-\text {LRU}_{k,i}(\sigma ) \leq \min \{\frac {k-i-1}{2i+1}, 1\}\) and \(\textsc {Smoothed}-\text {LRU}_{k,i}(\sigma ^{\prime \prime })-\textsc {Smoothed}-\text {LRU}_{k,i}(\sigma ^{\prime }) \leq \frac {k+i}{2i+1}+1\).

Now, consider the case where the deletion causes a whole additional fault, i.e, Smoothed −LRU_{k, i}(*σ*^{′}) −Smoothed −LRU_{k, i}(*σ*) = 1. Then, the page *p* must be cached with probability 1 after *σ*_{pre}. Also, the next request to *p* in *σ*_{post} causes a fault with probability 1 in *σ*^{′}. Thus, the insertion of *q* does not increase the request’s miss probability in *σ*^{″}. In this case, the insertion of *q* thus only increases the miss probabilities of *k* + *i* − 1 pages. Combining the arguments above we get \(\textsc {Smoothed}-LRU_{k,i}(\sigma ^{\prime \prime })-\textsc {Smoothed}-LRU_{k,i}(\sigma ) \leq \min \{\frac {k-i-1}{2i+1}+\frac {k+i}{2i+1}+1, 1+\frac {k+i-1}{2i+1}+1\} = \min \{\frac {2k-1}{2i+1}+1, \frac {k+i-1}{2i+1}+2\}\).

For *k* > 3*i*, the difference simplifies to \(\frac {k+i-1}{2i+1}+2\), as \(\frac {k+i-1}{2i+1}+2 = \frac {k+3i}{2i+1}+1 \leq \frac {2k-1}{2i+1}+1\). On the other hand, for *k* ≤ 3*i*, \(\frac {k+i-1}{2i+1}+2 = \frac {k+3i}{2i+1}+1 > \frac {2k-1}{2i+1}+1\).

For tightness, consider the two sequences *σ* = 1,2,…, *k* + *i*, *k* − *i*,1,2,…, *k* + *i*, *y* and *σ*^{′} = 1,2,…, *k* + *i*, *x*,1,2,…, *k* + *i*, *y* with *y* > *x* > *k* + *i*. Then, Δ(*σ*^{′}, *σ*) = 1. The request to *x* incurs one additional fault. The second request to each page in the set {1,…, *k* − *i* − 1, *k* − *i* + 1,…*k* + *i*} has an age of *k* + *i* in *σ*^{′}, while it only has an age of *k* + *i* − 1 in *σ*. Thus each of these requests incurs an additional \(\frac {1}{2i+1}\) expected faults, for a total of \(\frac {k+i-1}{2i+1}\) additional expected faults. By construction, the third request to *k* − *i* in *σ* incurs no faults, as it has an age of *k* − *i* − 1. The corresponding request in *σ*^{′} has an age of *k* + *i*, thus incurring one additional fault. Finally, the second request to *k* − *i* in *σ*, which is missing in *σ*^{′}, has an age of *k* + *i* − (*k* − *i*) = 2*i*, and thus causes no faults if 2*i* < *k* − *i* ⇔ *k* > 3*i*. In this case, \(\textsc {Smoothed}-LRU_{k,i}(\sigma ^{\prime })-\textsc {Smoothed}-LRU_{k,i}(\sigma ) = 1+\frac {k+i-1}{2i+1}+1-0 = \frac {k+i-1}{2i+1}+2\), matching the upper bound. If 2*i* ≥ *k* − *i* ⇔ *k* ≤ 3*i*, the second request to *k* − *i* in *σ* incurs an expected \(1-\frac {k+i-2i}{2i+1} = 1-\frac {k-i}{2i+1}\) faults. In this case, \(\textsc {Smoothed}-LRU_{k,i}(\sigma ^{\prime })-\textsc {Smoothed}-LRU_{k,i}(\sigma ) = 1+\frac {k+i-1}{2i+1}+1-(1-\frac {k-i}{2i+1}) = \frac {2k-1}{2i+1}+1\). For *δ* > 1, consider the sequences *σ*_{δ} and \(\sigma ^{\prime }_{\delta }\) obtained by concatenating *δ* copies of *σ* and *σ*^{′}, respectively. □

For *i* = 0, Smoothed − *L**R**U* is identical to LRU and (1, *δ*(*k* + 1))-smooth. At the other extreme, for *i* = *k* − 1, Smoothed − *L**R**U* is (1,2*δ*)-smooth, like the optimal offline algorithm. However, for larger *i*, Smoothed-LRU is less competitive than LRU:

**Lemma 3** (Competitiveness of Smoothed-LRU)

*For any sequence*

*σ*

*and*

*l*≤

*k*−

*i*

*,*

*where*OPT

_{l}(

*σ*)

*denotes the number of faults of the optimal offline algorithm processing*

*σ*

*on a fast memoryof size*

*l*.

*For l*>

*k*−

*i*

*and any*

*α and*

*β*

*σ, such that*Smoothed − LRU

_{k, i}(

*σ*) >

*α*⋅OPT

_{l}(

*σ*) +

*β*.

*Proof*

Let *L**R**U*_{l}(*σ*) denote the number of faults of LRU on a cache of size *l*. As Smoothed − *L**R**U*_{k, i} caches all pages younger than *k* − *i* with probability one, we have Smoothed − *L**R**U*_{k, i}(*σ*) ≤ *L**R**U*_{k−i}(*σ*) for any sequence *σ*, From Sleator and Tarjan [26], we know that \(LRU_{k-i}(\sigma ) \leq \frac {k-i}{k-i-l+1}\cdot \text {OPT}_l(\sigma ) + l\).

For the second part of the theorem consider the sequence *σ*_{n} = (1,…, *l*)^{n}, which contains l distinct pages. The optimal offline algorithm misses exactly l times on this sequence independently of *n*. For *k* − *i* < *l*, on the other hand, Smoothed − *L**R**U*_{k, i} has a non-zero miss probability of at least \(\frac {1}{2i+1}\) on every request. For every *α* and *β* there is an *n* such that Smoothed − *L**R**U*_{k, i}(*σ*_{n}) > *α* ⋅OPT_{l}(*σ*_{n}) + *β*. □

So far we have analyzed Smoothed-LRU based on the hit probabilities given in (2). We have yet to show that a randomized algorithm satisfying (2) can be realized.

In the following, we construct a probability distribution on the set of all deterministic algorithms using a fast memory of size *k* that satisfies Equation (2). This is commonly referred to as a *mixed strategy* [7].

*i*+ 1 instances of a simpler algorithm called Step-LRU. Then we show how Step-LRU can be realized as a mixed strategy. Like Smoothed-LRU, Step-LRU is parameterized by

*i*, and it exhibits the following hit probabilities in terms of the age of a requested page:

**Lemma 4** (Decomposition of Smoothed-LRU in terms of Step-LRU^{7})

*For all ages a*

*,*

As a consequence, we can realize Smoothed-LRU as a mixed strategy if we can realize Step-LRU as a mixed strategy.

While the hit probabilities \(P(\textit {hit}_{\textsc {Step}-\text {LRU}_{k,i}}(a))\) do not fully define Step-LRU, by linearity of expectation they are sufficient to determine the expected number of faults on any sequence *σ*, which we denote by Step- LRU_{k, i}(*σ*). We are able to show that Step-LRU can be realized as a mixed strategy:

**Proposition 2** (Step-LRU as a mixed strategy^{8})

*There is a probability distribution*\(d : {\mathcal A} \rightarrow \mathbb {R}\)

*over a finite set of deterministic paging algorithms*\({\mathcal A}\)

*using a fast memory of size*k

*, such that for all*

*sequences*

*σ*

*,*

This immediately implies that Smoothed-LRU can be realized as a mixed strategy:

**Corollary 3** (Smoothed-LRU as a mixed strategy)

*There is a probability distribution*\(d : {\mathcal A} \rightarrow \mathbb {R}\)

*over a finite set of deterministic paging algorithms*\({\mathcal A}\)

*using a fast memory of size*k

*, such that for all sequences*

*σ*

*,*

*Proof*

This follows immediately from Lemma 4 and Proposition 2. □

### 5.4 A Competitive and Smooth Randomized Paging Algorithm: LRU-Random

*i*

^{th}oldest page in the cache is evicted with probability \(\frac {1}{i\cdot H_{k}}\). By construction the eviction probabilities sum up to 1: \({\sum }_{i=1}^{k} \frac {1}{i\cdot H_{k}} = \frac {1}{H_{k}}\cdot {\sum }_{i=1}^{k} \frac {1}{i} = 1\). LRU-Random is

*not*demand paging: if the cache is not yet entirely filled, it may still evict cached pages according to the probabilities mentioned above. For a cache of size 8, Fig. 4 illustrates the probabilities of evicting the

*i*

^{th}oldest page from the cache upon a miss under LRU, LRU-Random, and Random.

LRU-Random is at least as competitive as strongly-competitive deterministic algorithms:

**Theorem 16** (Competitiveness of LRU-Random)

*For any*

*sequence*

*σ*

*,*

*Proof*

We actually prove a stronger statement, namely that LRU-Random is *k*-competitive against any *adaptive online adversary* [21]. Our proof is based on a potential argument.

*S*

_{ADV}and

*S*

_{LRUR}be the set of pages contained in the adversary’s and LRU-Random’s fast memory, respectively. Further, let

*a*

*g*

*e*(

*p*) be the age of page

*p*∈

*S*

_{LRUR}, i.e.,

*a*

*g*

*e*(

*p*) is 0 for the most-recently-used page and

*k*− 1 for the least-recently-used one among those pages that are in

*S*

_{LRUR}. Based on

*a*

*g*

*e*(

*p*), we define

*s*(

*p*) =

*k*−

*a*

*g*

*e*(

*p*). In other words,

*s*(

*p*) is 1 for the oldest cached page, and k for the youngest, most-recently-used. Using this notation we define the following potential function:

*x*and any decision of the adversary to evict a page from its memory, we have

*x*) and ADV(

*x*) denote the cost of the request, and ΔΦ(

*x*) is the expected change in the potential function. Note that the potential function is initially zero, given that both caches are initially empty. Further it is never negative. From this and Inequality (5) the

*k*-competitiveness of LRU-Random against an adaptive online adversary follows. To prove Inequality (5), we distinguish four cases upon a request to page

*x*:

- 1.
LRU-Random hits and ADV hits. Then, LRU-Random(

*x*) = ADV(*x*) = 0. The request does not decrease the ages of pages in*S*_{LRUR}∖*S*_{ADV}and so the potential does not increase, as \(\frac {s}{H_s}\) is monotonically increasing in*s*. - 2.LRU-Random hits and ADV misses. As LRU-Random(
*x*) = 0 and ADV(*x*) = 1, we have to show that ΔΦ(*x*) ≤*k*. The contribution of each page*p*∈*S*_{LRUR}∖*S*_{ADV}to the potential drops or stays the same, as the ages of these pages may not decrease. The potential only increases if ADV chooses to evict a page in*S*_{LRUR}∩*S*_{ADV}. The maximal increase is achieved by evicting the youngest such page*p*. After the request,*p*’s age is at least 1, as it was not the requested page. Therefore it contributes at most \(H_k\cdot \frac {k-1}{H_{k-1}}\) to the potential, which is$$H_k\cdot \frac{k-1}{H_{k-1}} = \left( H_{k-1} + \frac{1}{k}\right)\cdot \frac{k-1}{H_{k-1}} = k-1 + \frac{k-1}{k\cdot H_{k-1}} < k. $$ - 3.LRU-Random misses and ADV hits. Then, we have to show that the potential reduces by at least 1 in expectation. Again, the contribution of no page
*p*∈*S*_{LRUR}∖*S*_{ADV}may increase. Further, as ADV may not evict a page, no new page contributes to the potential. We show that the contribution of each page*p*∈*S*_{LRUR}∖*S*_{ADV}drops by at least 1 in expectation. There are three possible cases for a page*p*with*s*(*p*) =*s*:So the expected change in potential due to page- (a)
A younger page is replaced, and

*p*’s contribution to the potential does not change. This happens with probability \({\sum }_{i=s+1}^k \frac {1}{i\cdot H_k} = 1 - \frac {H_s}{H_k}\). - (b)
Page

*p*gets replaced. This happens with probability \(\frac {1}{s\cdot H_k}\) and it reduces the potential by \(\frac {H_k\cdot s}{H_s}\). - (c)
An older page is replaced, and

*p*’s age increases by one. This happens with probability \({\sum }_{i=1}^{s-1} \frac {1}{i\cdot H_k} = \frac {H_{s-1}}{H_k}\) and it reduces the potential by \(H_k\cdot \left (\frac {s}{H_{s}}-\frac {s-1}{H_{s-1}}\right ) = H_k\cdot \frac {H_s-1}{H_sH_{s-1}}.\)

*p*is$$\frac{1}{s\cdot H_k}\cdot \frac{-H_k\cdot s}{H_s} + \frac{H_{s-1}}{H_k}\cdot H_k\cdot\frac{1-H_s}{H_sH_{s-1}} = -\frac{1}{H_s} + \frac{1-H_s}{H_s} = -1.</p><p class="noindent">$$ - (a)
- 4.LRU-Random misses and ADV misses. If, before the request,
*S*_{LRUR}∖*S*_{ADV}≠*∅*, then we can combine the arguments from cases 2 and 3 to show that the potential increases by at most*k*− 1. This does not cover the case where*S*_{LRUR}=*S*_{ADV}. In this case, the potential is increased maximally if the adversary chooses to evict the most-recently-used page. If LRU-Random replaces a different page, the potential increases by \(H_k\cdot \frac {k-1}{H_{k-1}}\). However, with probability \(\frac {1}{k\cdot H_k}\), LRU-Random also replaces the most-recently-used page (in which case the potential remains the same). The expected change in potential is thus bounded by$$\begin{array}{@{}rcl@{}} \left( 1-\frac{1}{k H_k}\right) H_k \frac{k-1}{H_{k-1}} &=& \frac{kH_k-1}{kH_k} H_k \frac{k-1}{H_{k-1}} = \frac{(kH_k-1)(k-1)}{H_{k-1}k}\\ &=& \frac{kH_{k-1}(k-1)}{H_{k-1}k} = k-1. \end{array} $$

The proof of Theorem 16 applies to an adaptive online adversary. An analysis for an oblivious adversary might yield a lower competitive ratio. On the other hand, the result is tight for adaptive adversaries. This can be seen by considering cases 3 and 4 of the previous proof. An optimal adversary forces case 3 by accessing a block in *S*_{ADV} ∖ *S*_{LRUR}, as long as *S*_{LRUR}≠*S*_{ADV}. Whenever *S*_{LRUR} = *S*_{ADV} the adversary accesses a page contained in neither *S*_{ADV} nor *S*_{LRUR} and replaces the most-recently-used page in *S*_{LRUR}, resulting in case 4. In both cases the expected change in potential is equal to the difference in the expected number of misses.

For *k* = 2, we also show that LRU-Random is (1, *δ**c*)-smooth, where *c* is less than *k* + 1, which is the best possible among deterministic, demand-paging or competitive algorithms. Specifically, *c* is \(1+11/6=2.8\bar {3}\). Although our proof technique does not scale beyond *k* = 2, we conjecture that this algorithm is in fact smoother than (1, *δ*(*k* + 1)) for all *k*.

**Theorem 17** (Smoothness of LRU-Random^{9})

*Let**k* = 2*LRU*-Random* is*\((1,\frac {17}{6}\delta )\)*-smooth.*

**Conjecture 1** (Smoothness of LRU-Random)

LRU-Random* is*\((1,{\Theta }({H_{k}^{2}})\delta )\)*-smooth.*

## 6 Discussion

We have determined fundamental limits on the smoothness of deterministic and randomized paging algorithms. No deterministic competitive algorithm can be smoother than (1, *δ*(*k* + 1))-smooth. Under the restriction to bounded-memory algorithms, which is natural for hardware implementations of caches, smoothness implies competitiveness. We conjecture that smoothness generally implies competitiveness without the restriction to bounded-memory algorithms. LRU is strongly competitive, and it matches the lower bound for deterministic competitive algorithms, while FIFO matches the upper bound. There is no trade-off between smoothness and competitiveness for deterministic algorithms.

*δ*

*H*

_{k})-smooth. With LRU-Random we introduce a randomized algorithm that is at least as competitive as any deterministic algorithm, yet provably smoother, at least for

*k*= 2. While its exact smoothness remains open, we conjecture that LRU-Random is \((1,{\Theta }({H_{k}^{2}})\delta )\)-smooth. Figure 5 schematically illustrates many of our results.

Randomization introduces variance in the number of faults even on the same request sequence. In our current framework this variance is hidden, as we analyze the expected number of faults. In the analysis of real-time systems, however, the tail of the probability distribution is of greater interest than its expected value. It would thus be interesting to study how the probability distribution and in particular its tail changes in response to perturbations in the request sequence. Recent results by Komm et al. [16] suggest that smoothness “with high probability” is possible.

## Footnotes

- 1.
See the Appendix for the proof.

- 2.
See the Appendix for the proof.

- 3.
See the Appendix for the proof.

- 4.
See the Appendix for the proof.

- 5.
See the Appendix for the proof.

- 6.
Note that the conference version of this article [25] incorrectly claimed Smoothed −LRU

_{k, i}to be \((1,\delta (\frac {k+i}{2i+1}+1))\)-smooth. - 7.
See the Appendix for the proof.

- 8.
See the Appendix for the proof.

- 9.
See the Appendix for the proof.

### References

- 1.Achlioptas, D., Chrobak, M., Noga, J.: Competitive analysis of randomized paging algorithms. Theoretical Comput. Sci.
**234**(1-2), 203–218 (2000). https://doi.org/10.1016/S0304-3975(98)00116-9. http://www.sciencedirect.com/science/article/pii/S0304397598001169 MathSciNetCrossRefMATHGoogle Scholar - 2.Aho, A., Denning, P., Ullman, J.: Principles of optimal page replacement. J. ACM
**18**(1), 80–93 (1971)MathSciNetCrossRefMATHGoogle Scholar - 3.Axer, P., et al.: Building timing predictable embedded systems. ACM Trans. Embed. Comput. Syst.
**13**(4), 82:1–8:372 (2014). https://doi.org/10.1145/2560033 CrossRefGoogle Scholar - 4.Becchetti, L., Leonardi, S., Marchetti-Spaccamela, A., Schäfer, G., Vredeveld, T.: Average-case and smoothed competitive analysis of the multilevel feedback algorithm. Math. Oper. Res.
**31**(1), 85–108 (2006). https://doi.org/10.1287/moor.1050.0170 MathSciNetCrossRefMATHGoogle Scholar - 5.Beckmann, N., Sanchez, D.: Talus: a Simple Way to Remove Cliffs in Cache Performance. In: 21St IEEE International Symposium on High Performance Computer Architecture, HPCA 2015, Burlingame, CA, USA, February 7-11, 2015, pp. 64–75 (2015). https://doi.org/10.1109/HPCA.2015.7056022
- 6.Belady, L.A.: A study of replacement algorithms for virtual-storage computer. IBM Syst. J.
**5**(2), 78–101 (1966)CrossRefGoogle Scholar - 7.Borodin, A., El-Yaniv, R.: Online computation and competitive analysis. Cambridge University Press, New York (1998)MATHGoogle Scholar
- 8.Brodal, G.S., Moruz, G., Negoescu, A.: Onlinemin: a fast strongly competitive randomized paging algorithm. Theory Comput. Syst.
**56**(1), 22–40 (2015). https://doi.org/10.1007/s00224-012-9427-y MathSciNetCrossRefMATHGoogle Scholar - 9.Cazorla, F.J., et al.: PROARTIS: Probabilistically analyzable real-time systems. ACM Trans. Embed. Comput. Syst.
**12**(2s), 94:1–94:26 (2013). https://doi.org/10.1145/2465787.2465796 CrossRefGoogle Scholar - 10.Chaudhuri, S., Gulwani, S., Lublinerman, R.: Continuity and robustness of programs. Commun. ACM
**55**(8), 107–115 (2012). https://doi.org/10.1145/2240236.2240262 CrossRefMATHGoogle Scholar - 11.Doychev, G., et al.: CacheAudit: A tool for the static analysis of cache side channels. ACM Trans. Inf. Syst. Secur
**18**(1), 4:1–4:32 (2015). https://doi.org/10.1145/2756550 CrossRefGoogle Scholar - 12.Doyen, L., Henzinger, T., Legay, A., Nickovic, D.: Robustness of Sequential Circuits. In: ACSD ’10, pp 77–84 (2010). https://doi.org/10.1109/ACSD.2010.26
- 13.Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP ’06, Part II, LNCS, vol. 4052, pp 1–12. Springer (2006). https://doi.org/10.1007/11787006_1
- 14.Fiat, A., Karp, R.M., Luby, M., McGeoch, L.A., Sleator, D.D., Young, N.E.: Competitive paging algorithms. J. Algorithms
**12**(4), 685–699 (1991)CrossRefMATHGoogle Scholar - 15.Kleene, S.: Automata studies, chap. Representation of events in nerve nets and finite automata. Princeton University Press, Princeton (1956)Google Scholar
- 16.Komm, D., Královic, R., Královic, R., Mömke, T.: Randomized online algorithms with high probability guarantees. In: STACS ’14, vol. 25, pp 470–481 (2014)Google Scholar
- 17.Koutsoupias, E., Papadimitriou, C.: Beyond competitive analysis. SIAM J. Comput.
**30**(1), 300–317 (2000). https://doi.org/10.1137/S0097539796299540 MathSciNetCrossRefMATHGoogle Scholar - 18.Liu, C.L.: Some memory aspects of finite automata. Tech. Rep. 411 Massachusetts Institute of Technology (1963)Google Scholar
- 19.Mattson, R.L., Gecsei, J., Slutz, D.R., Traiger, I.L.: Evaluation techniques for storage hierarchies. IBM Syst. J.
**9**(2), 78–117 (1970)CrossRefMATHGoogle Scholar - 20.McGeoch, L., Sleator, D.: A strongly competitive randomized paging algorithm. Algorithmica
**6**, 816–825 (1991). https://doi.org/10.1007/BF01759073 MathSciNetCrossRefMATHGoogle Scholar - 21.Motwani, R., Raghavan, P.: Randomized algorithms. Cambridge University Press, New York (1995)CrossRefMATHGoogle Scholar
- 22.Perles, M., Rabin, M., Shamir, E.: The theory of definite automata. IEEE Trans. Electron. Comput.
**12**(3), 233–243 (1963). https://doi.org/10.1109/PGEC.1963.263534 MathSciNetCrossRefMATHGoogle Scholar - 23.Raghavan, P., Snir, M.: Memory versus randomization in on-line algorithms (Extended Abstract). In: Ausiello, G., Dezani-Ciancaglini, M., Rocca, S.R.D. (eds.) ICALP ’89, Lecture Notes in Computer Science, vol. 372, pp 687–703. Springer (1989). https://doi.org/10.1007/BFb0035792
- 24.Reineke, J., Grund, D.: Sensitivity of cache replacement policies. ACM Trans. Embed. Comput. Syst.
**12**(1s), 42:1–42:18 (2013). https://doi.org/10.1145/2435227.2435238 CrossRefGoogle Scholar - 25.Reineke, J., Salinger, A.: On the smoothness of paging algorithms. In: Sanità, L., Skutella, M. (eds.) Approximation and Online Algorithms - 13th International Workshop, WAOA 2015, Patras, Greece, September 17-18, 2015. Revised Selected Papers, Lecture Notes in Computer Science, vol. 9499, pp 170–182. Springer (2015). https://doi.org/10.1007/978-3-319-28684-6_15
- 26.Sleator, D.D., Tarjan, R.E.: Amortized efficiency of list update and paging rules. Commun. ACM
**28**(2), 202–208 (1985)MathSciNetCrossRefGoogle Scholar - 27.Spielman, D.A., Teng, S.H.: Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time. J. ACM
**51**(3), 385–463 (2004). https://doi.org/10.1145/990308.990310 MathSciNetCrossRefMATHGoogle Scholar - 28.Wilhelm, R., et al.: The worst-case execution-time problem—overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst.
**7**(3), 36:1–36:53 (2008). https://doi.org/10.1145/1347375.1347389 CrossRefGoogle Scholar