General Caching Is Hard: Even with Small Pages

Folwarczný, Lukáš; Sgall, Jiří

doi:10.1007/s00453-016-0185-0

General Caching Is Hard: Even with Small Pages

Published: 19 August 2016

Volume 79, pages 319–339, (2017)
Cite this article

Algorithmica Aims and scope Submit manuscript

300 Accesses
3 Citations
Explore all metrics

Abstract

Caching (also known as paging) is a classical problem concerning page replacement policies in two-level memory systems. General caching is the variant with pages of different sizes and fault costs. The strong NP-hardness of its two important cases, the fault model (each page has unit fault cost) and the bit model (each page has the same fault cost as size) has been established, but under the assumption that there are pages as large as half of the cache size. We prove that this already holds when page sizes are bounded by a small constant: The bit and fault models are strongly NP-complete even when page sizes are limited to $\{1, 2, 3\}$. Considering only the decision versions of the problems, general caching is equivalent to the unsplittable flow on a path problem and therefore our results also improve the hardness results about this problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Article 12 April 2024

Serverless Computing: Current Trends and Open Problems

Constacyclic locally recoverable codes from their duals

Article Open access 18 April 2024

References

Achlioptas, D., Chrobak, M., Noga, J.: Competitive analysis of randomized paging algorithms. Theor. Comput. Sci. 234(1–2), 203–218 (2000). doi:10.1016/S0304-3975(98)00116-9. (A preliminary version appeared at ESA 1996)
Article MathSciNet MATH Google Scholar
Adamaszek, A., Czumaj, A., Englert, M., Räcke, H.: An O(log k)-competitive algorithm for generalized caching. In: Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1681–1689 (2012). doi:10.1137/1.9781611973099
Albers, S., Arora, S., Khanna, S.: Page replacement for general caching problems. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 31–40 (1999). http://dl.acm.org/citation.cfm?id=314500.314528
Anagnostopoulos, A., Grandoni, F., Leonardi, S., Wiese, A.: A mazing $2+\epsilon $ approximation for unsplittable flow on a path. In: Chekuri, C. (ed.) Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5–7, 2014, pp. 26–41. SIAM (2014). doi:10.1137/1.9781611973402.3
Bansal, N., Buchbinder, N., Naor, J.: Randomized competitive algorithms for generalized caching. SIAM J. Comput. 41(2), 391–414 (2012). doi:10.1137/090779000. (A preliminary version appeared at STOC 2008)
Article MathSciNet MATH Google Scholar
Bansal, N., Chakrabarti, A., Epstein, A., Schieber, B.: A quasi-PTAS for unsplittable flow on line graphs. In: Kleinberg, J.M. (ed.) Proceedings of the 38th Annual ACM Symposium on Theory of Computing, pp. 721–729. ACM (2006). doi:10.1145/1132516.1132617
Bar-Noy, A., Bar-Yehuda, R., Freund, A., Naor, J., Schieber, B.: A unified approach to approximating resource allocation and scheduling. J. ACM 48(5), 1069–1090 (2001). doi:10.1145/502102.502107. (A preliminary version appeared at STOC 2000)
Article MathSciNet MATH Google Scholar
Batra, J., Garg, N., Kumar, A., Mömke, T., Wiese, A.: New approximation schemes for unsplittable flow on a path. In: Indyk, P. (ed.) Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 47–58. SIAM (2015). doi:10.1137/1.9781611973730.5
Belady, L.A.: A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5(2), 78–101 (1966). doi:10.1147/sj.52.0078
Article Google Scholar
Bonsma, P.S., Schulz, J., Wiese, A.: A constant-factor approximation algorithm for unsplittable flow on paths. SIAM J. Comput. 43(2), 767–799 (2014). doi:10.1137/120868360
Article MathSciNet MATH Google Scholar
Borodin, A., El-Yaniv, R.: Online Computation and Competitive Analysis. Cambridge University Press, Cambridge (1998)
MATH Google Scholar
Brodal, G.S., Moruz, G., Negoescu, A.: Onlinemin: a fast strongly competitive randomized paging algorithm. Theory Comput. Syst. 56(1), 22–40 (2015). doi:10.1007/s00224-012-9427-y. (A preliminary version appeared at WAOA 2011)
Article MathSciNet MATH Google Scholar
Chrobak, M., Karloff, H.J., Payne, T.H., Vishwanathan, S.: New results on server problems. SIAM J. Discrete Math. 4(2), 172–181 (1991). doi:10.1137/0404017. (A preliminary version appeared at SODA 1990)
Article MathSciNet MATH Google Scholar
Chrobak, M., Larmore, L.L., Lund, C., Reingold, N.: A better lower bound on the competitive ratio of the randomized 2-server problem. Inf. Process. Lett. 63(2), 79–83 (1997). doi:10.1016/S0020-0190(97)00099-9
Article MathSciNet MATH Google Scholar
Chrobak, M., Woeginger, G.J., Makino, K., Xu, H.: Caching is hard–even in the fault model. Algorithmica 63(4), 781–794 (2012). doi:10.1007/s00453-011-9502-9. (A preliminary version appeared at ESA 2010)
Article MathSciNet MATH Google Scholar
Darmann, A., Pferschy, U., Schauer, J.: Resource allocation with time intervals. Theor. Comput. Sci. 411(49), 4217–4234 (2010). doi:10.1016/j.tcs.2010.08.028
Article MathSciNet MATH Google Scholar
Fiat, A., Karp, R.M., Luby, M., McGeoch, L.A., Sleator, D.D., Young, N.E.: Competitive paging algorithms. J. Algorithms 12(4), 685–699 (1991). doi:10.1016/0196-6774(91)90041-V. (A preliminary version appeared in 1988)
Article MATH Google Scholar
Folwarczný, L., Sgall, J.: General caching is hard: Even with small pages. In: Elbassioni, K.M., Makino, K. (eds.) Algorithms and Computation—26th International Symposium, ISAAC 2015, Nagoya, Japan, December 9–11, 2015, Proceedings, Lecture Notes in Computer Science, vol. 9472, pp. 116–126. Springer (2015). doi:10.1007/978-3-662-48971-0_11
Irani, S.: Page replacement with multi-size pages and applications to web caching. Algorithmica 33(3), 384–409 (2002). doi:10.1007/s00453-001-0125-4. (A preliminary version appeared at STOC 1997)
Article MathSciNet MATH Google Scholar
McGeoch, L.A., Sleator, D.D.: A strongly competitive randomized paging algorithm. Algorithmica 6(6), 816–825 (1991). doi:10.1007/BF01759073. (A preliminary version appeared in 1989)
Article MathSciNet MATH Google Scholar
Young, N.E.: On-line file caching. Algorithmica 33(3), 371–383 (2002). doi:10.1007/s00453-001-0124-5. (A preliminary version appeared at SODA 1998)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

We are grateful to the anonymous reviewers, in particular for bringing the articles [10, 16] to our attention and for suggestions that helped us to improve the presentation.

Author information

Authors and Affiliations

Faculty of Mathematics and Physics, Computer Science Institute of Charles University, Malostranské nám. 25, 11800, Praha 1, Prague, Czech Republic
Lukáš Folwarczný & Jiří Sgall

Authors

Lukáš Folwarczný
View author publications
You can also search for this author in PubMed Google Scholar
Jiří Sgall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lukáš Folwarczný.

Additional information

Partially supported by the Center of Excellence—ITI, Project P202/12/G061 of GA ČR (J. Sgall) and by the Project 14-10003S of GA ČR (L. Folwarczný).

Appendix: The Simple Proof

In this appendix, we present a simple variant of the proof for the almost-fault model with two distinct costs. This completes the sketch of the proof presented at the end of Sect. 2. We present it with a complete description of the simplified reduction, so that it can be read independently of the rest of the paper. This appendix can therefore serve as a short proof of the hardness of general caching.

Theorem 6.1

General caching is strongly NP-hard, even in the case when page sizes are limited to 1, 2, 3 and there are only two distinct fault costs.

We prove the theorem for the optional policy. It is easy to obtain the theorem also for the forced policy the same way as in the proof of Theorem 5.1.

1.1 The Reduction

The reduction described here will be equivalent to Reduction 2.1 with $H = 1$ and the fault cost of each vertex-page set to $1/(n+1)$.

Suppose we have a graph $G=(V,E)$ with n nodes and m edges. We construct an instance of general caching whose optimal solution encodes a maximum independent set in G. Fix an arbitrary numbering of edges $e_1, \ldots , e_m$.

The cache size is $C=2m+1$. For each vertex v, we have a vertex-page $p_v$ with size one and cost $1/(n+1)$. For each edge e, we have six associated edge-pages $a^e, \bar{a}^e, \alpha ^e, b^e, \bar{b}^e, \beta ^e$; all have cost one, pages $\alpha ^e,\beta ^e$ have size three and the remaining pages have size two.

The request sequence is organized in phases and blocks. There is one phase for each vertex. In each phase, there are two adjacent blocks associated with every edge e incident with v; the incident edges are processed in an arbitrary order. In addition, there is one initial block I before all phases and one final block F after all phases. Altogether, there are $d=4m+2$ blocks. There are four blocks associated with each edge e; denote them $B^e_1$, $B^e_2$, $B^e_3$, $B^e_4$, in the order as they appear in the request sequence.

For each $v\in V$, the associated page $p_v$ is requested exactly twice, right before the beginning of the v-phase and right after the end of the v-phase; these requests do not belong to any phase. An example of the structure of phases and blocks is given in Fig. 6.

Even though each block is associated with some fixed edge, it contains one or more requests to the associated pages for every edge e. In each block, we process the edges $e_1,\ldots ,e_m$ in this order. For each edge e, we make one or more requests to the associated pages according to Table 3.

Table 3 Requests associated with an edge e

Full size table

Figure 7 shows an example of the requests on edge-pages associated with one particular edge.

1.2 Proof of Correctness

Instead of minimizing the service cost, we maximize the saving compared to the service which does not use the cache at all. This is clearly equivalent when considering the decision version of the problem.

Without loss of generality, we assume that any page is brought into the cache only immediately before a request to that page and removed from the cache only immediately after a request to that page; furthermore, at the beginning and at the end the cache is empty. I.e., a page may be in the cache only between two consecutive requests to this page, and either it is in the cache for the whole interval or not at all.

Each page of size three is requested only twice in two consecutive blocks, and these blocks are distinct for all pages of size three. Thus, a service of edge-pages is valid if and only if at each time, at most m edge-pages are in the cache. It is thus convenient to think of the cache as of m slots for edge-pages.

Each vertex-page is requested twice. Thus, the saving on the n vertex-pages is at most $n/(n+1)<1$. Since all edge-pages have cost one, the optimal service must serve them optimally. Furthermore, a vertex-page can be cached if and only if during the phase it never happens that at the same time all slots for edge-pages are full and a page of size three is cached.

Let $S_B$ denote the set of all edge-pages cached at the beginning of the block B and let $s_B = |S_B|$. Now observe that each edge-page is requested only in a contiguous segment of blocks, once in each block. It follows that the total saving on edge-pages is equal to $\sum _B s_B$ where the sum is over all blocks. In particular, the maximal possible saving on the edge-pages is $(d-1)m$, using the fact that $S_I$ is empty.

We prove that there is a service with the total saving at least $(d-1)m+K/(n+1)$ if and only if there is an independent set of size K in G. First the easy direction.

Lemma 6.2

Suppose that G has an independent set W of size K. Then there exists a service with the total saving $(d-1)m+K/(n+1)$.

Proof

For any e, denote $e=\{u, v\}$ so that u precedes v in the ordering of phases. If $u\in W$, we keep $\bar{a}^e, b^e, \bar{b}^e$ and $\beta ^e$ in the cache from the first to the last request on each page, and we do not cache $a^e$ and $\alpha ^e$ at any time. Otherwise we cache $\bar{b}^e, a^e, \bar{a}^e$ and $\alpha ^e$, and do not cache $b^e$ and $\beta ^e$ at any time. In both cases, at each time at most one page associated with e is in the cache and the saving on those pages is $(d-1)m$. See Fig. 8 for an illustration.

For any $v\in W$, we cache $p_v$ between its two requests. To check that this is a valid service, observe that if $v\in W$, then during the corresponding phase no page of size three is cached. Thus the page $p_v$ always fits in the cache together with at most m pages of size two. $\square $

Now we prove the converse in a sequence of claims. Fix a valid service with saving at least $(d-1)m$. For a block B, let $B^{\prime }$ denote the following block.

Claim 6.3

For any block B, with the exception of $B=I$, we have $s_B=m$.

Proof

For each $B\ne I$ we have $s_B\le m$. Because $s_I = 0$, the total saving on edge-pages is $\sum _B s_B\le (d-1)m$. We need an equality. $\square $

We now prove that each edge occupies exactly one slot during the service.

Claim 6.4

For any block $B\ne I$ and for any e, $S_B$ contains exactly one page associated with e.

Proof

Let us use the notation $S_B^{\le k} = S_B^{e_1} \cup \cdots \cup S_B^{e_k}$ and $s_B^{\le k} = \left| S_B^{\le k}\right| $. First, we shall prove for each $k \le m$

$$\begin{aligned} s_B^{\le k} = k. \end{aligned}$$

(8)

This is true for $B=F$, as only the m edge-pages $\bar{b}^e$ can be cached there, and by the previous claim all of them are indeed cached. Similarly for $B=I^{\prime }$ (i.e., immediately following the initial block).

If (8) is not true, then for some k and $B\not \in \{I,F\}$ we have $s_B^{\le k}<s_{B^{\prime }}^{\le k}$. Then after processing the edge $e_k$ in the block B we have in the cache all the pages in $(S_B{\setminus }S_B^{\le k})\cup S_{B^{\prime }}^{\le k}$. Their number is $(m-s_B^{\le k})+s_{B^{\prime }}^{\le k}>m$, a contradiction.

The statement of the claim is an immediate consequence of (8). $\square $

Claim 6.5

For any edge e, at least one of the pages $\alpha ^e$ and $\beta ^e$ is cached between its two requests.

Proof

Assume that none of the two pages is cached. It follows from the previous claim that $b^e\in S_{B^e_2}$, as at this point $\alpha ^e$ and $b^e$ are the only pages associated with e that can be cached. Similarly, $a^e\in S_{B^e_4}$.

It follows that there exists a block B between $B^e_1$ and $B^e_4$ such that $S_B$ contains the page $b^e$ and $S_{B^{\prime }}$ contains the page $a^e$. However, in B, the page $a^e$ is requested before the page $b^e$. Thus at the point between the two requests, the cache contains two pages associated with e, plus one page associated with every other edge, the total of $m+1$ pages, a contradiction. $\square $

Now we are ready to complete this direction.

Lemma 6.6

Suppose that there exists a valid service with the total saving $(d-1)m+K/(n+1)$. Then G has an independent set W of size K.

Proof

Let W be the set of all v such that $p_v$ is cached between its two requests. The total saving implies that $|W|=K$.

Now we claim that W is independent. Suppose not, let $e=uv$ be an edge with $u,v\in W$. Then $p_u$ and $p_v$ are cached in the corresponding phases. Thus neither $\alpha ^e$ nor $\beta ^e$ can be cached, since together with other $m-1$ requests of size 2 associated with the remaining edges, the cache size needed would be $2m+2$. However, this contradicts the last claim. $\square $

Lemmas 6.2 and 6.6 together show that we constructed a valid polynomial-time reduction from the problem of independent set to general caching. Therefore, Theorem 6.1 is proven.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Folwarczný, L., Sgall, J. General Caching Is Hard: Even with Small Pages. Algorithmica 79, 319–339 (2017). https://doi.org/10.1007/s00453-016-0185-0

Download citation

Received: 03 February 2016
Accepted: 07 July 2016
Published: 19 August 2016
Issue Date: October 2017
DOI: https://doi.org/10.1007/s00453-016-0185-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

General Caching Is Hard: Even with Small Pages

Abstract

Access this article

Similar content being viewed by others

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Serverless Computing: Current Trends and Open Problems

Constacyclic locally recoverable codes from their duals

References

Acknowledgments